Intro
At my company, we have a web app that monitors websites of investment firms for any new investment deals they make. The web app runs on Python Flask with Gunicorn and queues scraping jobs to Celery workers. Lately, we ran into an issue where a database query returned empty when it should not have. Since this issue occurs sporadically, I suspected it was due to our multithreading code. The ORM that we’re using, SQLAlchemy, warns that session
is not thread-safe, and should not be accessed concurrently from multiple threads. I dug into our code and found that we passed Website entity objects (which is attached to main thread’s session
) from the main thread to each of the 16 children threads. Since each object in the children thread is still attached to the main thread’s session
, running queries on it would result in concurrent access of main thread’s session. Bingo, that must be the bug!
Quick fix
I fixed that issue by getting the main thread to pass the id
of the Website entity to the children threads instead of passing the actual object. The children threads would then use their own thread-local session
to query the Website object via the id
. Hence no session
is shared between threads. We deployed the fix to production and 3 days later, to my surprise, the same bug returned!
Revenge of the bug
It turned out the bug was due to the database connection pool being shared by multiple Celery worker nodes. SQLAlchemy warns that database connections should not travel across process boundaries. In multiprocess situations, Engine.dispose()
should be called by the child processes after forking, to create a new pool of database connections. This bug surprised me because both Celery and Gunicorn use the pre-fork worker model and we have not seen this issue with Gunicorn. What’s the difference between Celery’s pre-fork and Gunicorn’s? Since both Celery and Gunicorn are open-source, I delved into their source code to study their difference.
Gunicorn Source
When Gunicorn starts, it starts the arbiter process. The arbiter maintains the worker processes by launching or killing them as needed. By default, the arbiter forks itself to create new worker processes. These worker process will then load the WSGI app. Since each worker loads the WSGI app after forking, they would not share any app memory. Interestingly, there is an option called preload_app
that lets the arbiter load the WSGI app before forking. This would speed up server boot times since the import of the web app happens before forking, and each worker process will not need to re-import again.
Celery Source
Celery workers load the web app before forking because it needs the Celery
object, which contains configuration settings (ex. message broker host, port, where to import tasks, etc.) In our case, we initialized the SQLAlchemy engine with connection pool during import time. When Celery forks the worker processes, all the worker processes will share the engine, and resulted in strange behaviour.
3 Possible Solutions
Disable connection pooling in SQLAlchemy by using
NullPool
so connections are not kept in memory.- Pros: No chance of sharing connections if connections are not kept in memory.
- Cons: Performance decreases since each query requires a new connection to be opened and closed.
Listen to the
worker_process_init
signal in Celery to dispose engine after worker processes are initialized.- Pros: Engine is disposed on init, so each worker starts with a fresh connection pool.
- Cons: Does not guard against sharing of connection pool with other multiprocessing libraries. (Ex. Gunicorn with
preload_app=True
, or python’s multiprocess forking)
Add a
register_after_fork
hook to dispose engine after multiprocess forking.- Pros:
Engine.dispose()
will run after each fork, no matter which framework/library is doing the forking. - Cons:
register_after_fork
is an internal undocumented method and may change without warning.
- Pros:
Conclusion: Not All Pre-fork Models Are The Same
In the end, we went with solution 3 since we want to ensure engine is disposed on every fork, no matter if it’s in Celery worker, Gunicorn worker, or our own forks. The lesson I’ve learned is that not all pre-fork worker model forks at the same time, and that I should always check the source code to verify.