Here I'll outline some attempts to optimize a slow view – the facilitator view – in UELC.
To find out where time was being spent, I added print('a')
,
print('b')
etc, throughout the get
function,
and loaded the view while watching the logs.
After reducing the query count by removing duplicate calls to
Pagetree's .block()
, I noticed that this page loads
four columns of data, displayed as a column for each "group user"
in the template. Each iteration of this loop calculating the columns
takes about half a second. Not really thinking things through, I
figured why not calculate this concurrently, splitting it up into
multiple processes, potentially running on different CPUs? After
reading about python's
multiprocessing
library, I refactored the for loop into
the function render_user_gates
, and came up with this:
pool = multiprocessing.Pool(processes=4) args_list = [[u, hierarchy, section, hand, gateblocks] for u in cohort_users] user_sections = pool.map(render_user_gates, args_list) print('done', len(user_sections))from here.
This doesn't work.. I think that it may have even messed up my database.
I got a "multiple hierarchies returned" error from pagetree, so I had to
get a new database. Because my render_user_gates
method makes
database queries, it's not an option to just disconnect from the database
at the beginning of this function.
I asked on #django if anyone's using multiprocessing.pool
with Django's ORM. Someone named "moldy" said probably not, because it
was a really strange thing to do. I guess I didn't even think of how
this would behave, deployed on a server, running through gunicorn or
something.
Moldy mentioned that this could be a use case for celery. I know we use that on some other projects, like PMT. I'm reading more about celery now – I don't want to put the time in to set everything up until I'm sure that it would work, and also I would have to be pretty sure that it will actually improve performance in this view.