Django group by

I have often been confused by the way that the Django ORM implements the equivalent of the “group by” statement of SQL. When I want to group some data I often think in a structure very similar to SQL. And because Django has a higher level abstraction I often forget how to string it together properly. In this post, I will describe how I understand the grouping in Django after experimenting a bit with the feature.

Values

Values() is the method that does the grouping of the data. Calling values() without a subsequent annotate() call just pulls out the values of the specified fields.

Calling annotate() after a values() call on a queryset groups that data according to the specified fields, such that each group has a unique combination of the fields. The annotation is then performed on the resulting groups.

Annotate

The annotate call defines what information to add to each group. Usually this is some kind of aggregation – Count, Sum, Avg, Min or Max. Each line returned by the values() call will be annotated with an aggregation of all the rows that shares that combination of fields. A Sum annotation will therefore be the sum of all rows that share some combination of the fields used in the values() call.

Order_by

An important quirk is the interaction between annotate() and order_by(). If a model has a default ordering, or the query has an order_by() part, this will be used when grouping data. q.values(‘name’).order_by(‘id’) will be treated as grouping on distinct (name, id) pairs, which might not be intended. To ensure that a distant order_by is not interfering with a grouping, you can call order_by() without arguments to clear any previous ordering.

I hope this cleared up how to group data and add annotations with the Django ORM. The official documentation is brilliant as always, so also check that out for more information on aggregation: https://docs.djangoproject.com/en/2.2/topics/db/aggregation/#values