On client project I have to do a search in a structure of tables and return
results including some child objects. A classical instance of the N+1 query
problem, if I was using an ORM. I decided not to use the ORM for the feature
because it will be one of the hottest paths of the application and I wanted
more control over the queries (rather complex logic with LIKE & sorting).
But the filtering/sorting is not the topic today, so I will leave it out in
the examples.
For illustration, let’s assume the following schema:
CREATE TABLE parents (
  parent_id INTEGER PRIMARY KEY,
  name VARCHAR
);
CREATE TABLE children (
  child_id INTEGER PRIMARY KEY,
  name VARCHAR,
  birthdate DATE,
  parent_id INTEGER,
  FOREIGN KEY (parent_id) REFERENCES parents
);Without any further thinking I would do a query like this and a bit of code that constructs parent objects with their respective children.
SELECT * FROM parents JOIN children USING (parent_id) ORDER BY parent_id; parent_id | name  | child_id |  name   | birthdate  
-----------+-------+----------+---------+------------
         1 | Jim   |        1 | Tamara  | 2017-02-01
         1 | Jim   |        3 | Tom     | 2005-10-01
         1 | Jim   |        5 | Tonja   | 2011-07-17
         2 | Jenny |        2 | Tim     | 2000-11-02
         2 | Jenny |        4 | Theresa | 2017-04-30Just before I started writing the logic to pull apart the result and put it into it’s object structure, I thought, “It would be nice to let the database put together all the children of one parent into an array.”
I already knew
array_agg
which aggregates all values into an array. After some reading I discovered
json_build_object
which takes a sequence of keys and values and creates a JSON object from it.
So my final query looked like this:
SELECT
    parent_id
  , p.name
  , array_agg(json_build_object(
      'child_id', c.child_id
    , 'name', c.name
    , 'birthdate', c.birthdate
  )) as children
FROM parents p
JOIN children c USING (parent_id)
GROUP BY 1, 2; parent_id | name  |                                                                                                             children                                                                                                              
-----------+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
         1 | Jim   | {"{\"child_id\" : 1, \"name\" : \"Tamara\", \"birthdate\" : \"2017-02-01\"}","{\"child_id\" : 3, \"name\" : \"Tom\", \"birthdate\" : \"2005-10-01\"}","{\"child_id\" : 5, \"name\" : \"Tonja\", \"birthdate\" : \"2011-07-17\"}"}
         2 | Jenny | {"{\"child_id\" : 2, \"name\" : \"Tim\", \"birthdate\" : \"2000-11-02\"}","{\"child_id\" : 4, \"name\" : \"Theresa\", \"birthdate\" : \"2017-04-30\"}"}When the query is executed with sqlalchemy, children in the resulting rows
is already correctly typed as a list of dictionaries.
The explain analyze output for both queries (on a trivially small test set)
shows that the aggregated version is 50-100% slower (150-200μs vs. 250-350μs),
but I guess that will rarely be a real problem because - at least in my case
- the execution time of my query is dominated by filtering/sorting the parent rows.
 
If you would like to play with the example yourself, get the SQL file here.