I really like to create SQL queries using SQLAlchemy’s explicit and declarative API. When using this instead of raw strings, roughly half of errors I tend to introduce are caught before even sending the query.
Here is an example:
A huge advantage over dealing with string queries is that you can create
different parts of the query on their own and combine them. In the example I
SELECT statements in a central place and allow to pass in parts of
the where clause.
Unfortunately, this pattern has a dangerous property: It will implicitly add a
CROSS JOIN when you pass a filter expression that contains a column from a
table that is not part of the select statement already.
On small tables this will just create an unexpected result set, but when the involved tables are large this query might well exhaust the DB server’s resources. The concrete problem can be fixed by adding a join to all the tables that should be allowed in the filters:
A more abstract problem is that we can create queries that will lead to unexpected results without noticing. A good but pretty expensive solution would be to validate the parts that are passed in. Depending on how variable the input can be, I would go for this solution. But in other cases it’s just myself using my query construction logic in new ways. So I prefer a cheaper solution that makes me aware of the problem.
Thankfully, SQLAlchemy’s events allow us to be notified when a query is about to be executed. We can create a listener that raises an exception when we try to run a problematic query.
It’s not perfect because it does not check subqueries or CTEs but it gives us a line of defense. I am also thinking of adding an assertion that checks for the problem to the tests that exercise the query construction logic.
Here is the complete example code: sqlalchemy_implicit_cross_join.py.