In database modelling, I often see that flags get introduced. Especially, as the the project gets older, important tables often accumulate more and more flags.
Of course there are cases, where flags do make sense. But as innocent as they might seem at first, they can be a serious anti-pattern, which is hard to get rid of after the fact.
Let's see why.
Rethink the Model
Flags often indicate that the domain model changed, but you are too busy (or lazy?) to refactor it accordingly. It might be better to split up the current model into multiple, more specific entities.
It gets especially apparent if you also experience some of the following symptoms:
- You have to remove constraints to make the schema work with your app logic.
- Some columns are mutually exclusive, depending on a flag.
- Certain columns are always
NULL
, depending on a flag. - Columns change their meaning depending on a flag.
If these effects occur, chances are high that the table does not represent the actual domain model any more. To avoid further technical debt, a refactoring is unavoidable.
A less problematic cause of such symptoms is, that the model would be better decomposed at its life cycle boundaries. This approach often helps to tighten the constraints and get rid of nullable columns.
Consider a blogging system which contains a table for blog posts with fields like "title", "content", "publication date", "preview URL" and of course, a flag is_published
. The blog starts out as a draft and might get published later. That's the life cycle boundary, which toggles the flag.
As long as the blog is a draft, the publication date is always NULL
and the preview URL not NULL
. But for published blog posts the publication date is never NULL
and the preview URL is always NULL
.
It can make sense to split the data model into "drafts" and "posts". Drafts would have no publication date, but a non-nullable column "preview URL". The posts table has a non-nullable column "publication date", but no "preview URL".
If the new entities reflect a deeper insight in the domain, it makes sense to also model them as classes and expose them throughout the application code.
If it's just a technical optimization, the different tables can be hidden in a repository and the rest of the application code does not need to know about it.
Invalid Combinations
If you are going to have multiple flags in a table, you can run into a situation, where there are logically invalid combinations.
Consider an "invoices" table, where an invoice can either be paid or overdue. With two flags, there are four possibilities:
- It's possible for an invoice to be unpaid and not overdue yet.
- An invoice can also be paid and hence not overdue.
- And it's possible, that an invoice was not paid and is now overdue.
- But it's logically invalid for an invoice to be paid and overdue at the same time.
overdue = FALSE | overdue = TRUE | |
paid = FALSE | ✅ | ✅ |
paid = TRUE | ✅ | ❌ |
Of course, this gets more complex to think about when there are even more flags involved. Try to avoid situations like this.
If you are certain that you have to introduce flags which can produce invalid states, add constraints to forbid them.
ALTER TABLE invoices
ADD CONSTRAINT check_invoices_paid_overdue_exclusive CHECK (
NOT (paid = TRUE AND overdue = TRUE)
);
Even if the application should enforce the constraints, they serve as documentation and prohibit invalid manual changes.
I have seen cases in the wild with more than four flags, where this was an issue and a serious amount of records already contained invalid flag combinations. This was a huge mess and really hard to clean up.
👉 Did you also detect such cases in your projects? How did you handle them?
Use Nullable Timestamps Instead of Boolean Flags
Finally, there are valid use cases, where a flag makes sense. But in such cases, you can also consider using a nullable timestamp instead of a plain boolean flag.
Instead of adding the boolean flags paid
and overdue
, you could add the timestamps paid_at
and overdue_since
:
CREATE TABLE invoices (
-- ... other columns
paid_at TIMESTAMP,
overdue_since TIMESTAMP
);
It adds information, and I couldn't find any drawback with this approach. To filter the column just use paid_at IS NULL
and paid_at IS NOT NULL
instead of paid_at = FALSE
and paid_at = TRUE
. The constraints above can be adapted accordingly.
Most web frameworks already use nullable timestamps for typical flags, like deleted_at
for soft-deletes, instead of boolean flags and have the capabilities to handle them well.
👉 Have you found any problems with this approach?