2

Which is best practice?

Scenario: If I have 2 tables, one called topics and another called posts whereby each post belongs to a specific topic_id.

Would it be best to have a column in the topics table called post_count which I would update whenever a post is created/delete under that topic_id.

Or would it be best to count the number of rows where a topic_id matches using a SELECT COUNT query?

Are there any disadvantages to either or differences in efficiency?

1
  • 3
    i prefer to always update the topics table. so, if 1 topic has 10000++ post, and the user only want to know how much post its has, then the query can be faster Commented Feb 12, 2015 at 1:13

1 Answer 1

4

Storing post_count on column is a denormalisation.

The read performance will be faster to look up a single field than to perform a SELECT COUNT, but most people would agree that this is a premature optimisation that introduces the possiblity of update anomaly. The database should store normalised (i.e. non-redundant) data unless you have benchmarks indicating that this is the performance bottleneck in your application.

You would probably be better served by looking at a caching layer between the database and the application so that the count is not performed each time you query the database, but will be updated when the contents of posts changes - you may even find that MySQL is already caching the result.

3
  • Would it also be practical if I had a functionality which periodically checked the accuracy of the post_count column by comparing it to a SELECT COUNT? (Something that would be scheduled for every week or so and can also be manually done by an admin)
    – Sakuya
    Commented Feb 12, 2015 at 1:25
  • 1
    It really depends on the stage of your application's lifecycle, and indeed your mentality/methodology. I would say that if you have a working, deployed application that has thousands to millions of topics and posts, and you have identified a noticeable performance issue with this particular query that caching can not mitigate, then go for it. Otherwise I would say forget it - don't waste your time introducing possible bugs when you could be working on real features. Commented Feb 12, 2015 at 1:41
  • 1
    For fun I ran some times on a postgresql database on my laptop (i.e. non-scientific), first with 200 records then with 200k records. See results: pastebin.com/C6xRy6K3 . The SELECT COUNT was actually faster for smaller number of posts. Commented Feb 12, 2015 at 2:13

Not the answer you're looking for? Browse other questions tagged or ask your own question.