Denormalization

The normalization process aims to remove redundancy in the modeled data. This leads to efficient updates, where writes don't need to update data at many places for overall consistency and data integrity.

However, there are limitations to this approach. One major limitation is performance: certain reads many need so many database operations (joins, scans, and so on) that they become computationally intractable. For example, let's say we have a use case of having resellers on the travel website. These people would take inventory and do bookings for customers as normal travel agents, in lieu of fees (paid at the end of every month) from the travel website. Let's say the bookings are modeled as follows:

  • Bookings:
    • BookingId
    • Date
    • SKU
    • ResellerId
    • Amount
    • Fee
  • Resellers:
    • ResellerId
    • Name
    • Address

Here, the ResellerID in the bookings table is a foreign key to the resellers table. Whenever a booking is done, the ResellerId and the applicable fees are populated by the booking DB transaction.

Now, there is a new requirement for an agent to figure out the total fees due to him in the current month. This can be engineered by doing a GROUP BY ResellerID on the bookings table, scanning for the required time range, and then doing a summation of the fees. However, this performance might not be acceptable and based on the isolation levels defined might cause bottlenecks in the write (business critical) path. One way to solve this problem is to maintain a count of fees due for the current month in the resellers table itself, like so:

  • ResellerId
  • Name
  • Address
  • Fees due

Every time a booking is made, the DB transaction adds the fees of the current booking to the fees due column; getting the current fees due is a matter of a simple table lookup. The trade-off we made, of course, is that the write path needs to do a bit more work in maintaining this aggregated data. In many cases, this trade-off is very much a sane choice.

It is important that the update for the denormalized schema happens in a transaction.

Another reason we might want to do denormalization is to maintain history of the change. Normalized schema retain the current state of the system, and many times a use case calls for a change log. Denormalization helps here by maintaining the changes made in a separate model from the current state of data.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset