Reporting database23 April 2014
At peerTransfer we have a database system that has been evolving since the beginning. At first it was a small relational MySQL attached to a Ruby on Rails application. Over time the relational system has grown in size and complexity, evolving into a complex operational system, now accommodating various systems and applications including event-driven storage, document-oriented databases or plain text files.
One day, the product team came up with the idea of retrieving and aggregating data from the different storages and sharing it in a digestible way across the company. That moment was the start of our reporting database.
This article by Martin Fowler describes very accurately the idea behind a reporting database. It is a database with a very different purpose than the operational one. It needs a different schema since it stores derived and aggregated data for read-only purposes.
The following schema represents the structure of our reporting database and how we feed it.
- Tracker, an event-driven component, stores real-time data into the reporting database as well as into Keen.io, a service that help us to track our conversion funnel.
- We run daily batches that aggregate data and dump it to the reporting database.
- Keen.io charts are displayed in our lounge application, that is visible to the whole company.
- The reporting database relational schema is visualised through Tableau, integrated into the lounge as well.
- Also, some co-workers skilled in SQL need to query the reporting database directly.
Over time, we have become very confident in the architecture, it provides lots of benefits that otherwise would be hard to achieve:
- Tailored data for different user profiles
- Normalisation not needed
- Storage is easy to manage
- Decoupled from the core system and operations database
- Easy to extend
- Evolves at a different pace than the rest of the system
Fowler points out some drawbacks in the reporting database architecture, for example keeping it updated or managing real-time data. We use a combination of nightly batch processes, event-driven components and backup storage systems that provides us a moderate easy way to recover data from exceptional system failures.