by Miguel Angel

Dismembering the monolith. The pT way


Yet another post about how to deal with a monolith. Well, yes, when you’re a tech company whose early stages are built with Ruby on Rails, what would you expect? Like other successful companies that are well-funded after a few years with modest resources, we now deal with new business requirements, system scalability challenges, and a fast growing team. Now is the time for change because I don’t know any monolith capable of holding up such stress.

Let’s take a look at the past. I see a brand new office with a small dev team and a few other employees. Everyone with a goal in mind: create an MVP to survive till next year. The dev team decides to kick off the platform with the technology they know best: Ruby on Rails, of course. Building feature by feature, at a fast, steady pace as one could imagine given a small team and a greenfield… yeah, the good ole days. Four years later, here we are, with years of fast RoR development to survive not the next year but next decade. The situation is different and so are the techniques and technologies applied.

We realized that a new requirement for the product was emerging: it must be built to last. That new requirement came together with tensions observed inherents to a big Rails based software. To mention a few points:

  • Different rhythm of development for different logical components that share the same codebase.
  • Scalability only desired for a subset of components.
  • Undesired single points of failure.
  • Performance issues due to coupling.

All of the issues have something in common: different components with different requirements were put together for the sake of the development speed. The team realized the situation and swear under the SOLID principles not to do it again. And that is how pT entered a new era: the era of microservices. New features are thought with a wider set of technical requirements and put together with similar features or extracted to new isolated components. For example, sending a notification to a user is not coded into the callback of an activerecord model anymore. Instead, it is triggered by a message that is read from a queue that is pushed by an observer over the model. It is easy to see that the complexity has been moved from the code to other places, like infrastructure. Actually we already had a few components already isolated; the ones dealing with external third party services were already using a decoupled queue-based system.

Since for new features the road is clear, how do we deal with the code that is already written (the vast majority)?

We began locating the areas that make the system more rigid, the ones that are preventing us to develop fast, including having high number of bugs or causing stability issues.

I can name some of them:

  • RoR has an awesome model for generating REST APIs. It is almost automatic, with only a few of lines of configuration voilà! a full CRUD API over a model is ready to consume for the outside clients. The benefit turns to be also a drawback; who wants to have the internals of an application exposed as is to outside clients? The technique had its moment years ago and it is still useful for certain situations like exploratory development, but it is hard to maintain over time. Now that the business model is defined and also there is a roadmap for the incoming year, a proper stable API hiding the internals of domain behaviour is what we need and what we are building.

  • Fat activerecord models. A very well known and documented bad practice in RoR that affects almost every RoR applications.

  • Very very fat activerecord models. The bad practice at its best. Not only the behaviour is mixed up with the model, but also different models are packed as a single one mixing behaviour of different logical models with persistence all together into the same activerecord model. Lovely! Ready for a challenging refactor? New abstractions at different levels from the database to javascript must be built. And remember, the development of new features cannot stop as well as the uptime must be 100%

I have to say that it feels very pleasant working with such problems. Not everybody is lucky to work in an organization and belong to a team that recognize the tradeoffs made and the path to follow. Such environment defines what we are and what we do: the problems described and the techniques we use to deal with them describe our current situation. That does not mean that these practices would be successful with a different technology, timing or team. In fact it does not mean that they will be successful in the long run for us. The time is now.

Miguel Angel Fernandez.
Developer at peerTransfer.