Archive for August, 2015

Web application architecture: dumb entity and smart+fat model

Friday, August 7th, 2015

Subtitle: Entity Repository vs. Model Repository vs. Services

This article is a brain dump of web application architecture design decisions and experience and reasons that have guided to them. This will also be the place I will look up when I have to pick things up where I have left them, after another year passes without any significant web software development.

 

What web application should provide and how should we go about it?

Imagine a new web app you are just starting with. You are starting from scratch, you have some business requirements to meet, and then you are basically on your own. You can go about writing SQL queries directly in HTML templates, or do a little bit of separation of concerns. Currently there are two leading approaches in the wild:

  1. SOA: Service-Oriented Architecture, and
  2. DDD: Domain-Driven Design.

Due to various reasons, one of the main ones agrees with Martin Fowler’s Anemic Domain Model, SOA approach was scrapped in favour of DDD. DDD seems more naturally aligned with how humans think about specific domains. For example, in car analogy: you do not take your car to the EngineStartingService, you turn the ignition key on a car (this corresponds to calling $Car->turnIgnitionKeyOn() method call) and then the car starts itself. Although I must say I believe payable EngineStartingService would be car maker’s wet dream:)

Back to our imaginary web application: app will do something business-specific. Got it. App will have to communicate with its users (or clients) – darn, why the lusers (in BOFH speak)? 🙂 Anyway, our imaginary web app should provide what any modern web application does:

  1. Web access, and
  2. API access based on REST. Even if it does not provide API out-of-the-box, it should not be too hard to bolt it on later on.

This is about it as far as requirements are concerned.

 

Layers of web application

After requirements are set in stone, as they usually are NOT, we need to design general architecture of software we are about to develop. Here I will leave the beautiful meadows of general software architecture design theory and I will start describing how I what I use when I write web application from scratch. Usually I am using a framework of sorts, Symfony or Zend Framework (>=2) being the preferred two, but I try to not integrate with framework too deeply. Therefore I usually design a Core component of application, which interacts with framework via ExternalEnvironmentDriver if it needs to (for example for session storage, DB connection handle, etc).

So, without further ado, here are the layers of web application architecture as I see fit for my needs:

  1. View layer: manages data presentation for the client
  2. Controller layer: this is where clients interact with the application. It takes requests, passes them to application core, takes core responses and translates them into client-viewable responses. For the web part, controllers also do a bit of UI decisions (mostly redirecting). This part must be done cautiously, as it must not interfere with API/REST access. This layer does a bit of actual data validation, but this is not it’s primary occupation.
  3. Model + Model repository layer, also know as business-logic layer: this is where our business/domain logic resides. It provides all intelligent operations, including all data validation. This layer could potentially be replaced with a collection of services, if SOA application architecture design is desired. This part is what I call The Core of application.
  4. Entity + Entity repository layer, also known as data-storage layer: this is our data storage layer. It is separated from model in order to provide simple way of switching storage subsystems.

Let us now dive into each layer.

 

Layer #1: View – data presentation

This layer is responsible for converting controller’s output into appropriate format, according to what client has requested. Same data should be converted in either REST JSON response or used to render a full blown HTML page. Controller should provide all the data that view layer needs.

I must admit I deviate a bit on this point: I really hate controllers to fetch data for view, or at least to do more than point to what data should be returned. It makes them bulkier, as you have to know up front what data the view will need (at least HTML one). Therefore I generally just pass it a list of Models (for list view) or one specific Model (for view of that model’s details) and then the view layer calls model’s data retrieval methods for whatever data it needs to put into the response. This goes against the good practice of separation of concerns, as it enables you to call $Model->update() in template. I refrain myself from doing data manipulation in templates, of course, therefore using passed models only for data retrieval only.

You might ask “why do you do this”? Well, mostly for HTML view, where you need a trivial decision whether you should i.e. display the “Delete” button. The controller’s delete action does the actual checking if deletion is allowed, of course. But do I really want to pollute controller’s display action with if $Model->isDeleteable() { $view->displayDeleteButton = true }? No, I most certainly do not.

 

Layer #2: Controller – user interaction

The controller layer is very slim one. Mostly it takes care of routing requests from clients to appropriate method calls on various models or their respective repositories. The following statements are true for my controllers:

  1. Does controller perform model instantiation? No, that is model repository’s responsibility.
  2. Does controller instantiate data manipulation forms? No, they call model’s methods for that.
  3. Does controller do data validation before CRUD operations? Nope, model does that. Well, this one is a bit of slippery slope, true. Usually I call model to get form (which includes all validation constraints) and then do if ($form->isValid()) check and based on the result of that check, either perform data manipulation or repeat form displaying, but now with added error messages.

Well, what does Controller do then?

  1. For “getList()” requests, controller retrieves requested list of models from appropriate model repository, and passes the list to view.
  2. For “getDetails()” requests, controller retrieves requested model from repository, and passes it to view.
  3. For CUD (note the missing CR from CRUD) operations, it does the following: retrieves appropriate form from corresponding model or model repository (for creation); binds submitted data to this form; asks model (or repo) to verify the form with updated data; if data validation is successful, it calls model’s (repo’s) appropriate method that executes the requested operation.
  4. For HTML view, which is a pre-step to CRUD operations, it retrieves form from model; for Update operation it populates the form with existing model data too; then it passes the form to the view layer.

 

Layer #3: Model and Model Repository – business logic

This is where all of domain-specific logic and processes should happen.

Model repositories are the so-dreaded singletons. Even if they are not singletons, the method of their instantiation and retrieval ensures that only one instance is ever created. And for a good reason, too: model repositories cache instantiated models. Having more than one instance of model would be confusing at best, but more probably it would be disastrous. Therefore, once particular model is instantiated, only reference to it is returned to the requester.

Layer #3 communicates with layer #4 by using a few standard calls:

  • entityRepo:create(newEntity),
  • entityRepo:update(existingEntityWithUpdatedData),
  • entityRepo:delete(thisEntity),
  • and various findBy methods, which may be model/entity specific.

To make code more testable, entity repositories are injected into model repositories/models. This makes them more testable, as you do not need a live database to run a test on a model – a fake entity repository that stores entities in memory will do just fine.

Should a need arise to convert application to SOA architecture, this is where refactoring would begin. It would touch layer #4 too, but not layers #1 and #2.

 

Layer #4: Entity and Entity Repository – data storage

Initially I really wanted this functionality to be merged into Layer #3, as it seemed redundant. However, later it became apparent that decoupling data storage logic from business logic was beneficial in multiple ways. The main aspect is database schema migrations. Since I started using doctrine migrations, I do not touch database more than I really have to, and that is only for its creation. After that, schema is completely defined with entities (for me, preferably, via annotations). Then doctrine:migrations:diff creates the necessary schema diff, which I quickly edit to append data transformation statements. Then followed by doctrine:migrations:migrate, git add and git commit and that is it.

To be completely honest, since entities are so clueless about business part of application, I usually do not configure mappings. I configure indexes on particular columns, yes, but mappings are omitted. One reason is that doctrine fetches related entities instead of field contents. This seems weird for relations that are based on IDs, I agree. But for relations that are based on string values (domain name for example), I prefer retrieving related entities manually. This may or may not be desired by your programming style. For me, it is not, thus entities do not know much about one another.

 

Entity Repository AND Model Repository – WTH?

It took me a while to reach a decision and start using two types of repositories. One reason for this is Doctrine ORM+Migrations. The ORM part not so much, but schema migrations are so well done in Doctrine that I was almost instantly addicted to it. When I started using migrations (and implicitly ORM), it was sooner rather than later that I realised that objects which Doctrine uses as its entities, those objects can not be made smart. Doctrine entities are plain (stupid) PHP objects which can not be taught any business logic at all. They are simply a data containers, and for a good reason (SOA).

Once it became clear that entities will stay stupid, I decided upon a split approach (use entities AND models). It dawned as the most effective approach for separating layers of architecture, but surely this is not RAD-friendly. Funnily enough, this same approach was unknowingly used earlier when I was not yet using Doctrine, but ZF2’s Db abstraction layer.

 

Conclusion

I hope I have explained my preferred application architecture and reasons for choosing it well enough for anyone to understand. Comments and constructive criticism are welcome.

 

Some references for further reading: