0

Right now we have a monolith application, and we will still have one monolith probably for the following years. What we are trying to do though, is to break the responsibilities more clearly in well-defined bounded contexts. For this question, let's say we have two bounded contexts:

  • Company bounded context: in this bounded context we handle companies and their users.
  • Objects bounded context: this is like a CRM, where we handle a ton of different objects, that can be extended following certain rules, etc.

The main thing here is the Objects can be owned, modified, and created by users in a company. So we save that data in the Objects bounded context. Now, the User aggregate has too many things that we do not need to do: like we do not always need to. For instance:

  • To store who created an Object, we just need the user id. Same with who updated it, who is the owner, etc.
  • In some report operations, we do need the email of the user, because instead of showing the user id in the CSV report, we prefer to show the email.

Right now we are sharing the same class in both bounded contexts, using just a direct import. Nonetheless, this has one problem: when testing, we need to create User instances that have too much data that we do not care about, making our tests harder to write, read and maintain.

To fix this, we are thinking about decoupling, but we are not sure who. Our main idea is to just create a repository for users in the Objects bounded context and use a different representation. This repository would not update any data because read-only access to the Users is more than enough. Something like this:

companies/
    domain/
        users/
            user.py # Full User object with all the data
            users_repository_interface.py
    application/
        users/
            users_repository.py # This handles full User objects

----------------------------------------------------------------------------------

objects/
    domain/
        users/
            user.py # minimized version of the user
            users_repository_interface.py
    application/
        users/
            users_repository.py # Kind of an anti-corruption layer, calling the other 
                                # repository under the hood and convert the full user
                                # to the reduced representation of this bounded context

We are not sure if this is the correct approach to decouple this. Maybe the class does not need to be a repository and be called something else, like a connector or something. Any ideas/recommendations on how to approach this?

Note: A justification for this approach is that right now the Objects SQL table has a Foreign Key to Users. We also would like to delete that and just use a normal field like varchar or int to represent the user id, without the foreign key thing. This will make our integration tests in the Objects bounded context way easier to write, read and maintain.

4 Answers 4

1

Right now we are sharing the same class in both bounded contexts, using just a direct import. Nonetheless, this has one problem: when testing, we need to create User instances that have too much data that we do not care about, making our tests harder to write, read and maintain.

As @Ewan writes in another answer, you should not duplicate the User data in other databases/repositories/ARs. And that is especially true if you want to duplicate the data to solve a testing problem.

To solve a problem with dependencies during testing, you should look for a solution within your testing environment.

The easiest solution here is that the User aggregate also offers a factory function that can be used by the test cases to obtain a User instance with reasonable defaults for all fields. If a particular test case needs a specific value in one or more fields, they could either specify that in the call to the factory function, or they can modify the User object afterwards.

1

Unlike Evan, I think that data duplication is a valid solution here.

You are simply creating a local read model that will be updated based on events from the source domain. This is a great way to decouple the domains, as the domain using the local copy does not care where the user data comes from, as long as the events are correct. It also makes debugging and testing easier, as the using domain can be executed without the source domain being run or accessed.

It also gives you way to optimize the read model the domain's specific use case. Instead of having to communicate your specific query and storage needs to the source domain.

1
  • 1
    I would go a bit further. Data can be duplicated, but it doesn't need to be materialized as a new persistent and read-only User entity in Object's domain. User's IDs and user's emails are fields/attributes of Reports, Objects, etc. Worth investigating if the Object context really needs a persistent, read-only User entity. If not, the problem banishes.
    – Laiv
    Commented May 10, 2023 at 7:47
1

In addition to @Euphoric's answer.

and we will still have one monolith probably for the following years

Fine. It doesn't mean you can not think in the long run. For example, approach each bounded context as if they were different modules or libs.

Each module:

  • represents a domain so get rid of the application layer of each module because the notion of application is strange to the domain.
  • the outermost layer is no longer application but is service. Modules purpose domain services as entry points. Think also in factories as @Bart van Ingen Schenau suggests. In a nutshell, provide each module with an API (application programmable interface).

Now

  • Take modules out of the monolith.
  • Make'em standalone source codes. This will highlight the interdependencies among boundaries, forcing you to set abstractions. Think about abstract classes, interfaces, ports, etc. (Sorry, I'm unfamiliar with Python).
  • Test modules apart from each other
  • Add the modules to the monolith as dependencies.

Now your monolith has, at least, 4 layers

  • configuration (setup)
  • infrastructure (adapters and integrations)
  • business (the layer operating with domains)
  • application (the layer allowing clients to operate the business).

The business layer is the one solving the interdependencies among bounded contexts. For example, fetching users' data to complete reports or objects' data.

To store who created an Object, we just need the user id. Same with who updated it, who is the owner, etc.

Good, Object holds a reference, not a whole user. So far so good

In some report operations, we do need the email of the user, because instead of showing the user id in the CSV report, we prefer to show the email.

Good, Report holds a copy of the email. So far so good.

Unless Object domain needs a full and up-to-date representation of User, there's no reason for you to have a persistent, up-to-date and read-only User entity in Object's domain.

Wait, Is not it duplicated data? Yes, it's. Is not it a horrible idea? Not necessarily.

Make yourself the following question: Should you handle changes on User so they are not propagated (magically) to other contexts? For example. Are users' emails in Report immutable?

  • Yes? Keep a copy of the email in Report.
  • No? Keep a copy anyways. Whenever the user's email changes, handle the change and update all the reports. But do it yourself, by code. Allow boundaries to handle changes from external sources so you can make the state of the bounded context to be determinist.
0

The simple answer is no. You shouldn't duplicate the User data in other databases/repositories/ARs

Doing this and having it update with the 'main' user is just asking for trouble. What happens when you change the schema? or delete a user? or do some update that results in a duplicate?

You are allowed to put userIds in other objects as much as you like and not have the FK constraint. This is encouraged as a third party key. But you have to allow for orphaned records.

You are allowed to put user data such as email in other objects. Just because that's part of the user doesn't mean it can't also be part of other objects as well. But don't try and sync them up.

You are allowed to have data lakes/swamps/warehouses with denormalised copies of everything for reporting

3
  • Thanks for your answer! But my problem is not about storing the data, is about the code in the bounded context. Let me know if I should clarify the question more. Commented May 9, 2023 at 22:35
  • Reading between the lines on your question I think you are conflating bounded context objects with the data model and persistence. If I said, 1. you are not allowed to inject repositories into domain objects, and 2. your methods have to go in domain services rather than domain objects. Would "User" still be in both BCs? or would you have CompanyBC.User and ObjectBC.ObjectCreationService(string userId, string idSourceSystem, string email)?
    – Ewan
    Commented May 10, 2023 at 16:36
  • in the second case you can obviously have some program which loads both BCs, takes data from one and calls methods on the other. You can even have two different user objects and a mapper. The question is are you going to try and keep both in sync? If you do then you need some sort of sync service that runs and does the same thing, loads from one context and saves to the other, you can't share the repo/AG between contexts without breaking the boundary
    – Ewan
    Commented May 10, 2023 at 16:40

Not the answer you're looking for? Browse other questions tagged or ask your own question.