1

How do you document the logical data model of a document-oriented database like MongoDB?

For relational databases, Entity-Relationship diagrams or UML class diagrams are used.

What is the practice for NoSQL document databases?

In particular, how do you describe the existence of nested entities versus referenced entities?

Also, how do you highlight the data redundancy that was introduced by design?

7
  • 1
    You can find the answer to your question right here on SO : stackoverflow.com/a/67532753/3723423
    – Christophe
    Commented Mar 31, 2022 at 10:57
  • In that answer, therer are three bullets at the end regarding the document to object mapping strategy: the first is about referred entities, the second about nested entities. If anything’s unclear you could comment under the answer in SO.
    – Christophe
    Commented Mar 31, 2022 at 11:27
  • Thanks for the pointer! It is relevant but it does not answer my question on how to 'draw' (document) those choices regarding the embedding / linking of entities. Basically, how to represent the logical data model in UML.
    – ssn
    Commented Mar 31, 2022 at 17:41
  • 1
    The first sentence of the accepted answer already makes a recommendation: "You can use UML class diagrams to model entities and aggregates of an application domain, regardless of the implementation technology." The last sentence of the first paragraph gives you the reason why: "Objects are stored in the database are kind-of dehydrated (i.e. object data without their behavior) into a document." I'm not sure what else we can recommend. A UML class diagram is how I would do it. You are documenting a data structure. Commented Mar 31, 2022 at 19:20
  • If you want entity relationships, you could use Neo4j, which does have clear relationship definitions, and can store arbitrary labels in nodes rather than as documents. Otherwise, Mongo doesn't really have this concept, but you can still use nested document objects, or ID references to external documents, and those can still be represented in UML aggregation/composition relationships Commented Mar 31, 2022 at 20:46

2 Answers 2

1

Nesting data is an artifact of how data is stored in a document database — the physical data model. Translating that to a logical data model requires you to reorient how you think about the data in terms of relationships rather than "nesting." However you choose to document your logical data model, remember that you are describing relationships between data, and nesting is a kind of relationship. To me, nesting implies ownership. Given that nested data structures are "owned" by their parent structure, I would recommend a UML class diagram showing the aggregate associations.

UML aggregate associations come in two main flavors. One implies "ownership" and the other doesn't. The diamond shape is used to denote which entity has the relationship. A diamond with a solid fill color would probably be most useful in your case, because this shape claims that one entity "owns" another. This is the closest I can think of to a "nested" relationship.

For example, say your "Customer" document is in this structure:

{
    "FirstName": "...",
    "LastName": "...",
    "Email": "...",
    "Purchases": [
        { ... },
        { ... },
        { ... }
    ]
}

The relationship between the parent entity "Customer" and the nested collection "Purchases" could look like this in a UML diagram:

UML diagram showing an aggregate relationship between customer and purchases using a solid fill diamond to denote the nested relationship.

The 1..* notation under the line between the two entities can indicate the multiplicity of that relationship. In this case, 1 customer can have zero or more purchases. The diamond shape is resting against the "Customer" entity, because that is the entity that contains the nested purchases.

For other cases where your data structure only has one object being nested, e.g.

{
    "Person": {
        "Name": "...",
        ...
    },
    ...
}

You would still use the solid fill diamond, but change the multiplicity to indicate that only one "Person" is nested: 1..1. This sort of notation might make the nesting less obvious, but it would translate easier to an object oriented model.

To help clarify the meanings, consider adding a Key to your diagram to describe what this notation means for a document database.

2

Why UML is a good candidate for document modeling

Document databases store collections of data objects that have a dynamic, self-describing structure. MongoDB and many other such databases use JSON or its binary equivalent BSON for data objects.

There is no modeling technique dedicated to document databases. Moreover, traditional ERD techniques, including extended ERD, are not well suited for dynamic data formats.

UML on the other hand is ideally suited for documenting object oriented models, including data object models: it can model specific object instances as well as general classes of objects. It's not well known, but the UML classifier semantic correspond very closely to what you do when trying to group sets of JSON objects that match some specific expectations.

How does it work?

You will find an explanation about mapping document databases to UML classes here on StackOverflow. In addition, some more practical details about referential links and nested objects:

  • Links to references are usually modeled with an association between two classes, where the second class would correspond to the kind of objects you would expect/require at the other end.
  • Nested objects are usually modeled with an UML composition the containing class being at the side of the black diamond, the contained class being at the other end, and with a multiplicity greater than one if there is a collection of nested objects. UML composition tells that if the container is deleted, all the contained objects will be deleted as well.

If you think this might be confusing, you could consider creating a profile in your UML model with some explicit stereotypes for associations, e.g. <<nested>> and <<reference>>.

Logical model?

UML is method neutral. There is no distinction between a conceptual, a logical and a physical model: it's the same UML notation and semantic.

It's up to you to decide if you want one model or three distinct ones, and what level of details you want to include. For example in a logical data model you would represent the classes and the associations. But you would not yet care about implementation details, such as repeating some keys to implement a relation.

5
  • The use of stereotypes in associations is a good suggestion, but how can the aggregate frontiers can be defined in UML? Considering a multi-level nested structure (customer > orders > products), how can the aggregate be defined and associated with the <<nested> associations?
    – ssn
    Commented Apr 1, 2022 at 9:15
  • 1
    @ssn: nested implies ownership. Use a connector between entities that has a solid diamond rather than a diamond with an outline. Don't over complicate this. Commented Apr 1, 2022 at 11:29
  • @GregBurghardt Agree, but I want to be able to express and distinguish from a composition (ownership) that is modeled: 1) as two separate documents or one that is modeled 2) as a single document with nesting. I thing the stereotyped association is the way to go.
    – ssn
    Commented Apr 1, 2022 at 11:38
  • 1
    @ssn composition is my preferred option as I use it very scarcely. But it is true that it is not sufficient, if you intend to use composition in other circumstances (e.g. to clarify ownership and need to cascade deletes across reference links). The stereotype is the the solution you need (I’d by the way use the stereotype on a composition). In most of the cases this is sufficient. However if a class could be used in different situation, once with nesting and once not, it can be difficult to set the nesting boundaries. There is no easy solution, except to work with specializations.
    – Christophe
    Commented Apr 1, 2022 at 15:46
  • 1
    @ssn with this last resort use, I cannot imagine a situation that can’t be dealt with. If you have such a case, as this question is for the general case, you could consider opening a new, more specific question with an example.
    – Christophe
    Commented Apr 1, 2022 at 15:50

Not the answer you're looking for? Browse other questions tagged or ask your own question.