1

I believe this forum is the correct one for my question, based on the community guidelines but let me know if not. It seemed like the best fit.

I am performing a data modeling exercise to formulate a conceptual model of our business to support our requirements. An obvious step is discovering the entities.

My challenge is deciding if some of the concepts that exist in our existing tools are entities in their own right, and if they are, should we make a distinction between them. Or are they simply implementation details.

An example:

  • An old fashioned newspaper would likely have entities like Publication, Article, Advert, Editor, Writer, Copy, Photograph etc etc.
  • Now 50 years ago the technology to print those things would have included things like a matrix and castings
  • Today perhaps it's a different technology with master plates

My question is are these implementation things (magazines and castings, or master plates depending on when you did the exercise) true entities or not?

They're clearly conceptual things. But they are also implementation details that can and will change over time (whereas articles and adverts are stable entities)

Our problem space is content management, and so the question becomes are implementation "things" in the CMS (like a "copy-asset") entities in the same way that webpages and footnotes are? Clearly if we replace the CMS those things won't exist anymore, but we'll still have webpages and footnotes!

---------- Additional Edits adding more context based on provided answers -----------

As mentioned context is key so adding additional detail to respond to some of the thoughtfully provided answers below.

  • The problem space is a content management software for publications.
  • There is a preexisting custom built (and dated) legacy system that does little to abstract the technical implementation from the users (content managers). For instance there is no concept of "Article" in the CMS, but there are things like "copy_asset", "document_segment", and "document" (which go as far as surfacing keys to the users) that the users must piece together together to form an article, or an advert, or an... etc etc
  • We are planning to replace the existing CMS as it's overly complicated (people spend more time working around the nuances of the tool than actually managing content), inefficient and error prone.
  • As part of our initial analysis we are modeling the existing workflow and "What" is done (free of "Who" and "How").
  • Subsequent modeling will then identify "Who" and "How" those processes are performed today. Which we'll analyze before moving on to designing the future desired state.
  • To support that workflow modeling exercise we are working towards identifying the data model related to this work (not written down anywhere today)
  • And now to the question. And where I'm going back and forth. If we were to simply come up with a Conceptual Model from scratch based on "What" the business does (which in my mind reflects the entities that are truly important to the business) I wouldn't expect to see entities like "copy_asset", or "document_segment" as they are implementation details that this particular CMS happens to surface to the users. For instance if we happened to use a different CMS they would definitely not appear. But when we ask the users "What are the things you work with" then of course they raised these. And of course the things they implement, like "Article", "Advert", "Headline", "Byline" etc were raised as well.
  • Essentially our (legacy) Physical Model — which was created years ago — does not align very well with the Conceptual Model of what the business does (hence why the tool is complicated to use)
  • So the question becomes should we exclude these technical elements ("copy_asset", "document_segment") from our Conceptual Model? Which will then high-light the discrepancy between Conceptual and Physical model which we'll need to address.
  • Or perhaps I'm overcomplicating this and I should just create the AS-IS data model and then, as with the workflows, come up with a "TO-BE" data model with entities that better reflect the true work that is done.
5
  • 1
    What is is in the eye of the beholder. What is relevant is what a user wants to ask about. You don't make that clear. A retail user interacting is different than a helpdesk employee is different from an admin is different from a programmer. But where & how are you stuck following what information modelling & DB design method form what reference/textbook? PS ER modelling makes arbitrary distinctions between what is an entity & what is just a value. Best design approaches are the "fact based". Eg ORM2 & its ancestor NIAM (when properly presented, with n-ary facts).
    – philipxy
    Commented Sep 23, 2023 at 21:40
  • Your problem space is content management, so are you building software to manage content for a print publication? Commented Sep 24, 2023 at 14:11
  • Yes it is. I provided additional context to the question to better reflect this.
    – Steven
    Commented Sep 24, 2023 at 17:48
  • 1
    @Steven what will this conceptual model be used for, when and by whom? Models are not made to capture some kind of philosophical essence of things but to serve a purpose in a project :) Commented Sep 25, 2023 at 7:54
  • 1
    The conceptual model or the data model will be used, in conjunction with the To-Be workflow models, as key documents to inform and support requirements for the new CMS. As such it will be used by the owners of the Product and Development teams. The reason for my focus on the philosophical essence, as you stated it, is to get to the heart of what the users and business really do, and not cloud it with implementation details from a dated platform.
    – Steven
    Commented Sep 26, 2023 at 6:10

5 Answers 5

9

Your original question gave the impression you were trying to model entities without knowing how they will be used. That's a classic road to hell, paved with good intentions.

Data modeling requires context. The same entity can be modeled from several different perspectives, on different levels of abstractions and with different ways of generalizations. Use cases are the litmus test to validate if you modeled the "right" entities, the ones which are required in your system, and also if you modeled them on a sensible level of abstraction. That's why entity modeling and use case modeling should go hand-in-hand.

In my experience it is most successful to develop a system in iterations:

  • find and describe a use case
  • model a feature which supports the use case (which may include new entities, or changes to existing entities)
  • implement the feature in an application (and correct the errors made in the two former steps)

Doing only data modeling alone only works to a certain degree. But without a way of regular making a "reality check", this can easily become a mere mind game.


Now some words to your edited question: it seems you have your business context and probably some use cases, though your original question and specifially the examples you gave did not explain your issues well. My main recommendation here is still to design your entities iteratively as as you go, and implement parts of your new CMS, so you can validate you new data model.

In your specific case, it seem one of your issues that user's may have to deal with parts of an article, things which are currently named "document_segment" or "copy_asset". Your real problem here is that your user's are currently trained to use those old terms, though you expect those terms will be not required in the new system.

My approach here would be scetch and implement (!) some UI prototype of the new CMS, maybe without any "document_segment, and then only model the entities you need for this. When this works so far, add more use case and features and extend your business model accordingly. Maybe later you will reach the point where you need an entity for something like "part of an article", maybe not, I don't know, but when you reach that point, I am pretty sure you can easily work out how to name it and how it needs to look like.

2
  • Thanks. I added additional context to help clarify my original, too vague, question. As you can see data modeling is not the sole activity, but supporting a workflow modeling and redesign effort.
    – Steven
    Commented Sep 24, 2023 at 17:49
  • +1 In some cases, it may make sense to implement something resembling the feature in a quick spike solution first, to demonstrate the usefulness of the modeling before going any further down that path. For example, if you're developing a new database-backed application, it's sometimes useful to write and test a few key queries as a standalone experiment to see how the data model works for some of your most important planned use cases before trying to implement the feature "for real." Commented Sep 24, 2023 at 19:41
3

An obvious step is discovering the entities. [...] They're clearly conceptual things. But they are also implementation details that can and will change over time (whereas articles and adverts are stable entities)

"Discovering" is completely the wrong way to think about it.

Concepts are not generally discovered, like Egyptian tombs or footings of long-lost buildings. They exist as a result of a process of design - a human cognitive process which occurs constantly.

Any existing concepts in a business are designed to fit the perceived operational needs at the current time - largely the needs of organising the people involved and allowing them to communicate with one another.

Those concepts aren't intended to be stable for all time, and many concepts which may be necessary for computer programming may not exist when handled by human staff (because the staff achieve their work differently than how the computer will work), or will be fully implicit (unacknowledged, and having no familiar terminology).

Also, existing conceptualisations are not always particularly sound even in their own terms. People can leave a lot of bridges to be crossed only when they come to them, which of course is inconsistent with programming a computer to (as much as possible) make progress automatically and independently from human intervention.

So any kind of computerisation often involves the developer performing a redesign of any existing system of concepts.

Even if you accept that things like "articles" and "adverts" will always exist, different incarnations over time may be unrecognisable. A radio advert is nothing like a paper advert in form - the only nexus is the shared purpose of inducing commercial demand. Computer programming consists only of specifying concrete forms of data storage and physical processing, not expressing the human purposes for which that storage and processing occurs.

A paper advert and a radio advert - despite the same name - involve forms of physical storage and processing that are completely different in every possible respect.

It's a sad fact that this relationship between human purposes, human conceptualisations, concepts expressed through natural language, concepts expressed in programming languages, and the true nature and role of computers and software in human affairs, are poorly studied and articulated, and it's why any kind of "modelling" in software remains a craft activity.

2

Generally speaking, an "Entity" is some set of data that you give a name and an Id and save multiple of to your database.

So to use your newspaper example "Article" is a good entity, it has a set of related data, author, text, date and you have multiple of them, 1,2,3, etc

Now a printing press, could also be an entity, say you have 10 of them different makes and models etc. But it's more likely to translate into an object with methods eg. Printer.Print(article)

If you are doing classic OOP it can get confusing as you would be pushed to write Article.Print() but, dependency injection comes to save you with :

class Article
{
   string Id
   string Words
   IPrinter printer;

   void Print()
   {
      printer.Print(Words);
   }
}

And you have now clearly separated the two again.

Entities tend to be more stable than processes. You might add a new field now and again but the paper from 100 years ago still had articles and authors whereas the processes around publishing them change on a weekly basis

  • We should allow people to pay for big 'articles' which are mainly a picture of cigarettes or washing powder!
  • Articles should be spell checked!
  • I want to print articles in columns!
  • Don't print the big picture articles when the user is paying us a subscription!

etc

The article stays much the same but IPrinter has multiple implementations

------------- Post Question Edit -----------------

Given the above, I'd be surprised if the data model has changed much, you still have to save the same underlying document so I would expect the old objects to map to equivalents in your new one. Maybe document_segment maps to Paragraph, maybe they all just join together into one Article, maybe copy-asset is an Article with status = "ready for publish" etc

In any case you should create either an "Anti Corruption Layer" to hide the old implementation and prevent it seeping into your new design, OR a converter which will take all the old data and save it in the new format.

1

This boils down to a question of semantics.

If [thing] is not particularly part of your codebase but it is relevant for the persistence technology that your codebase uses, is it therefore an entity or is it part of the persistence tech library?

Furthermore, where do we draw the line? If the persistence technology somehow forced us to model our entities (i.e. the things we can definitely agree on as entities) in a certain way, does that now make the entity less of an entity? Is there a ratio of entity vs tech library implementation that can shift where this class belongs?

I think these questions are philosophical and hard to productively answer. I would offer the following conclusion:

  • Yes, we can acknowledge that there is an other side to the spectrum where a particular implementation detail is not necessarily an "entity" anymore. For the purpose of this answer I'll call them "tech models".
  • Let's not get sidetracked by trying to draw an arbitrary line on that spectrum.
  • As a baseline, it's fine for these tech models to live with the entities.
  • If there is value to abstracting these tech models, e.g. because you have multiple codebases that implement the same persistence technology and you want to be able to lean on some reusability here, it would be perfectly acceptable to migrate these tech models to some kind of library/package that can be shared between your codebases.
    • Entities should generally not migrate towards these kinds of libraries/packages since entities are defined by the codebase they belong to.
    • The exercise as to which classes get to move to libraries/packages and which don't (i.e. when they're on the spectrum but not on either clearly delineated end) should be judged on a more concrete case-by-case approach instead of trying to divine a general rule that fits every possible scenario.

That's where I'd leave it, for the sake of productivity.

1

Are technical concepts within a tool that implement a business entity entities as well?

Since the model you're designing defines a shared language between product owners and devs, no, these things shouldn't be in there.

If the domain experts don't understand the model, there is something wrong with the model.

Trying to align the vocabulary used in a purely technical model (code, technical documentation) with the real language of the business is also a good thing, although not always feasible. Devs constantly translating back and forth between domain concepts and their equivalent in tech lingo is unproductive and prone to leaks that will muddle collaboration with domain experts and affect user experience.

Not the answer you're looking for? Browse other questions tagged or ask your own question.