0

I am building an application that requires a lot of tables in one database and while the joining and aggregation of data is really nice and seamless, I am starting to wonder if I am creating too many tables in one database rather than organizing better by creating multiple databases.

  1. Is this standard practice in enterprise level companies?
  2. How do you usually join data from two different databases if its normal to do so?
  3. Does this cause latency issues?

Any help or guidance would help,

2 Answers 2

3
  1. Is this standard practice in enterprise level companies?

Typically the natively generated data of an application doesn't span multiple databases, especially an enterprise level application, unless there's a special use case for doing so. For example, some ERP (Enterprise Resource Planning) systems store files (such as PDF attachments) in the database directly as opposed to in a file share on disk. As a way to mitigate imploding the size of the main database for the application, they'll store those files in a special separate database. Storing files in the database is generally a bad design choice to begin with, but that's besides the point of your question.

There's nothing terrible about storing the data between multiple databases, it's just there's not much benefit to be gained from doing it, and there are a few drawbacks, besides it being a bit awkward of a design. Also there are alternative solutions.

Drawbacks

  1. Cross-database security can be a bit of a pain to manage, depending on the business needs.
  2. Referential integrity goes out the window a bit, since a foreign key can't be created across databases.
  3. Backups can become a little bit more of a pain to manage if the application depends on consistent data across the multiple databases.

Alternative Solution

Use schemas within the same single database as way to organize the data objects better and also improve security management at a more granular level. Backups are now atomic as well, since everything lives under a single database.

Schemas are a great way to categorize related data objects together. For example, in an ERP system, there might be application modules and screens for Payroll, Production, and Sales. In the database layer, each of those domains could be its own schema in the same ERP application database. Then your tables could look like Payroll.Employees, Payroll.Timecards, Production.Products, Production.Components, Sales.SalesOrders, Sales.SalesLines, and Sales.Customers, etc.

  1. How do you usually join data from two different databases if its normal to do so?

All the previous being said, sometimes you'll have a use case to join data together from multiple databases anyway, such as when you're pulling in other systems' data together into a single place to be referenced from your application. When that is the case, what I like to do first is create a separate schema for each source database, e.g. SourceDatabaseA, SourceDatabaseB, etc, in my application's database. Then create views in those schemas that do a cross-database reference to the object from the correlating source database.

For example:

-- This view makes a cross-database reference to the native source database
CREATE VIEW SourceDatabaseA.SomeDataObject
AS 

SELECT
    Field1,
    Field2,
    Field3
FROM SourceDatabaseA.dbo.SomeDataObject;

This gives a layer of abstraction to normalize the names and / or data types from the source database, and inject any global logic. Finally I'll create a schema that combines the multiple source views together, with a representative name of the domain for what these objects belong to. E.g. again if I was pulling in the Sales data from multiple applications' databases then I'd have a Sales schema and create a view that unions all the sources together in this schema like so:

-- This view references only other data objects (views) that live in my application's database
CREATE VIEW Sales.SalesOrders
AS 

SELECT
    Field1,
    Field2,
    Field3
FROM SourceDatabaseA.SomeDataObject

UNION ALL

SELECT
    Field1,
    Field2,
    Field3
FROM SourceDatabaseB.SomeDataObject;
  1. Does this cause latency issues?

No, cross-database queries, or data objects that reference cross-database objects don't incur any additional measurable overhead / latency.

-3

Depends on what kind of output you want/are expecting.

this idea might be useful to get general insights or reports or simply data cleaning.

0

Not the answer you're looking for? Browse other questions tagged or ask your own question.