Database modeling, software development, and version control

Question

Here is the framework

We have an SQL database in which our database model lives in. On the other side, there is a bunch of code using and filling that database.

Here is what we want

We want to put both, the database model and the software code, under version control to be able to go forth and back within different versions.
Since our database model becomes nasty, we want a graphical tool that assists us with the logical database modelling and that creates our SQL DLL scripts

This post is about the question how to get both of these requirements satisfied at the same time. Personally, I tend to skip the requirement for the graphical tool and to simply write our hole database model within SQL scripts and add them to git. However, its very hard to maintain the model as the model becomes more complicated and (easy) logical changes can be huge code changes.

Here is our current "solution"

Our software code is under version control with git. Our logical database model on the other hand is maintained with PowerDesigner, which brings (with some limitations) its own version control. At the end of the day, we generate our SQL DDL statements from the logical model.

Here are our problems with the current situation

Eventhough the code is under version control, the development of the code is somehow disconnected from the development of the database model. A change of an object in the logical database model (like the change of a column name) is not visible in git. Even if we checkout a certain version in git, an additional mapping is needed that tells us which version of the logical database model I have to use. Moreover, branches in git cannot be mapped to branches within PowerDesigner, which makes development chaotic.

Here are some unification-ideas we came up with

Exporting the logical model from PowerDesigner as (XML) and put it under version control in git.

Here, I know which version of the code relates to which version of the PowerDesigner, since I simply can import the XML back. However, the git diffs on the XML are useless and merging to different logical models in git is hopeless. Thus, there has to be a merge within git and within PowerDesigner. Moreover, I have to export the model to an XML for every commit that involves changes on the database model (which are unfortunately quite frequent) and which slows down the development process a lot. Moreover, bugs are hard to track, because I have to search my code with git and my logical model within Powerdesigner seperately.

Put the database scripts generated by PowerDesigner under version control.

Here, the git diffs have a meaning, and I can relate changes of the code precisely to changes of the logical model. But, I cannot go back a version within PowerDesigner just with the generated database scripts. So, again, I need to export the XML of the model as well and I get the same problems than above.

Here are my questions

How do you connect the world of code, the world of logical database modeling, and the world of version-control?
Is it feasible for complex databases to quit on a graphical tool and to code everything with SQL scripts?
What are good working models to maintain these two repositories and to coordinate the developers?

How big is (or would be) your database? dozen of thousand rows and a few megabytes, or billions of rows and many gigabytes? — Basile Starynkevitch, Commented Feb 11, 2017 at 13:49
@BasileStarynkevitch: We have two types of tables: Tables that contain control information for the application and tables that contain the "real" data. The control tables are rather small, but the data tables will have several gigabytes. — Tobias Windisch, Commented Feb 13, 2017 at 21:03

Joppe · Accepted Answer · 2017-02-11 13:43:26Z

You should look into database migrations, it's a common strategy used on databases.

Basically you associate your codebase with a database schema version. To keep your schema up to date you maintain a chain of incremental change scripts each carrying a sequential version number.

This has the added advantage that you can use the scripts to upgrade an existing database with user data in it (your SQL create scripts will not work there).

You include the 'delta' migrations scripts in you version control system, along side your codebase. When someone gets the latest version they will also get all the scripts they need to execute to make their database current.

You can sometimes still use your visual tooling if you can somehow extract diff scripts, there are software tools that can help you with this.

see for example: https://martinfowler.com/articles/evodb.html#AllDatabaseChangesAreMigrations

Robert Baron · Accepted Answer · 2017-02-11 15:22:22Z

2

I would definitely store the SQL used to build the data without needing the tool to get the XML in the code respository. That way everything that is needed to build the system is in one place. This SQL should contain the database structure plus any data that needs to be there in order for the application to actually work.

I have seen very few logical database models. I have constructed ER diagrams from the actual database (reverse engineer). This is usually done while I am learning what the relationships are within the database, and no one has seen an ER of the database in years. It usually uncovers both inconsistent use and just simply bad design.

Let us know what you finally do.

answered Feb 11, 2017 at 15:22

Robert Baron

1,1327 silver badges10 bronze badges

Interesting idea to extract the ER diagram on-demand from the database. Do you know of any good strategies to keep the SQL files overseeable (like everything related to one table in one file etc)?
– Tobias Windisch
Commented Feb 13, 2017 at 21:10
1

@Tobias to facilitate easy reconstruction of a useable datamodel, using constranints and a somewhat consistent naming convention are more helpful.
– Robert Baron
Commented Feb 14, 2017 at 12:07

Add a comment |

Basile Starynkevitch · Accepted Answer · 2017-02-11 22:08:35Z

If your database is reasonably small (e.g. less than a megabyte, or a few dozen of megabytes) you might consider dumping it in SQL format (e.g. with sqlite3 yourdb.sqlite .dump > yourdb.sql if using an SQLite database yourdb.sqlite) and version control that yourdb.sql textual dump file.

If you want to version only the database schema (without table contents), you can dump only that (in SQL format) and version control the output (e.g. of sqlite3 yourdb.sqlite .fullschema if using an SQLite database). Then, it is very likely to fit into a reasonable size.

You could even automatize that (dumping at pre-commit, loading at post-merge), using git hooks. (Look at my old-melt-monitor github project for an example).

Alternatively, if your code just needs some tables to be created (but no real data in it), you might consider having some initialization routine (like my create_tables_for_dump_mom routine) doing some CREATE TABLE IF NOT EXISTS SQL requests (and other similar CREATE "if not exists" requests for indexes, etc...) . Then the database schema (or database model) is in your code (as literal strings). You might add to that the few tables -and their contents- keeping nearly non-changing information (e.g. error messages).

I think you mean to have the SQL used to build the tables and constraints (DDL) and some of the data tables (tables used to define status codes, text for error messages ... ), and not the complete data set as this would be expected to change between versions. — Robert Baron, Commented Feb 11, 2017 at 15:30

Gijs Bürmann · Accepted Answer · 2020-01-07 14:58:31Z

I recommend to not version control DDL scripts, instead version control Logical Data Model (LDM) + Entity Relationship Diagrams (ERD's) Build/buy a DDL generator that compares an 'as is' Physical Data Model (PDM) deployed on a DTAP instance, with the 'to be' PDM that is automatically derived from the version controlled 'to be' LDM (identified by a specific release number). The DDL-generator then automatically generates diff-DDL and deployes this diff-DDL automatically to close the gap between 'as is' and 'to be as of release xyz'. Similar approach (automatic software generation + automatic deployment) can be implemented to build and maintain LDM specific RESTful/GraphGL web services + Swagger definitions and UI HTML components/base building blocks like LDM specific form, grid/table, chart, widget etc. all derived form the same version controlled LDM

Stack Exchange Network

Database modeling, software development, and version control

4 Answers 4

Not the answer you're looking for? Browse other questions tagged
version-control
software
database-development
or ask your own question.

Linked

Hot Network Questions

Database modeling, software development, and version control

4 Answers 4

Not the answer you're looking for? Browse other questions tagged version-controlsoftwaredatabase-development or ask your own question.

Linked

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
version-control
software
database-development
or ask your own question.