-2

I have an issue where I have multiple databases from previous projects that I would like to combine into one large database. These databases are stored in .sql files. The issue is that I only need specific columns and information and I'd like to filter out any columns that aren't the ones I need. Some of the databases are MySQL and others are Postgres..

A friend suggested I write a parser for all these files and parse each one and then add all of them to my PostgreSQL database, but I really don't know how I'd do something like this. I've tried looking for npm packages that do this for you but they just arent exactly what I'm looking for and I doubt I would be able to get very far with writing my own parser.

Could someone maybe give me a push in the right direction and maybe help out with an algorithm of doing something like this? Or maybe there's already something that does this for you..? Sorry for the odd question I'm very lost and in need of support.

7
  • 1
    I'd suggesting importing the .sql files into the appropriate engine (PostgreSQL or MySQL), making the changes you need in SQL, dumping the results, combining the results and then re-importing the new, modified results Commented Aug 4, 2022 at 11:35
  • I would do this but its incredibly inefficient and time consuming as I have multiple of these files (10+)
    – ste
    Commented Aug 4, 2022 at 11:43
  • 5
    Less efficient than writing and debugging a parser? I very much doubt it. Commented Aug 4, 2022 at 11:57
  • 2
    And 10+ SQL files is not a large number of files. Commented Aug 4, 2022 at 12:15
  • Are the schema names all different? Commented Aug 4, 2022 at 12:16

2 Answers 2

3

The easiest way might be to recreate each schema in your new database. Create all of the "old" tables in those schemas, and load them with data. After that, you can write additional SQL scripts that selectively move data from the old schemas and tables into the new schema and tables.

This has a number of advantages:

  1. You can view the imported data before migrating to the new tables.
  2. You can correct the imported data.
  3. You can attempt the migration any number of times. If things go wrong, delete data from the new tables. The old data remains unchanged.

In PostgreSQL, an insert-select statement can be used to bulk insert data from a number of other tables:

insert into newSchema.newTable
select oldColumn1 as newColumn1,
       oldColumn4 as newColumn2,
from oldSchema1.oldTable1
where /* filter query if necessary */
4
  • Theoretically, this should work, but I'm afraid of constraints violations will not make this so easy. Overall if OP is migrating PK and FK from different schemes. I would dare to say OP needs to programme this.
    – Laiv
    Commented Aug 4, 2022 at 12:49
  • @Laiv: that's the part I forgot. Don't put constraints on any of the tables until the data looks good. I'll try updating my answer in a bit. Commented Aug 4, 2022 at 13:08
  • Well, but you will have to enable'em later and it's likely you won't if any conflict arises. Then looking for the exact entry causing the issue is going to be a pain.
    – Laiv
    Commented Aug 4, 2022 at 13:12
  • It's been my experience that data migrations like this are always a pain. The implication here is that you are not blindly taking PK and FK values. The OP is aggregating data from multiple sources. Surrogate key values should be regenerated, if possible. Commented Aug 4, 2022 at 13:14
0

In addition to @Greg Burghardt's answer.

I agree with Greg, to some extent. For example, mimicking the real environment in your local is a must.

The importing data is what I don't think you can solve only with SQL. You need to programme the migration and do it separately for each data source (scheme, database engine, etc). Mainly, because you will have to handle PK and FK conflicts (among other possible constraints and data conflicts) and it's unlikely you can make it with default or generic (all-fit-size) solutions.

To this end, I would look for solutions based on ETLs1. In other words, a way to programme and customize the data extraction from each data source, the dataset transformation and the dataset loading into the new database.

You can code your ETLs or use a product like Node-Red (to mention an open-source tool made and programmable in NodeJS). Node-Red is a Data Flow designer, but I have implemented it for similar purposes to yours, and it worked like a charm.

In my opinion (biased by my own experience in the field), I would code my own ETLs. Nothing sophisticated. Something to do the job and forget.

Implementing products takes time and you have to deal with the learning curve. Leave alone hacking the product to make it compatible with your requirements.


1: Extract, Transform, Load

Not the answer you're looking for? Browse other questions tagged or ask your own question.