Skip to main content

All Questions

Tagged with
7 votes
5 answers
376 views

What can I do to get a message processor to slow down the rate of writes that it is making to a database?

We have this architecture: queue -> message processor (horizontal scaling) -> RDBMS Sometimes external systems dump 10k messages onto the queue and the message processor of course dutifully ...
jcollum's user avatar
  • 229
-1 votes
1 answer
190 views

Modeling a CSV file: What is the standard? Python or SQL?

I have a wide CSV file of about 350mb, and want to load it into a SQL database and properly model the data to make it easier to use for analysis. I could split the data into tables with python and ...
HappilyCoding's user avatar
0 votes
3 answers
221 views

In data engineering, why is data integrity checked on the DW rather than on the data sources?

I'm a software developer and new to data engineering, so this may be a newbie question, but I'm wondering why data integrity checks (for instance, dbt tests) are ran on the data warehouse, rather than ...
samdouble's user avatar
  • 243
2 votes
2 answers
509 views

Reading a large CSV file and then loading data to a DB

I have a Django application of 2 GB running and I need to receive a CSV file of more than 1 GB, read it and load the data to a PostgreSQL DB in IBM Cloud. The problem is that if I receive the file, it ...
Elvin Quero's user avatar
1 vote
2 answers
300 views

How to handle manual corrections to data in ETL pipeline

We receive product data from vendors on a regular basis to be incorporated into our catalog. The data looks like this: [ { id: 123, collection: Spring, name: New Beginnings, size: 8, price:...
user2468842's user avatar
2 votes
0 answers
393 views

Data pipeline architecture: airflow triggered by message broker

Let us say we have: a web app with a Postgres DB that produces data over time, another DB optimized for analytics that we would like to populate over time. My goal is to build and monitor an ETL ...
sunless's user avatar
  • 151
1 vote
1 answer
1k views

How/when to normalize during ETL?

Let's say you're loading a denormalized flat file of purchase transactions that looks like this: | location_name | location_zip | product | product_price | |---------------|--------------|---------|--...
seriestoo2's user avatar
1 vote
1 answer
1k views

How to implement ETL with MySQL?

I have a legacy MySQL Database (A), and a new reviewed structure for MySQL data base (B). Problem number one is that Database has to be alive and keeps receiving data from legacy apps. What I need is ...
koalaok's user avatar
  • 513
3 votes
1 answer
1k views

How is one or more aggregate function implemented in most SQL engines?

In the book Database Fundamentals, Silberschatz. It is explained that aggregate functions can be calculated on the march. This make sense. What it means is that for calculating the maximun, average ...
jgomo3's user avatar
  • 336
3 votes
1 answer
221 views

Enterprise Wide Keys [closed]

I have for a long time been working on an ODS as well as Data Warehouse. Both are integrating a wide variety of data sources from stove pipe applications. One of the uses of the ODS is to provide ...
AaronLS's user avatar
  • 206