All Questions
10
questions
7
votes
5
answers
376
views
What can I do to get a message processor to slow down the rate of writes that it is making to a database?
We have this architecture:
queue -> message processor (horizontal scaling) -> RDBMS
Sometimes external systems dump 10k messages onto the queue and the message processor of course dutifully ...
-1
votes
1
answer
190
views
Modeling a CSV file: What is the standard? Python or SQL?
I have a wide CSV file of about 350mb, and want to load it into a SQL database and properly model the data to make it easier to use for analysis.
I could split the data into tables with python and ...
0
votes
3
answers
221
views
In data engineering, why is data integrity checked on the DW rather than on the data sources?
I'm a software developer and new to data engineering, so this may be a newbie question, but I'm wondering why data integrity checks (for instance, dbt tests) are ran on the data warehouse, rather than ...
2
votes
2
answers
509
views
Reading a large CSV file and then loading data to a DB
I have a Django application of 2 GB running and I need to receive a CSV file of more than 1 GB, read it and load the data to a PostgreSQL DB in IBM Cloud. The problem is that if I receive the file, it ...
1
vote
2
answers
300
views
How to handle manual corrections to data in ETL pipeline
We receive product data from vendors on a regular basis to be incorporated into our catalog. The data looks like this:
[
{
id: 123,
collection: Spring,
name: New Beginnings,
size: 8,
price:...
2
votes
0
answers
393
views
Data pipeline architecture: airflow triggered by message broker
Let us say we have:
a web app with a Postgres DB that produces data over time,
another DB optimized for analytics that we would like to populate over time.
My goal is to build and monitor an ETL ...
1
vote
1
answer
1k
views
How/when to normalize during ETL?
Let's say you're loading a denormalized flat file of purchase transactions that looks like this:
| location_name | location_zip | product | product_price |
|---------------|--------------|---------|--...
1
vote
1
answer
1k
views
How to implement ETL with MySQL?
I have a legacy MySQL Database (A), and a new reviewed structure for MySQL data base (B).
Problem number one is that Database has to be alive and keeps receiving data from legacy apps.
What I need is ...
3
votes
1
answer
1k
views
How is one or more aggregate function implemented in most SQL engines?
In the book Database Fundamentals, Silberschatz. It is explained that aggregate functions can be calculated on the march.
This make sense. What it means is that for calculating the maximun, average ...
3
votes
1
answer
221
views
Enterprise Wide Keys [closed]
I have for a long time been working on an ODS as well as Data Warehouse. Both are integrating a wide variety of data sources from stove pipe applications. One of the uses of the ODS is to provide ...