Questions tagged [etl]
Extract, Transform, Load - process in a database
53
questions
7
votes
5
answers
371
views
What can I do to get a message processor to slow down the rate of writes that it is making to a database?
We have this architecture:
queue -> message processor (horizontal scaling) -> RDBMS
Sometimes external systems dump 10k messages onto the queue and the message processor of course dutifully ...
-1
votes
1
answer
190
views
Modeling a CSV file: What is the standard? Python or SQL?
I have a wide CSV file of about 350mb, and want to load it into a SQL database and properly model the data to make it easier to use for analysis.
I could split the data into tables with python and ...
0
votes
3
answers
219
views
In data engineering, why is data integrity checked on the DW rather than on the data sources?
I'm a software developer and new to data engineering, so this may be a newbie question, but I'm wondering why data integrity checks (for instance, dbt tests) are ran on the data warehouse, rather than ...
1
vote
1
answer
121
views
Data pipeline design - robust and resilient to future variations
I need to build a data pipeline to populate a database from various files. This is a common scenario. However, I want to have expert opinions for implementing a pipeline that is robust, modular and ...
0
votes
1
answer
83
views
Better design for a REST import into web store
I have an import that needs to grab data from a REST service and import into an web store. It's basically an ETL type of service, but because the REST service can be slow and I don't want to call it ...
1
vote
1
answer
444
views
Is microservice approach always best fit for ETL processes?
In our project we are using Django and Django Rest Framework as main application to get/query the data from database and send it to the frontend. Those endpoints are very fast as they should be. ...
2
votes
2
answers
506
views
Reading a large CSV file and then loading data to a DB
I have a Django application of 2 GB running and I need to receive a CSV file of more than 1 GB, read it and load the data to a PostgreSQL DB in IBM Cloud. The problem is that if I receive the file, it ...
0
votes
1
answer
72
views
Running ad hoc queries on JSON log files
I have a situation where let's say I have a folder called logs which has N folders.
Each folder contains events for a specific event type and each folder has N .log files where each file has multiple ...
-1
votes
1
answer
548
views
Agile approach in ETL/ELT development
What are the pros and cons of using agile/iterative approach in ETL/ELT (Extract Transform Load or Extract Load Transform) data warehouses/data lakes/lakehouses systems development?
I often find that ...
-2
votes
2
answers
401
views
What happens after the ETL process?
I have thousands of .csv files with the same structure and, in most of the cases, some column values are the same ones recurring. Each file represents a report on some structures, with numeric ...
4
votes
2
answers
116
views
Designing an ETL with where there are a few points of entry
I'm trying to think of a scalable solution for my current system.
The current system is
3 microscopes
1 processing machine
1. 60-100GB Files come from 2-3 microscopes every 30 minutes
2. That data ...
-1
votes
1
answer
33
views
Duplicating API implementations for declaring intention
I'm developing an ETL process in Python and Pandas to pull data from a rest API, and then dump it into a relational database. A few of the fields that come back contain sensitive that I do not want to ...
1
vote
2
answers
296
views
How to handle manual corrections to data in ETL pipeline
We receive product data from vendors on a regular basis to be incorporated into our catalog. The data looks like this:
[
{
id: 123,
collection: Spring,
name: New Beginnings,
size: 8,
price:...
0
votes
1
answer
38
views
Is there any general guidelines to allocate table space quota to different layers in ETL?
I am looking for any general guidelines to allocate table space quota to different layers/schemas in ETL flow of a data warehouse (% of total space in each layer).
As per my research, ETL flow can ...
0
votes
1
answer
259
views
Do Data Warehouse standards allow foreign key constraints at a dimensional model?
Is it true that we never enable foreign key constraints in the dimensional model of a data warehouse? If yes, then what is the rationale behind that?
As per my research:
Some experts told me in a ...