Skip to main content

Questions tagged [etl]

Extract, Transform, Load - process in a database

7 votes
5 answers
371 views

What can I do to get a message processor to slow down the rate of writes that it is making to a database?

We have this architecture: queue -> message processor (horizontal scaling) -> RDBMS Sometimes external systems dump 10k messages onto the queue and the message processor of course dutifully ...
jcollum's user avatar
  • 229
-1 votes
1 answer
190 views

Modeling a CSV file: What is the standard? Python or SQL?

I have a wide CSV file of about 350mb, and want to load it into a SQL database and properly model the data to make it easier to use for analysis. I could split the data into tables with python and ...
HappilyCoding's user avatar
0 votes
3 answers
219 views

In data engineering, why is data integrity checked on the DW rather than on the data sources?

I'm a software developer and new to data engineering, so this may be a newbie question, but I'm wondering why data integrity checks (for instance, dbt tests) are ran on the data warehouse, rather than ...
samdouble's user avatar
  • 243
1 vote
1 answer
121 views

Data pipeline design - robust and resilient to future variations

I need to build a data pipeline to populate a database from various files. This is a common scenario. However, I want to have expert opinions for implementing a pipeline that is robust, modular and ...
Imtiaz's user avatar
  • 23
0 votes
1 answer
83 views

Better design for a REST import into web store

I have an import that needs to grab data from a REST service and import into an web store. It's basically an ETL type of service, but because the REST service can be slow and I don't want to call it ...
user204588's user avatar
1 vote
1 answer
444 views

Is microservice approach always best fit for ETL processes?

In our project we are using Django and Django Rest Framework as main application to get/query the data from database and send it to the frontend. Those endpoints are very fast as they should be. ...
Alex T's user avatar
  • 161
2 votes
2 answers
506 views

Reading a large CSV file and then loading data to a DB

I have a Django application of 2 GB running and I need to receive a CSV file of more than 1 GB, read it and load the data to a PostgreSQL DB in IBM Cloud. The problem is that if I receive the file, it ...
Elvin Quero's user avatar
0 votes
1 answer
72 views

Running ad hoc queries on JSON log files

I have a situation where let's say I have a folder called logs which has N folders. Each folder contains events for a specific event type and each folder has N .log files where each file has multiple ...
Sriram R's user avatar
-1 votes
1 answer
548 views

Agile approach in ETL/ELT development

What are the pros and cons of using agile/iterative approach in ETL/ELT (Extract Transform Load or Extract Load Transform) data warehouses/data lakes/lakehouses systems development? I often find that ...
Eugene Lycenok's user avatar
-2 votes
2 answers
401 views

What happens after the ETL process?

I have thousands of .csv files with the same structure and, in most of the cases, some column values are the same ones recurring. Each file represents a report on some structures, with numeric ...
BoardsOfConsulting's user avatar
4 votes
2 answers
116 views

Designing an ETL with where there are a few points of entry

I'm trying to think of a scalable solution for my current system. The current system is 3 microscopes 1 processing machine 1. 60-100GB Files come from 2-3 microscopes every 30 minutes 2. That data ...
user3145912's user avatar
-1 votes
1 answer
33 views

Duplicating API implementations for declaring intention

I'm developing an ETL process in Python and Pandas to pull data from a rest API, and then dump it into a relational database. A few of the fields that come back contain sensitive that I do not want to ...
ADataGMan's user avatar
  • 181
1 vote
2 answers
296 views

How to handle manual corrections to data in ETL pipeline

We receive product data from vendors on a regular basis to be incorporated into our catalog. The data looks like this: [ { id: 123, collection: Spring, name: New Beginnings, size: 8, price:...
user2468842's user avatar
0 votes
1 answer
38 views

Is there any general guidelines to allocate table space quota to different layers in ETL?

I am looking for any general guidelines to allocate table space quota to different layers/schemas in ETL flow of a data warehouse (% of total space in each layer). As per my research, ETL flow can ...
Curious_Mind's user avatar
0 votes
1 answer
259 views

Do Data Warehouse standards allow foreign key constraints at a dimensional model?

Is it true that we never enable foreign key constraints in the dimensional model of a data warehouse? If yes, then what is the rationale behind that? As per my research: Some experts told me in a ...
Curious_Mind's user avatar

15 30 50 per page