Questions tagged [etl]
Extract, Transform, Load - process in a database
53
questions
-3
votes
1
answer
86
views
what is event based data integration? [closed]
Please help me to understand what is event based data integration in simple layman term with some examples?
How it is different from other form of data integration.
Some sample use cases will be ...
-5
votes
1
answer
563
views
Sync local database with remote
My client has a business which work mostly in remote areas where internet felicity is limited, We have a central database and the branches in remote areas need to connect to the central database.
We ...
4
votes
2
answers
995
views
How should a data warehouse be maintained for a quickly changing schema
I am currently in a process of maintaining a data warehouse for a quickly growing start up company. There is a lot of reporting demands from the clients, and this is usually handled by a data ...
2
votes
0
answers
42
views
How to manage scheduled ETL jobs that are time sensitive?
We have some ETL jobs that are scheduled to run every day, and some that are scheduled to run every week via Control-M. These types of jobs tag data with the date the job was run and perform filter ...
1
vote
0
answers
822
views
Parsing a JSON file from S3 using Airflow
I'm new to Airflow and I'm working on a proof of concept. The project is fairly simple... every day some 10,000 JSON files are loaded onto a folder on AWS S3. I have to get each one of them, parse ...
1
vote
1
answer
4k
views
Best practice for sharing code and data between airflow worker nodes?
New to Apache Airflow and curious about how code and data are expected to be used across worker nodes in a multinode airflow setup.
When considering if ETL logic should be in the dags or in separate ...
2
votes
0
answers
393
views
Data pipeline architecture: airflow triggered by message broker
Let us say we have:
a web app with a Postgres DB that produces data over time,
another DB optimized for analytics that we would like to populate over time.
My goal is to build and monitor an ETL ...
2
votes
2
answers
1k
views
Micro-services architecture for Data Ingestion/Transformation pipeline project
I am working on designing a brand new Data Ingestion Pipeline with the Key highlights of the new project are as follows:
Download and Update data to/from SharePoint using SharePoint APIs
Download and ...
1
vote
2
answers
81
views
Should data be pre-processed before being handled by an ETL framework?
So I was discussing coding with an associate of mine at work, and was mentioning how I was working on a project where I'd need to transform the data that was provided into a standardized format before ...
1
vote
1
answer
1k
views
How/when to normalize during ETL?
Let's say you're loading a denormalized flat file of purchase transactions that looks like this:
| location_name | location_zip | product | product_price |
|---------------|--------------|---------|--...
-1
votes
1
answer
279
views
WebApp for ETL with visual mapping - read csv and map it to data model
a few years ago I wrote a python script for reading CSV, handling the headers, filtering data, renaming stuff via RegEx...bascially to do various ETL stuff.
This was done using a exhaustive ...
-1
votes
1
answer
152
views
How to incrementally update value of features in a machine learning pipeline?
I am working on a machine learning pipeline where we have to compute certain measures on streaming data. Every day, new raw data enters our pipeline. To update our features, we have to run an ETL that ...
-3
votes
2
answers
700
views
Why are multiple backends in this system? [closed]
I am trying to understand the architecture of the system described in this patent about aggregating and analyzing confidential data:
https://patents.justia.com/patent/20180089196.
The general ...
0
votes
2
answers
102
views
How to automatically test the result of an ETL tool?
If an ETL tool is being used to move data from an OLTP database into a "business intelligence reporting" database, is there any standard way of automatically testing that the data in the reporting ...
0
votes
1
answer
354
views
Data Integration Design Using Microsoft SSIS
I am working on a data integration project, where I need to extract data from oracle source and load it to XML file. The requirement is to get the list of customers and foreach customer create an xml ...