By far one of the best i have seen on data mapping and stages of ETL...Thanks Ben! Keep them coming!!
Awesome explanation. Thank you. I have lots of questions about this topic. How do you handle additions of new data? Do the raw and stage databases get cleared out each time new data is added to the data warehouse? Also how do you handle changes in ETL logic over time? How would you handle a situation where a portion of historical data needs to be reloaded into the data warehouse possibly using new ETL logic? Should it always be possible to recreate the entire data warehouse from the flat files? If so how do you ensure that the current state of the data warehouse is the same as it would be if it were blasted away and recreated from scratch? Are there any strategies for version control of a data warehouse?
You’re a natural teacher. Great vid!
Very interesting video, with a lot of ideas. Because of the title, the content of the video is bit different, I thought it was more focused in presenting the "players" to construct an ETL. The idea of raw data seems to come more from the operational database than from the flat files (csv, etc), and after the cleaning/staging/mapping comes the Stage DB and then the Data warehouse. I can't wait to see part 2 !
Is a Part 2 coming? I could not find it. Very good vid!
No part 2 ? What about the logging and error tracking video? Really nice video, thank you very much
Did you ever done some data quality monitoriing system and send notifications for loading issues.? It's help to track issues on every level of loading.
Great explanation!
Would be great if you could make this more specific with something like DBT or prefect/dagster/airflow involved.
Thank you so much. I needed this for an interview I have coming up. I need some formal concepts for what I am doing at a current job where things aren't exactly referred to in these ways.
As an financial auditor I want to extract data from our clients database and then manipulate it to have auditiformation. Is learning SQL language the best thing to do? Like to hear form you.
thanks for this
Wish you made more content like this
Hi there. Got a question on raw data (flatfiles). These doesn't have identities or keys so you formulate a candidate key by combining some columns (product, location, target_year). Here's the question, if there are some columns to correct and it belongs to the combination of candidate keys, how can the data be corrected or updated? What approach is to be made ?
Great Video! I have one clarifying question. Is there a need to create csv, xml etc files from operational DB and then load the data into Raw DB. Will it not be easier and efficient to simply load the data from operational DB to Raw DB without creating any files in between?
Where slowly changing dimension process should be done, in staging or DW ?
What about removing nulls and malformed entries -- would you recommend doing that prior to Staging or afterwards?
Very nice but could you provide more videos on ETL pls thanks
Hi i have few questions related to this topic.
@abnuru1784