Moussa Taifi

Data science platform architect focused on data science productivity, reliability, performance, and cost.

Working on designing and implementing large scale AI products through data collection, analysis, and warehousing.

Passionate about building scalable machine learning pipeline architectures with high business impact.

Aspiring author.

The speaker's profile picture

Sessions

11-02
15:20
40min
Modern Data Pipelines Testing Techniques: A Visual Guide
Moussa Taifi

"Should I just run it in production to see if it works!" is a common starting point for many python data engineers. Don't let it be your end point. Any software product deteriorates rapidly without disciplined testing. However, testing data pipelines is a hellish experience for new data developers. Unfortunately, there are somethings that are only learnt on the job. It is a crude reality that data pipeline testing is one of those fundamental skills that gets glossed over during the training of new data engineers. It can be so much more fun to learn about the latest and greatest python data processing library. But how can a new python data engineer transform a patchwork of scripts, into a well engineered data product? Testing is the cornerstone of any iterative development to reach acceptable confidence in the outputs of any data pipeline. This talk will help with an overview of modern data pipelines testing techniques in a visual and coherent game plan.

Why bother testing data pipelines? Billions of budget dollars regularly rely on the excellence of the data scientists, data engineers, and machine learning engineers behind the countless software data pipelines that inform critical business decisions.

Winter Garden (Room 5412)