Machine Learning in your Data Warehouse using Python
11-03, 11:45–12:25 (America/New_York), Music Box (Room 5411)

Moving data in and out of a warehouse is both tedious and time-consuming. In this talk, we will demonstrate a new approach using the Snowpark Python library. Snowpark for Python is a new interface for Snowflake warehouses with Pythonic access that enables querying DataFrames without having to use SQL strings, using open-source packages, and running your model without moving your data out of the warehouse. We will discuss the framework and showcase how data scientists can design and train a model end-to-end, upload it to a warehouse and append new predictions using notebooks.


Objective: If you are a data scientist that already stores your data in a warehouse, this talk will teach and demonstrate how to run ML models with the new Snowpark Python library. If you are new to warehouse data storage, the demonstration walks through integrating a Snowflake database with a python notebook.

(10-15 mins) Snowpark Overview: We will run through the process of transforming data, training a model, and running the model while keeping all the data in one place. The Snowpark library provides an intuitive API for querying and processing data in a data pipeline.

(15 mins) ML Model Demonstration: The audience will be able to open the notebook and run the code themselves and leave with a more seamless ML workflow utilizing a pipeline in Python.

Thesis: Snowpark speeds up Python-based workflows with seamless access to open source packages and package manager via Anaconda Integration without having to move data.

This talk is for data scientists who have familiarity with data warehouses. A background in writing ML models in Python is recommended, but not necessary, as we will be going over the process from start to finish and providing all the code.


Prior Knowledge Expected

Previous knowledge expected

Megan Lieu is a Data Advocate at Deepnote, where she talks about data science careers, workflows and tools. She also is a thought leader in the data space and writes daily on LinkedIn to an audience of 85k.