Aaditya Bhat

Aaditya Bhat is a seasoned engineer with a proven track record of building end-to-end data projects at scale. In his current role as a Senior Data Engineer at Robinhood, Aaditya works with software engineers, product managers, and data scientists.

Prior to joining Robinhood, Aaditya worked as a Data Engineer at Facebook and held various data science roles at Cohen Veterans Network and Apptopia.

  • Self-Service Analytics using LLMs
Akriti Dhasmana
  • The Billion-Request Content Recommendation System Challenge
Aleksander Molak

Aleksander Molak is a Machine Learning Researcher, Educator, Consultant and Author who gained
experience working with Fortune 100, Fortune 500, and Inc. 5000 companies across Europe, the USA,
and Israel, designing and building large scale machine learning systems.

On a mission to democratize causality for businesses and machine learning practitioners, Aleksander is a prolific writer, creator, international speaker and the author of a best selling book Causal Inference and Discovery in Python.

He’s a founder of Lesprie.io a company that provides machine learning trainings for corporate
teams, the leader of CausalPython.io community and the host of the Causal Bandits Podcast

Aleksander has provided workshops , and trainings for companies across industries, including
market leaders like Mercedes Benz innovative disruptors like e:fs TechHub, and more.

  • The Causal Toolbox: A Practical Guide to Causality in Python (For The Perplexed)
Aleksandra Płońska

Lawyer, a graphic designer with a passion for promoting data science tools. Open source enthusiast. From 2019, Executive Director at MLJAR - (the best open-source AutoML available).

  • No fronted, no notebook rewriting, no callbacks. Turning notebook into a web app with Mercury.
Alexander CS Hendorf

Alexander Hendorf runs his own company opotoc providing expertise e.g. for data and artificial intelligence e.g. at the digital excellence consultancy KÖNIGSWEG. He has many years of experience in the practical application, introduction and communication of data and AI-driven strategies and decision-making processes.
Through his commitment as a speaker and chair of various international conferences as PyConDE & PyData Berlin, he is a proven expert in the field of data intelligence. He's been appointed Python Software Foundation and EuroPython fellow for this various contributions. Currently he is sitting board member of Python Software Verband (Germany) and the EuroPython Society (EPS).

  • Ten Years of Community Organizer
Andrea Dobrindt

Andrea is the Competency Lead for AI/ML automation and Generative AI Automation at Data & Technology Transformation Organization at IBM Consulting. She has over 10 years of technical leadership experience managing complex data-centric and AI-driven programs and projects in both the consulting and life science industries. In her current role, she assists clients in Healthcare, Life Sciences, State & Local Government, and Higher Education with their digital transformation journey.

  • Transformative Power: Synthetic Image Generation in Rare Disease Diagnosis
Andy Fundinger

Andy Fundinger is a senior engineer at Bloomberg, where he develops Python applications in the Data Gateway Platform team and supports Python developers throughout the firm through the company's Python Guild. Andy has spoken several times at PyGotham, as well as other conferences such as QCon, PyCaribbean, and EuroPython.

In the past, Andy has worked on private equity and credit risk applications, web services, and virtual worlds. Andy holds a master's degree in engineering from Stevens Institute of Technology.

  • Adventures in not writing tests
Andy Terrel

I lead CUDA Python Product Management, working closely with RAPIDS, Omniverse, and Math Libraries to unify NVIDIA's foundational offering for Python developers and the Python community.

I received my Ph.D. from the University of Chicago in 2010, where Ibuilt domain-specific languages to generate high-performance code for physics simulations with the PETSc and FEniCS projects. After spending a brief time as a research professor at the University of Texas and Texas Advanced Computing Center, I have been a serial startup executive, including a founding team member of Anaconda.

I am a leader in the Python open data science community (PyData). A contributor to Python's scientific computing stack since 2006, I am most notably a co-creator of the popular Dask distributed computing framework, the Conda package manager, and the SymPy symbolic computing library. I was a founder of the NumFOCUS foundation. At NumFOCUS, I served as the president and director, leading the development of programs supporting open-source codes such as Pandas, NumPy, and Jupyter.

  • The Beauty and the Beast: Python on GPUS
Anjani Prasad Atluri

Anjani Prasad Atluri is a Data Scientist in the Sustainability Software division of IBM, focusing on building AI solutions for businesses using remote-sensing Imagery and client related data sets. He graduated with Masters in Data Science from Columbia University in 2022. He has extensive experience working with satellite imagery (several resolutions) and lidar imagery. Currently, he is focused on building and deploying AI solutions in IBM Environment Intelligence Suite for Vegetation Management. Vegetation Management: uses 3D Imagery to determine where a Utility needs to trim trees that are closest to powerlines.

  • How to Empower Utility Vegetation Management: A Blend of AI and LiDAR Data
Ara Ghukasyan

Ara is a Research Software Engineer at Agnostiq Inc. He has a B.Sc. in Math & Physics and a
Ph.D. in Engineering Physics from McMaster University in Hamilton, Ontario. Ara’s interests
include Machine Learning, Physics, and Quantum Computing. In his spare time, he also enjoys
playing guitar and bass.

  • Covalent - A New Paradigm for High Compute Cloud Workloads in the Age of LLM and Generative AI
Avishek Panigrahi

Avishek Panigrahi is the founder and CEO of Logarithm Labs, A YCombinator backed company that helps companies accelerate data efforts.

Prior to Logarithm Labs, Avishek worked at Google on analytics and infrastructure as part of the TPU (Tensor Processing Unit) team. Before that, he worked at Xilinx, Samsung and MIPS on various aspects of hardware design and automation.

He enjoys skiing, camping and low frequency quantitative trading

  • Bad data - anecdotes and examples from the real world
Chris Hoge

Chris Hoge is the Head of Community for HumanSignal, helping to grow the Label Studio community. He has spent over a decade working in open-source machine learning and infrastructure communities, including Apache TVM, Kubernetes, and OpenStack. He has an M.S. in Applied Mathematics from the University of Colorado, where he studied numerical methods for simulating physical systems. He splits his time between the PNW and NYC, where he spends his free time trail running and playing piano.

  • Building an Expert Question/Answer Bot with Open Source Tools and LLMs
Cliff Kerr

Cliff Kerr is a Senior Research Scientist at the Institute for Disease Modeling, part of the Bill & Melinda Gates Foundation, where he works on COVID, HPV, and family planning. Previously, he completed a B.Sc. in neuroscience and a Ph.D. in physics, was a lecturer in scientific computing at the University of Sydney, co-founded two startups (on data analytics and health economics), worked on a DARPA project teaching robots to pick up balls, and developed an algorithm that composes music in real time based on brain activity recordings. He is passionate about architecture, cooking, and promoting equity in global health. He lives in New York.

  • Sciris: Simplifying scientific software in Python
Daniel Goldfarb

Daniel is an engineer at Bloomberg L.P. with experience developing Trading Systems, Risk Analytics, and applications for Financial Analysis of Equities and Fixed Income securities. He holds a Ph.D. in Molecular Biophysics from the University of Virginia, and was a CFA charter holder and member of the Chartered Financial Analyst Institute for more than 10 years. He is the Open Source maintainer of Matplotlib's MPLFINANCE package (https://pypi.org/project/mplfinance/), and the author of McGraw-Hill's "Biophysics Demystified."

  • Adding Your Own Data Apps to JupyterLab
Dharhas Pothina

Dharhas Pothina is the CTO at Quansight where he helps clients wrangle their data using the pydata stack. He leads the development teams for the Nebari, Conda-Store and Ragna open source projects.
His background includes expertise in computational modeling, big data/high performance computing, visualization and geospatial analysis. Prior to his current position he worked for 15 years in state and federal research labs where he led large multi-disciplinary, multi-agency research projects.

He holds a PhD in Civil Engineering and an MS in Aerospace Engineering from the University of Texas at Austin and a BTech in Aerospace Engineering from the Indian Institute of Technology Madras.

Dharhas is passionate about enabling scientists and engineers with tools that let them scale as well as share their analyses, he loves woodworking, photography and teaching his daughters to love science.

  • Taming the toxic python environment on your laptop
  • From RAGs to riches: Build an AI document interrogation app in 30 mins
  • Data of an Unusual Size: A practical guide to analysis and interactive visualization of massive datasets
Dhaval Patel

Dhaval Patel is a full-time YouTuber (Over 870k subscribers on his channel Codebasics). Prior to going full-time on YouTube, he worked as a data engineer at Bloomberg and a software engineer at NVIDIA for over 14 years. He has been a dedicated Python programmer for more than a decade. On his channel, he teaches data science, machine learning, data analytics, etc. He enjoys mentoring data enthusiasts and has spoken at various colleges, podcasts, and meetups.

  • How to Become a Successful Tech YouTuber?
Elaine Liu

Machine Learning Engineer @ Chartbeat

  • The Billion-Request Content Recommendation System Challenge
Erin Mikail Staples

Erin Mikail Staples is a developer advocate, community human, tech educator, and comedian living in Brooklyn, NYC🗽 by way of Reno, NV 🤠.

She is currently a Developer Experience Engineer at LaunchDarkly. Erin hosts the DevRel(ish) podcast and performs and produces comedy in NYC.

Erin's various shenanigans have taken her to speaking at events like PyData Berlin and PyCon, garnered Jeff Staple's attention, appeared in The New York Times, broadcasted across the sound waves on the Practical AI podcast, or in USA Today's Storytellers Project, exist in the archive of the Museum of Alternate History and her mom's dogs's Instagram account.

  • Shipping like a Spy; Enabling continuous deployment with stealth, security, and smooth moves in mind.
  • Fireside Chat with Michelle Gill
  • Lightning Talks
  • Fireside Chat with Soumith Chintala
Fabio Buso

Fabio Buso is VP of Engineering at Hopsworks, leading the Feature Store development team. Fabio holds a master’s degree in Cloud Computing and Services with a focus on data intensive applications.

  • Build Simpler Production ML Systems using Feature/Training/Inference Pipelines
Florian Jacta
  • Specialist of Taipy, a low-code open-source Python package enabling any Python developers to easily develop production-ready AI applications.

  • Data Scientist for Groupe Les Mousquetaires (Intermarche) and ATOS.

  • Developed Predictive Models as part of strategic AI projects.

  • Master in Applied Mathematics from INSA, Major in Data Science & Mathematical Optimization.

  • Turning your Data/AI algorithms into full web applications in no time with Taipy
Gastón Barbero

Born in Córdoba, Argentina. Started learning web development at the age of 13 and grew a love for machine learning back at 2018, after attending to my first PyData event at my home city. Currently working as a Senior Machine Learning Engineer for an american real estate company.

  • Basics of cloud computing for data scientists
Gil Forsyth

Gil Forsyth is a staff software engineer at Voltron Data. He followed the common career path of Japanese language specialist -> administrative assistant -> mechanical engineer -> computational fluid dynamicist -> data scientist -> software engineer -> machine learning engineer -> software engineer. Gil contributes to several projects in the PyData ecosystem and is a core maintainer of xonsh and Ibis. He served as the program chair for the Scientific Computing with Python (SciPy) conference from 2017 to 2020.

  • Ibis: A fast, flexible, and portable tool for data analytics.
Gordon Shotwell

Gordon is a Software Engineer at Posit PBC where he works on Shiny for Python. He has ten years of experience building data science applications in various industries, and most recently was a Lead Data Scientist at Socure where he was responsible for data science tooling.

  • Build Simple and Scalable Apps with Shiny
Gurkanwar Singh

Data Scientist at IBM

  • Using Generative AI and Foundation Models to Predict Above Ground Biomass for Nature Based Carbon Sequestration
Hajime Takeda

Hajime is a data professional with five years of expertise in marketing, retail, and eCommerce, working across Japan and the United States.

As a Data Analyst at Procter and Gamble and MIKI HOUSE Americas, Hajime has led data-driven strategy formulation and implemented technology initiatives such as e-commerce expansion, advertising optimization, and the identification of growth opportunities.

  • Customer Lifetime Value Prediction with PyMC Marketing
Hannah Aizenman

Hannah Aizenman studies visualization as a graduate student in Computer Science at The Graduate Center (CUNY) and is the Matplotlib Community Manager.

  • Plotting with Matplotlib; Telling Static, Animated, & Interactive Stories
Harini Srinivasan

Harini Srinivasan is a Senior Technical Staff Member in the IBM Sustainability Software organization. She currently manages a team of Data Scientists and Machine Learning Engineers and focuses on building AI solutions using weather data, satellite imagery, and other geo-spatial data. Over her tenure at IBM, Harini has worked with several clients – be it in analyzing and fixing performance problems in enterprise applications, or bringing new innovative solutions to clients in various technical areas like deployment, Social Media Data Analysis, B2B solutions using Weather and other geo-spatial data. Her current focus is application of AI models (including Generative AI) that use environment data such as satellite imagery, weather for sustainability solutions such as Outage Prediction, Vegetation Management and Carbon Sequestration.

  • Using Generative AI and Foundation Models to Predict Above Ground Biomass for Nature Based Carbon Sequestration
Ines Montani

Ines Montani is a developer specializing in tools for AI and NLP technology. She’s the co-founder and CEO of Explosion and a core developer of spaCy, a popular open-source library for Natural Language Processing in Python, and Prodigy, a modern annotation tool for creating training data for machine learning models.

  • Half hour of labeling power: Can we beat GPT?
JJ Allaire

J.J. Allaire is the founder of RStudio and the creator of the RStudio IDE. J.J. also worked on the R Markdown and Shiny packages, as well as on the R interfaces to Python and TensorFlow. J.J. is now leading the Quarto project, which is a new Jupyter-based scientific and technical publishing system.

  • Keynote: Dashboards with Jupyter and Quarto
  • Fireside Chat with J.J. Allaire
James Powell

James has played a vital role in organizing the PyData community while serving as a liaison during his years as a NumFOCUS Vice President. He has attended 30+ PyData events throughout the world helping to recruit the best talent and ideas to better the community. James has also worked with core developers behind numerous NumFOCUS projects to advance their communities.

  • Simple Simulators with pandas and Generator Coroutines
  • Pub Quiz
Jonathan Bechtel

Jonathan is the principal data scientist for the Data Science and Machine Learning Research Group, which is a consultancy that allows ambitious academics to do transformative work in the commercial sector. He's also a contributor and community council member for SKTime.

In the past he's worked with organizations such as General Assembly, NYPD, Amber Capital and Advent International to assist them with their data science needs.

  • Forecasting With Classical and Machine Learning Methods Using SKTime
Kewen Gu

Kewen Gu is a Machine Learning Engineer in the Sustainability Software division of IBM, focusing on building AI solutions for businesses using weather, remote-sensing Imagery and client related data sets. He has extensive experience working with very large data sets including scaling runs using PySpark, Kubernetes, Argo. Currently, he is focused on building and deploying two main AI solutions in the IBM Environment Intelligence Suite - a) Outage Prediction which predicts where and how many outages a Utility should expect during a storm. b) Vegetation Management, which uses 3D Imagery to determine where a Utility needs to trim trees that are closest to power lines.

  • How to Empower Utility Vegetation Management: A Blend of AI and LiDAR Data
Keyur Patel

Keyur is a proud member of the team that helped bring Python to Excel. He's excited on sharing what new capabilities are now unleashed and excited

  • Python in Excel
Kim Pevey

Kim Pevey is a Senior Software Engineer at Quansight. She has an extensive background in big data analytics using Dask and delivering those computational insights to clients. She has architected a wide variety of packages to analyze client data, build and automate ML workflows, parallelize existing codebases, and everything in between. She also has a special interest in visualization and building dashboards.

  • Data of an Unusual Size: A practical guide to analysis and interactive visualization of massive datasets
Krishi Sharma

Krishi Sharma is a software developer at KUNGFU.AI where she builds software applications that power machine learning models and deliver data for a broad range of services. As a former data scientist and machine learning engineer, she is passionate about building tools that ease the infrastructure dependencies and reduce potential technical debt around handling data. She helped build and maintains an internal Python tool, Potluck, which allows machine learning engineers the ability to bootstrap a containerized, production ready application with data pipelining templates so that her team can focus on the data and metrics without squandering too much time finagling with deployment and software.

  • Innovation in the Age of Regulation: Federated Learning with Flower
Kriti Kohli

Kriti Kohli is a Senior Manager, Applied Machine Learning at Shopify. Kriti has over 10+ years of experience in enterprise applications of natural language processing, machine learning and time series forecasting models. Prior to joining Shopify, Kriti has held AI/ML leadership roles in AI centers-of-excellence organizations developing industry expertise in AI solutions for supply chain, finance, HR and hardware. Kriti earned a PhD in Physics from King’s College London and a B.S. degree from University of Notre Dame in Electrical Engineering. She has over 30+ publications and patents.

  • Leveraging Generative AI for Enhanced E-commerce
Liz Johnson

Liz is a full stack software engineer with a passion for learning new things. Liz is passionate about data analysis and machine learning but even more passionate about ways we can improve it through good testing practices and using the tools that exist for all the awesome things they provide. Liz has worked has a software consultant for the last couple years and in the process has built complex rules engines in python and then did extensive data analysis on their performance and possible optimizations. She has also worked on ETL pipelines for large data migrations.

  • Harnessing Test-Driven Development and CI/CD for Smarter Data Analysis
Lucas Durand

Lucas Durand (he/him/his) is the Director of Data Science Engineering at TD Securities and the Product Owner for TDS Notebooks, the TD Securities "Data Platform as a Service". Lucas has been with TD for upwards of 7 years as a Quant, Software Engineer, and Data Scientist.

Lucas holds a Master of Science in Theoretical Physics from York University as well as an Honours Bachelor of Science from the University of Toronto. He is a passionate teacher, avid musician, and big advocate for Python as a first-class language in banking.

  • Building an Interactive Network Graph to Understand Communities
Manuel Illanes

Manuel is a Data Scientist specializing in AI, software development, and Brain-Computer Interfaces (BCIs). With an MSc in Data Science and a background in Psychology from the Bolivian Catholic University, he has navigated through a spectrum of roles in neurotechnology and software development. Passionate about sharing knowledge, he has been a speaker and presenter at various conferences, delving into topics like Big Data, Brain-Computer Interfaces, Neurotechnology, and Neural Data Science.

  • Deep Learning with PyTorch for the Analysis of Time Series Neural Data
Mars Lee

Technical Illustrator.

  • Comics in NumPy? More Likely than You Think!
Martha Norrick

Martha Norrick is the Chief Analytics Officer for New York City and is the head of the Office of Data Analytics (ODA). The ODA works with City agencies, OTI staff, and their data to serve New Yorkers more equitably and efficiently. They support citywide programs, conduct analytics projects to improve City operations, and advance citywide data analytics, infrastructure, integration, and sharing. Our office includes the NYC Open Data team, which implements the City's Open Data Law, strives to improve the accessibility of public datasets, and advocates for the use of open data in the community.

  • Opening Remarks: Martha Norrick
Matthew Rocklin

Matthew is an open source software developer in the numeric Python ecosystem. He maintains several PyData libraries, but today focuses mostly on Dask a library for scalable computing. Matthew worked for Anaconda Inc for several years, then built out the Dask team at NVIDIA for RAPIDS, and most recently founded Coiled to improve Python's scalability with Dask for large organizations.

Matthew holds a bachelors degree from UC Berkeley in physics and mathematics, and a PhD in computer science from the University of Chicago.

  • Website: https://matthewrocklin.com
  • Dask: https://dask.org/
  • Coiled: https://coiled.io
  • Spark, Dask, DuckDB, and Polars: Benchmarks
Megan Lieu

Megan Lieu is a Data Advocate at Deepnote, where she talks about data science careers, workflows and tools. She also is a thought leader in the data space and writes daily on LinkedIn to an audience of 85k.

  • Machine Learning in your Data Warehouse using Python
Mei Chen

Mei is a machine learning engineer at MunichRE. She holds a MASc from the University of Waterloo and a BHSc from McMaster University. Mei has 20+ publications in the intersection of machine learning and healthcare, with focuses in brain computer interfaces, intensive care, and musical mindfulness.

  • Using Open Source LLM in ETL
Michael Zargham

Dr. Zargham is the founder and Chief Engineer at BlockScience, as well as a Research Director of a (nonprofit), The Metagovernance Project. Additionally, he serves on the Advisory Council at NumFocus. He holds a PhD in Electrical and Systems Engineering from the University of Pennsylvania with a focus on Optimal Dynamic Resource Allocation Policies.

  • Crafting Reliable Rules with Robust Control
Michelle Gill

Michelle Gill is a Manager for Research & Development and a Senior Deep Learning Scientist in
Life Sciences at NVIDIA, where she focuses on deep learning and HPC methods for scientific
discovery. She is also the technical lead for BioNeMo, a platform for development and access to
generative AI models in drug discovery. She holds a PhD in Molecular Biophysics and
Biochemistry from Yale University and completed a postdoctoral research fellowship at
Columbia University Medical School.

  • Fireside Chat with Michelle Gill
  • Keynote: Scientific Discovery: From the Lab Bench to the GPU
Mine Çetinkaya-Rundel

Mine Çetinkaya-Rundel is Professor of the Practice at Duke University and Developer Educator at Posit. Mine's work focuses on innovation in statistics and data science pedagogy, with an emphasis on computing, reproducible research, student-centered learning, and open-source education as well as pedagogical approaches for enhancing the retention of women and underrepresented minorities in STEM. Mine works on the OpenIntro project, whose mission is to make educational products that are free, transparent, and lower barriers to education. As part of this project, she co-authored four open-source introductory statistics textbooks. She is also the creator and maintainer of datasciencebox.org, co-author on R for Data Science (2nd Edition), and she teaches the popular Statistics with R MOOC on Coursera.

  • From Jupyter Notebooks to websites with Quarto
Moussa Taifi

Data science platform architect focused on data science productivity, reliability, performance, and cost.

Working on designing and implementing large scale AI products through data collection, analysis, and warehousing.

Passionate about building scalable machine learning pipeline architectures with high business impact.

Aspiring author.

  • Modern Data Pipelines Testing Techniques: A Visual Guide
Nitya Narasimhan

Nitya Narasimhan is a PhD and Polyglot with 25+ years of experience in software research, engineering and advocacy across industry, startups and academia. Her interests span distributed systems, mobile & web development, cloud and AI. She is currently a member of the JavaScript Advocacy team at Microsoft where she works on empowering the web developer ecosystem to build intelligent apps with Azure and AI. You can find here talking about tech and career @nitya - and visualizing tech @SketchTheDocs

  • Simplifying Data Analysis with GitHub Codespaces, Jupyter Notebooks & Open AI
Patrick Hoefler

Patrick Hoefler is a member of the pandas core team and a Dask maintainer. He is currently working at Coiled where he focuses on Dask development and the integration of a logical query planning layer into Dask. He holds a Msc degree in Mathematics and works towards a Msc in Software engineering at the University of Oxford.

  • Dask tutorial
  • The Arrow revolution in pandas and Dask
Pavithra Eswaramoorthy

Pavithra Eswaramoorthy is a Developer Advocate at Quansight, where she works to improve the developer experience and community engagement for several open source projects in the PyData community. Currently, she maintains the Bokeh visualization library, and contributes to the Nebari (adjacent to the Jupyter community) and conda-store (part of the conda ecosystem) projects. Pavithra has been involved in the open source community for over 5 years, notable as a maintainer of the Dask library and an administrator for Wikimedia’s OSS programs. In her spare time, she enjoys a good book and hot coffee. :)

  • From RAGs to riches: Build an AI document interrogation app in 30 mins
  • Data of an Unusual Size: A practical guide to analysis and interactive visualization of massive datasets
Peter Vidos

Peter, the CEO, and Co-Founder of Vizzu, is on a mission to redefine how we perceive and interact with data. His passion lies in uncovering innovative solutions to the challenges faced by data professionals when it comes to chart creation and presentation.

With over 15 years of experience in digital product development, Peter's career has spanned a wide array of projects, from mobile app testing to online analytics, decision support systems, and e-learning solutions.

In his current role at Vizzu, Peter is dedicated to driving innovation in data visualization and empowering data professionals to effortlessly convey their insights through interactive and animated data stories.

  • Empowering Data Exploration: Creating Interactive, Animated Reports in Streamlit with Vizzu
Phillip Cloud

I'm fascinated by a variety of problems related to computers. I've solved hard problems in a variety of software engineering domains including digital video, Rust, systems programming, computer vision, and analytics. I'm currently helping build the future of analytics at Voltron Data.

  • Ibis: A fast, flexible, and portable tool for data analytics.
Piero Ferrante

Piero Ferrante is an AVP & Data Science Fellow at CVS Health, a Fortune 6 health solutions company, where he and his team are focused on building scalable machine learning systems and developing tools to enhance the productivity and efficacy of hundreds of fellow data scientists and engineers.

Piero has 15+ years of applied AI/ML experience in healthcare, telecom, insurance, mobile advertising, and fintech at companies ranging in size from unicorn startups to Fortune 500s. He holds an M.S. in Predictive Analytics from Northwestern University, a B.S. in Finance and Management Information Systems from the University of Delaware, and has served as an adjunct at New York University, the University of Kansas, and Rockhurst University. Piero also advises Play-it Heath, a digital health startup, on algorithms and data strategy.

  • Low(er) Code ML Pipelines with Conduit
Piotr Płoński

PhD in computer science, author and creator of data science tool:

MLJAR AutoML - Automated Machine Learning framework https://github.com/mljar/mljar-supervised
MERCURY - Transform your notebook into a web app easily! https://github.com/mljar/mercury

  • No fronted, no notebook rewriting, no callbacks. Turning notebook into a web app with Mercury.
Ramon Perez

Ramon is currently a developer advocate at Seldon. Before joining Seldon, he worked as an independent freelance data professional and as a Senior Product Developer at Decoded, where he created custom data science tools, workshops, and training programs for clients in various industries. Going a bit further back, Ramon used to wear different research hats in the areas of entrepreneurship, strategy, consumer behavior, and development economics in industry and academia. Outside of work, he enjoys giving talks and technical workshops and has participated in several conferences and meetup events. In his free time, you will most likely find him traveling to new places, mountain biking, or both.

  • Artificial Rhythms: The Merger of Machine Learning, Music and Programming
  • Architecting Data: A Deep Dive Into the world of Synthetic Data
Randy Au

Randy is a Senior Quantitative UX Researcher, using data science methods to understand how users use products in order to help build better experiences. Outside of work, he writes the Counting Stuff newsletter and engages in way too many hobbies.

  • Solving the problems in front of you
Ritchie Vink

Ritchie Vink is the Author of the Polars DataFrame library. Originally he has a background in Civil Engineering, but he soon made the switch to Data/Software development. He has worked as a Machine Learning Engineer and a Software Engineer for 5 years, before he spent all of his time to Polars project. Those years have been filled with side projects to feed his curiosity. In present times he is the CEO of the newly started Polars Inc.

  • Polars; DataFrames in the multi-core era.
Roni Kobrosly

I spent nearly a decade employing causal modeling and inference in academia as an epidemiologist, and since 2015 then I've been employing these approaches as an industry data scientist / ML engineer. I also am a member of the open-source community, being the author and maintainer of the causal-curve python package (https://github.com/ronikobrosly/causal-curve). I am currently a Director of Data Science at Capital One.

  • The dangers of storytelling with feature importance
Ryan Curtin

Ryan earned a Ph.D. studying the acceleration of statistical algorithms at Georgia Tech in 2015. During his time there, he became the maintainer of the mlpack C++ machine learning library (in 2009), and has been contributing to the C++ data science ecosystem ever since. Ryan also maintains the ensmallen optimization library, the Bandicoot GPU linear algebra library, and contributes to the Armadillo linear algebra library. His interest is in making machine learning fast---both by high-quality, efficient implementations, and by choice of asymptotically effective algorithm.

  • Lightweight, low-overhead, high-performance: machine learning directly in C++
Ryan Wesslen

Ryan is an ML Engineer and Prodigy Customer Lead at Explosion. Previously, he led the NLP team in Bank of America's Chief Data Scientist Organization and worked in various roles in credit risk management, analytics, and modeling. His academic work covers research from fields like human-computer interaction, cognitive science, information visualization, and computational social science.

  • Half hour of labeling power: Can we beat GPT?
Saba Nejad

Saba Nejad is a Data Engineer at Point72 working mostly with alternative data within the energy and industrials sector. She is broadly interested in using mathematics and programming to gain insight from real world data. Prior to joining Point72, she was studying at MIT where she was doing research at the Institute for Data, Systems, and Society. She was previously a Product Manager at Quantopian.

  • How to Use Python and Mathematical Modeling to Better Understand the Impact of Electricity Pricing on Consumption
Sebastian Benthall

Sebastian Benthall, PhD, is a contributor to Econ-ARK, an open source toolkit for heterogeneous agent modeling. He is a Principal Investigator at the International Computer Science Institute, and a Senior Research Fellow at New York University School of Law, where he researchers computational economics approaches to data protection and AI regulation.

  • Open Source Computational Economics: The State of the Art
Sinclair Target
  • The Billion-Request Content Recommendation System Challenge
Soumith Chintala

Soumith Chintala is an Engineer at Meta, where he works on high-performance Artificial Intelligence (AI). Soumith co-created PyTorch, a major scientific computing framework within the AI community. Soumith also holds a visiting research position at NYU, focusing on Robotics. Soumith focuses on high-performance computing, computer vision, generative AI and robotics.

  • Keynote: AI and the stuff built for AI -- are they actually useful for data science?
  • Fireside Chat with Soumith Chintala
Suri Chen

Suri Chen is a Principal Data Scientist at PepsiCo eCommerce. With a background in Operations Research from Columbia University, she possesses expertise in areas including optimization, neural networks, bayesian modeling, consumer clustering, and text mining. As an applied scientist in PepsiCo's cross-functional team, Suri addresses business challenges by harnessing the power of data and machine learning. Beyond her professional role, she is passionate about generative AI research, particularly in the realm of music generation.

  • Deciphering Sales Drivers at PepsiCo: Exploring Bayesian and Frequentist Approaches to Media Mix Modeling
Theophilus Ijiebor

Theophilus IJiebor is a distinguished data scientist and researcher, renowned for his contributions to the field of data analytics and artificial intelligence. Born on Sept 5, 1990, in Edo State, Nigeria. Theophilus has dedicated his career to harnessing the power of data and AI o drive innovation.
My academic journey began with an Associate Degree in Computer Engineering, from University of Benin, Nigeria, where I developed a strong foundation Mathematics and Statistics, my passion led me to pursue a Bachelor's degree in Statistics and Computer Science from the same university, where I developed a strong foundation in programming and data analysis skills. With the growing interest in the field of Software Engineering, I decided to obtained a master’s degree in Computer Science from the same university for data-driven insights in software development, where I conducted groundbreaking research in using Machine learning algorithms in software component testing and prediction. In 2021, as the interest in the field of Artificial Intelligence and its application in the healthcare industry and other fields, I decided to pursue another Master’s degree in Computer Science with specialization in Artificial Intelligence and Machine Learning where I conducted different research in the field of AI/ML such as anomaly detection, Sentiment analysis using BERT and other NLP techniques at the Schaefer School of Engineering and Science, at Stevens Institute of Technology, New Jersey, USA.

I have worked with leading tech companies and research institutions, including a group of Researchers at the University of Benin and at Stevens Institute for Artificial Intelligence research lab. I have attended several conferences and workshops and I have made several contribution in the data science community.

Theophilus is also a dedicated educator, having served as a teaching assistant at various university for several years, where I mentored and inspired the next generation of data scientists. Theophilus Ijiebor is known for his ability to convey complex concepts in a relatable and engaging manner.

Currently, Theophilus Ijiebor is a Senior Data Scientist at IBM, where he is currently developing cutting-edge machine learning algorithms solutions to diverse domains, from healthcare, life science, education and other domain.

  • Transformative Power: Synthetic Image Generation in Rare Disease Diagnosis
Thomas J. Fan

Thomas J. Fan is a maintainer for scikit-learn, an open-source machine learning library for Python. He led the development of scikit-learn's set_output API, which allows transformers to return pandas DataFrames. Previously, Thomas worked at Columbia University to improve interoperability between scikit-learn and AutoML systems. He is a maintainer for skorch, a neural network library that wraps PyTorch. Thomas holds a Master's in Mathematics from NYU and a Master's in Physics from Stony Brook University.

  • Scikit-learn on GPUs with Array API
Timothy Hewitt

Timothy is the Product Manager leading Anaconda's Python in Excel Initiatives. After graduating with a degree in linguistics, he started his coding and data journey when he had a crazy idea and his boss told him to "go learn Python and do it." He now leads software development products with compassion towards those who are learning and growing in this crazy field.

  • Python in Excel
Tracy Teal

Tracy Teal is the Open Source Program Director at Posit Software, PBC. Previously, she was a co-founder of Data Carpentry and the Executive Director of The Carpentries. She developed open source bioinformatics software as an assistant professor at Michigan State University and holds a PhD in computation and neural systems from California Institute of Technology. Tracy is involved in the open source software and reproducible research communities, and has been working with open source communities, developing curriculum, and teaching people how to work with data and code as a developer, instructor and project leader throughout her career.

  • Build Simple and Scalable Apps with Shiny
Valay Dave

Valay is a software engineer at Outerbounds and one of the core contributors of Metaflow. Before joining Outerbounds he was pursuing his master's degree in ML/AI at Arizona State University (ASU) where he discovered Metaflow and began contributing to it's open-source development.

  • Building Robust Reactive ML Systems In A Multiplayer Setting With Metaflow
Viren Bajaj

Viren Bajaj is a Senior Machine Learning Engineer at CVS Health.

He works on products to improve the productivity and efficacy of data scientists and engineers.

Currently, he is working on Conduit - a python package that streamlines the process of deploying and managing kubeflow pipelines on Vertex AI, Google's AI Platform.

Previously, he's worked at NASA Langley Research Center, Logical Systems Lab at Carnegie Mellon University, and the Nuclear & Particle Physics Department at Carnegie Mellon University.

  • Low(er) Code ML Pipelines with Conduit
Wenqi (Summer) Zhai

A senior data scientist at IBM in Data and Technology Transformation - Healthcare and Life Science Practice. In my spare time, I love volunteering and visiting museums.

  • Transformative Power: Synthetic Image Generation in Rare Disease Diagnosis
Will Ayd

Will Ayd has been a maintainer of the pandas project since 2018. As an independent Solutions Architect / Data Engineer, Will has helped countless clients leverage the best of open-source technology to make the most out of their data.

  • Faster SQL with pandas and Apache Arrow
Zachary Blackwood

Once a teacher, then a web developer, now a Data Something. Father of 5 beautiful daughters. Working at Snowflake to make Streamlit even more amazing than it already is.

  • Empowering Data Exploration: Creating Interactive, Animated Reports in Streamlit with Vizzu
santosh kumar radha

Meet Santosh, a theoretical physicist who's as comfortable with fermions as he is with FORTRAN. With a Ph.D. specializing in Condensed Matter Theory, Santosh spent years unraveling the knotty quantum phenomena that arise when subatomic particles decide they're in the mood for complexity. Equally adept in Python, Julia, C, and yes, even FORTRAN, he’s translated his keen insights from the abstract world of physics to the cutting-edge realm of quantum computing. Now serving dual roles as the Head of R&D and Product at Agnostiq Inc., he's at the forefront of developing quantum algorithms and software that are as transformative as they are practical. Santosh's interests are as wide-ranging as his expertise, spanning from the serenade of mathematical equations to the rhythmic complexities of music. In short, if you're looking for someone who bridges the gap between the esoteric and the essential, you've found your guy!

  • Covalent - A New Paradigm for High Compute Cloud Workloads in the Age of LLM and Generative AI