Using Open Source LLM in ETL PyData NYC 2023

Using Open Source LLM in ETL
.ical

11-03, 11:45–12:25 (America/New_York), Central Park West (Room 6501)

This session will provide a case study of using Llama2-70b to tackle a data transformation friction point in reinsurance underwriting. The final approach of the solution is industry agnostic. We will walk through our thought framework for breaking down a business problem into LLM-able chunks, lay out the explored solutions and best performing method, compare local vs. at scale inference, and how we evaluated the unstructured LLM responses to prevent hallucination and ambiguity in getting structured response.

There are multiple friction points in the world of accelerated underwriting. The process for information to go from an applicant filling out a form to being processed by the machine learning model for decisioning involves several steps of manual transformation. We aim to use open source LLMs to streamline the manual task in a time-sensitive, privacy-preserving manner. We explored this problem with the model Llama2-70b, fine-tuning the parameters and using a multi-prompt approach to achieve useful structured results. Evaluation of the result was also done step-wise, using an eval LLM step to confirm the validity of the solution.

Prior Knowledge Expected –

No previous knowledge expected

Mei Chen

Mei is a machine learning engineer at MunichRE. She holds a MASc from the University of Waterloo and a BHSc from McMaster University. Mei has 20+ publications in the intersection of machine learning and healthcare, with focuses in brain computer interfaces, intensive care, and musical mindfulness.

Using Open Source LLM in ETL .ical 11-03, 11:45–12:25 (America/New_York), Central Park West (Room 6501)

Using Open Source LLM in ETL
.ical

11-03, 11:45–12:25 (America/New_York), Central Park West (Room 6501)