Building an Expert Question/Answer Bot with Open Source Tools and LLMs
11-01, 13:20–14:50 (America/New_York), Winter Garden (Room 5412)

In this workshop, we'll explore how LangChain, Chroma, Gradio, and Label Studio can be employed as tools for continuous improvement, specifically in building a Question-Answering (QA) system trained to answer questions about an open-source project using domain-specific knowledge from GitHub documentation.


When applying large language models (LLMs) to the real world, quality is critical. LLMs, particularly foundation models, are trained on vast corpora of data, giving them a general "understanding" of the world that is nothing less than jaw-dropping. But, along with this wide coverage, LLMs also inherit an internet-level bias that is nearly impossible to understand, let alone fully control. This ubiquitous bias poses a challenge because it only sometimes aligns with the expectations and requirements of our unique application domains. Therefore, a one-size-fits-all LLM often needs to meet the expectation of providing quality responses for specific applications.

As much as these LLMs are data-rich, their application in the real world leaves room for improvement. Quality, not quantity, becomes the key issue. For business applications, contextual awareness, data privacy, and the ability to control these applications are vital requirements. LLMs and applications built on top of them need continuous fine-tuning to suit specific domains and align the model with our precise needs. The ability to do this consistently and reliably is becoming integral for vertical-specific LLM applications. Additionally, we must continuously tune and improve our models and applications.

In this workshop, we'll explore how LangChain, Chroma, Gradio, and Label Studio can be employed as tools for continuous improvement, specifically in building a Question-Answering (QA) system trained to answer questions about an open-source project using domain-specific knowledge from GitHub documentation.

Ultimately, we aim for the QA system to serve as a blueprint for continuous enhancement across LLM applications. We want a system that allows us to strategically navigate the continuous cycle of feedback and adaptation, all while allowing us to incorporate human understanding. Code for the workshop is open source and will be provided.


Prior Knowledge Expected

No previous knowledge expected

Chris Hoge is the Head of Community for HumanSignal, helping to grow the Label Studio community. He has spent over a decade working in open-source machine learning and infrastructure communities, including Apache TVM, Kubernetes, and OpenStack. He has an M.S. in Applied Mathematics from the University of Colorado, where he studied numerical methods for simulating physical systems. He splits his time between the PNW and NYC, where he spends his free time trail running and playing piano.