We have spearheaded the creation of a cutting-edge Legal Language Model (LLM) application using RAG architecture tailored for legal professionals, meticulously architecting a comprehensive solution encompassing the following key components:
1. Data Ingestion Pipeline for LegalAI QA System: We have meticulously crafted a robust pipeline for seamless data ingestion, ensuring a smooth flow of information to our advanced LegalAI Question Answering (QA) system.
2. Document Store (Weaviate): Our ingenious design includes the implementation of a sophisticated document store using Weaviate, providing a structured repository for efficiently managing ingested legal documents.
3. QA System (RAG): We've established a state-of-the-art QA system utilizing the Retrieve-and-Generate (RAG) approach. Through extensive testing across various embedding models and chunk sizes, we've identified and integrated the optimal model for unparalleled relevance evaluation.
4. Follow-up QA with User State Management: Our innovative approach includes the implementation of a follow-up QA system that intelligently maintains user chat history, ensuring a contextual and coherent interaction by considering the user's previous inputs.
5. Continuous Quality Data Generation Pipeline: In our beta testing phase, we've introduced a dynamic pipeline for continuous quality data generation. This iterative process is pivotal for refining the RAG pipeline and enhancing the overall efficacy of the QA system.
Key Features:
1. Improved RAG Pipeline for Answer Synthesis: We've revolutionized the RAG pipeline, optimizing the retrieval of relevant documents from the vector store. Rigorous testing across diverse embedding models and chunk sizes has enabled us to pinpoint and integrate the most effective model.
2. Enhanced Reranker Pipeline: To ensure the final answer synthesis involves only relevant documents, we've implemented an advanced reranker pipeline. This strategic addition significantly reduces false positives and mitigates the risk of answer hallucination.
3. User-Centric Answer Display Enhancements:
• Highlighted Answer Origin: Each answer is enriched with highlighted chunks indicating the source of information, offering transparency in the answer generation process.
• Contextual Chunk Highlighting: We've implemented a feature that highlights the specific text chunks crucial to answer synthesis, providing users with a focused view of the segments contributing to the response.
4. Document QA feature: Allowing user to upload document in real time and build functionality to interact with uploaded document along with user session management where user can re iterate over those documents in any time future until 30 days with option how many days he wants to keep that state alive, so he can start from where he was left.
5. Time line creation: Identifying the important events in the uploaded documents and created it as a timeline so that user can be aware about the previous events in a particular case.