DBA806 M11 - Prompts to RAGs - FW.VISION

up:: [[DBA806 - Applied AI Innovation]] tags:: #source/course #on/AI #on/ML people:: [[Praphul Chandra]] # DBA806 M11 - Prompts to RAGs [Session 11 - Dr. Praphul Chandra - Prompts to RAGs - YouTube](https://www.youtube.com/watch?v=z2vizsy9YVA) [PromptsToRags.pdf](https://cdn.upgrad.com/uploads/production/1b001f99-f1e8-4119-af81-0fdfc2ae574a/PromptsToRags.pdf) The text on large language models (LLMs), such as GPT (Generative Pre-trained Transformers). The session aims to clarify concepts such as prompt engineering, retrieval augmented generation (RAG), and the benefits of continuous training and retraining models. Key points covered include: - **Foundation Models and Large Language Models (LLMs):** - Foundation models are AI/ML models trained on extensive and diverse data, often web scale, enabling broad applicability across tasks. - LLMs, a type of foundation model, excel in understanding and generating text due to their training on vast amounts of textual data. - GPT models, a subset of LLMs, are based on the Transformer architecture, demonstrating superhuman text comprehension and generation abilities. - **Prompt Engineering and RAG:** - Prompt engineering involves shaping inputs to LLMs to produce desired outputs, facilitating their use in various applications. - Retrieval Augmented Generation (RAG) is a technique discussed, emphasizing its significance and a dedicated session planned for a deeper understanding. - **Continuous Training and Retraining Models:** - Continuous training of models post-deployment is crucial to address performance degradation due to changing data distributions or to experiment for improved models. - Discussion on retraining pre-trained models explores the benefits and challenges, emphasizing the need to understand different terminologies and concepts related to foundation models. - **Terminology and Concepts Clarification:** - Definitions provided for terms like Foundation Models, LLMs, and GPTs, focusing on their data-driven training and broad applicability. - Clarification on parameters in LLMs, explaining their relation to the number of connections between artificial neurons, influenced by both model size and training data volume. Overall, the session provides insights into the evolving landscape of foundation models, emphasizing their transformative potential across various AI applications while addressing technical nuances and participant inquiries. The text discusses various aspects of utilizing large language models (LLMs) like GPT for tasks such as summarization and question-answering, as well as the importance of prompt engineering in maximizing their effectiveness. Here are the key points summarized: - LLMs like GPT continue to advance, with hardware technology improving every 18 months, leading to more powerful models. - The physical size of a data farm, where these models are trained, refers to adding more servers to handle increasing computational demands. - Unstructured data can be fed into LLMs to obtain document summaries, enabling them to answer questions based on the documents. - Retrieval augmented generations (RAGs) are specifically designed for this purpose. - Terms like pre-trained, generalized, adaptable, large, and self-supervised define aspects of foundation models and their capabilities. - Language models function on self-supervised learning principles, predicting the next word in a sequence based on large corpuses of data. - Prompt engineering involves structuring language input to LLMs to make it easier for them to understand and generate appropriate responses. - Prompts serve as instructions or commands to LLMs, describing tasks they should perform. - Prompt engineering can be seen as treating English as a programming language, instructing the model through natural language. - Clear and detailed prompts are crucial for effective communication with LLMs. - Including specifics in prompts, such as desired formats or instructions, improves model performance. - Examples of clear and detailed prompts are provided, highlighting the importance of context and clarity. - Personal identifiable information should not be included in prompts, as they are sent to model owners. - Delimiters, like triple quotes, help distinguish between instructions and data in prompts. - The term "engineering" in prompt engineering can be interpreted broadly as designing effective prompts for LLMs. Overall, the text emphasizes the significance of crafting clear and detailed prompts to harness the full potential of large language models for various tasks. The text discusses various techniques and considerations for effective prompting of large language models like GPT. Here are the key points summarized: - **Nature of Language and Naming**: Acknowledges that language is dynamic and uncontrollable; names and terms are arbitrary. - **Specifying Inputs**: Proposes using angular brackets instead of triple quotes to specify articles in commands, aiding model comprehension. - **Controlling Output Size**: Suggests specifying output size in terms of words, sentences, or paragraphs. - **Persona Customization**: Emphasizes the effectiveness of assigning personas to models for tailored responses, highlighting various roles like teacher, interviewer, or chef. - **Utilizing Cheat Sheets**: Recommends using cheat sheets for crafting effective prompts, citing examples available online. - **Advanced Prompting Techniques**: Introduces more sophisticated methods like system instructions, step-by-step guidance within prompts, and examples for the model to follow. - **Role of Instructions**: Raises questions about the boundary between natural language interactions and programming, as prompting becomes more complex. - **Chain of Thought Prompting**: Describes advanced prompting techniques like chains or trees of thought to guide model responses. - **Few-shot Prompting**: Explains the concept of providing a few examples in prompts to shape model responses, compared to zero-shot prompting. - **Reference Text Usage**: Discusses the importance of providing reference text within prompts to mitigate model "hallucinations," ensuring more accurate responses. - **Citing Reference Text**: Proposes a stricter approach where models are asked to cite specific references in their responses, enhancing accountability and accuracy. - **Reducing Hallucination**: Considers reducing hallucination rates by leveraging reference texts and domain-specific knowledge. - **Advanced Prompt Engineering**: Recommends breaking down complex tasks into simpler ones, akin to modular programming in software engineering, to guide model interactions effectively. These key points cover a range of techniques and considerations for prompting large language models effectively, ensuring accurate and tailored responses. The text discusses the concept of prompt engineering in the context of utilizing Foundation models like GPT for various tasks. It emphasizes breaking down problems into smaller tasks and stitching them together in a workflow, with a focus on programming in natural language. The example used is intent classification for chatbots, where the goal is to understand user queries and categorize them accurately using natural language processing. **Key points:** - Prompt engineering involves breaking down problems into smaller tasks and creating a workflow where the output of one task becomes the input to another. - The text illustrates intent classification for chatbots as an example of prompt engineering, aiming to categorize user queries accurately. - Instructions for prompt engineering involve providing commands or instructions to the foundation model based on user queries and their classifications. - Prompt engineering allows for utilizing Foundation models as software modules, with responses generated in natural language. - Workflow stitching tools facilitate the integration of Foundation models into larger systems. - Fine-tuning the model involves providing numerous examples or training it on domain-specific data to optimize performance for specific tasks. **Further details:** - Instruction fine-tuning focuses on optimizing the model's output to follow a specific style, format, or tone, often achieved by providing numerous examples. - Instruction fine-tuning is useful when it's challenging to explain what is desired in text, making it easier to demonstrate through examples. - Supervised fine-tuning is more complex and involves adapting the model to a specific domain by training it on labeled data from that domain. - Supervised fine-tuning is necessary for tasks like creating domain-specific models for professions like airline pilots or lawyers. - Supervised fine-tuning requires labeled data and is used to optimize the model's performance for a particular domain. Overall, prompt engineering offers a structured approach to utilizing Foundation models effectively for various tasks, with fine-tuning strategies catering to specific requirements, whether for style optimization or domain adaptation. Supervised learning involves finding a function \( f \) where \( y = f(x) \), and in the context of language models, this means adapting the model to specific domains using labeled data. The process includes pre-training the model on extensive data sources like Wikipedia and common crawl, creating a pre-trained model. Then, domain-specific labeled data is fed into the fine-tuning path, resulting in a fine-tuned model tailored to the domain. This process aims to enhance the model's responses to user prompts. **Key Points:** - Supervised learning aims to find a function \( f \) such that \( y = f(x) \), where \( x \) represents input features and \( y \) represents labels. - Adapting language models to specific domains involves using labeled data from that domain to fine-tune pre-trained models. - The process starts with pre-training the model on extensive data sources like Wikipedia and common crawl, resulting in a pre-trained model. - Domain-specific labeled data is then fed into the fine-tuning path, resulting in a fine-tuned model suited to the particular domain. - Fine-tuning allows the model to adapt itself based on user prompts, improving the quality of responses. Additionally, the discussion covers various questions and clarifications regarding supervised fine-tuning: - Once fine-tuned with domain-specific data, the model provides better responses to similar prompts. - Hallucination, or generating incorrect responses, can still occur after fine-tuning. - Reinforcement learning with human feedback (RLF) is another approach to improving model responses. - Fine-tuning changes the weights of parameters in the model, requiring access to the model's source code, typically with open-source models. - The choice of in-domain data for fine-tuning depends on its trustworthiness and relevance to the domain. - Fine-tuning is done once and does not require execution with each prompt. - It's important to distinguish between yesterday's approach of retraining from scratch and today's supervised fine-tuning of pre-trained models. - Supervised fine-tuning is suitable for natural language tasks, while new models are preferred for tasks involving structured data. - Instruction fine-tuning involves providing specific instructions within prompts, while supervised fine-tuning uses labeled domain data. - Instruction fine-tuning, prompt engineering, and supervised fine-tuning serve different purposes in customizing model behavior. **Bullet points:** - Fine-tuning existing models enhances capabilities without rendering them obsolete. - Cost considerations suggest exploring open-source LLMs for efficiency. - Identification of specific fine-tuning methods in customized models can be challenging without detailed information. - Supervised fine-tuning modifies the model, while referencing in prompts influences responses without altering the model. - Domain-specific LLMs are available from various sources, including open-source initiatives and startups. - Supervised fine-tuning with company data remains private, ensuring data protection. - RAG leverages external data dynamically without modifying the model's parameters. - RAG separates factual knowledge from the model, accessing external sources for crafting responses beyond the model's trained dataset. The discussion revolves around the utilization of tightly linked Foundation models, particularly focusing on the feasibility of hosting such models within an organization's infrastructure, independent of third-party services. Closed Source Foundation models, like those provided by OpenAI and Microsoft, are typically hosted by the respective companies and accessed via API on a pay-per-use model. Conversely, open-source models, such as GPT Falcon, are available as GitHub repositories but necessitate deployment on cloud infrastructure, often GPU-based, requiring additional costs. **Key points:** - Closed Source Foundation models are hosted by respective companies, accessible via API on pay-per-use models. - Open-source models require deployment on cloud infrastructure, often GPU-based, incurring additional costs. Further inquiries address the applicability of pre-trained large language models (LLMs) in enterprise information retrieval scenarios. Two approaches are outlined: fine-tuning the model with enterprise data or employing retrieval-augmented generation (RAG) methods, allowing the model access to external enterprise knowledge without altering its core structure. **Key points:** - Two approaches for enterprise information retrieval: fine-tuning LLMs with enterprise data or employing RAG methods. - RAG methods allow access to external enterprise knowledge without altering the model's structure. Questions arise regarding the necessity of pre-trained LLMs in scenarios where only domain-specific information is required. While LLMs possess deep language understanding beyond web data, they serve as crucial tools for comprehending user queries and crafting natural language responses based on enterprise knowledge. **Key points:** - Pre-trained LLMs are essential for understanding user queries and crafting responses based on enterprise knowledge, not just for web data. The discussion progresses to comparing RAG and supervised fine-tuning (SFT) approaches, highlighting RAG's adaptability, cost-effectiveness, and privacy advantages. RAG enables dynamic access to updated information without costly retraining, making it suitable for real-time data scenarios. **Key points:** - RAG offers adaptability and cost-effectiveness compared to SFT. - RAG allows access to dynamic information without costly retraining. Finally, a detailed overview of RAG's functionality is provided, emphasizing its preparation, retrieval, and generation stages. It addresses concerns regarding data security when utilizing RAG by discussing the exposure of external databases to the LLM model within the prompt. **Key points:** - RAG enhances model responses with up-to-date information without retraining. - It enables transparency and trust by citing document snippets in responses. - RAG is suitable for building enterprise knowledge retrieval systems due to efficient retrieval of relevant information. Overall, the session concludes with a summary of discussed patterns, including utilizing LLMs as black boxes and implementing RAG for real-time data access and enhanced responses in enterprise scenarios. The text discusses the architecture and various applications of large language models (LLMs), emphasizing their use in knowledge retrieval, article summarization, auto-correction, machine translation, and natural language generation. Key points include: - **LLM Architecture and Use Cases**: - Great for knowledge retrieval, summarizing large articles, auto-correct, machine translation, and natural language generation. - Key mechanism: prompts for interaction. - **App Store Model**: - LLM providers enable third-party app development through app stores. - Third-party apps built on LLMs, offering custom solutions for enterprises. - Users can access apps via mobile/web UI or workflows built using LLMs. - **Data Ownership and Privacy**: - Important to understand data ownership and privacy implications when using third-party apps. - Contracts between users and app developers, not just the LLM provider. - **Supervised Fine-Tuning**: - Enhances LLM performance for specific domains by retraining with enterprise/domain data. - Modifies model weights and layers to embed domain knowledge. - **Retrieval-Augmented Generation (RAG)**: - Provides an alternative to fine-tuning by using separate data sources for responses. - Utilizes both model and user queries to generate responses, potentially reducing costs. - **Multi-Agent LLM Orchestration**: - Emerging concept involving workflows using multiple LLMs for specific tasks. - Allows stitching together of responses from various LLMs, potentially increasing efficiency. - **Decision Framework for LLM Adoption**: - Organizations need to consider factors like LLM selection, NLP tasks, data sources, fine-tuning/RAG, and deployment. In the classroom discussion, questions arise about the definition of AI, human involvement in decision-making processes, and data privacy frameworks for LLM implementations. It's emphasized that while tools like LLMs raise privacy concerns, the responsibility for privacy frameworks lies with the enterprise rather than the tool itself. Variations in adoption strategy are expected across industries, functions, and enterprise data ownership guidelines. Finally, it's noted that responses from vector databases cannot replace natural language understanding in the RAG model.