<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
    <id>https://weaviate.io/papers</id>
    <title>Weaviate Blog</title>
    <updated>2024-09-01T00:00:00.000Z</updated>
    <generator>https://github.com/jpmonette/feed</generator>
    <link rel="alternate" href="https://weaviate.io/papers"/>
    <subtitle>Weaviate Blog</subtitle>
    <icon>https://weaviate.io/img/favicon.ico</icon>
    <entry>
        <title type="html"><![CDATA[Distillation Experiments]]></title>
        <id>https://weaviate.io/papers/distillation2</id>
        <link href="https://weaviate.io/papers/distillation2"/>
        <updated>2024-09-01T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[Experiments comparing Distillation to Finetuning]]></summary>
        <content type="html"><![CDATA[<p><img loading="lazy" alt="A preview of the paper" src="/assets/images/hero-5ff5c2f45c270ea2c7022077c1af9dbb.png" width="1000" height="654" class="img_ev3q"></p><h2 class="anchor anchorWithStickyNavbar_LWe7" id="is-it-better-to-distill-or-finetune-language-models">Is it better to distill or finetune language models?<a href="#is-it-better-to-distill-or-finetune-language-models" class="hash-link" aria-label="Direct link to Is it better to distill or finetune language models?" title="Direct link to Is it better to distill or finetune language models?">​</a></h2><p>Arcee did a bunch of experiments comparing model distillation to finetuning, base vs instruct model distillation and more. </p><h2 class="anchor anchorWithStickyNavbar_LWe7" id="main-takeaways">Main Takeaways:<a href="#main-takeaways" class="hash-link" aria-label="Direct link to Main Takeaways:" title="Direct link to Main Takeaways:">​</a></h2><p>♦️ Both logit-based and hidden states-based distillation methods consistently outperform standard SFT across various benchmarks.</p><p>♦️ General-Purpose Performance Gains: Significant improvements across datasets like OpenHermes, WebInstruct-Sub, and FineTome, particularly in MMLU and MMLU-Pro benchmarks, indicating enhanced knowledge absorption.</p><p>♦️ Domain-Specific Performance Gains: Distilling models for domain-specific tasks, especially when using the same training data as the teacher model leads to performance improvements.</p><h2 class="anchor anchorWithStickyNavbar_LWe7" id="experiments">Experiments<a href="#experiments" class="hash-link" aria-label="Direct link to Experiments" title="Direct link to Experiments">​</a></h2><p>♦️Experiment 1: What's better Supervised-Finetune(SFT) or Distill+SFT?</p><p>Three models—Hermes-Distilled (logit-based), Hermes-Hidden-States, and Hermes Vanilla (SFT-only)—were evaluated, all distilled from Arcee Spark using a subset of the Teknium's OpenHermes-2.5 dataset (200k examples). Both distillation methods were better than SFT-only model across major benchmarks such as BBH, MUSR, and MMLU-PRO. The logit-based approach was better then the hidden-states-based distillation.</p><p>♦️Experiment 2: Effectiveness of Logit-based Distillation in a Generic Domain</p><p>The 1.5B Distilled model, trained on a 200k subset of WebInstruct-Sub, demonstrated performance improvements over the baseline Qwen2-1.5B-Instruct model across all metrics. Its performance was also comparable to the teacher model, Arcee Spark, particularly on MUSR and GPQA benchmarks.</p><p>♦️Experiment 3: Distillation on Instruct vs. Base Student Models</p><p>The 1.5B-Instruct-Distilled model (logit-based), trained on WebInstruct-Sub, showed performance improvements over the vanilla Qwen2-1.5B-Instruct model on the MMLU benchmark, showing benefits of distillation for enhancing knowledge retrieval.</p><p>♦️Experiment 4: Effectiveness of Domain-specific Distillation</p><p>Distilling Arcee Agent, a 7B parameter model specialized in function calling, into Qwen2-1.5B-Instruct using the same dataset that trained the teacher model resulted in performance gains. This approach underscores the potential of using the same training data for both teacher and student models to achieve even greater improvements, particularly in domain-specific tasks.</p><p>More resources:
<a href="https://blog.arcee.ai/announcing-distillkit/" target="_blank" rel="noopener noreferrer">DistillKit by Arcee AI</a></p><p><a class="btn_VbJ1 btnMain_ywTD" href="https://arxiv.org/abs/2402.13116" download="">🔗 arXiv Link</a></p><p><a class="btn_VbJ1 btnMain_ywTD" href="https://arxiv.org/pdf/2402.13116" download="">📜 Download paper</a></p><h2 class="anchor anchorWithStickyNavbar_LWe7" id="ready-to-start-building">Ready to start building?<a href="#ready-to-start-building" class="hash-link" aria-label="Direct link to Ready to start building?" title="Direct link to Ready to start building?">​</a></h2><p>Check out the <a href="https://docs.weaviate.io/weaviate/quickstart" target="_blank" rel="noopener noreferrer">Quickstart tutorial</a>, or build amazing apps with a free trial of <a href="https://console.weaviate.cloud/" target="_blank" rel="noopener noreferrer">Weaviate Cloud (WCD)</a>.</p><div class="communityWrapper_ZpuS"><div class="container_sUl4"><div class="wrapper_FyvH"><div class="rightSide_UqS8"><div class="socialBox_W1XR"><a href="https://github.com/weaviate/weaviate" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="github_DEOB"></div><p class="text_g9NY">GitHub</p></a></div><div class="socialBox_W1XR"><a href="https://forum.weaviate.io/" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="forum_pUq6"></div><p class="text_g9NY">Forum</p></a></div><div class="socialBox_W1XR"><a href="https://twitter.com/weaviate_io" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="twitter_ewvw"></div><p class="text_g9NY">X (Twitter)</p></a></div></div><div class="leftSide_WlMC"><h2 class="communityHeader_jLni">Don't want to miss another blog post?</h2><span class="rightText_noBq"><p>Sign up for our bi-weekly newsletter to stay updated!</p> <br>By submitting, I agree to the<!-- --> <a href="/service">Terms of Service </a>and<!-- --> <a href="/privacy">Privacy Policy</a>.</span><div class="communityForm_pedn"><iframe src="https://embeds.beehiiv.com/15b21ebd-decd-433b-ada8-2d405e345f2e?slim=true" data-test-id="beehiiv-embed" frameborder="0" scrolling="no" style="margin:0;border-radius:0px;button-colour:#61BD73;background-color:transparent;width:100%;important:"></iframe></div></div></div></div></div>]]></content>
        <author>
            <name>Zain Hasan</name>
            <uri>https://github.com/zainhas</uri>
        </author>
    </entry>
    <entry>
        <title type="html"><![CDATA[Language Model Distillation]]></title>
        <id>https://weaviate.io/papers/distillation</id>
        <link href="https://weaviate.io/papers/distillation"/>
        <updated>2024-08-08T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[Distilling Large Language models in Small Language Models!]]></summary>
        <content type="html"><![CDATA[<p><img loading="lazy" alt="A preview of the paper" src="/assets/images/hero-0d2e9892961dbaa75b52800733619f38.png" width="931" height="522" class="img_ev3q"></p><p><strong>Distillation has become popular recently due to its ability to efficiently compress the knowledge of larger LLMs into smaller ones. Here’s how it works, why it’s useful, and examples of how you can perform distillation</strong></p><h2 class="anchor anchorWithStickyNavbar_LWe7" id="what-is-distillation">What is distillation?<a href="#what-is-distillation" class="hash-link" aria-label="Direct link to What is distillation?" title="Direct link to What is distillation?">​</a></h2><p>Distillation is a model compression technique where a smaller "student" model is trained to mimic the behavior of a larger "teacher" model. This is achieved by transferring knowledge from the teacher to the student, usually through methods like logit-based or hidden states-based distillation. These methods are designed to help the student model replicate the teacher's output distribution or internal representations, often leading to a more efficient model with comparable performance.</p><h2 class="anchor anchorWithStickyNavbar_LWe7" id="when-would-we-use-this">When would we use this?<a href="#when-would-we-use-this" class="hash-link" aria-label="Direct link to When would we use this?" title="Direct link to When would we use this?">​</a></h2><p>Distillation is commonly used when deploying large models is impractical due to resource constraints, such as in real-time applications or edge devices. For instance, a smaller student model can be distilled from a powerful teacher model like Llama3.1 405B, retaining much of the original model’s capability but with significantly lower computational demands. Distillation is also useful when adapting models to specific tasks or domains, as seen in domain-specific distillation cases like "function calling," where specialized knowledge from a teacher model is transferred to a smaller model for specific use cases.</p><h2 class="anchor anchorWithStickyNavbar_LWe7" id="whats-the-benefit">What’s the benefit?<a href="#whats-the-benefit" class="hash-link" aria-label="Direct link to What’s the benefit?" title="Direct link to What’s the benefit?">​</a></h2><p>Distillation offers a significant reduction in model size and computational requirements while maintaining a high level of performance. This is especially valuable in scenarios where memory and processing power are limited. Moreover, distillation allows for flexibility in model architecture choices; for example, distilling knowledge from a Llama-3.1-70B model into a much smaller StableLM-2-1.6B model. Distillation methods like those provided in Arcee-AI's DistillKit, including logit-based and hidden states-based distillation, can lead to substantial performance gains over traditional training routines without requiring additional data.</p><h2 class="anchor anchorWithStickyNavbar_LWe7" id="examples-of-distillation-techniques">Examples of Distillation Techniques:<a href="#examples-of-distillation-techniques" class="hash-link" aria-label="Direct link to Examples of Distillation Techniques:" title="Direct link to Examples of Distillation Techniques:">​</a></h2><h3 class="anchor anchorWithStickyNavbar_LWe7" id="1-logit-based-distillation">(1) Logit-based Distillation:<a href="#1-logit-based-distillation" class="hash-link" aria-label="Direct link to (1) Logit-based Distillation:" title="Direct link to (1) Logit-based Distillation:">​</a></h3><p>This method involves transferring knowledge by using both the hard targets (actual labels) and soft targets (teacher logits) to guide the student model. The student is trained to minimize the difference between its output distribution and the teacher’s output, typically using Kullback-Leibler (KL) divergence. This method is particularly effective for maintaining performance close to the teacher model while improving the student’s generalization abilities.</p><h3 class="anchor anchorWithStickyNavbar_LWe7" id="2-hidden-states-based-distillation">(2) Hidden States-based Distillation:<a href="#2-hidden-states-based-distillation" class="hash-link" aria-label="Direct link to (2) Hidden States-based Distillation:" title="Direct link to (2) Hidden States-based Distillation:">​</a></h3><p>Here, the focus is on aligning the intermediate layer representations of the student with those of the teacher. This layer-wise guidance helps the student model capture similar features and improves its performance and generalization. This method also allows for cross-architecture distillation, enabling knowledge transfer between different model architectures, such as distilling from a Llama-3.1-70B model into a StableLM-2-1.6B model.</p><p>More resources:
<a href="https://arcee-ai-distillkit.my.canva.site/" target="_blank" rel="noopener noreferrer">DistillKit by Arcee AI</a></p><p><a class="btn_VbJ1 btnMain_ywTD" href="https://arxiv.org/abs/2402.13116" download="">🔗 arXiv Link</a></p><p><a class="btn_VbJ1 btnMain_ywTD" href="https://arxiv.org/pdf/2402.13116" download="">📜 Download paper</a></p><h2 class="anchor anchorWithStickyNavbar_LWe7" id="ready-to-start-building">Ready to start building?<a href="#ready-to-start-building" class="hash-link" aria-label="Direct link to Ready to start building?" title="Direct link to Ready to start building?">​</a></h2><p>Check out the <a href="https://docs.weaviate.io/weaviate/quickstart" target="_blank" rel="noopener noreferrer">Quickstart tutorial</a>, or build amazing apps with a free trial of <a href="https://console.weaviate.cloud/" target="_blank" rel="noopener noreferrer">Weaviate Cloud (WCD)</a>.</p><div class="communityWrapper_ZpuS"><div class="container_sUl4"><div class="wrapper_FyvH"><div class="rightSide_UqS8"><div class="socialBox_W1XR"><a href="https://github.com/weaviate/weaviate" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="github_DEOB"></div><p class="text_g9NY">GitHub</p></a></div><div class="socialBox_W1XR"><a href="https://forum.weaviate.io/" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="forum_pUq6"></div><p class="text_g9NY">Forum</p></a></div><div class="socialBox_W1XR"><a href="https://twitter.com/weaviate_io" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="twitter_ewvw"></div><p class="text_g9NY">X (Twitter)</p></a></div></div><div class="leftSide_WlMC"><h2 class="communityHeader_jLni">Don't want to miss another blog post?</h2><span class="rightText_noBq"><p>Sign up for our bi-weekly newsletter to stay updated!</p> <br>By submitting, I agree to the<!-- --> <a href="/service">Terms of Service </a>and<!-- --> <a href="/privacy">Privacy Policy</a>.</span><div class="communityForm_pedn"><iframe src="https://embeds.beehiiv.com/15b21ebd-decd-433b-ada8-2d405e345f2e?slim=true" data-test-id="beehiiv-embed" frameborder="0" scrolling="no" style="margin:0;border-radius:0px;button-colour:#61BD73;background-color:transparent;width:100%;important:"></iframe></div></div></div></div></div>]]></content>
        <author>
            <name>Zain Hasan</name>
            <uri>https://github.com/zainhas</uri>
        </author>
    </entry>
    <entry>
        <title type="html"><![CDATA[LoRA: Low-Rank Adaptation of Large Language Models]]></title>
        <id>https://weaviate.io/papers/lora</id>
        <link href="https://weaviate.io/papers/lora"/>
        <updated>2024-07-28T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[Tuning LLMs by only learning a fraction of weight updates!]]></summary>
        <content type="html"><![CDATA[<p><img loading="lazy" alt="A preview of the paper" src="/assets/images/hero-69fac3dcafdb12cfed9c6c763f12198f.png" width="4144" height="1753" class="img_ev3q"></p><h1>Detailed explanation of Low-Rank Adaptation (LoRA), a method for efficiently fine-tuning pre-trained neural networks.</h1><h2 class="anchor anchorWithStickyNavbar_LWe7" id="the-problem-lora-solves">The Problem LoRA Solves:<a href="#the-problem-lora-solves" class="hash-link" aria-label="Direct link to The Problem LoRA Solves:" title="Direct link to The Problem LoRA Solves:">​</a></h2><ul><li>In early 2021, Microsoft partnered with OpenAI to explore the commercial viability of GPT-3.</li><li>They found that prompting was insufficient for production tasks like natural language to code generation.</li><li>Fine-tuning was necessary but prohibitively expensive due to the large size of model checkpoints.</li></ul><h2 class="anchor anchorWithStickyNavbar_LWe7" id="how-it-works">How It Works:<a href="#how-it-works" class="hash-link" aria-label="Direct link to How It Works:" title="Direct link to How It Works:">​</a></h2><ul><li>LoRA generalizes full fine-tuning(updating every single parameter) by asking two questions:<ol><li>Do we need to fine-tune all parameters?</li><li>For the weight matrices we fine-tune, how expressive should the updates be in terms of matrix rank?</li></ol></li><li>These questions define a 2D plane where full fine-tuning is the top-right corner(full rank and full parameter updates) and the origin represents the original model.</li><li>Any point in this plane is a valid LoRA configuration.</li></ul><p>The chosen rank of the update matrix controls the expressivity of the finetuning process.</p><ul><li>A d x d matrix can represent any linear transformation in a d-dimensional vector space.</li><li>By first transforming the input to a lower-dimensional space and then back to the original space, we can restrict the kind of linear transformations that can be represented.</li><li>This reduces the number of parameters that need to be stored from (dxd) to (dxr + dxr) where r &lt;&lt; d.</li><li>A point near the origin often performs as well as full fine-tuning. - because often Neural Networks are over-parametrized and thus the weight matrices are full of linearly dependent </li><li>This suggests that we can start with a low-rank configuration and gradually increase the rank if needed.</li></ul><h2 class="anchor anchorWithStickyNavbar_LWe7" id="common-practices-when-using-lora">Common practices when using LoRA:<a href="#common-practices-when-using-lora" class="hash-link" aria-label="Direct link to Common practices when using LoRA:" title="Direct link to Common practices when using LoRA:">​</a></h2><ul><li>How to choose the rank R of the update matrix: Start with a low rank and increase it if needed.</li><li>When to use full fine-tuning?: When finetuning on data that is completely new and absent from the pretraining of the base model (for example if you are tuning an English model on Martian then full fine-tuning may be necessary).</li><li>Can I use LoRA for any model architecture?: As long as the model uses matrix multiplication, LoRA can be applied. So basically pretty much every model architecture can use LoRA!</li></ul><h2 class="anchor anchorWithStickyNavbar_LWe7" id="benefits-of-lora">Benefits of LoRA:<a href="#benefits-of-lora" class="hash-link" aria-label="Direct link to Benefits of LoRA:" title="Direct link to Benefits of LoRA:">​</a></h2><ul><li>Reduced checkpoint sizes: On GPT-3, checkpoint size was reduced from 1TB to 25MB.</li><li>No additional inference latency: LoRA updates can be merged with the original parameters during inference. W_new = W_old + AxB</li><li>Ability to quickly switch between tasks: LoRA modules can be loaded and unloaded efficiently.(A_frenchxB_french),(A_germanxB_german),(A_spanishxB_spanish)</li></ul><h2 class="anchor anchorWithStickyNavbar_LWe7" id="some-interesting-engineering-ideas-enabled-by-lora">Some interesting engineering ideas enabled by LoRA:<a href="#some-interesting-engineering-ideas-enabled-by-lora" class="hash-link" aria-label="Direct link to Some interesting engineering ideas enabled by LoRA:" title="Direct link to Some interesting engineering ideas enabled by LoRA:">​</a></h2><ul><li>Caching LoRA modules in RAM for faster model switching and routing between different finetunes.</li><li>Training multiple LoRA modules in parallel on different batches of the training set.</li><li>Creating a tree of adaptive models where each node is a LoRA module.</li></ul><p><a class="btn_VbJ1 btnMain_ywTD" href="https://arxiv.org/abs/2106.09685" download="">🔗 arXiv Link</a></p><p><a class="btn_VbJ1 btnMain_ywTD" href="https://arxiv.org/pdf/2106.09685" download="">📜 Download paper</a></p><h2 class="anchor anchorWithStickyNavbar_LWe7" id="ready-to-start-building">Ready to start building?<a href="#ready-to-start-building" class="hash-link" aria-label="Direct link to Ready to start building?" title="Direct link to Ready to start building?">​</a></h2><p>Check out the <a href="https://docs.weaviate.io/weaviate/quickstart" target="_blank" rel="noopener noreferrer">Quickstart tutorial</a>, or build amazing apps with a free trial of <a href="https://console.weaviate.cloud/" target="_blank" rel="noopener noreferrer">Weaviate Cloud (WCD)</a>.</p><div class="communityWrapper_ZpuS"><div class="container_sUl4"><div class="wrapper_FyvH"><div class="rightSide_UqS8"><div class="socialBox_W1XR"><a href="https://github.com/weaviate/weaviate" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="github_DEOB"></div><p class="text_g9NY">GitHub</p></a></div><div class="socialBox_W1XR"><a href="https://forum.weaviate.io/" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="forum_pUq6"></div><p class="text_g9NY">Forum</p></a></div><div class="socialBox_W1XR"><a href="https://twitter.com/weaviate_io" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="twitter_ewvw"></div><p class="text_g9NY">X (Twitter)</p></a></div></div><div class="leftSide_WlMC"><h2 class="communityHeader_jLni">Don't want to miss another blog post?</h2><span class="rightText_noBq"><p>Sign up for our bi-weekly newsletter to stay updated!</p> <br>By submitting, I agree to the<!-- --> <a href="/service">Terms of Service </a>and<!-- --> <a href="/privacy">Privacy Policy</a>.</span><div class="communityForm_pedn"><iframe src="https://embeds.beehiiv.com/15b21ebd-decd-433b-ada8-2d405e345f2e?slim=true" data-test-id="beehiiv-embed" frameborder="0" scrolling="no" style="margin:0;border-radius:0px;button-colour:#61BD73;background-color:transparent;width:100%;important:"></iframe></div></div></div></div></div>]]></content>
        <author>
            <name>Zain Hasan</name>
            <uri>https://github.com/zainhas</uri>
        </author>
    </entry>
    <entry>
        <title type="html"><![CDATA[Prover-verifier Games Improve Legibility of LLM Outputs]]></title>
        <id>https://weaviate.io/papers/prover</id>
        <link href="https://weaviate.io/papers/prover"/>
        <updated>2024-07-22T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[Optimizing to improve legibility of an LLMs output!]]></summary>
        <content type="html"><![CDATA[<p><img loading="lazy" alt="A preview of the paper" src="/assets/images/hero-6464c0c35b338b8162ccf32df9d41317.png" width="1194" height="759" class="img_ev3q"></p><h2 class="anchor anchorWithStickyNavbar_LWe7" id="training-llms-to-write-solutions-such-that-smaller-models-can-better-check-them-this-makes-them-easier-to-check-for-humans-too">Training LLMs to write solutions such that smaller models can better check them. This makes them easier to check for humans, too.<a href="#training-llms-to-write-solutions-such-that-smaller-models-can-better-check-them-this-makes-them-easier-to-check-for-humans-too" class="hash-link" aria-label="Direct link to Training LLMs to write solutions such that smaller models can better check them. This makes them easier to check for humans, too." title="Direct link to Training LLMs to write solutions such that smaller models can better check them. This makes them easier to check for humans, too.">​</a></h2><h2 class="anchor anchorWithStickyNavbar_LWe7" id="key-findings"><strong>Key Findings</strong><a href="#key-findings" class="hash-link" aria-label="Direct link to key-findings" title="Direct link to key-findings">​</a></h2><ol><li><p><strong>Performance vs. Legibility Trade-Off</strong>:    </p><ul><li>A model with only supervised fine-tuning on human-written derivations achieved high legibility but poor accuracy.</li><li>Optimizing solely for correctness led to high accuracy but poor legibility.</li><li>They propose method balanced both, retaining high legibility with moderate accuracy.</li></ul></li><li><p><strong>They Train Three Models</strong>:    </p><ul><li><strong>Robust Verifier</strong>: Accurately distinguishes correct from incorrect solutions, even when misleading.</li><li><strong>Helpful Prover</strong>: Generates human-legible solutions, reducing human error.</li><li><strong>Sneaky Prover</strong>: Produces subtly incorrect solutions, initially confusing human evaluators and highlighting improvement areas.</li></ul></li></ol><h2 class="anchor anchorWithStickyNavbar_LWe7" id="how-it-works"><strong>How It Works</strong><a href="#how-it-works" class="hash-link" aria-label="Direct link to how-it-works" title="Direct link to how-it-works">​</a></h2><p>To train the prover models to produce human-legible text, they use a scheme where the prover's solutions must be easily verifiable by a weaker model. </p><p>They use models from the GPT-4 family with significant differences in pretraining compute, they train using a GAN like(generator and discriminator) alternating training process as follows:</p><ol><li>Generate multiple solutions to math problems from the prover, some of which are naturally incorrect.</li><li>Alternate between two training phases:<ul><li>Verifier Training: Train the weak verifier to predict the correctness of the prover’s solutions.</li><li>Prover Training: Instruct the prover to be either "helpful" or "sneaky," rewarding correct solutions in the former role and incorrect solutions that fool the verifier in the latter.</li></ul></li></ol><p>Initially, the sneaky prover easily deceives the verifier. However, as incorrect solutions are added to verifier training, the prover must continuously develop new strategies. Training concludes when the sneaky prover can no longer deceive the verifier.</p><p><strong>Training Framework</strong>: Multi-agent setup with honest prover (correct solutions) and sneaky prover (deceptive solutions), both checked by a smaller model(robust verifier).</p><h2 class="anchor anchorWithStickyNavbar_LWe7" id="details">Details:<a href="#details" class="hash-link" aria-label="Direct link to Details:" title="Direct link to Details:">​</a></h2><ul><li>Unique, effective loss function.</li><li>Honest prover's solutions become detailed and reasoned over time.</li><li>Sneaky prover's solutions develop subtler flaws.</li><li>Legibility transfer: Honest prover’s solutions become easier for humans to verify with training.</li><li>Humans perform better (speed and accuracy) with more trained models in time-constrained tasks.</li><li>Traditional reinforcement learning for correctness leads to less legible solutions.</li><li>Approach helps in training models to explain actions comprehensibly, reducing deception.</li><li>"Legibility tax" affects large model performance. - as legibility increases performance decreases!</li></ul><p><a class="btn_VbJ1 btnMain_ywTD" href="https://cdn.openai.com/prover-verifier-games-improve-legibility-of-llm-outputs/legibility.pdf" download="">🔗 arXiv Link</a></p><p><a class="btn_VbJ1 btnMain_ywTD" href="https://cdn.openai.com/prover-verifier-games-improve-legibility-of-llm-outputs/legibility.pdf" download="">📜 Download paper</a></p><h2 class="anchor anchorWithStickyNavbar_LWe7" id="ready-to-start-building">Ready to start building?<a href="#ready-to-start-building" class="hash-link" aria-label="Direct link to Ready to start building?" title="Direct link to Ready to start building?">​</a></h2><p>Check out the <a href="https://docs.weaviate.io/weaviate/quickstart" target="_blank" rel="noopener noreferrer">Quickstart tutorial</a>, or build amazing apps with a free trial of <a href="https://console.weaviate.cloud/" target="_blank" rel="noopener noreferrer">Weaviate Cloud (WCD)</a>.</p><div class="communityWrapper_ZpuS"><div class="container_sUl4"><div class="wrapper_FyvH"><div class="rightSide_UqS8"><div class="socialBox_W1XR"><a href="https://github.com/weaviate/weaviate" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="github_DEOB"></div><p class="text_g9NY">GitHub</p></a></div><div class="socialBox_W1XR"><a href="https://forum.weaviate.io/" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="forum_pUq6"></div><p class="text_g9NY">Forum</p></a></div><div class="socialBox_W1XR"><a href="https://twitter.com/weaviate_io" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="twitter_ewvw"></div><p class="text_g9NY">X (Twitter)</p></a></div></div><div class="leftSide_WlMC"><h2 class="communityHeader_jLni">Don't want to miss another blog post?</h2><span class="rightText_noBq"><p>Sign up for our bi-weekly newsletter to stay updated!</p> <br>By submitting, I agree to the<!-- --> <a href="/service">Terms of Service </a>and<!-- --> <a href="/privacy">Privacy Policy</a>.</span><div class="communityForm_pedn"><iframe src="https://embeds.beehiiv.com/15b21ebd-decd-433b-ada8-2d405e345f2e?slim=true" data-test-id="beehiiv-embed" frameborder="0" scrolling="no" style="margin:0;border-radius:0px;button-colour:#61BD73;background-color:transparent;width:100%;important:"></iframe></div></div></div></div></div>]]></content>
        <author>
            <name>Zain Hasan</name>
            <uri>https://github.com/zainhas</uri>
        </author>
    </entry>
    <entry>
        <title type="html"><![CDATA[RouteLLM: Learning to Route LLMs with Preference Data]]></title>
        <id>https://weaviate.io/papers/routellm</id>
        <link href="https://weaviate.io/papers/routellm"/>
        <updated>2024-07-14T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[Route between LLMs to optimize cost and quality!]]></summary>
        <content type="html"><![CDATA[<p><img loading="lazy" alt="A preview of the paper" src="/assets/images/hero-ad579ee91c1c05c3964ad05400e6599f.png" width="4391" height="1874" class="img_ev3q"></p><h2 class="anchor anchorWithStickyNavbar_LWe7" id="you-dont-need-a-2-trillion-parameter-model-to-tell-you-the-capitol-of-france-is-paris">You don't need a 2 trillion parameter model to tell you the capitol of France is Paris.<a href="#you-dont-need-a-2-trillion-parameter-model-to-tell-you-the-capitol-of-france-is-paris" class="hash-link" aria-label="Direct link to You don't need a 2 trillion parameter model to tell you the capitol of France is Paris." title="Direct link to You don't need a 2 trillion parameter model to tell you the capitol of France is Paris.">​</a></h2><p>Be smart and route between a panel of models according to query difficulty and model specialty! </p><p>New paper proposes a framework to train a router that routes queries to the appropriate LLM to optimize the trade-off b/w cost vs. performance.</p><p>Model inference cost varies significantly: Per one million output tokens: Llama-3-70b ($1) vs. GPT-4-0613 ($60), Haiku ($1.25) vs. Opus ($75)</p><h2 class="anchor anchorWithStickyNavbar_LWe7" id="overview">Overview:<a href="#overview" class="hash-link" aria-label="Direct link to Overview:" title="Direct link to Overview:">​</a></h2><p>The RouteLLM paper propose a router training framework based on human preference data and augmentation techniques, demonstrating over 2x cost saving on widely used benchmarks.</p><p>They define the problem as having to choose between two classes of models:
(1) strong models - produce high quality responses but at a high cost (GPT-4o, Claude3.5) </p><p>  (2) weak models - relatively lower quality and lower cost (Mixtral8x7B, Llama3-8b)</p><p>A good router requires a deep understanding of the question’s complexity as well as the strengths and weaknesses of the available LLMs.</p><p>Explore different routing approaches:</p><ul><li>Similarity-weighted (SW) ranking</li><li>Matrix factorization</li><li>BERT query classifier</li><li>Causal LLM query classifier</li></ul><h2 class="anchor anchorWithStickyNavbar_LWe7" id="neat-ideas-to-build-from">Neat Ideas to Build From:<a href="#neat-ideas-to-build-from" class="hash-link" aria-label="Direct link to Neat Ideas to Build From:" title="Direct link to Neat Ideas to Build From:">​</a></h2><ul><li><p>Users can collect a small amount of in-domain data to improve performance for their specific use cases via dataset augmentation.</p></li><li><p>Can expand this problem from routing between a strong and weak LLM to a multiclass model routing approach where we have specialist models(language vision model, function calling model etc.)</p></li><li><p>Larger framework controlled by a router - imagine a system of 15-20 tuned small models and the router as the n+1'th model responsible for picking the LLM that will handle a particular query at inference time.</p></li><li><p><strong>MoA architectures:</strong> Routing to different architectures of a Mixture of Agents would be a cool idea as well. Depending on the query you decide how many proposers there should be, how many layers in the mixture, what the aggregate models should be etc.</p></li><li><p><strong>Route based caching:</strong> If you get redundant queries that are slightly different then route the query+previous answer to a small model to light rewriting instead of regenerating the answer</p></li></ul><p><a class="btn_VbJ1 btnMain_ywTD" href="https://arxiv.org/abs/2406.18665" download="">🔗 arXiv Link</a></p><p><a class="btn_VbJ1 btnMain_ywTD" href="https://arxiv.org/pdf/2406.18665" download="">📜 Download paper</a></p><h2 class="anchor anchorWithStickyNavbar_LWe7" id="ready-to-start-building">Ready to start building?<a href="#ready-to-start-building" class="hash-link" aria-label="Direct link to Ready to start building?" title="Direct link to Ready to start building?">​</a></h2><p>Check out the <a href="https://docs.weaviate.io/weaviate/quickstart" target="_blank" rel="noopener noreferrer">Quickstart tutorial</a>, or build amazing apps with a free trial of <a href="https://console.weaviate.cloud/" target="_blank" rel="noopener noreferrer">Weaviate Cloud (WCD)</a>.</p><div class="communityWrapper_ZpuS"><div class="container_sUl4"><div class="wrapper_FyvH"><div class="rightSide_UqS8"><div class="socialBox_W1XR"><a href="https://github.com/weaviate/weaviate" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="github_DEOB"></div><p class="text_g9NY">GitHub</p></a></div><div class="socialBox_W1XR"><a href="https://forum.weaviate.io/" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="forum_pUq6"></div><p class="text_g9NY">Forum</p></a></div><div class="socialBox_W1XR"><a href="https://twitter.com/weaviate_io" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="twitter_ewvw"></div><p class="text_g9NY">X (Twitter)</p></a></div></div><div class="leftSide_WlMC"><h2 class="communityHeader_jLni">Don't want to miss another blog post?</h2><span class="rightText_noBq"><p>Sign up for our bi-weekly newsletter to stay updated!</p> <br>By submitting, I agree to the<!-- --> <a href="/service">Terms of Service </a>and<!-- --> <a href="/privacy">Privacy Policy</a>.</span><div class="communityForm_pedn"><iframe src="https://embeds.beehiiv.com/15b21ebd-decd-433b-ada8-2d405e345f2e?slim=true" data-test-id="beehiiv-embed" frameborder="0" scrolling="no" style="margin:0;border-radius:0px;button-colour:#61BD73;background-color:transparent;width:100%;important:"></iframe></div></div></div></div></div>]]></content>
        <author>
            <name>Zain Hasan</name>
            <uri>https://github.com/zainhas</uri>
        </author>
    </entry>
    <entry>
        <title type="html"><![CDATA[Adaptive Retrieval and Scalable Indexing for k-NN Search with Cross-Encoders]]></title>
        <id>https://weaviate.io/papers/axn</id>
        <link href="https://weaviate.io/papers/axn"/>
        <updated>2024-07-07T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[The quality of a reranked retreiver and the speed of a bi-encoder retreiver!]]></summary>
        <content type="html"><![CDATA[<p><img loading="lazy" alt="A preview of the paper" src="/assets/images/hero-10a25bd18fd5bbe296569a48669d223a.png" width="3981" height="1664" class="img_ev3q"></p><h2 class="anchor anchorWithStickyNavbar_LWe7" id="how-do-you-get-the-retrieval-quality-of-a-cross-encoderre-ranker-and-the-efficiency-of-a-bi-encoder">How do you get the retrieval quality of a cross-encoder/re-ranker and the efficiency of a bi-encoder?<a href="#how-do-you-get-the-retrieval-quality-of-a-cross-encoderre-ranker-and-the-efficiency-of-a-bi-encoder" class="hash-link" aria-label="Direct link to How do you get the retrieval quality of a cross-encoder/re-ranker and the efficiency of a bi-encoder?" title="Direct link to How do you get the retrieval quality of a cross-encoder/re-ranker and the efficiency of a bi-encoder?">​</a></h2><p>Typically people choose to do this with the trusty old retrieve-and-re-rank approach. </p><p>This new paper from DeepMind proposes Adaptive Cross-Encoder Nearest Neighbor Search, an alternative which approximates the re-ranker query-document similarities while still using a bi-encoder setup.</p><h2 class="anchor anchorWithStickyNavbar_LWe7" id="high-level-overview">High-level Overview:<a href="#high-level-overview" class="hash-link" aria-label="Direct link to High-level Overview:" title="Direct link to High-level Overview:">​</a></h2><p>You can think of this as an efficient way to train an adaptor for the query vector that transforms the query vector in such a way that makes the similarity scores b/w query-documents more like the cross-encoder similarity scores.</p><ul><li>Once you pass the query vector through the trained adopter then you can simply use cosine similarity with the document embeddings</li><li>Can use existing bi-encoder models to initialize the item and query embeddings</li><li>In an offline indexing step -&gt; compute query/item embeddings to index a given set of items from a target domain making sure the similarity scores are similar to cross encoder scores</li><li>At test time -&gt; compute the test query embedding to approximate cross-encoder scores of the given test query for a small set of adaptively-chosen items</li><li>Perform retrieval with the approximate cross-encoder scores</li></ul><h2 class="anchor anchorWithStickyNavbar_LWe7" id="details">Details:<a href="#details" class="hash-link" aria-label="Direct link to Details:" title="Direct link to Details:">​</a></h2><p>At test time, they keep item embeddings fixed and perform retrieval over multiple rounds, alternating between:</p><blockquote><blockquote><p>estimating the test query embedding by minimizing error in approximating CE scores of items retrieved thus far, and </p></blockquote></blockquote><blockquote><blockquote><p>using the updated test query embedding for retrieving more items in the next round.</p></blockquote></blockquote><p>In the last round once the test query embedding is fully refined, this test query embedding can now be used to retrieve items using cosine similarity.</p><p>Proposed k-NN search method can achieve up to 5% and 54% improvement in k-NN recall for k = 1 and 100 respectively over the widely-used DE-based retrieve-and-rerank approach.</p><p><a class="btn_VbJ1 btnMain_ywTD" href="https://arxiv.org/abs/2405.03651" download="">🔗 arXiv Link</a></p><p><a class="btn_VbJ1 btnMain_ywTD" href="https://arxiv.org/pdf/2405.03651" download="">📜 Download paper</a></p><h2 class="anchor anchorWithStickyNavbar_LWe7" id="ready-to-start-building">Ready to start building?<a href="#ready-to-start-building" class="hash-link" aria-label="Direct link to Ready to start building?" title="Direct link to Ready to start building?">​</a></h2><p>Check out the <a href="https://docs.weaviate.io/weaviate/quickstart" target="_blank" rel="noopener noreferrer">Quickstart tutorial</a>, or build amazing apps with a free trial of <a href="https://console.weaviate.cloud/" target="_blank" rel="noopener noreferrer">Weaviate Cloud (WCD)</a>.</p><div class="communityWrapper_ZpuS"><div class="container_sUl4"><div class="wrapper_FyvH"><div class="rightSide_UqS8"><div class="socialBox_W1XR"><a href="https://github.com/weaviate/weaviate" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="github_DEOB"></div><p class="text_g9NY">GitHub</p></a></div><div class="socialBox_W1XR"><a href="https://forum.weaviate.io/" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="forum_pUq6"></div><p class="text_g9NY">Forum</p></a></div><div class="socialBox_W1XR"><a href="https://twitter.com/weaviate_io" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="twitter_ewvw"></div><p class="text_g9NY">X (Twitter)</p></a></div></div><div class="leftSide_WlMC"><h2 class="communityHeader_jLni">Don't want to miss another blog post?</h2><span class="rightText_noBq"><p>Sign up for our bi-weekly newsletter to stay updated!</p> <br>By submitting, I agree to the<!-- --> <a href="/service">Terms of Service </a>and<!-- --> <a href="/privacy">Privacy Policy</a>.</span><div class="communityForm_pedn"><iframe src="https://embeds.beehiiv.com/15b21ebd-decd-433b-ada8-2d405e345f2e?slim=true" data-test-id="beehiiv-embed" frameborder="0" scrolling="no" style="margin:0;border-radius:0px;button-colour:#61BD73;background-color:transparent;width:100%;important:"></iframe></div></div></div></div></div>]]></content>
        <author>
            <name>Zain Hasan</name>
            <uri>https://github.com/zainhas</uri>
        </author>
    </entry>
    <entry>
        <title type="html"><![CDATA[Many-Shot In-Context Learning]]></title>
        <id>https://weaviate.io/papers/manyshoticl</id>
        <link href="https://weaviate.io/papers/manyshoticl"/>
        <updated>2024-07-05T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[All the details around teaching LLMs by giving examples!]]></summary>
        <content type="html"><![CDATA[<p><img loading="lazy" alt="A preview of the paper" src="/assets/images/hero-6efa327f047e43183e4bf4a14fa88108.png" width="1315" height="509" class="img_ev3q"></p><h2 class="anchor anchorWithStickyNavbar_LWe7" id="should-you-finetune-your-llm-or-just-give-relevant-examples-in-the-prompt-how-many-examples-should-you-give-for-best-performance-if-you-give-more-will-it-hurt-perf-does-order-of-the-examples-matter">Should you finetune your LLM or just give relevant examples in the prompt? How many examples should you give for best performance?? If you give more will it hurt perf?? Does order of the examples matter!??<a href="#should-you-finetune-your-llm-or-just-give-relevant-examples-in-the-prompt-how-many-examples-should-you-give-for-best-performance-if-you-give-more-will-it-hurt-perf-does-order-of-the-examples-matter" class="hash-link" aria-label="Direct link to Should you finetune your LLM or just give relevant examples in the prompt? How many examples should you give for best performance?? If you give more will it hurt perf?? Does order of the examples matter!??" title="Direct link to Should you finetune your LLM or just give relevant examples in the prompt? How many examples should you give for best performance?? If you give more will it hurt perf?? Does order of the examples matter!??">​</a></h2><p>New paper from Deepmind answers all these questions and more, so much to take away from this one, lets dig in!</p><h2 class="anchor anchorWithStickyNavbar_LWe7" id="main-takeaways">Main Takeaways:<a href="#main-takeaways" class="hash-link" aria-label="Direct link to Main Takeaways:" title="Direct link to Main Takeaways:">​</a></h2><ul><li>Large performance jumps when going from providing very few(1-5) examples(few-shot incontext learning(ICL) to providing many(100s-1000s) examples(many-shot ICL) - The harder the task the more it benefits from more examples in the prompt!</li><li>Propose using synthetically generated examples(as opposed to human labelled ones) and find that works quite well</li><li>Propose providing only questions and no answers, in the examples, and find this also works quite well!!</li><li>Show that many-shot ICL can overcome pre-training biases, perform comparably to supervised fine-tuning, and learn non-NLP prediction tasks.</li></ul><h2 class="anchor anchorWithStickyNavbar_LWe7" id="juicy-details">Juicy Details:<a href="#juicy-details" class="hash-link" aria-label="Direct link to Juicy Details:" title="Direct link to Juicy Details:">​</a></h2><ul><li><p>Full supervised/instruction fine-tuning only slightly outperforms many-shot ICL for many tasks</p></li><li><p>They mainly test Gemini 1.5 but also try GPT4 and Claude 3.5 and show that different LLMs have varying degrees of success when using many-shot ICL - not a model agnostic trick</p></li><li><p>They show that if you provide encough examples in the prompt it can adapt to unseen non-lingual tasks and even in domains that might be misaligned with an LLM’s training data</p></li><li><p>Surprisingly, the order of examples in the prompt also influences many-shot performance - would be interesting to see how optimization systems like DSPy can help with this</p></li><li><p>Adding more examples, then optimal, in the prompt can also sometimes degrade performance for certain tasks - <strong>weird finding</strong> - opportunity for DSPy to do its thing here aswell</p></li><li><p>Many-shot ICL achieves comparable or superior performance when using only problems compared to using problems with solutions - signals that providing solutions with many-shot ICL might just be redundant</p></li><li><p>Many shot ICL also shows an improvement in out-of-distribution general problem-solving abilities from in-context learning - Math tasks and etc.</p></li><li><p>Biases instilled in the model during pre-training can also be overcome with many shot ICL - a small number of shots leads to the model giving into the bias but with enough examples this eventually diminishes as task learning takes effect in the many shot regime.</p></li></ul><p><a class="btn_VbJ1 btnMain_ywTD" href="https://arxiv.org/abs/2404.11018" download="">🔗 arXiv Link</a></p><p><a class="btn_VbJ1 btnMain_ywTD" href="https://arxiv.org/pdf/2404.11018" download="">📜 Download paper</a></p><h2 class="anchor anchorWithStickyNavbar_LWe7" id="ready-to-start-building">Ready to start building?<a href="#ready-to-start-building" class="hash-link" aria-label="Direct link to Ready to start building?" title="Direct link to Ready to start building?">​</a></h2><p>Check out the <a href="https://docs.weaviate.io/weaviate/quickstart" target="_blank" rel="noopener noreferrer">Quickstart tutorial</a>, or build amazing apps with a free trial of <a href="https://console.weaviate.cloud/" target="_blank" rel="noopener noreferrer">Weaviate Cloud (WCD)</a>.</p><div class="communityWrapper_ZpuS"><div class="container_sUl4"><div class="wrapper_FyvH"><div class="rightSide_UqS8"><div class="socialBox_W1XR"><a href="https://github.com/weaviate/weaviate" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="github_DEOB"></div><p class="text_g9NY">GitHub</p></a></div><div class="socialBox_W1XR"><a href="https://forum.weaviate.io/" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="forum_pUq6"></div><p class="text_g9NY">Forum</p></a></div><div class="socialBox_W1XR"><a href="https://twitter.com/weaviate_io" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="twitter_ewvw"></div><p class="text_g9NY">X (Twitter)</p></a></div></div><div class="leftSide_WlMC"><h2 class="communityHeader_jLni">Don't want to miss another blog post?</h2><span class="rightText_noBq"><p>Sign up for our bi-weekly newsletter to stay updated!</p> <br>By submitting, I agree to the<!-- --> <a href="/service">Terms of Service </a>and<!-- --> <a href="/privacy">Privacy Policy</a>.</span><div class="communityForm_pedn"><iframe src="https://embeds.beehiiv.com/15b21ebd-decd-433b-ada8-2d405e345f2e?slim=true" data-test-id="beehiiv-embed" frameborder="0" scrolling="no" style="margin:0;border-radius:0px;button-colour:#61BD73;background-color:transparent;width:100%;important:"></iframe></div></div></div></div></div>]]></content>
        <author>
            <name>Zain Hasan</name>
            <uri>https://github.com/zainhas</uri>
        </author>
    </entry>
    <entry>
        <title type="html"><![CDATA[Token Pooling to Scale Multi-Vector Retrieval Systems]]></title>
        <id>https://weaviate.io/papers/colbertpooling</id>
        <link href="https://weaviate.io/papers/colbertpooling"/>
        <updated>2024-06-30T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[Clustering tokens to make ColBERT more efficient and usable with Vector DBs!]]></summary>
        <content type="html"><![CDATA[<p><img loading="lazy" alt="A preview of the paper" src="/assets/images/hero-a5b08b51ed431f8240c02023b9622097.png" width="1070" height="636" class="img_ev3q"></p><h2 class="anchor anchorWithStickyNavbar_LWe7" id="multi-vector-retrieval-approaches-like-colbert-have-great-retrieval-quality-but-vector-count-can-balloon-answerai-propose-a-solution">🏹Multi-vector retrieval approaches, like ColBERT, have great retrieval quality but vector count can balloon, AnswerAI propose a solution!<a href="#multi-vector-retrieval-approaches-like-colbert-have-great-retrieval-quality-but-vector-count-can-balloon-answerai-propose-a-solution" class="hash-link" aria-label="Direct link to 🏹Multi-vector retrieval approaches, like ColBERT, have great retrieval quality but vector count can balloon, AnswerAI propose a solution!" title="Direct link to 🏹Multi-vector retrieval approaches, like ColBERT, have great retrieval quality but vector count can balloon, AnswerAI propose a solution!">​</a></h2><p>Below is an explanation of how ColBERT works and AnswerAI's proposed modification!</p><h2 class="anchor anchorWithStickyNavbar_LWe7" id="breakdown-of-different-types-of-encoders">Breakdown of different types of encoders:<a href="#breakdown-of-different-types-of-encoders" class="hash-link" aria-label="Direct link to Breakdown of different types of encoders:" title="Direct link to Breakdown of different types of encoders:">​</a></h2><p><strong>Cross-encoders:</strong></p><ul><li>Document text &amp; query text strings concatenated and passed into a cross-encoder which then outputs a rank/score.</li></ul><p><strong>Bi-encoders:</strong></p><ul><li>Document text passed into an encoder and generates a document embedding</li><li>Query text separately passed into an encoder and generates a query embedding</li><li>Similarity of query and doc embedding calculated</li><li>Retrieval performance can suffer especially on Out-Of-Domain data</li></ul><p><strong>Multi-vector bi-encoder - (eg. ColBERT):</strong></p><ul><li>Functions as a bi-encoder: all documents representations are pre-computed in isolation</li><li>Similarity computation occurs between individual query and document token vectors, as opposed to the full document.</li></ul><h2 class="anchor anchorWithStickyNavbar_LWe7" id="main-weakness-of-multi-vector-approaches">Main weakness of multi-vector approaches:<a href="#main-weakness-of-multi-vector-approaches" class="hash-link" aria-label="Direct link to Main weakness of multi-vector approaches:" title="Direct link to Main weakness of multi-vector approaches:">​</a></h2><ol><li><p>Storage and memory usage balloons up, each token in a document requires a vector(as opposed to one document = one vector)</p></li><li><p>Complicated to efficiently search through multiple vectors</p></li></ol><h2 class="anchor anchorWithStickyNavbar_LWe7" id="answerais-proposed-token-pooling-solution">AnswerAI's Proposed Token Pooling Solution:<a href="#answerais-proposed-token-pooling-solution" class="hash-link" aria-label="Direct link to AnswerAI's Proposed Token Pooling Solution:" title="Direct link to AnswerAI's Proposed Token Pooling Solution:">​</a></h2><ul><li>Main Idea: a lot of the tokens are likely to carry somewhat redundant semantic information, we can semantically cluster them!</li><li>Requires no model modification whatsoever, nor any complex processing, while&nbsp;greatly improving the scalability of easily updatable indexing methods - like HNSW, which are typically harder to use with ColBERT.</li></ul><p><strong>Approach:</strong></p><ul><li>Token pooling by clustering similar tokens within a given document and averaging (mean pooling) their representation.</li><li>After being pooled, the vectors are then quantised to 2-bits using the ColBERTv2 quantisation approach</li><li>Each cluster is represented by a single token by averaging the values of contained tokens</li></ul><h2 class="anchor anchorWithStickyNavbar_LWe7" id="results">Results:<a href="#results" class="hash-link" aria-label="Direct link to Results:" title="Direct link to Results:">​</a></h2><ul><li>All results compared to non-pooled ColBERT vector approach</li><li>Pooling by a factor 2 achieves a&nbsp;50%&nbsp;vector count reduction and 100.6% retrieval performance on average.</li><li>Pool factor = 3 achieves 66%&nbsp;reduction while reaching&nbsp;99% performance.</li><li>Pool factor = 4 achieves 75%&nbsp;reduction while reaching&nbsp;97% performance.</li></ul><p><a href="https://github.com/stanford-futuredata/ColBERT" target="_blank" rel="noopener noreferrer">Code</a></p><p><a href="https://www.answer.ai/posts/colbert-pooling.html" target="_blank" rel="noopener noreferrer">Blog</a></p><p><a class="btn_VbJ1 btnMain_ywTD" href="https://www.answer.ai/posts/colbert-pooling.html" download="">🔗 Blog Link</a></p><h2 class="anchor anchorWithStickyNavbar_LWe7" id="ready-to-start-building">Ready to start building?<a href="#ready-to-start-building" class="hash-link" aria-label="Direct link to Ready to start building?" title="Direct link to Ready to start building?">​</a></h2><p>Check out the <a href="https://docs.weaviate.io/weaviate/quickstart" target="_blank" rel="noopener noreferrer">Quickstart tutorial</a>, or build amazing apps with a free trial of <a href="https://console.weaviate.cloud/" target="_blank" rel="noopener noreferrer">Weaviate Cloud (WCD)</a>.</p><div class="communityWrapper_ZpuS"><div class="container_sUl4"><div class="wrapper_FyvH"><div class="rightSide_UqS8"><div class="socialBox_W1XR"><a href="https://github.com/weaviate/weaviate" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="github_DEOB"></div><p class="text_g9NY">GitHub</p></a></div><div class="socialBox_W1XR"><a href="https://forum.weaviate.io/" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="forum_pUq6"></div><p class="text_g9NY">Forum</p></a></div><div class="socialBox_W1XR"><a href="https://twitter.com/weaviate_io" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="twitter_ewvw"></div><p class="text_g9NY">X (Twitter)</p></a></div></div><div class="leftSide_WlMC"><h2 class="communityHeader_jLni">Don't want to miss another blog post?</h2><span class="rightText_noBq"><p>Sign up for our bi-weekly newsletter to stay updated!</p> <br>By submitting, I agree to the<!-- --> <a href="/service">Terms of Service </a>and<!-- --> <a href="/privacy">Privacy Policy</a>.</span><div class="communityForm_pedn"><iframe src="https://embeds.beehiiv.com/15b21ebd-decd-433b-ada8-2d405e345f2e?slim=true" data-test-id="beehiiv-embed" frameborder="0" scrolling="no" style="margin:0;border-radius:0px;button-colour:#61BD73;background-color:transparent;width:100%;important:"></iframe></div></div></div></div></div>]]></content>
        <author>
            <name>Zain Hasan</name>
            <uri>https://github.com/zainhas</uri>
        </author>
    </entry>
    <entry>
        <title type="html"><![CDATA[Mixture-of-Agents Enhances Large Language Model Capabilities]]></title>
        <id>https://weaviate.io/papers/moa</id>
        <link href="https://weaviate.io/papers/moa"/>
        <updated>2024-06-29T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[Combining small LLMs to outperform larger monolithic LLMs!]]></summary>
        <content type="html"><![CDATA[<p><img loading="lazy" alt="A preview of the paper" src="/assets/images/hero-a226000952df08ea3d493335b94d1825.png" width="15472" height="5061" class="img_ev3q"></p><h2 class="anchor anchorWithStickyNavbar_LWe7" id="can-multiple-smaller-open-source-llms-be-combined-to-outperform-larger-monolithic-llms">🤖Can multiple smaller open-source LLMs be combined to outperform larger monolithic LLMs?<a href="#can-multiple-smaller-open-source-llms-be-combined-to-outperform-larger-monolithic-llms" class="hash-link" aria-label="Direct link to 🤖Can multiple smaller open-source LLMs be combined to outperform larger monolithic LLMs?" title="Direct link to 🤖Can multiple smaller open-source LLMs be combined to outperform larger monolithic LLMs?">​</a></h2><p>New paper shows that LLMs tend to generate better responses when presented with outputs from other models, even if less capable.</p><p>They use this to build a Mixture of Agents(MoA) architecture where multiple LLMs are used to iteratively enhance the generation quality.</p><p>LLMs in deeper layers are presented responses from LLMs in earlier layers and iteratively improve the response; mitigates individual model deficiencies and enhance overall response.</p><h2 class="anchor anchorWithStickyNavbar_LWe7" id="moa-architecture">MoA Architecture<a href="#moa-architecture" class="hash-link" aria-label="Direct link to MoA Architecture" title="Direct link to MoA Architecture">​</a></h2><p>The complete architecture consists of LLM agents playing one of two roles:</p><ol><li><p>Proposers: These models generate initial reference responses.</p></li><li><p>Aggregators: These models synthesize the different responses from the proposers into one high-quality response.</p></li></ol><ul><li><p>Models used: <code>Qwen1.5-110B-Chat</code>, <code>Qwen1.572B-Chat</code>, <code>WizardLM-8x22B</code>, <code>LLaMA-3-70B-Instruct</code>, <code>Mixtral-8x22B-v0.1</code>, <code>dbrx-instruct</code></p></li><li><p>3 MoA layers and use the same set of models in each MoA layer</p></li><li><p><code>Qwen1.5-110BChat</code> as the aggregator in the last layer</p></li><li><p>Some models work better as proposers and others as both proposers and aggregators.</p></li></ul><h2 class="anchor anchorWithStickyNavbar_LWe7" id="how-do-you-choose-which-models-to-include-in-the-moa">How do you choose which models to include in the MoA??<a href="#how-do-you-choose-which-models-to-include-in-the-moa" class="hash-link" aria-label="Direct link to How do you choose which models to include in the MoA??" title="Direct link to How do you choose which models to include in the MoA??">​</a></h2><p>Two metrics are used to assess which models are included in the mixture:</p><ul><li><p>Performance: The average win rate of models in layer i decides if they are included in layer i + 1.</p></li><li><p>Diversity: The diversity of model outputs is important - using heterogeneous models across layers is better then using the same model</p></li></ul><h2 class="anchor anchorWithStickyNavbar_LWe7" id="details">Details:<a href="#details" class="hash-link" aria-label="Direct link to Details:" title="Direct link to Details:">​</a></h2><ul><li><p>MoA achieves a new SOTA win rate of 65.8% on AlpacaEval 2.0 compared to the previous best of 57.5% achieved by GPT-4 Omni.</p></li><li><p>Overall performance comparable to GPT-4 Turbo while being 2× more cost-effective.</p></li><li><p>No finetuning required only utilizes the interface of prompting and generation of LLMs.</p></li><li><p>Extends the MoE concept to the model level by operating at the model level rather than at the activation level.</p></li><li><p>You can swap the final aggregator to any LLM of your choice (Gemini, GPT-4o, Claude3.5) and this improves performance!</p></li></ul><p><a href="https://github.com/togethercomputer/MoA#interactive-cli-demo" target="_blank" rel="noopener noreferrer">Demo</a></p><p><a href="https://github.com/togethercomputer/MoA" target="_blank" rel="noopener noreferrer">Code</a></p><p><a href="https://www.together.ai/blog/together-moa" target="_blank" rel="noopener noreferrer">Blog</a></p><p><a class="btn_VbJ1 btnMain_ywTD" href="https://arxiv.org/abs/2406.04692" download="">🔗 arXiv Link</a></p><p><a class="btn_VbJ1 btnMain_ywTD" href="https://arxiv.org/pdf/2406.04692" download="">📜 Download paper</a></p><h2 class="anchor anchorWithStickyNavbar_LWe7" id="ready-to-start-building">Ready to start building?<a href="#ready-to-start-building" class="hash-link" aria-label="Direct link to Ready to start building?" title="Direct link to Ready to start building?">​</a></h2><p>Check out the <a href="https://docs.weaviate.io/weaviate/quickstart" target="_blank" rel="noopener noreferrer">Quickstart tutorial</a>, or build amazing apps with a free trial of <a href="https://console.weaviate.cloud/" target="_blank" rel="noopener noreferrer">Weaviate Cloud (WCD)</a>.</p><div class="communityWrapper_ZpuS"><div class="container_sUl4"><div class="wrapper_FyvH"><div class="rightSide_UqS8"><div class="socialBox_W1XR"><a href="https://github.com/weaviate/weaviate" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="github_DEOB"></div><p class="text_g9NY">GitHub</p></a></div><div class="socialBox_W1XR"><a href="https://forum.weaviate.io/" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="forum_pUq6"></div><p class="text_g9NY">Forum</p></a></div><div class="socialBox_W1XR"><a href="https://twitter.com/weaviate_io" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="twitter_ewvw"></div><p class="text_g9NY">X (Twitter)</p></a></div></div><div class="leftSide_WlMC"><h2 class="communityHeader_jLni">Don't want to miss another blog post?</h2><span class="rightText_noBq"><p>Sign up for our bi-weekly newsletter to stay updated!</p> <br>By submitting, I agree to the<!-- --> <a href="/service">Terms of Service </a>and<!-- --> <a href="/privacy">Privacy Policy</a>.</span><div class="communityForm_pedn"><iframe src="https://embeds.beehiiv.com/15b21ebd-decd-433b-ada8-2d405e345f2e?slim=true" data-test-id="beehiiv-embed" frameborder="0" scrolling="no" style="margin:0;border-radius:0px;button-colour:#61BD73;background-color:transparent;width:100%;important:"></iframe></div></div></div></div></div>]]></content>
        <author>
            <name>Zain Hasan</name>
            <uri>https://github.com/zainhas</uri>
        </author>
    </entry>
    <entry>
        <title type="html"><![CDATA[GLiNER: Generalist Model for Named Entity Recognition using Bidirectional Transformer]]></title>
        <id>https://weaviate.io/papers/gliner</id>
        <link href="https://weaviate.io/papers/gliner"/>
        <updated>2024-06-22T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[Using Metadata Filters to Improve Recall in RAG!]]></summary>
        <content type="html"><![CDATA[<p><img loading="lazy" alt="A preview of the paper" src="/assets/images/hero-8325172277453f676dc67575e72af221.png" width="1146" height="476" class="img_ev3q"></p><h2 class="anchor anchorWithStickyNavbar_LWe7" id="using-metadata-filters-to-improve-recall-in-rag">Using Metadata Filters to Improve Recall in RAG<a href="#using-metadata-filters-to-improve-recall-in-rag" class="hash-link" aria-label="Direct link to Using Metadata Filters to Improve Recall in RAG" title="Direct link to Using Metadata Filters to Improve Recall in RAG">​</a></h2><p>Filtered search using metadata filtering is a simple technique that can significantly improve retrieval quality in a RAG pipeline, but how do you extract metadata from chunks if your data doesn't already come with it??</p><p>GLiNER is a powerful model that allows you to extract arbitrary entities such as names, times, places, etc. from any text chunk. It outperforms decoder models like ChatGPT and others at zero-shot identification of names entities in text chunks.</p><h2 class="anchor anchorWithStickyNavbar_LWe7" id="how-it-works">How it Works:<a href="#how-it-works" class="hash-link" aria-label="Direct link to How it Works:" title="Direct link to How it Works:">​</a></h2><p>GLiNER operates by taking in text chunks and entity labels, that you want to identify in the chunks. Both inputs are concatenated, encoded, and projected into the same latent space and fed into a classifier that predicts the entity labels per word in the text input.</p><p>This method allows the model to generalize across different NER tasks and labels passed in at query time.</p><p>The fact that entity vectors and the text chunk is concatenated allows the entity labels to attend to the text chunks and vice versa in the encoder step which allows GLiNER to work very well OOD.</p><h2 class="anchor anchorWithStickyNavbar_LWe7" id="architechture">Architechture:<a href="#architechture" class="hash-link" aria-label="Direct link to Architechture:" title="Direct link to Architechture:">​</a></h2><p>GLiNER consists of three joined components:</p><ol><li><p>An encoder backbone(DeBERTa) that generates token-level representations of the entity labels and text tokens.</p></li><li><p>A simple feedforward network that takes in entity token representations from the encoder and embeds them into vectors.</p></li><li><p>A Span layer that embeds groups of words (ie. "McGill University" -&gt; vector) from the text into vectors</p></li></ol><p>The entity vectors and span vectors are then combined and used to train a classifier that identifies which text spans correctly paired with entity labels.</p><h2 class="anchor anchorWithStickyNavbar_LWe7" id="effectiveness">Effectiveness:<a href="#effectiveness" class="hash-link" aria-label="Direct link to Effectiveness:" title="Direct link to Effectiveness:">​</a></h2><ul><li><p>Generalization: The model demonstrates SoTA performance across various NER benchmarks, outperforming traditional task-specific models.</p></li><li><p>Adaptability: GLiNER is super easy to use, pass in any text and any labels you want to extract and it simply works making it a flexible solution to add to your RAG pipeline.</p></li><li><p>Scalability: The unified approach simplifies the deployment process, as a single model can handle multiple NER tasks.</p></li></ul><p><a href="https://huggingface.co/spaces/tomaarsen/gliner_medium-v2.1" target="_blank" rel="noopener noreferrer">Demo</a>
<a href="https://github.com/urchade/GLiNER" target="_blank" rel="noopener noreferrer">Code</a></p><p><a class="btn_VbJ1 btnMain_ywTD" href="https://arxiv.org/abs/2311.08526" download="">🔗 arXiv Link</a></p><p><a class="btn_VbJ1 btnMain_ywTD" href="https://arxiv.org/pdf/2311.08526" download="">📜 Download paper</a></p><h2 class="anchor anchorWithStickyNavbar_LWe7" id="ready-to-start-building">Ready to start building?<a href="#ready-to-start-building" class="hash-link" aria-label="Direct link to Ready to start building?" title="Direct link to Ready to start building?">​</a></h2><p>Check out the <a href="https://docs.weaviate.io/weaviate/quickstart" target="_blank" rel="noopener noreferrer">Quickstart tutorial</a>, or build amazing apps with a free trial of <a href="https://console.weaviate.cloud/" target="_blank" rel="noopener noreferrer">Weaviate Cloud (WCD)</a>.</p><div class="communityWrapper_ZpuS"><div class="container_sUl4"><div class="wrapper_FyvH"><div class="rightSide_UqS8"><div class="socialBox_W1XR"><a href="https://github.com/weaviate/weaviate" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="github_DEOB"></div><p class="text_g9NY">GitHub</p></a></div><div class="socialBox_W1XR"><a href="https://forum.weaviate.io/" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="forum_pUq6"></div><p class="text_g9NY">Forum</p></a></div><div class="socialBox_W1XR"><a href="https://twitter.com/weaviate_io" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="twitter_ewvw"></div><p class="text_g9NY">X (Twitter)</p></a></div></div><div class="leftSide_WlMC"><h2 class="communityHeader_jLni">Don't want to miss another blog post?</h2><span class="rightText_noBq"><p>Sign up for our bi-weekly newsletter to stay updated!</p> <br>By submitting, I agree to the<!-- --> <a href="/service">Terms of Service </a>and<!-- --> <a href="/privacy">Privacy Policy</a>.</span><div class="communityForm_pedn"><iframe src="https://embeds.beehiiv.com/15b21ebd-decd-433b-ada8-2d405e345f2e?slim=true" data-test-id="beehiiv-embed" frameborder="0" scrolling="no" style="margin:0;border-radius:0px;button-colour:#61BD73;background-color:transparent;width:100%;important:"></iframe></div></div></div></div></div>]]></content>
        <author>
            <name>Zain Hasan</name>
            <uri>https://github.com/zainhas</uri>
        </author>
    </entry>
    <entry>
        <title type="html"><![CDATA[Be like a Goldfish, Don't Memorize! Mitigating Memorization in Generative LLMs]]></title>
        <id>https://weaviate.io/papers/goldfish</id>
        <link href="https://weaviate.io/papers/goldfish"/>
        <updated>2024-06-18T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[Training LLMs without making them memorize!]]></summary>
        <content type="html"><![CDATA[<p><img loading="lazy" alt="A preview of the paper" src="/assets/images/hero-b09484d9e6765c944852cbb0958c9ecd.png" width="1704" height="733" class="img_ev3q"></p><h2 class="anchor anchorWithStickyNavbar_LWe7" id="how-do-you-train-a-large-language-model-without-it-memorizing-training-data">How do you train a Large Language Model without it memorizing training data?<a href="#how-do-you-train-a-large-language-model-without-it-memorizing-training-data" class="hash-link" aria-label="Direct link to How do you train a Large Language Model without it memorizing training data?" title="Direct link to How do you train a Large Language Model without it memorizing training data?">​</a></h2><p>This paper proposes a technique called Goldfish Loss that is now used to mitigate the risk of LLMs memorizing copyrighted or private training data.</p><h3 class="anchor anchorWithStickyNavbar_LWe7" id="in-short">In Short:<a href="#in-short" class="hash-link" aria-label="Direct link to In Short:" title="Direct link to In Short:">​</a></h3><p>The paper introduces Goldfish Loss, a method where the model does not compute the loss on every token but excludes (e.g.) 1 in 4 tokens from loss computation. This makes it difficult for the model to memorize the training data.</p><h3 class="anchor anchorWithStickyNavbar_LWe7" id="how-it-works">How it Works:<a href="#how-it-works" class="hash-link" aria-label="Direct link to How it Works:" title="Direct link to How it Works:">​</a></h3><p>Goldfish Loss works by omitting a portion of tokens from loss computation during training. When the model encounters these excluded tokens at test time, it has to guess, reducing its ability to reproduce training samples exactly.</p><h3 class="anchor anchorWithStickyNavbar_LWe7" id="effectiveness">Effectiveness:<a href="#effectiveness" class="hash-link" aria-label="Direct link to Effectiveness:" title="Direct link to Effectiveness:">​</a></h3><p>In standard training on Wikipedia articles, about 85% of them get perfectly memorized after 100 updates. With Goldfish Loss, the model usually diverges from the training data within the first 5 tokens it generates.</p><h3 class="anchor anchorWithStickyNavbar_LWe7" id="trade-off">Trade-off:<a href="#trade-off" class="hash-link" aria-label="Direct link to Trade-off:" title="Direct link to Trade-off:">​</a></h3><p>The model learns slower because it does not get credit for the dropped tokens. Training on N tokens with Goldfish Loss is equivalent to standard training on 0.75N tokens.</p><h3 class="anchor anchorWithStickyNavbar_LWe7" id="benefits">Benefits:<a href="#benefits" class="hash-link" aria-label="Direct link to Benefits:" title="Direct link to Benefits:">​</a></h3><p>Goldfish training is scalable and helps avoid the need for unlearning methods, which are often not scalable. This makes it possible to prevent the memorization of copyrighted text/code during training.</p><h3 class="anchor anchorWithStickyNavbar_LWe7" id="results">Results:<a href="#results" class="hash-link" aria-label="Direct link to Results:" title="Direct link to Results:">​</a></h3><p>The paper validates Goldfish Loss by pre-training a model for 200B tokens, showing that it effectively prevents memorization without significantly compromising the learning rate.</p><h3 class="anchor anchorWithStickyNavbar_LWe7" id="details-in-the-paper">Details in the Paper:<a href="#details-in-the-paper" class="hash-link" aria-label="Direct link to Details in the Paper:" title="Direct link to Details in the Paper:">​</a></h3><ul><li>Explanation of the Goldfish Loss technique</li><li>Comparison of memorization rates with standard training</li><li>Analysis of the trade-offs between learning rate and memorization prevention</li><li>Validation experiments and results</li></ul><p><a href="https://github.com/ahans30/goldfish-loss" target="_blank" rel="noopener noreferrer">Code</a></p><p><a class="btn_VbJ1 btnMain_ywTD" href="https://arxiv.org/abs/2406.10209" download="">🔗 arXiv Link</a></p><p><a class="btn_VbJ1 btnMain_ywTD" href="https://arxiv.org/pdf/2406.10209" download="">📜 Download paper</a></p><h2 class="anchor anchorWithStickyNavbar_LWe7" id="ready-to-start-building">Ready to start building?<a href="#ready-to-start-building" class="hash-link" aria-label="Direct link to Ready to start building?" title="Direct link to Ready to start building?">​</a></h2><p>Check out the <a href="https://docs.weaviate.io/weaviate/quickstart" target="_blank" rel="noopener noreferrer">Quickstart tutorial</a>, or build amazing apps with a free trial of <a href="https://console.weaviate.cloud/" target="_blank" rel="noopener noreferrer">Weaviate Cloud (WCD)</a>.</p><div class="communityWrapper_ZpuS"><div class="container_sUl4"><div class="wrapper_FyvH"><div class="rightSide_UqS8"><div class="socialBox_W1XR"><a href="https://github.com/weaviate/weaviate" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="github_DEOB"></div><p class="text_g9NY">GitHub</p></a></div><div class="socialBox_W1XR"><a href="https://forum.weaviate.io/" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="forum_pUq6"></div><p class="text_g9NY">Forum</p></a></div><div class="socialBox_W1XR"><a href="https://twitter.com/weaviate_io" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="twitter_ewvw"></div><p class="text_g9NY">X (Twitter)</p></a></div></div><div class="leftSide_WlMC"><h2 class="communityHeader_jLni">Don't want to miss another blog post?</h2><span class="rightText_noBq"><p>Sign up for our bi-weekly newsletter to stay updated!</p> <br>By submitting, I agree to the<!-- --> <a href="/service">Terms of Service </a>and<!-- --> <a href="/privacy">Privacy Policy</a>.</span><div class="communityForm_pedn"><iframe src="https://embeds.beehiiv.com/15b21ebd-decd-433b-ada8-2d405e345f2e?slim=true" data-test-id="beehiiv-embed" frameborder="0" scrolling="no" style="margin:0;border-radius:0px;button-colour:#61BD73;background-color:transparent;width:100%;important:"></iframe></div></div></div></div></div>]]></content>
        <author>
            <name>Zain Hasan</name>
            <uri>https://github.com/zainhas</uri>
        </author>
    </entry>
    <entry>
        <title type="html"><![CDATA[Visual Instruction Tuning]]></title>
        <id>https://weaviate.io/papers/vit</id>
        <link href="https://weaviate.io/papers/vit"/>
        <updated>2024-04-28T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[Training a LLM to understand images!]]></summary>
        <content type="html"><![CDATA[<p><img loading="lazy" alt="A preview of the paper" src="/assets/images/hero-d615b462063a413adb6e7edc20a8e6fb.png" width="1150" height="922" class="img_ev3q"></p><p>How do you teach a Large Language Model to see? Here's a breakdown!</p><p>This paper proposes a technique called Visual Instruction Tuning that is now used by many of the language vision models we see in the field such as LLaVA, GPT4-Vision and Gemini etc.</p><h3 class="anchor anchorWithStickyNavbar_LWe7" id="in-short">In Short:<a href="#in-short" class="hash-link" aria-label="Direct link to In Short:" title="Direct link to In Short:">​</a></h3><p>The paper introduces a method to generate multimodal language-image instruction-following data using a language-only GPT-4 model. This data is then used to train LLaVA, a model that combines a vision encoder and a large language model (LLM) for general-purpose visual and language understanding.</p><h3 class="anchor anchorWithStickyNavbar_LWe7" id="how-it-works">How it Works:<a href="#how-it-works" class="hash-link" aria-label="Direct link to How it Works:" title="Direct link to How it Works:">​</a></h3><p>VIT works by using GPT-4 to generate instructions for corresponding images and captions. This dataset is used to train LLaVA to learn to follow instructions and understand images. A vision encoder (CLIP ViT 40) is combined with an LLM (Vicuna) to process text and images and generate text.</p><h3 class="anchor anchorWithStickyNavbar_LWe7" id="architecture">Architecture:<a href="#architecture" class="hash-link" aria-label="Direct link to Architecture:" title="Direct link to Architecture:">​</a></h3><p>LLaVA consists of two main components:</p><ol><li><p>Vision Encoder (VE): A pre-trained vision encoder (e.g. CLIP) that takes an image as input and generates a visual embedding.</p></li><li><p>Large Language Model (LLM): A pre-trained LLM (Vicuna) that takes a text prompt as input and generates a language embedding.</p></li></ol><p>The VE and LLM are combined through a series of layers and mechanisms to enable multimodal understanding and generation:</p><h3 class="anchor anchorWithStickyNavbar_LWe7" id="benefits">Benefits:<a href="#benefits" class="hash-link" aria-label="Direct link to Benefits:" title="Direct link to Benefits:">​</a></h3><p>The combination of the VE and LLM enables LLaVA to understand and generate text and images in a unified framework, leverage the strengths of both visual and language models, generalize to unseen images and text prompts.</p><h3 class="anchor anchorWithStickyNavbar_LWe7" id="results">Results:<a href="#results" class="hash-link" aria-label="Direct link to Results:" title="Direct link to Results:">​</a></h3><p>LLaVA achieves a 85.1% relative score compared to GPT-4 on a synthetic multimodal instruction-following dataset</p><h3 class="anchor anchorWithStickyNavbar_LWe7" id="details-in-the-paper">Details in the Paper:<a href="#details-in-the-paper" class="hash-link" aria-label="Direct link to Details in the Paper:" title="Direct link to Details in the Paper:">​</a></h3><ul><li><p>The VIT generation process, including prompt engineering and filtering</p></li><li><p>The LLaVA architecture, including the vision encoder and LLM components</p></li><li><p>Experimental results, including comparisons to GPT-4 and other baselines</p></li><li><p>Ablation studies and analysis of the effectiveness of different components and training strategies</p></li></ul><p><a href="https://huggingface.co/spaces/liuhaotian/LLaVA-1.6" target="_blank" rel="noopener noreferrer">Demo</a></p><p><a href="https://llava-vl.github.io/" target="_blank" rel="noopener noreferrer">Code</a></p><p><a class="btn_VbJ1 btnMain_ywTD" href="https://arxiv.org/abs/2304.08485" download="">🔗 arXiv Link</a></p><p><a class="btn_VbJ1 btnMain_ywTD" href="https://arxiv.org/pdf/2304.08485" download="">📜 Download paper</a></p><h2 class="anchor anchorWithStickyNavbar_LWe7" id="ready-to-start-building">Ready to start building?<a href="#ready-to-start-building" class="hash-link" aria-label="Direct link to Ready to start building?" title="Direct link to Ready to start building?">​</a></h2><p>Check out the <a href="https://docs.weaviate.io/weaviate/quickstart" target="_blank" rel="noopener noreferrer">Quickstart tutorial</a>, or build amazing apps with a free trial of <a href="https://console.weaviate.cloud/" target="_blank" rel="noopener noreferrer">Weaviate Cloud (WCD)</a>.</p><div class="communityWrapper_ZpuS"><div class="container_sUl4"><div class="wrapper_FyvH"><div class="rightSide_UqS8"><div class="socialBox_W1XR"><a href="https://github.com/weaviate/weaviate" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="github_DEOB"></div><p class="text_g9NY">GitHub</p></a></div><div class="socialBox_W1XR"><a href="https://forum.weaviate.io/" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="forum_pUq6"></div><p class="text_g9NY">Forum</p></a></div><div class="socialBox_W1XR"><a href="https://twitter.com/weaviate_io" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="twitter_ewvw"></div><p class="text_g9NY">X (Twitter)</p></a></div></div><div class="leftSide_WlMC"><h2 class="communityHeader_jLni">Don't want to miss another blog post?</h2><span class="rightText_noBq"><p>Sign up for our bi-weekly newsletter to stay updated!</p> <br>By submitting, I agree to the<!-- --> <a href="/service">Terms of Service </a>and<!-- --> <a href="/privacy">Privacy Policy</a>.</span><div class="communityForm_pedn"><iframe src="https://embeds.beehiiv.com/15b21ebd-decd-433b-ada8-2d405e345f2e?slim=true" data-test-id="beehiiv-embed" frameborder="0" scrolling="no" style="margin:0;border-radius:0px;button-colour:#61BD73;background-color:transparent;width:100%;important:"></iframe></div></div></div></div></div>]]></content>
        <author>
            <name>Zain Hasan</name>
            <uri>https://github.com/zainhas</uri>
        </author>
    </entry>
    <entry>
        <title type="html"><![CDATA[Retrieval-Augmented Dual Instruction Tuning (RA-DIT)]]></title>
        <id>https://weaviate.io/papers/radit</id>
        <link href="https://weaviate.io/papers/radit"/>
        <updated>2024-04-25T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[Fine-Tuning for Better Retrieval-Augmented Generation]]></summary>
        <content type="html"><![CDATA[<p><img loading="lazy" alt="A preview of the paper" src="/assets/images/hero-65d4ebba7d2a2fe2a62b13d570c887bb.png" width="1125" height="510" class="img_ev3q"></p><p>Can we finetune our LLM and retriever together to improve RAG performance?
This paper proposes a technique to do exactly that!</p><h3 class="anchor anchorWithStickyNavbar_LWe7" id="rag-basics">RAG Basics:<a href="#rag-basics" class="hash-link" aria-label="Direct link to RAG Basics:" title="Direct link to RAG Basics:">​</a></h3><p>When you prompt an LLM, RAG supplies relevant documents. A separate retrieval model computes the probability of each text chunk being relevant and provides the top chunks to the LLM. The LLM generates tokens based on the chunks, prompt, and previous tokens.</p><h3 class="anchor anchorWithStickyNavbar_LWe7" id="in-short">In Short:<a href="#in-short" class="hash-link" aria-label="Direct link to In Short:" title="Direct link to In Short:">​</a></h3><p>Fine-tuning LLMs and retrieval models together improves performance without extensive data processing, enabling better retrieval-augmented generation.
LLMs aren't exposed to retrieval-augmented inputs during pretraining, limiting their ability to use retrieved text effectively. Fine-tuning the LLM and retrieval model together can improve performance without requiring extensive data processing.</p><h3 class="anchor anchorWithStickyNavbar_LWe7" id="how-it-works">How it Works:<a href="#how-it-works" class="hash-link" aria-label="Direct link to How it Works:" title="Direct link to How it Works:">​</a></h3><p>Authors from Meta fine-tuned Llama 2 (65B parameters) and DRAGON+, a retriever, to create RA-DIT 65B. They fine-tuned Llama 2 on prompts with retrieved text and questions, and fine-tuned DRAGON+ to retrieve more relevant chunks. Fine-tuning was supervised for tasks like question-answering and self-supervised for text chunk completion.</p><h3 class="anchor anchorWithStickyNavbar_LWe7" id="results">Results:<a href="#results" class="hash-link" aria-label="Direct link to Results:" title="Direct link to Results:">​</a></h3><p>RA-DIT 65B achieved 49.1% accuracy on average across four question datasets, outperforming LLaMA 2 65B with DRAGON+ (45.1%) and LLaMA 2 65B alone (32.9%). With five example inputs, RA-DIT 65B reached 51.8% accuracy.
RA-DIT offers an efficient way to enhance LLM performance with RAG, making it a valuable technique for developers.</p><h3 class="anchor anchorWithStickyNavbar_LWe7" id="details">Details:<a href="#details" class="hash-link" aria-label="Direct link to Details:" title="Direct link to Details:">​</a></h3><p>RA-DIT fine-tunes Llama 2 and DRAGON+ to work together effectively, leveraging the strengths of both models to generate better output. By fine-tuning the LLM to better use retrieved knowledge and the retrieval model to select more relevant text, RA-DIT achieves improved performance without requiring extensive data processing.</p><p><a class="btn_VbJ1 btnMain_ywTD" href="https://arxiv.org/abs/2310.01352" download="">🔗 arXiv Link</a></p><p><a class="btn_VbJ1 btnMain_ywTD" href="https://arxiv.org/pdf/2310.01352" download="">📜 Download paper</a></p><h2 class="anchor anchorWithStickyNavbar_LWe7" id="ready-to-start-building">Ready to start building?<a href="#ready-to-start-building" class="hash-link" aria-label="Direct link to Ready to start building?" title="Direct link to Ready to start building?">​</a></h2><p>Check out the <a href="https://docs.weaviate.io/weaviate/quickstart" target="_blank" rel="noopener noreferrer">Quickstart tutorial</a>, or build amazing apps with a free trial of <a href="https://console.weaviate.cloud/" target="_blank" rel="noopener noreferrer">Weaviate Cloud (WCD)</a>.</p><div class="communityWrapper_ZpuS"><div class="container_sUl4"><div class="wrapper_FyvH"><div class="rightSide_UqS8"><div class="socialBox_W1XR"><a href="https://github.com/weaviate/weaviate" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="github_DEOB"></div><p class="text_g9NY">GitHub</p></a></div><div class="socialBox_W1XR"><a href="https://forum.weaviate.io/" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="forum_pUq6"></div><p class="text_g9NY">Forum</p></a></div><div class="socialBox_W1XR"><a href="https://twitter.com/weaviate_io" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="twitter_ewvw"></div><p class="text_g9NY">X (Twitter)</p></a></div></div><div class="leftSide_WlMC"><h2 class="communityHeader_jLni">Don't want to miss another blog post?</h2><span class="rightText_noBq"><p>Sign up for our bi-weekly newsletter to stay updated!</p> <br>By submitting, I agree to the<!-- --> <a href="/service">Terms of Service </a>and<!-- --> <a href="/privacy">Privacy Policy</a>.</span><div class="communityForm_pedn"><iframe src="https://embeds.beehiiv.com/15b21ebd-decd-433b-ada8-2d405e345f2e?slim=true" data-test-id="beehiiv-embed" frameborder="0" scrolling="no" style="margin:0;border-radius:0px;button-colour:#61BD73;background-color:transparent;width:100%;important:"></iframe></div></div></div></div></div>]]></content>
        <author>
            <name>Zain Hasan</name>
            <uri>https://github.com/zainhas</uri>
        </author>
    </entry>
    <entry>
        <title type="html"><![CDATA[Spotting LLMs With Binoculars: Zero-Shot Detection Of Machine-Generated Text]]></title>
        <id>https://weaviate.io/papers/paper24</id>
        <link href="https://weaviate.io/papers/paper24"/>
        <updated>2024-02-19T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[Zero-shot detection of LLM generated content.]]></summary>
        <content type="html"><![CDATA[<p><img loading="lazy" alt="A preview of the paper" src="/assets/images/hero-02497c7ab1cc48f68bfd12a8743404b6.png" width="1808" height="928" class="img_ev3q"></p><p>Can you reliably tell apart fake, LLM-generated, text from human-written text?🤖⚖️👱</p><p>Binoculars is a technique that requires no training and can 0-shot detect 90% of LLM-generated content at a 0.01% false positive rate.</p><h3 class="anchor anchorWithStickyNavbar_LWe7" id="in-short">In Short⏩:<a href="#in-short" class="hash-link" aria-label="Direct link to In Short⏩:" title="Direct link to In Short⏩:">​</a></h3><p>Human tokens are, on average more surprising to LLMs than other LLM tokens. They use this insight to identify a classification threshold.</p><p>Given two LLMs, M1 and M2. Their main insight is that human text should diverge from M1 more than M2 diverges from M1, provided the LLMs M1 and M2 are more similar to each other than they are to a human.</p><h3 class="anchor anchorWithStickyNavbar_LWe7" id="details">Details🔎:<a href="#details" class="hash-link" aria-label="Direct link to Details🔎:" title="Direct link to Details🔎:">​</a></h3><p>They look at the text in question through the lenses of two different LMs and calculate two perplexity scores:</p><ol><li><p>Perplexity of the text using an "observer" LLM(M1).</p></li><li><p>Compute all the next-token predictions that a "performer" LLM(M2) would make at each position in the string, and compute their perplexity according to the "observer" LLM(M1).</p></li></ol><p>Then, they take the ratio of the two PPL scores: PPL1/PPL2. </p><p>If the string is written by a machine, we should expect these two perplexities to be similar. If it is written by a human they should be different.</p><blockquote><blockquote><p>They find that if PPL1/PPL2 &gt; 0.9 then text is human generated; otherwise it's LLM generated.</p></blockquote></blockquote><blockquote><blockquote><p>Works on detecting fake multilingual text as well.</p></blockquote></blockquote><blockquote><blockquote><p>They think of PPL as how surprising the next token is - human tokens are, on average more surprising to LLMs than LLM tokens.</p></blockquote></blockquote><blockquote><blockquote><p>They use PPL2, what they call cross perplexity, to account for the increase in perplexity due to the prompt; normalizing the observed perplexity by the expected perplexity of a machine acting on the same text, we can arrive at a detection metric that is fairly invariant to the prompt</p></blockquote></blockquote><blockquote><blockquote><p>They use Falcon-7b model (M1) and the Falcon-7b-instruct (M2)</p></blockquote></blockquote><p><a href="https://github.com/ahans30/Binoculars/tree/main" target="_blank" rel="noopener noreferrer">🧑‍💻Code</a></p><p><a href="https://huggingface.co/spaces/tomg-group-umd/Binoculars" target="_blank" rel="noopener noreferrer">🤗HuggingFace Demo</a></p><p><a class="btn_VbJ1 btnMain_ywTD" href="https://arxiv.org/abs/2401.12070" download="">🔗 arXiv Link</a></p><p><a class="btn_VbJ1 btnMain_ywTD" href="https://arxiv.org/pdf/2401.12070" download="">📜 Download paper</a></p><h2 class="anchor anchorWithStickyNavbar_LWe7" id="ready-to-start-building">Ready to start building?<a href="#ready-to-start-building" class="hash-link" aria-label="Direct link to Ready to start building?" title="Direct link to Ready to start building?">​</a></h2><p>Check out the <a href="https://docs.weaviate.io/weaviate/quickstart" target="_blank" rel="noopener noreferrer">Quickstart tutorial</a>, or build amazing apps with a free trial of <a href="https://console.weaviate.cloud/" target="_blank" rel="noopener noreferrer">Weaviate Cloud (WCD)</a>.</p><div class="communityWrapper_ZpuS"><div class="container_sUl4"><div class="wrapper_FyvH"><div class="rightSide_UqS8"><div class="socialBox_W1XR"><a href="https://github.com/weaviate/weaviate" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="github_DEOB"></div><p class="text_g9NY">GitHub</p></a></div><div class="socialBox_W1XR"><a href="https://forum.weaviate.io/" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="forum_pUq6"></div><p class="text_g9NY">Forum</p></a></div><div class="socialBox_W1XR"><a href="https://twitter.com/weaviate_io" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="twitter_ewvw"></div><p class="text_g9NY">X (Twitter)</p></a></div></div><div class="leftSide_WlMC"><h2 class="communityHeader_jLni">Don't want to miss another blog post?</h2><span class="rightText_noBq"><p>Sign up for our bi-weekly newsletter to stay updated!</p> <br>By submitting, I agree to the<!-- --> <a href="/service">Terms of Service </a>and<!-- --> <a href="/privacy">Privacy Policy</a>.</span><div class="communityForm_pedn"><iframe src="https://embeds.beehiiv.com/15b21ebd-decd-433b-ada8-2d405e345f2e?slim=true" data-test-id="beehiiv-embed" frameborder="0" scrolling="no" style="margin:0;border-radius:0px;button-colour:#61BD73;background-color:transparent;width:100%;important:"></iframe></div></div></div></div></div>]]></content>
        <author>
            <name>Zain Hasan</name>
            <uri>https://github.com/zainhas</uri>
        </author>
    </entry>
    <entry>
        <title type="html"><![CDATA[Retrieval-Augmented Generation for Large Language Models: A Survey]]></title>
        <id>https://weaviate.io/papers/paper22</id>
        <link href="https://weaviate.io/papers/paper22"/>
        <updated>2024-02-13T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[Overview of the different RAG techniques.]]></summary>
        <content type="html"><![CDATA[<p><img loading="lazy" alt="A preview of the paper" src="/assets/images/hero-8b73e1958d2720389e73df7809a52f2a.png" width="514" height="512" class="img_ev3q"></p><p>A recent survey on Retrieval-Augmented Generation (RAG) mentions an evolving paradigm:
Modular RAG. </p><p>Modular RAG is comprised of various functional modules. Thus, modular RAG is not standalone. Instead, different RAG patterns are composed of different modules.   </p><p>For example, the following animation shows:
🥚 The original naive RAG paradigm consists of the “Retrieval”, "Augmentation," and "Generation" modules.</p><p>🐣 After naive RAG has shown some limitations, advanced RAG has emerged as a new paradigm. A typical pattern of Advanced RAG builds upon the foundation of Naive RAG by adding “Rewrite” and “Rerank” modules. </p><p>🐓 Different RAG patterns, such as DSP, can be composed of entirely different modules.  </p><p>The modular RAG paradigm is slowly becoming the norm in the RAG domain due to its versatility and flexibility, allowing: </p><ul><li>the adaption of modules within the RAG process to suit your specific problem, </li><li>for a serialized pipeline or an end-to-end training approach across multiple modules.</li></ul><p>I definitely recommend checking out the full survey if you want to catch up on recent advancements in the RAG domain.</p><p><a class="btn_VbJ1 btnMain_ywTD" href="https://arxiv.org/abs/2312.10997" download="">🔗 arXiv Link</a></p><p><a class="btn_VbJ1 btnMain_ywTD" href="https://arxiv.org/pdf/2312.10997" download="">📜 Download paper</a></p><h2 class="anchor anchorWithStickyNavbar_LWe7" id="ready-to-start-building">Ready to start building?<a href="#ready-to-start-building" class="hash-link" aria-label="Direct link to Ready to start building?" title="Direct link to Ready to start building?">​</a></h2><p>Check out the <a href="https://docs.weaviate.io/weaviate/quickstart" target="_blank" rel="noopener noreferrer">Quickstart tutorial</a>, or build amazing apps with a free trial of <a href="https://console.weaviate.cloud/" target="_blank" rel="noopener noreferrer">Weaviate Cloud (WCD)</a>.</p><div class="communityWrapper_ZpuS"><div class="container_sUl4"><div class="wrapper_FyvH"><div class="rightSide_UqS8"><div class="socialBox_W1XR"><a href="https://github.com/weaviate/weaviate" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="github_DEOB"></div><p class="text_g9NY">GitHub</p></a></div><div class="socialBox_W1XR"><a href="https://forum.weaviate.io/" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="forum_pUq6"></div><p class="text_g9NY">Forum</p></a></div><div class="socialBox_W1XR"><a href="https://twitter.com/weaviate_io" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="twitter_ewvw"></div><p class="text_g9NY">X (Twitter)</p></a></div></div><div class="leftSide_WlMC"><h2 class="communityHeader_jLni">Don't want to miss another blog post?</h2><span class="rightText_noBq"><p>Sign up for our bi-weekly newsletter to stay updated!</p> <br>By submitting, I agree to the<!-- --> <a href="/service">Terms of Service </a>and<!-- --> <a href="/privacy">Privacy Policy</a>.</span><div class="communityForm_pedn"><iframe src="https://embeds.beehiiv.com/15b21ebd-decd-433b-ada8-2d405e345f2e?slim=true" data-test-id="beehiiv-embed" frameborder="0" scrolling="no" style="margin:0;border-radius:0px;button-colour:#61BD73;background-color:transparent;width:100%;important:"></iframe></div></div></div></div></div>]]></content>
        <author>
            <name>Leonie Monigatti</name>
            <uri>https://www.linkedin.com/in/804250ab/</uri>
        </author>
    </entry>
    <entry>
        <title type="html"><![CDATA[Matryoshka Representation Learning]]></title>
        <id>https://weaviate.io/papers/paper21</id>
        <link href="https://weaviate.io/papers/paper21"/>
        <updated>2024-01-29T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[Overview of OpenAI's New Truncatable - Matryoshka Embeddings]]></summary>
        <content type="html"><![CDATA[<p><img loading="lazy" alt="A preview of the paper" src="/assets/images/hero-237ed4b707a303e4ad3353daaf4edab8.jpeg" width="1125" height="933" class="img_ev3q"></p><h3 class="anchor anchorWithStickyNavbar_LWe7" id="an-overview-of-openais-new-truncatable---matryoshka-embeddings">An Overview of OpenAI's New Truncatable - Matryoshka Embeddings🪆<a href="#an-overview-of-openais-new-truncatable---matryoshka-embeddings" class="hash-link" aria-label="Direct link to An Overview of OpenAI's New Truncatable - Matryoshka Embeddings🪆" title="Direct link to An Overview of OpenAI's New Truncatable - Matryoshka Embeddings🪆">​</a></h3><p>OpenAI recently announced embeddings that you can simply use chunks of (say the first 8, 16, 32, 64, 128 or 256 ... dimensions of the total 2048d vector) they use Matryoshka representation learning(MRL). </p><p>This is how they work, In Short⏩:</p><ul><li><p>MLR allows you to use a subset of the dimensions of the embedding vector - earlier dimensions store more information than dimensions later on in the vector, which simply add more details</p></li><li><p>You can understand how this works by the analogy of trying to classify an image at multiple resolutions - the lower res give high-level info and the higher res add details - Human perception of the natural world also has a naturally coarse-to-fine granularity</p></li><li><p>This is done by modifying the loss function which is optimized. If previously the loss function was L, for MRL we break down the Loss function into the sum of the losses on individual vector dimension ranges: Loss_Total =  L(upto 8d) + L(upto 16d) + L(upto 32d) + ... + L(upto 2048d) - Now there is incentive for the model to capture information in each sub-section of the vec.</p></li><li><p>After modifying the loss you get these truncatable vectors for free/no additional costs - this works on almost all loss functions and pre-existing models can be finetuned to output MRL vectors! - super easy-to-adopt technique</p></li><li><p>You can actually use any slice of dimensions, not just 8, 16,32 ... - b/c information is diffused in an interpolative fashion; so you can choose an arbitrary-sized chunk dimension that falls between the chosen granularity of the representations</p></li></ul><p><a class="btn_VbJ1 btnMain_ywTD" href="https://arxiv.org/abs/2205.13147" download="">🔗 arXiv Link</a></p><p><a class="btn_VbJ1 btnMain_ywTD" href="https://arxiv.org/pdf/2205.13147" download="">📜 Download paper</a></p><h2 class="anchor anchorWithStickyNavbar_LWe7" id="ready-to-start-building">Ready to start building?<a href="#ready-to-start-building" class="hash-link" aria-label="Direct link to Ready to start building?" title="Direct link to Ready to start building?">​</a></h2><p>Check out the <a href="https://docs.weaviate.io/weaviate/quickstart" target="_blank" rel="noopener noreferrer">Quickstart tutorial</a>, or build amazing apps with a free trial of <a href="https://console.weaviate.cloud/" target="_blank" rel="noopener noreferrer">Weaviate Cloud (WCD)</a>.</p><div class="communityWrapper_ZpuS"><div class="container_sUl4"><div class="wrapper_FyvH"><div class="rightSide_UqS8"><div class="socialBox_W1XR"><a href="https://github.com/weaviate/weaviate" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="github_DEOB"></div><p class="text_g9NY">GitHub</p></a></div><div class="socialBox_W1XR"><a href="https://forum.weaviate.io/" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="forum_pUq6"></div><p class="text_g9NY">Forum</p></a></div><div class="socialBox_W1XR"><a href="https://twitter.com/weaviate_io" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="twitter_ewvw"></div><p class="text_g9NY">X (Twitter)</p></a></div></div><div class="leftSide_WlMC"><h2 class="communityHeader_jLni">Don't want to miss another blog post?</h2><span class="rightText_noBq"><p>Sign up for our bi-weekly newsletter to stay updated!</p> <br>By submitting, I agree to the<!-- --> <a href="/service">Terms of Service </a>and<!-- --> <a href="/privacy">Privacy Policy</a>.</span><div class="communityForm_pedn"><iframe src="https://embeds.beehiiv.com/15b21ebd-decd-433b-ada8-2d405e345f2e?slim=true" data-test-id="beehiiv-embed" frameborder="0" scrolling="no" style="margin:0;border-radius:0px;button-colour:#61BD73;background-color:transparent;width:100%;important:"></iframe></div></div></div></div></div>]]></content>
        <author>
            <name>Zain Hasan</name>
            <uri>https://github.com/zainhas</uri>
        </author>
    </entry>
    <entry>
        <title type="html"><![CDATA[A Simple Overview of the LLM Training Steps 🔡]]></title>
        <id>https://weaviate.io/papers/paper20</id>
        <link href="https://weaviate.io/papers/paper20"/>
        <updated>2024-01-24T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[A breakdown of the different training steps that go into creating a LLM.]]></summary>
        <content type="html"><![CDATA[<p><img loading="lazy" alt="A preview of the paper" src="/assets/images/hero-44342a5706c59b4b5f0df7dd5b061320.jpeg" width="1200" height="668" class="img_ev3q"></p><p>A Simple Overview of the LLM Training Steps:🔡</p><ol><li><p>Unsupervised Pretraining: </p><blockquote><blockquote><p>High quantity, low quality data
The model is trained to predict the next token for trillions of tokens.
Produces what is called the foundation or base model.  </p></blockquote></blockquote></li><li><p>Supervised Finetuning:</p><blockquote><blockquote><p>Low quantity, high quality {prompt, response}
Enables the model to be finetuned for dialogue - turning the base model into a chatbot
Often referred to as instruction tuning</p></blockquote></blockquote></li><li><p>Reinforcement Learning from Human Feedback (RLHF): - lots of innovation going on here (will cover DPO, PTO, and KTO soon)</p></li></ol><p>This is a two-step process:</p><p>a. Train a reward model to act as a scoring function:</p><blockquote><blockquote><p>This model will take in a prompt + response and provide a score of how good it is.
Human labelers are asked to pick good vs. bad responses and this data is used to train a model.</p></blockquote></blockquote><p>b. Optimize LLM to generate responses for which the reward model will give high scores. </p><blockquote><blockquote><p>Use an iterative procedure to update a part of the model such that:</p><ol><li>Produces outputs with higher score</li><li>Outputs that are not too far away from the SFT model from Step 2</li><li>Outputs that aren't getting worse a text completion </li></ol></blockquote></blockquote><p>Specifically for this phase it is better to think of this as learning an optimal strategy/policy for predicting a probability distribution over tokens and we want to tweak this distribution to produce higher quality text, here the:</p><blockquote><blockquote><p>The policy is a language model that takes in a prompt and returns a probability distribution over text.
The action space of this policy is all the tokens corresponding to the vocabulary of the language model (~50k tokens)
The observation space: distribution of possible input token sequences
The reward Model is a combination of the preference model(score higher) and a constraint on policy shift(don't change too much, get worse at text completion).</p></blockquote></blockquote><p>RLHF Learning Resources:</p><ol><li><p><a href="https://arxiv.org/pdf/2203.02155.pdf" target="_blank" rel="noopener noreferrer">InstructGPT Paper</a></p></li><li><p><a href="https://arxiv.org/pdf/2204.05862.pdf" target="_blank" rel="noopener noreferrer">RLHF Paper Anthropic</a></p></li><li><p><a href="https://openai.com/research/instruction-following" target="_blank" rel="noopener noreferrer">OpenAI Blog</a></p></li><li><p><a href="https://huyenchip.com/2023/05/02/rlhf.html" target="_blank" rel="noopener noreferrer">RLHF Blog Chip Huyen</a></p></li><li><p><a href="https://interconnects.ai/p/how-rlhf-works" target="_blank" rel="noopener noreferrer">RLHF Nathan Lambert</a></p></li><li><p><a href="https://youtube.com/watch?v=bZQun8Y4L2A&amp;ab_channel=MicrosoftDeveloper" target="_blank" rel="noopener noreferrer">Karpathy Talk</a></p></li></ol><h2 class="anchor anchorWithStickyNavbar_LWe7" id="ready-to-start-building">Ready to start building?<a href="#ready-to-start-building" class="hash-link" aria-label="Direct link to Ready to start building?" title="Direct link to Ready to start building?">​</a></h2><p>Check out the <a href="https://docs.weaviate.io/weaviate/quickstart" target="_blank" rel="noopener noreferrer">Quickstart tutorial</a>, or build amazing apps with a free trial of <a href="https://console.weaviate.cloud/" target="_blank" rel="noopener noreferrer">Weaviate Cloud (WCD)</a>.</p><div class="communityWrapper_ZpuS"><div class="container_sUl4"><div class="wrapper_FyvH"><div class="rightSide_UqS8"><div class="socialBox_W1XR"><a href="https://github.com/weaviate/weaviate" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="github_DEOB"></div><p class="text_g9NY">GitHub</p></a></div><div class="socialBox_W1XR"><a href="https://forum.weaviate.io/" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="forum_pUq6"></div><p class="text_g9NY">Forum</p></a></div><div class="socialBox_W1XR"><a href="https://twitter.com/weaviate_io" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="twitter_ewvw"></div><p class="text_g9NY">X (Twitter)</p></a></div></div><div class="leftSide_WlMC"><h2 class="communityHeader_jLni">Don't want to miss another blog post?</h2><span class="rightText_noBq"><p>Sign up for our bi-weekly newsletter to stay updated!</p> <br>By submitting, I agree to the<!-- --> <a href="/service">Terms of Service </a>and<!-- --> <a href="/privacy">Privacy Policy</a>.</span><div class="communityForm_pedn"><iframe src="https://embeds.beehiiv.com/15b21ebd-decd-433b-ada8-2d405e345f2e?slim=true" data-test-id="beehiiv-embed" frameborder="0" scrolling="no" style="margin:0;border-radius:0px;button-colour:#61BD73;background-color:transparent;width:100%;important:"></iframe></div></div></div></div></div>]]></content>
        <author>
            <name>Zain Hasan</name>
            <uri>https://github.com/zainhas</uri>
        </author>
    </entry>
    <entry>
        <title type="html"><![CDATA[Using a 7B Model + RAG to Identify and Edit Word-level Hallucinations]]></title>
        <id>https://weaviate.io/papers/paper19</id>
        <link href="https://weaviate.io/papers/paper19"/>
        <updated>2024-01-20T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[Finetuning a 7B model to outperform GPT-4 for hallucination detection.]]></summary>
        <content type="html"><![CDATA[<p><img loading="lazy" alt="A preview of the paper" src="/assets/images/hero-37c5610019a34e8cd6b12a9a47e84826.jpeg" width="1200" height="1066" class="img_ev3q"></p><h2 class="anchor anchorWithStickyNavbar_LWe7" id="using-a-7b-model--rag-to-identify-and-edit-word-level-hallucinations-in-llms-better-then-gpt-4">Using a 7B Model + RAG to Identify and Edit Word-level Hallucinations in LLMs better then GPT-4:<a href="#using-a-7b-model--rag-to-identify-and-edit-word-level-hallucinations-in-llms-better-then-gpt-4" class="hash-link" aria-label="Direct link to Using a 7B Model + RAG to Identify and Edit Word-level Hallucinations in LLMs better then GPT-4:" title="Direct link to Using a 7B Model + RAG to Identify and Edit Word-level Hallucinations in LLMs better then GPT-4:">​</a></h2><h3 class="anchor anchorWithStickyNavbar_LWe7" id="in-short">In Short⏩:<a href="#in-short" class="hash-link" aria-label="Direct link to In Short⏩:" title="Direct link to In Short⏩:">​</a></h3><blockquote><p>Train a model that consists of a Retreiver and a Language Model:  </p></blockquote><div class="codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_biex"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">&gt;&gt; The retriever, Mret, takes the original output you want to check to hallucination (y) and optionally input prompt (x) and retrieves top relevant documents (C). So C = Mret(x, y). This can be a vector database like Weaviate for example.</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">&gt;&gt; The detector and editor, Medit, takes in the context - (C), input - (x) and output - (y) and detects (and if possible also edits/corrects) factual errors in (y) given the retrieved context (C): y* = Medit(x, y, C).</span><br></span></code></pre><div class="buttonGroup__atx"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div><h3 class="anchor anchorWithStickyNavbar_LWe7" id="️training">🏋️Training:<a href="#️training" class="hash-link" aria-label="Direct link to 🏋️Training:" title="Direct link to 🏋️Training:">​</a></h3><blockquote><p>Create a synthetic hallucination dataset of 35k C = context, y=incorrect output, y<em>=annotated fixed output -&gt; (C, y, y</em>) </p></blockquote><blockquote><p>Magic Synthetic Dataset Creation: </p></blockquote><div class="codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_biex"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">&gt;&gt; GPT-4 is few-shot prompted to add different types of errors to a passage</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">&gt;&gt; It is also instructed to mark phrases or sentences for deletion along with their error type and insert phrases and sentences along with insertion tags</span><br></span></code></pre><div class="buttonGroup__atx"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div><blockquote><p>Start off with Llama2-Chat 7B to initialize Medit and then train on (C, y, y∗) </p></blockquote><blockquote><p>Medit takes in (C, y) as input and learns to predict the edited outputs with tags to represent error type y∗ using standard language modeling objective.</p></blockquote><blockquote><p>The model, once trained, can identify different types of hallucination and mark which words they come from - it also suggests edits to improve factuality.</p></blockquote><h3 class="anchor anchorWithStickyNavbar_LWe7" id="result-">Result 📈:<a href="#result-" class="hash-link" aria-label="Direct link to Result 📈:" title="Direct link to Result 📈:">​</a></h3><p>The model has a fine-grained hallucination detection accuracy 46.5% while it's binary acc.{hallucination, no hallucination} is 79%.</p><p>For comparison ChatGPT has a fine-grained hallucination detection acc. of 21.5% (59% binary acc) w/o RAG and 26%(68.5% binary hall detect) w/ RAG</p><p><a href="https://github.com/abhika-m/FAVA" target="_blank" rel="noopener noreferrer">💻Code</a></p><p><a href="https://huggingface.co/datasets/fava-uw/fava-data" target="_blank" rel="noopener noreferrer">🔷Data</a></p><p><a href="https://huggingface.co/fava-uw/fava-model" target="_blank" rel="noopener noreferrer">🏗️Model</a></p><p><a class="btn_VbJ1 btnMain_ywTD" href="https://arxiv.org/abs/2401.06855" download="">🔗 arXiv Link</a></p><p><a class="btn_VbJ1 btnMain_ywTD" href="https://arxiv.org/pdf/2401.06855" download="">📜 Download paper</a></p><h2 class="anchor anchorWithStickyNavbar_LWe7" id="ready-to-start-building">Ready to start building?<a href="#ready-to-start-building" class="hash-link" aria-label="Direct link to Ready to start building?" title="Direct link to Ready to start building?">​</a></h2><p>Check out the <a href="https://docs.weaviate.io/weaviate/quickstart" target="_blank" rel="noopener noreferrer">Quickstart tutorial</a>, or build amazing apps with a free trial of <a href="https://console.weaviate.cloud/" target="_blank" rel="noopener noreferrer">Weaviate Cloud (WCD)</a>.</p><div class="communityWrapper_ZpuS"><div class="container_sUl4"><div class="wrapper_FyvH"><div class="rightSide_UqS8"><div class="socialBox_W1XR"><a href="https://github.com/weaviate/weaviate" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="github_DEOB"></div><p class="text_g9NY">GitHub</p></a></div><div class="socialBox_W1XR"><a href="https://forum.weaviate.io/" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="forum_pUq6"></div><p class="text_g9NY">Forum</p></a></div><div class="socialBox_W1XR"><a href="https://twitter.com/weaviate_io" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="twitter_ewvw"></div><p class="text_g9NY">X (Twitter)</p></a></div></div><div class="leftSide_WlMC"><h2 class="communityHeader_jLni">Don't want to miss another blog post?</h2><span class="rightText_noBq"><p>Sign up for our bi-weekly newsletter to stay updated!</p> <br>By submitting, I agree to the<!-- --> <a href="/service">Terms of Service </a>and<!-- --> <a href="/privacy">Privacy Policy</a>.</span><div class="communityForm_pedn"><iframe src="https://embeds.beehiiv.com/15b21ebd-decd-433b-ada8-2d405e345f2e?slim=true" data-test-id="beehiiv-embed" frameborder="0" scrolling="no" style="margin:0;border-radius:0px;button-colour:#61BD73;background-color:transparent;width:100%;important:"></iframe></div></div></div></div></div>]]></content>
        <author>
            <name>Zain Hasan</name>
            <uri>https://github.com/zainhas</uri>
        </author>
    </entry>
    <entry>
        <title type="html"><![CDATA[Fine-grained Hallucination Detection and Editing for Language Models]]></title>
        <id>https://weaviate.io/papers/paper18</id>
        <link href="https://weaviate.io/papers/paper18"/>
        <updated>2024-01-19T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[Provides a taxonomy of different types of hallucinations.]]></summary>
        <content type="html"><![CDATA[<p><img loading="lazy" alt="A preview of the paper" src="/assets/images/hero-22ea2ae5101f8e873d45a26da3e4268c.jpeg" width="1200" height="813" class="img_ev3q"></p><h3 class="anchor anchorWithStickyNavbar_LWe7" id="a-breakdown-of-the-different-types-of-hallucinations-from-ai2">A breakdown of the different types of hallucinations from AI2:🍄<a href="#a-breakdown-of-the-different-types-of-hallucinations-from-ai2" class="hash-link" aria-label="Direct link to A breakdown of the different types of hallucinations from AI2:🍄" title="Direct link to A breakdown of the different types of hallucinations from AI2:🍄">​</a></h3><ol><li>Verifiably Factually Wrong ❌</li></ol><ul><li><p>Entity: an entity in a statement is incorrect (eg. Christmas falls on Nov. 25th)</p></li><li><p>Relation: semantic relationship in a statement is incorrect (eg. The mouse ate the cat.)</p></li><li><p>Contradictory: statements that entirely contradict relevant evidence from the web (eg. Raptors are yet to win an NBA final.)</p></li></ul><ol start="2"><li>Unverifiable Types of Hallucinations ⁉️</li></ol><ul><li><p>Invented: statements of concepts that do not exist in world knowledge (eg. MJ created the sideways somersault)</p></li><li><p>Subjective: Statement that lacks universal validity - basically an opinion (eg. The Raptors are the best NBA team)</p></li><li><p>Unverifiable: potentially factual statement but cannot be grounded in world evidence(eg. Jensen sleeps in a leather jacket.)</p></li></ul><h3 class="anchor anchorWithStickyNavbar_LWe7" id="word-vs-sentence-level">🔍Word vs. Sentence Level:<a href="#word-vs-sentence-level" class="hash-link" aria-label="Direct link to 🔍Word vs. Sentence Level:" title="Direct link to 🔍Word vs. Sentence Level:">​</a></h3><blockquote><p>Entity and Relation are usually word level, and so can be fixed with small edits if you know where they occur.</p></blockquote><blockquote><p>Contradictory, Invented, Subjective, and Unverifiable are often sentence level and thus need to be removed completely to fix the issue.</p></blockquote><p><a href="https://fine-grained-hallucination.github.io" target="_blank" rel="noopener noreferrer">💻Code</a></p><p><a class="btn_VbJ1 btnMain_ywTD" href="https://arxiv.org/abs/2401.06855" download="">🔗 arXiv Link</a></p><p><a class="btn_VbJ1 btnMain_ywTD" href="https://arxiv.org/pdf/2401.06855" download="">📜 Download paper</a></p><h2 class="anchor anchorWithStickyNavbar_LWe7" id="ready-to-start-building">Ready to start building?<a href="#ready-to-start-building" class="hash-link" aria-label="Direct link to Ready to start building?" title="Direct link to Ready to start building?">​</a></h2><p>Check out the <a href="https://docs.weaviate.io/weaviate/quickstart" target="_blank" rel="noopener noreferrer">Quickstart tutorial</a>, or build amazing apps with a free trial of <a href="https://console.weaviate.cloud/" target="_blank" rel="noopener noreferrer">Weaviate Cloud (WCD)</a>.</p><div class="communityWrapper_ZpuS"><div class="container_sUl4"><div class="wrapper_FyvH"><div class="rightSide_UqS8"><div class="socialBox_W1XR"><a href="https://github.com/weaviate/weaviate" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="github_DEOB"></div><p class="text_g9NY">GitHub</p></a></div><div class="socialBox_W1XR"><a href="https://forum.weaviate.io/" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="forum_pUq6"></div><p class="text_g9NY">Forum</p></a></div><div class="socialBox_W1XR"><a href="https://twitter.com/weaviate_io" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="twitter_ewvw"></div><p class="text_g9NY">X (Twitter)</p></a></div></div><div class="leftSide_WlMC"><h2 class="communityHeader_jLni">Don't want to miss another blog post?</h2><span class="rightText_noBq"><p>Sign up for our bi-weekly newsletter to stay updated!</p> <br>By submitting, I agree to the<!-- --> <a href="/service">Terms of Service </a>and<!-- --> <a href="/privacy">Privacy Policy</a>.</span><div class="communityForm_pedn"><iframe src="https://embeds.beehiiv.com/15b21ebd-decd-433b-ada8-2d405e345f2e?slim=true" data-test-id="beehiiv-embed" frameborder="0" scrolling="no" style="margin:0;border-radius:0px;button-colour:#61BD73;background-color:transparent;width:100%;important:"></iframe></div></div></div></div></div>]]></content>
        <author>
            <name>Zain Hasan</name>
            <uri>https://github.com/zainhas</uri>
        </author>
    </entry>
    <entry>
        <title type="html"><![CDATA[Long-Context Retrieval Models with Monarch Mixer]]></title>
        <id>https://weaviate.io/papers/paper16</id>
        <link href="https://weaviate.io/papers/paper16"/>
        <updated>2024-01-15T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[32k context length retreival models with sub-quadratic attention mechanism.]]></summary>
        <content type="html"><![CDATA[<p><img loading="lazy" alt="A preview of the paper" src="/assets/images/hero-0b709c8abd18ca664ec3b01009b53e50.jpeg" width="1200" height="874" class="img_ev3q"></p><p>A breakdown of the Long Context Retrieval Embedding Models from Stanford!💥 </p><h3 class="anchor anchorWithStickyNavbar_LWe7" id="in-short">In Short⏩:<a href="#in-short" class="hash-link" aria-label="Direct link to In Short⏩:" title="Direct link to In Short⏩:">​</a></h3><ol><li><p>They release 3 long context(2k/8k/32k) BERT-like encoder embedding models on HuggingFace</p></li><li><p>The models are only 80M params and outperform MUCH larger models (4-85x larger)</p></li><li><p>Accessible via @togethercompute endpoints and integrated into @llama_index and @LangChainAI</p></li><li><p>They also release LoCo a long context retrieval benchmark.</p></li></ol><h3 class="anchor anchorWithStickyNavbar_LWe7" id="️architechtural-details">🏗️Architechtural Details:<a href="#️architechtural-details" class="hash-link" aria-label="Direct link to 🏗️Architechtural Details:" title="Direct link to 🏗️Architechtural Details:">​</a></h3><ol><li><p>They replace the Attention and MLP blocks in the transformer architecture with diagonal block matrix (Monarch Matrices -M2) operations which are hardware optimized and subquadratic in the sequence length - O(N^(1.5)) </p></li><li><p>This enables scaling sequence length and model parameters better.</p></li></ol><h3 class="anchor anchorWithStickyNavbar_LWe7" id="training-details">🪃Training Details:<a href="#training-details" class="hash-link" aria-label="Direct link to 🪃Training Details:" title="Direct link to 🪃Training Details:">​</a></h3><ol><li><p>These M2 models are trained for long context retrieval on a mixture of long and short context tasks data - surprisingly only training on long context doesn't work.</p></li><li><p>Use a cosine similarity loss instead of the trusty supervised contrastive training loss. </p><blockquote><p>This loss function. can be computed independently per datapoint in a batch instead of needing to sum over all negative examples in a batch. </p></blockquote><blockquote><p>Thus training can be scaled for large batch sizes of long context inputs without OOM'ing</p></blockquote></li></ol><p><a href="https://hazyresearch.stanford.edu/blog/2024-01-11-m2-bert-retrieval" target="_blank" rel="noopener noreferrer">📜Blog</a></p><p><a href="https://github.com/HazyResearch/m2" target="_blank" rel="noopener noreferrer">🧑‍💻Code</a></p><p><a href="https://huggingface.co/togethercomputer/m2-bert-80M-32k-retrieval" target="_blank" rel="noopener noreferrer">🔷Models</a></p><p><a class="btn_VbJ1 btnMain_ywTD" href="https://arxiv.org/abs/2310.12109" download="">🔗 arXiv Link</a></p><p><a class="btn_VbJ1 btnMain_ywTD" href="https://arxiv.org/pdf/2310.12109" download="">📜 Download paper</a></p><h2 class="anchor anchorWithStickyNavbar_LWe7" id="ready-to-start-building">Ready to start building?<a href="#ready-to-start-building" class="hash-link" aria-label="Direct link to Ready to start building?" title="Direct link to Ready to start building?">​</a></h2><p>Check out the <a href="https://docs.weaviate.io/weaviate/quickstart" target="_blank" rel="noopener noreferrer">Quickstart tutorial</a>, or build amazing apps with a free trial of <a href="https://console.weaviate.cloud/" target="_blank" rel="noopener noreferrer">Weaviate Cloud (WCD)</a>.</p><div class="communityWrapper_ZpuS"><div class="container_sUl4"><div class="wrapper_FyvH"><div class="rightSide_UqS8"><div class="socialBox_W1XR"><a href="https://github.com/weaviate/weaviate" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="github_DEOB"></div><p class="text_g9NY">GitHub</p></a></div><div class="socialBox_W1XR"><a href="https://forum.weaviate.io/" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="forum_pUq6"></div><p class="text_g9NY">Forum</p></a></div><div class="socialBox_W1XR"><a href="https://twitter.com/weaviate_io" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="twitter_ewvw"></div><p class="text_g9NY">X (Twitter)</p></a></div></div><div class="leftSide_WlMC"><h2 class="communityHeader_jLni">Don't want to miss another blog post?</h2><span class="rightText_noBq"><p>Sign up for our bi-weekly newsletter to stay updated!</p> <br>By submitting, I agree to the<!-- --> <a href="/service">Terms of Service </a>and<!-- --> <a href="/privacy">Privacy Policy</a>.</span><div class="communityForm_pedn"><iframe src="https://embeds.beehiiv.com/15b21ebd-decd-433b-ada8-2d405e345f2e?slim=true" data-test-id="beehiiv-embed" frameborder="0" scrolling="no" style="margin:0;border-radius:0px;button-colour:#61BD73;background-color:transparent;width:100%;important:"></iframe></div></div></div></div></div>]]></content>
        <author>
            <name>Zain Hasan</name>
            <uri>https://github.com/zainhas</uri>
        </author>
    </entry>
    <entry>
        <title type="html"><![CDATA[How Johnny Can Persuade LLMs to Jailbreak Them: Rethinking Persuasion to Challenge AI Safety by Humanizing LLMs]]></title>
        <id>https://weaviate.io/papers/paper15</id>
        <link href="https://weaviate.io/papers/paper15"/>
        <updated>2024-01-09T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[Using persuasion ot jailbreak LLM's.]]></summary>
        <content type="html"><![CDATA[<p><img loading="lazy" alt="A preview of the paper" src="/assets/images/hero-0b5772063167300e692a119a0f5d2daf.png" width="2166" height="1532" class="img_ev3q"></p><p>🗣️Persuasive Adversarial Prompting to Jailbreak LLMs with 92% Success Rate</p><p>🔒Fascinating new paper breaks down jailbreak prompting to a science!</p><p>⏩In Short:</p><ol><li><p>Provide a taxonomy of 40 persuasion prompting techniques</p></li><li><p>Use this list of 40 techniques they can jailbreak LLMs including GPT4 with a 92% success rate!!</p></li><li><p>Pretty interestingly Anthropic models are not susceptible at all to PAP attacks!! More advanced models like GPT-4 are more vulnerable to persuasive adversarial prompts (PAPs).</p></li><li><p>If you can defend against these PAPs this also provides effective protection against other attacks</p></li><li><p>Test these PAPs to perform attacks covering 14 different risk categories (such as economic harm, etc.)</p></li></ol><p>Blog+Demo: <a href="https://chats-lab.github.io/persuasive_jailbreaker/" target="_blank" rel="noopener noreferrer">https://chats-lab.github.io/persuasive_jailbreaker/</a></p><p><a class="btn_VbJ1 btnMain_ywTD" href="https://arxiv.org/abs/2401.06373" download="">🔗 arXiv Link</a></p><p><a class="btn_VbJ1 btnMain_ywTD" href="https://arxiv.org/pdf/2401.06373" download="">📜 Download paper</a></p><h2 class="anchor anchorWithStickyNavbar_LWe7" id="ready-to-start-building">Ready to start building?<a href="#ready-to-start-building" class="hash-link" aria-label="Direct link to Ready to start building?" title="Direct link to Ready to start building?">​</a></h2><p>Check out the <a href="https://docs.weaviate.io/weaviate/quickstart" target="_blank" rel="noopener noreferrer">Quickstart tutorial</a>, or build amazing apps with a free trial of <a href="https://console.weaviate.cloud/" target="_blank" rel="noopener noreferrer">Weaviate Cloud (WCD)</a>.</p><div class="communityWrapper_ZpuS"><div class="container_sUl4"><div class="wrapper_FyvH"><div class="rightSide_UqS8"><div class="socialBox_W1XR"><a href="https://github.com/weaviate/weaviate" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="github_DEOB"></div><p class="text_g9NY">GitHub</p></a></div><div class="socialBox_W1XR"><a href="https://forum.weaviate.io/" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="forum_pUq6"></div><p class="text_g9NY">Forum</p></a></div><div class="socialBox_W1XR"><a href="https://twitter.com/weaviate_io" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="twitter_ewvw"></div><p class="text_g9NY">X (Twitter)</p></a></div></div><div class="leftSide_WlMC"><h2 class="communityHeader_jLni">Don't want to miss another blog post?</h2><span class="rightText_noBq"><p>Sign up for our bi-weekly newsletter to stay updated!</p> <br>By submitting, I agree to the<!-- --> <a href="/service">Terms of Service </a>and<!-- --> <a href="/privacy">Privacy Policy</a>.</span><div class="communityForm_pedn"><iframe src="https://embeds.beehiiv.com/15b21ebd-decd-433b-ada8-2d405e345f2e?slim=true" data-test-id="beehiiv-embed" frameborder="0" scrolling="no" style="margin:0;border-radius:0px;button-colour:#61BD73;background-color:transparent;width:100%;important:"></iframe></div></div></div></div></div>]]></content>
        <author>
            <name>Zain Hasan</name>
            <uri>https://github.com/zainhas</uri>
        </author>
    </entry>
    <entry>
        <title type="html"><![CDATA[Improving Text Embeddings with Large Language Models]]></title>
        <id>https://weaviate.io/papers/paper14</id>
        <link href="https://weaviate.io/papers/paper14"/>
        <updated>2024-01-02T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[Presents a 7B parameter embedding model.]]></summary>
        <content type="html"><![CDATA[<p><img loading="lazy" alt="A preview of the paper" src="/assets/images/hero-e58232e3de8c2514774075314ec7286b.jpeg" width="1120" height="478" class="img_ev3q"></p><p>❓Your RAG workflow is only as good as the retrieved context. Can you use LLMs to improve recall and search relevance for dense retrievers?🤔</p><p>📜Work from <a href="https://arxiv.org/abs/2401.00368" target="_blank" rel="noopener noreferrer">Microsoft</a> uses synthetic data + LLMs as embedding models to achieve SOTA on the MTEB benchmark.</p><p>⏩In Short: </p><ol><li><p>They generate a multilingual synthetic retrieval dataset using GPT-4 which includes {queries, positive matches, hard negatives}. </p></li><li><p>They use this synthetic dataset along with 13 other public datasets and embed the queries &amp; positives/negatives using the last layer vectors of Mistral-7B.</p></li><li><p>They tune the Mistral-7B embedding model using a contrastive loss along with embeddings from step 2.</p></li><li><p>Using this fine-tuned Mistral-7B as an embedding model then achieves SOTA(+2.4%) on MTEB.</p></li></ol><p><img loading="lazy" alt="img1" src="/assets/images/GC1L3jXWwAAzeQx-ff4e5519376343ec3d60307eaffba6a3.jpeg" width="1105" height="610" class="img_ev3q"></p><p>❌ Limitations/Short-coming: Potential data contamination? Didn't Mistral-7B have access to all MTEB benchmark datasets? - The MTEB was released(Oct 2022) before the training cutoff(~2023) of the model. So there might be some contamination since we are using the LLM to embed this same data.</p><ul><li>maybe I just don't understand!🤷</li></ul><p>📑The details:</p><ol><li>The synthetic data generation consists of prompting GPT-4 to brainstorm a list of potential retrieval tasks followed by getting GPT-4 to generate (query, positive, hard negative) triplets for each task.</li></ol><p><img loading="lazy" alt="img2" src="/assets/images/GC1L-EdXgAA9QSZ-28fde271f22e6d470f99b86e836a399c.png" width="1117" height="1165" class="img_ev3q"></p><ol start="2"><li>This synthetic data captures text embedding tasks in 93 languages, covering 1000's of embedding tasks. - see prompt templates for how this diversity is obtained and the different tasks that are generated</li></ol><p><img loading="lazy" alt="img3" src="/assets/images/GC1MB5PXoAAqanX-201a5a8a21a2c2536e9deafd56f18d7c.jpeg" width="1072" height="630" class="img_ev3q"></p><ol start="3"><li><p>Fine-tune Mistral-7B using standard contrastive loss with a temperature-scaled cosine similarity with LoRA with rank 16.</p></li><li><p>For Mistral-7B based models, contrastive pre-training has negligible impact on the model quality. This is surprising since it's one of the key factors behind the success of existing text embedding models.</p></li></ol><p><img loading="lazy" alt="img4" src="/assets/images/GC1Me0_XQAAkU1O-3b5b4314d065dac44be7a2adb596016c.png" width="1090" height="409" class="img_ev3q"></p><p>🚀Pretty cool work overall, anytime an LLM augments a well studied field and beats SOTA it's exciting times! They show that language modeling and text embeddings are two sides of the same coin.🪙 </p><p>Given an embedding task prompt template, a robust LLM should be able to generate training data and then be transformed into an embedding model through light-weight fine-tuning.</p><p><a class="btn_VbJ1 btnMain_ywTD" href="https://arxiv.org/abs/2401.00368" download="">🔗 arXiv Link</a></p><p><a class="btn_VbJ1 btnMain_ywTD" href="https://arxiv.org/pdf/2401.00368" download="">📜 Download paper</a></p><h2 class="anchor anchorWithStickyNavbar_LWe7" id="ready-to-start-building">Ready to start building?<a href="#ready-to-start-building" class="hash-link" aria-label="Direct link to Ready to start building?" title="Direct link to Ready to start building?">​</a></h2><p>Check out the <a href="https://docs.weaviate.io/weaviate/quickstart" target="_blank" rel="noopener noreferrer">Quickstart tutorial</a>, or build amazing apps with a free trial of <a href="https://console.weaviate.cloud/" target="_blank" rel="noopener noreferrer">Weaviate Cloud (WCD)</a>.</p><div class="communityWrapper_ZpuS"><div class="container_sUl4"><div class="wrapper_FyvH"><div class="rightSide_UqS8"><div class="socialBox_W1XR"><a href="https://github.com/weaviate/weaviate" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="github_DEOB"></div><p class="text_g9NY">GitHub</p></a></div><div class="socialBox_W1XR"><a href="https://forum.weaviate.io/" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="forum_pUq6"></div><p class="text_g9NY">Forum</p></a></div><div class="socialBox_W1XR"><a href="https://twitter.com/weaviate_io" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="twitter_ewvw"></div><p class="text_g9NY">X (Twitter)</p></a></div></div><div class="leftSide_WlMC"><h2 class="communityHeader_jLni">Don't want to miss another blog post?</h2><span class="rightText_noBq"><p>Sign up for our bi-weekly newsletter to stay updated!</p> <br>By submitting, I agree to the<!-- --> <a href="/service">Terms of Service </a>and<!-- --> <a href="/privacy">Privacy Policy</a>.</span><div class="communityForm_pedn"><iframe src="https://embeds.beehiiv.com/15b21ebd-decd-433b-ada8-2d405e345f2e?slim=true" data-test-id="beehiiv-embed" frameborder="0" scrolling="no" style="margin:0;border-radius:0px;button-colour:#61BD73;background-color:transparent;width:100%;important:"></iframe></div></div></div></div></div>]]></content>
        <author>
            <name>Zain Hasan</name>
            <uri>https://github.com/zainhas</uri>
        </author>
    </entry>
    <entry>
        <title type="html"><![CDATA[Discovering the Hidden Vocabulary of DALLE-2]]></title>
        <id>https://weaviate.io/papers/paper13</id>
        <link href="https://weaviate.io/papers/paper13"/>
        <updated>2023-12-25T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[text2image diffusion models learn and use a secret language.]]></summary>
        <content type="html"><![CDATA[<p><img loading="lazy" alt="A preview of the paper" src="/assets/images/hero-b29a2a997e0c855eb72d45661bf25d29.jpeg" width="1024" height="1024" class="img_ev3q"></p><p>TIL that text2image diffusion models learn and use a secret language.</p><p>Tested this with the new DALL-E-3 and it works!🤯</p><p>Read a couple of papers and they mentioned that diffusion models when forced to output text generate images of gibberish words.</p><p>If you take those words and pass them back in as prompts, the model can draw for you what the word means to it.</p><p>For example: "cagama gur gerano" = "a fantasy creature"</p><p>I tested this for the newly released DALL-E-3 model and, interestingly, even when told to generate English it still uses this secret learned language instead.</p><p>Below is a conversation about fantasy creatures between two farmers in this secret language.</p><p>Initial prompt: "Two farmers talking about vegetables, with english subtitles."</p><p>After this just prompt the model with individual and word pairs to get images with secret words. I share examples below.</p><p>Prompt: "cagama gur gerano"</p><p><img loading="lazy" alt="image1" src="/assets/images/img1-cd4222675efda8ded12a6e0afa234231.jpeg" width="1024" height="1024" class="img_ev3q"></p><p><a class="btn_VbJ1 btnMain_ywTD" href="https://arxiv.org/abs/2206.00169" download="">🔗 arXiv Link</a></p><p><a class="btn_VbJ1 btnMain_ywTD" href="https://arxiv.org/pdf/2206.00169" download="">📜 Download paper</a></p><h2 class="anchor anchorWithStickyNavbar_LWe7" id="ready-to-start-building">Ready to start building?<a href="#ready-to-start-building" class="hash-link" aria-label="Direct link to Ready to start building?" title="Direct link to Ready to start building?">​</a></h2><p>Check out the <a href="https://docs.weaviate.io/weaviate/quickstart" target="_blank" rel="noopener noreferrer">Quickstart tutorial</a>, or build amazing apps with a free trial of <a href="https://console.weaviate.cloud/" target="_blank" rel="noopener noreferrer">Weaviate Cloud (WCD)</a>.</p><div class="communityWrapper_ZpuS"><div class="container_sUl4"><div class="wrapper_FyvH"><div class="rightSide_UqS8"><div class="socialBox_W1XR"><a href="https://github.com/weaviate/weaviate" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="github_DEOB"></div><p class="text_g9NY">GitHub</p></a></div><div class="socialBox_W1XR"><a href="https://forum.weaviate.io/" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="forum_pUq6"></div><p class="text_g9NY">Forum</p></a></div><div class="socialBox_W1XR"><a href="https://twitter.com/weaviate_io" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="twitter_ewvw"></div><p class="text_g9NY">X (Twitter)</p></a></div></div><div class="leftSide_WlMC"><h2 class="communityHeader_jLni">Don't want to miss another blog post?</h2><span class="rightText_noBq"><p>Sign up for our bi-weekly newsletter to stay updated!</p> <br>By submitting, I agree to the<!-- --> <a href="/service">Terms of Service </a>and<!-- --> <a href="/privacy">Privacy Policy</a>.</span><div class="communityForm_pedn"><iframe src="https://embeds.beehiiv.com/15b21ebd-decd-433b-ada8-2d405e345f2e?slim=true" data-test-id="beehiiv-embed" frameborder="0" scrolling="no" style="margin:0;border-radius:0px;button-colour:#61BD73;background-color:transparent;width:100%;important:"></iframe></div></div></div></div></div>]]></content>
        <author>
            <name>Zain Hasan</name>
            <uri>https://github.com/zainhas</uri>
        </author>
    </entry>
    <entry>
        <title type="html"><![CDATA[Fine-Tuning or Retrieval? Comparing Knowledge Injection in LLMs]]></title>
        <id>https://weaviate.io/papers/paper11</id>
        <link href="https://weaviate.io/papers/paper11"/>
        <updated>2023-12-19T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[Compares finetuning vs RAG for improvement on a specific domain.]]></summary>
        <content type="html"><![CDATA[<p><img loading="lazy" alt="A preview of the paper" src="/assets/images/hero-9cac4ae265e88b1e67fa91b9aaf72966.png" width="2292" height="1302" class="img_ev3q"></p><p>❓When using LLMs is unsupervised fine-tuning better than RAG for knowledge-intensive tasks? Should you do both?</p><p>If you want to augment an LLM with knowledge of your enterprise data you can do so by augmenting the parametric (finetune) or non-parametric(w/ a vector db like
@weaviate_io
) memory.</p><p>📜Researchers from Microsoft(<a href="https://arxiv.org/abs/2312.05934" target="_blank" rel="noopener noreferrer">https://arxiv.org/abs/2312.05934</a>) asked if unsupervised next token prediction finetuning is better than RAG to improve LLM perf. on both seen and unseen QnA tasks?</p><p>⏩In Short: RAG is a better way to inject knowledge into LLMs than unsupervised fine-tuning(USFT) and more surprisingly they found that RAG alone is even better than RAG + finetuning. Probably because USFT is not efficiently persisting new knowledge into params.</p><p>Would be cool to see a study comparing RAG vs. SFT/Instruction tuning or RLHF.</p><p>This improvement in QnA tasks with RAG occurred for both questions in the MMLU dataset as well as on a new dataset of "current events" that the model was not trained on.</p><p>📑The details:</p><ol><li><p>Used Mistral, Llama2, Orca2 7B for all assessments.</p></li><li><p>Only unsupervised finetuning was done - a direct continuation of the pre-training phase - by predicting the next token on the dataset</p></li><li><p>Used bge-large-en as the embedding model for the RAG component</p></li><li><p>Finetuning with multiple paraphrases of the same fact provides a significant improvement over the baseline. - To teach pre-trained LLMs new knowledge, the knowledge must be repeated in numerous ways</p></li></ol><p>❌ Limitations/Short-comings:</p><ol><li><p>Only a continuation of the pre-training was assessed - no instruction tuning or RLHF - SFT and RLHF will boost performance further.</p></li><li><p>Accuracy performance variance is quite high across the experiments - so it's quite hard to determine the statistical significance of results. </p></li><li><p>Why is the performance of baseline models on future data not 25% for MCQs with 4 choices? - Not truly "unseen" knowledge. </p></li><li><p>Only straightforward knowledge/fact tasks were assessed - reasoning capabilities were not assessed..</p></li></ol><p><a class="btn_VbJ1 btnMain_ywTD" href="https://arxiv.org/abs/2312.05934" download="">🔗 arXiv Link</a></p><p><a class="btn_VbJ1 btnMain_ywTD" href="https://arxiv.org/pdf/2312.05934" download="">📜 Download paper</a></p><h2 class="anchor anchorWithStickyNavbar_LWe7" id="ready-to-start-building">Ready to start building?<a href="#ready-to-start-building" class="hash-link" aria-label="Direct link to Ready to start building?" title="Direct link to Ready to start building?">​</a></h2><p>Check out the <a href="https://docs.weaviate.io/weaviate/quickstart" target="_blank" rel="noopener noreferrer">Quickstart tutorial</a>, or build amazing apps with a free trial of <a href="https://console.weaviate.cloud/" target="_blank" rel="noopener noreferrer">Weaviate Cloud (WCD)</a>.</p><div class="communityWrapper_ZpuS"><div class="container_sUl4"><div class="wrapper_FyvH"><div class="rightSide_UqS8"><div class="socialBox_W1XR"><a href="https://github.com/weaviate/weaviate" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="github_DEOB"></div><p class="text_g9NY">GitHub</p></a></div><div class="socialBox_W1XR"><a href="https://forum.weaviate.io/" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="forum_pUq6"></div><p class="text_g9NY">Forum</p></a></div><div class="socialBox_W1XR"><a href="https://twitter.com/weaviate_io" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="twitter_ewvw"></div><p class="text_g9NY">X (Twitter)</p></a></div></div><div class="leftSide_WlMC"><h2 class="communityHeader_jLni">Don't want to miss another blog post?</h2><span class="rightText_noBq"><p>Sign up for our bi-weekly newsletter to stay updated!</p> <br>By submitting, I agree to the<!-- --> <a href="/service">Terms of Service </a>and<!-- --> <a href="/privacy">Privacy Policy</a>.</span><div class="communityForm_pedn"><iframe src="https://embeds.beehiiv.com/15b21ebd-decd-433b-ada8-2d405e345f2e?slim=true" data-test-id="beehiiv-embed" frameborder="0" scrolling="no" style="margin:0;border-radius:0px;button-colour:#61BD73;background-color:transparent;width:100%;important:"></iframe></div></div></div></div></div>]]></content>
        <author>
            <name>Zain Hasan</name>
            <uri>https://github.com/zainhas</uri>
        </author>
    </entry>
    <entry>
        <title type="html"><![CDATA[Dense X Retrieval: What Retrieval Granularity Should We Use?]]></title>
        <id>https://weaviate.io/papers/paper10</id>
        <link href="https://weaviate.io/papers/paper10"/>
        <updated>2023-12-17T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[A new way to chunk your data using LLM's.]]></summary>
        <content type="html"><![CDATA[<p><img loading="lazy" alt="A preview of the paper" src="/assets/images/hero-1b462d250e32f483337ad522af4bccad.jpeg" width="1200" height="562" class="img_ev3q"></p><p>❓What text chunk size should we use in our RAG workflows? How does chunk size impact retrieval recall? Are bigger chunks better? Smaller chunks but keep more top-k?</p><p>📜The new paper from Tencent and Carnegie Mellon(<a href="https://arxiv.org/abs/2312.06648" target="_blank" rel="noopener noreferrer">https://arxiv.org/abs/2312.06648</a>) asked:</p><ol><li>What chunk size is best to segment and index a vector database like
@weaviate_io
?</li><li>How does chunk size impact generalization for passage retrieval and accuracy for QA RAG tasks?</li></ol><p>⏩In Short: They found that instead of using 100-word passage or sentence-level chunking it's best create Propositions - concise, distinct and self-contained expressions of factoids. </p><p>Propositions are generated by a finetuned LLM - which takes in paragraphs as input and is instructed to generate propositions.(blue in the image)</p><p>Going to try this out with the current
@weaviate_io
workflows.</p><p>📑The details:</p><ol><li><p>QnA RAG Improvements: +5.9, +7.8,+5.8, +4.9, +5.9, and +6.9 EM@100(exact match using 100 words) for SimCSE, Contriever, DPR, ANCE, TAS-B, and GTR.</p></li><li><p>Passage Retrieval Perf. : Improvement of Recall@20 is +10.1% and +2.2% for unsupervised and supervised retrievers resp.</p></li><li><p>Propositions have the following properties:
a. unique: a distinct piece of meaning in text
b. atomic: cannot be further split into separate propositions
c. self-contained: includes all the necessary context</p></li><li><p>The paragraph-to-proposition generating LLM (a FlanT5-large model) is finetuned using a 42k passage dataset that has been atomized into propositions using GPT-4 - ie. the process is automatable.</p></li><li><p>Supervised retrievers show less improvements with Propositions b/c these retrievers are trained on query-passage pairs.</p></li><li><p>Unsupervised retrieval by proposition demonstrates a clear advantage - 17-25% Recall@5 relative improvement on EntityQuestions with DPR and ANCE.</p></li><li><p>Works better for rare concepts: Retrieving by proposition much more advantageous for questions targeting less common entities.</p></li><li><p>The RAG (retrieve-then-read) task uses a T5-large size UnifiedQA-v2 as the reader model.</p></li><li><p>Proposition chunks outperform passage chunks for QnA most in the range of 100-200 words = ~10 propositions = ~5 sentences = ~2 passages.</p></li></ol><p><a class="btn_VbJ1 btnMain_ywTD" href="https://arxiv.org/abs/2312.06648" download="">🔗 arXiv Link</a></p><p><a class="btn_VbJ1 btnMain_ywTD" href="https://arxiv.org/pdf/2312.06648" download="">📜 Download paper</a></p><h2 class="anchor anchorWithStickyNavbar_LWe7" id="ready-to-start-building">Ready to start building?<a href="#ready-to-start-building" class="hash-link" aria-label="Direct link to Ready to start building?" title="Direct link to Ready to start building?">​</a></h2><p>Check out the <a href="https://docs.weaviate.io/weaviate/quickstart" target="_blank" rel="noopener noreferrer">Quickstart tutorial</a>, or build amazing apps with a free trial of <a href="https://console.weaviate.cloud/" target="_blank" rel="noopener noreferrer">Weaviate Cloud (WCD)</a>.</p><div class="communityWrapper_ZpuS"><div class="container_sUl4"><div class="wrapper_FyvH"><div class="rightSide_UqS8"><div class="socialBox_W1XR"><a href="https://github.com/weaviate/weaviate" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="github_DEOB"></div><p class="text_g9NY">GitHub</p></a></div><div class="socialBox_W1XR"><a href="https://forum.weaviate.io/" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="forum_pUq6"></div><p class="text_g9NY">Forum</p></a></div><div class="socialBox_W1XR"><a href="https://twitter.com/weaviate_io" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="twitter_ewvw"></div><p class="text_g9NY">X (Twitter)</p></a></div></div><div class="leftSide_WlMC"><h2 class="communityHeader_jLni">Don't want to miss another blog post?</h2><span class="rightText_noBq"><p>Sign up for our bi-weekly newsletter to stay updated!</p> <br>By submitting, I agree to the<!-- --> <a href="/service">Terms of Service </a>and<!-- --> <a href="/privacy">Privacy Policy</a>.</span><div class="communityForm_pedn"><iframe src="https://embeds.beehiiv.com/15b21ebd-decd-433b-ada8-2d405e345f2e?slim=true" data-test-id="beehiiv-embed" frameborder="0" scrolling="no" style="margin:0;border-radius:0px;button-colour:#61BD73;background-color:transparent;width:100%;important:"></iframe></div></div></div></div></div>]]></content>
        <author>
            <name>Zain Hasan</name>
            <uri>https://github.com/zainhas</uri>
        </author>
    </entry>
    <entry>
        <title type="html"><![CDATA[SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models]]></title>
        <id>https://weaviate.io/papers/paper9</id>
        <link href="https://weaviate.io/papers/paper9"/>
        <updated>2023-12-08T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[Get an LLM to check its own responses for hallucination.]]></summary>
        <content type="html"><![CDATA[<p><img loading="lazy" alt="Get an LLM to check its own responses for hallucination." src="/assets/images/hero-5c15ec382ad942b8296b110eb44338b5.jpeg" width="1185" height="1200" class="img_ev3q"></p><p>❓Can you really get a LLM to self-check its own responses for hallucinations?</p><p>📜Researchers from Cambridge released a <a href="https://arxiv.org/abs/2303.08896" target="_blank" rel="noopener noreferrer">paper</a> developing a method called SelfCheckGPT - a framework that uses only black-box access to a LLM through an API to assess if it's hallucinating.</p><p>⏩TLDR: They pass in the same prompt to the model multiple times and generate N more sample responses in addition to the original response and get the LLM to check for inconsistencies.</p><p>📑They compare how often each sentence in the original response contradicts these samples using the following prompt for every sentence:</p><div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_biex"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">Context: </span><span class="token punctuation" style="color:#393A34">{</span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"> </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">Sentence: </span><span class="token punctuation" style="color:#393A34">{</span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"> </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">Is the sentence supported by the context above? </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">Answer Yes or No: </span><br></span></code></pre><div class="buttonGroup__atx"><button type="button" aria-label="Copy code to clipboard" title="Copy" class="clean-btn"><span class="copyButtonIcons_eSgA" aria-hidden="true"><svg viewBox="0 0 24 24" class="copyButtonIcon_y97N"><path fill="currentColor" d="M19,21H8V7H19M19,5H8A2,2 0 0,0 6,7V21A2,2 0 0,0 8,23H19A2,2 0 0,0 21,21V7A2,2 0 0,0 19,5M16,1H4A2,2 0 0,0 2,3V17H4V3H16V1Z"></path></svg><svg viewBox="0 0 24 24" class="copyButtonSuccessIcon_LjdS"><path fill="currentColor" d="M21,7L9,19L3.5,13.5L4.91,12.09L9,16.17L19.59,5.59L21,7Z"></path></svg></span></button></div></div></div><p>The intuition is that if an LLM has knowledge of a given concept, sampled responses are likely to be similar and contain consistent facts. Hallucinated statements, on the other hand, are more likely to diverge from the extra sampled responses and will contradict one another.</p><p>They report higher AUC-PR scores in sentence-level hallucination detection and higher correlation scores in passage-level factuality assessment compared to grey-box methods - which use model weights and output token probs.</p><p>❌Some potential problems with this approach:</p><ol><li><p>What if the model hallucinates in a confident way where even the sampled responses contain the hallucinations?</p></li><li><p>Very high-cost method(requires (N + number of sentences) queries per prompt) - if you are generating N samples every time and then verifying for every sentence in a query you will rack up a bill really quickly</p></li></ol><p><a class="btn_VbJ1 btnMain_ywTD" href="https://arxiv.org/abs/2303.08896" download="">🔗 arXiv Link</a></p><p><a class="btn_VbJ1 btnMain_ywTD" href="https://arxiv.org/pdf/2303.08896" download="">📜 Download paper</a></p><h2 class="anchor anchorWithStickyNavbar_LWe7" id="ready-to-start-building">Ready to start building?<a href="#ready-to-start-building" class="hash-link" aria-label="Direct link to Ready to start building?" title="Direct link to Ready to start building?">​</a></h2><p>Check out the <a href="https://docs.weaviate.io/weaviate/quickstart" target="_blank" rel="noopener noreferrer">Quickstart tutorial</a>, or build amazing apps with a free trial of <a href="https://console.weaviate.cloud/" target="_blank" rel="noopener noreferrer">Weaviate Cloud (WCD)</a>.</p><div class="communityWrapper_ZpuS"><div class="container_sUl4"><div class="wrapper_FyvH"><div class="rightSide_UqS8"><div class="socialBox_W1XR"><a href="https://github.com/weaviate/weaviate" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="github_DEOB"></div><p class="text_g9NY">GitHub</p></a></div><div class="socialBox_W1XR"><a href="https://forum.weaviate.io/" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="forum_pUq6"></div><p class="text_g9NY">Forum</p></a></div><div class="socialBox_W1XR"><a href="https://twitter.com/weaviate_io" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="twitter_ewvw"></div><p class="text_g9NY">X (Twitter)</p></a></div></div><div class="leftSide_WlMC"><h2 class="communityHeader_jLni">Don't want to miss another blog post?</h2><span class="rightText_noBq"><p>Sign up for our bi-weekly newsletter to stay updated!</p> <br>By submitting, I agree to the<!-- --> <a href="/service">Terms of Service </a>and<!-- --> <a href="/privacy">Privacy Policy</a>.</span><div class="communityForm_pedn"><iframe src="https://embeds.beehiiv.com/15b21ebd-decd-433b-ada8-2d405e345f2e?slim=true" data-test-id="beehiiv-embed" frameborder="0" scrolling="no" style="margin:0;border-radius:0px;button-colour:#61BD73;background-color:transparent;width:100%;important:"></iframe></div></div></div></div></div>]]></content>
        <author>
            <name>Zain Hasan</name>
            <uri>https://github.com/zainhas</uri>
        </author>
    </entry>
    <entry>
        <title type="html"><![CDATA[Who’s Harry Potter? Approximate Unlearning in LLMs]]></title>
        <id>https://weaviate.io/papers/paper7</id>
        <link href="https://weaviate.io/papers/paper7"/>
        <updated>2023-12-06T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[Making LLMs forget by finetuning.]]></summary>
        <content type="html"><![CDATA[<p><img loading="lazy" alt="A preview of the paper" src="/assets/images/hero-b1ed641b0fa9214ac59f6e461238076a.png" width="2260" height="632" class="img_ev3q"></p><p>Sure, you can train a LLM, perhaps you can even finetune one! But can you brainwash one into forgetting specific concepts?🧠</p><p>How would you erase a concept from a LLM's parametric memory?</p><p>This question was addressed by researchers at MicrosoftAI in their new paper(<a href="https://arxiv.org/abs/2310.02238" target="_blank" rel="noopener noreferrer">https://arxiv.org/abs/2310.02238</a>) where they "propose a novel technique for unlearning a subset of the training data from a LLM" without adversely impacting performance on other benchmarks.</p><p>They propose "unlearning" or "un-training" as a three-step process:</p><ol><li><p>First they finetune a model to always respond with some reference to the information they want to later erase. This "reinforced model" becomes a specialist in the information we eventually want to unlearn. This step is used to identify which tokens should be targeted in the unlearning step!</p></li><li><p>For each of these unlearning targets identified in step 1 they generate synthetic generic alternatives using GPT4. So for example a sentence that originally says "Harry went to the Gryffindor common room" should be turned into "Harry went to the gym".</p></li><li><p>Each block of text from the unlearn target is then replaced with the generic counterparts and this dataset is now used to finetune the base model, which effectively erases the original text from the model’s memory whenever it is prompted with its context.</p></li></ol><p>For these experiments, they finetuned a Llama2-7b and it took about 1 GPU hour to implement this unlearning for a dataset that consisted of about 3.1M Harry Potter related tokens.</p><p><a class="btn_VbJ1 btnMain_ywTD" href="https://arxiv.org/abs/2310.02238" download="">🔗 arXiv Link</a></p><p><a class="btn_VbJ1 btnMain_ywTD" href="https://arxiv.org/pdf/2310.02238" download="">📜 Download paper</a></p><h2 class="anchor anchorWithStickyNavbar_LWe7" id="ready-to-start-building">Ready to start building?<a href="#ready-to-start-building" class="hash-link" aria-label="Direct link to Ready to start building?" title="Direct link to Ready to start building?">​</a></h2><p>Check out the <a href="https://docs.weaviate.io/weaviate/quickstart" target="_blank" rel="noopener noreferrer">Quickstart tutorial</a>, or build amazing apps with a free trial of <a href="https://console.weaviate.cloud/" target="_blank" rel="noopener noreferrer">Weaviate Cloud (WCD)</a>.</p><div class="communityWrapper_ZpuS"><div class="container_sUl4"><div class="wrapper_FyvH"><div class="rightSide_UqS8"><div class="socialBox_W1XR"><a href="https://github.com/weaviate/weaviate" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="github_DEOB"></div><p class="text_g9NY">GitHub</p></a></div><div class="socialBox_W1XR"><a href="https://forum.weaviate.io/" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="forum_pUq6"></div><p class="text_g9NY">Forum</p></a></div><div class="socialBox_W1XR"><a href="https://twitter.com/weaviate_io" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="twitter_ewvw"></div><p class="text_g9NY">X (Twitter)</p></a></div></div><div class="leftSide_WlMC"><h2 class="communityHeader_jLni">Don't want to miss another blog post?</h2><span class="rightText_noBq"><p>Sign up for our bi-weekly newsletter to stay updated!</p> <br>By submitting, I agree to the<!-- --> <a href="/service">Terms of Service </a>and<!-- --> <a href="/privacy">Privacy Policy</a>.</span><div class="communityForm_pedn"><iframe src="https://embeds.beehiiv.com/15b21ebd-decd-433b-ada8-2d405e345f2e?slim=true" data-test-id="beehiiv-embed" frameborder="0" scrolling="no" style="margin:0;border-radius:0px;button-colour:#61BD73;background-color:transparent;width:100%;important:"></iframe></div></div></div></div></div>]]></content>
        <author>
            <name>Zain Hasan</name>
            <uri>https://github.com/zainhas</uri>
        </author>
    </entry>
    <entry>
        <title type="html"><![CDATA[Long context prompting for Claude 2.1]]></title>
        <id>https://weaviate.io/papers/paper8</id>
        <link href="https://weaviate.io/papers/paper8"/>
        <updated>2023-12-06T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[Using prompt engineering to solve the 'lost in the middle' problem.]]></summary>
        <content type="html"><![CDATA[<p><img loading="lazy" alt="Using prompt engineering to solve the &amp;#39;lost in the middle&amp;#39; problem." src="/assets/images/hero-3722970f5e63fcc63613dab42719a090.webp" width="1712" height="1508" class="img_ev3q"></p><p>Anthropic was able to solve the "lost in the middle" problem "by adding the sentence “Here is the most relevant sentence in the context:” to the start of Claude’s response. This was enough to raise Claude 2.1’s score from 27% to 98% on the original evaluation."</p><p>Does it just take a little bit of prompt engineering to solve low accuracy when needing to retrieve from the middle of a context window??</p><p><a class="btn_VbJ1 btnMain_ywTD" href="https://anthropic.com/index/claude-2-1-prompting" download="">🔗 Article Link</a></p><h2 class="anchor anchorWithStickyNavbar_LWe7" id="ready-to-start-building">Ready to start building?<a href="#ready-to-start-building" class="hash-link" aria-label="Direct link to Ready to start building?" title="Direct link to Ready to start building?">​</a></h2><p>Check out the <a href="https://docs.weaviate.io/weaviate/quickstart" target="_blank" rel="noopener noreferrer">Quickstart tutorial</a>, or build amazing apps with a free trial of <a href="https://console.weaviate.cloud/" target="_blank" rel="noopener noreferrer">Weaviate Cloud (WCD)</a>.</p><div class="communityWrapper_ZpuS"><div class="container_sUl4"><div class="wrapper_FyvH"><div class="rightSide_UqS8"><div class="socialBox_W1XR"><a href="https://github.com/weaviate/weaviate" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="github_DEOB"></div><p class="text_g9NY">GitHub</p></a></div><div class="socialBox_W1XR"><a href="https://forum.weaviate.io/" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="forum_pUq6"></div><p class="text_g9NY">Forum</p></a></div><div class="socialBox_W1XR"><a href="https://twitter.com/weaviate_io" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="twitter_ewvw"></div><p class="text_g9NY">X (Twitter)</p></a></div></div><div class="leftSide_WlMC"><h2 class="communityHeader_jLni">Don't want to miss another blog post?</h2><span class="rightText_noBq"><p>Sign up for our bi-weekly newsletter to stay updated!</p> <br>By submitting, I agree to the<!-- --> <a href="/service">Terms of Service </a>and<!-- --> <a href="/privacy">Privacy Policy</a>.</span><div class="communityForm_pedn"><iframe src="https://embeds.beehiiv.com/15b21ebd-decd-433b-ada8-2d405e345f2e?slim=true" data-test-id="beehiiv-embed" frameborder="0" scrolling="no" style="margin:0;border-radius:0px;button-colour:#61BD73;background-color:transparent;width:100%;important:"></iframe></div></div></div></div></div>]]></content>
        <author>
            <name>Zain Hasan</name>
            <uri>https://github.com/zainhas</uri>
        </author>
    </entry>
    <entry>
        <title type="html"><![CDATA[Retrieval-Augmented Multimodal Language Modeling]]></title>
        <id>https://weaviate.io/papers/paper6</id>
        <link href="https://weaviate.io/papers/paper6"/>
        <updated>2023-12-04T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[Multimodal RAG and its benefits!]]></summary>
        <content type="html"><![CDATA[<p><img loading="lazy" alt="A preview of the paper" src="/assets/images/hero-d56d2552dd8525b50c274aded0d1c7e6.png" width="3386" height="950" class="img_ev3q"></p><p>What's better than retrieval augmented generation(RAG)? 🥁🥁🥁 </p><p>Multimodal RAG! 😎👌🔥</p><p>RAG allows you to pack retrieved context into a prompt so that a language model can read relevant information before generating a response - this function is critical and allows us to integrate knowledge in a more scalable and modular way into LLMs. </p><p>But isn't a picture worth a thousand words? So why just stop at retrieving textual context??</p><p>This is where multimodal RAG(MM-RAG) comes into the picture!</p><p>If you have an external knowledge database that can represent and store multimedia like images, audio, and video, just as well as it can text, then you can retrieve these objects and provide them a richer context for Large Multimodal Models to generate with. </p><p>Vector databases provide an ideal store from which multimedia knowledge can be retrieved and can capture the meaning of all of these modalities. </p><p>A paper(<a href="https://arxiv.org/abs/2211.12561" target="_blank" rel="noopener noreferrer">https://arxiv.org/abs/2211.12561</a>) earlier this year from @michiyasunaga at Stanford presented the first multimodal model that can retrieve and generate both text and images and discusses the advantages of MM-RAG.</p><p>They found that MM-RAG:</p><ol><li><p>Significantly outperforms baseline multimodal models such as DALL-E and CM3 on both image and caption generation tasks</p></li><li><p>Require much less compute for training (&lt;30% of DALLE)</p></li><li><p>MM-RAG capable models also generate images much more faithful to the retrieved context</p></li><li><p>Are capable of multimodal in-context learning (e.g., image generation from demonstrations)</p></li></ol><p><a class="btn_VbJ1 btnMain_ywTD" href="https://arxiv.org/abs/2211.12561" download="">🔗 arXiv Link</a></p><p><a class="btn_VbJ1 btnMain_ywTD" href="https://arxiv.org/pdf/2211.12561" download="">📜 Download paper</a></p><h2 class="anchor anchorWithStickyNavbar_LWe7" id="ready-to-start-building">Ready to start building?<a href="#ready-to-start-building" class="hash-link" aria-label="Direct link to Ready to start building?" title="Direct link to Ready to start building?">​</a></h2><p>Check out the <a href="https://docs.weaviate.io/weaviate/quickstart" target="_blank" rel="noopener noreferrer">Quickstart tutorial</a>, or build amazing apps with a free trial of <a href="https://console.weaviate.cloud/" target="_blank" rel="noopener noreferrer">Weaviate Cloud (WCD)</a>.</p><div class="communityWrapper_ZpuS"><div class="container_sUl4"><div class="wrapper_FyvH"><div class="rightSide_UqS8"><div class="socialBox_W1XR"><a href="https://github.com/weaviate/weaviate" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="github_DEOB"></div><p class="text_g9NY">GitHub</p></a></div><div class="socialBox_W1XR"><a href="https://forum.weaviate.io/" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="forum_pUq6"></div><p class="text_g9NY">Forum</p></a></div><div class="socialBox_W1XR"><a href="https://twitter.com/weaviate_io" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="twitter_ewvw"></div><p class="text_g9NY">X (Twitter)</p></a></div></div><div class="leftSide_WlMC"><h2 class="communityHeader_jLni">Don't want to miss another blog post?</h2><span class="rightText_noBq"><p>Sign up for our bi-weekly newsletter to stay updated!</p> <br>By submitting, I agree to the<!-- --> <a href="/service">Terms of Service </a>and<!-- --> <a href="/privacy">Privacy Policy</a>.</span><div class="communityForm_pedn"><iframe src="https://embeds.beehiiv.com/15b21ebd-decd-433b-ada8-2d405e345f2e?slim=true" data-test-id="beehiiv-embed" frameborder="0" scrolling="no" style="margin:0;border-radius:0px;button-colour:#61BD73;background-color:transparent;width:100%;important:"></iframe></div></div></div></div></div>]]></content>
        <author>
            <name>Zain Hasan</name>
            <uri>https://github.com/zainhas</uri>
        </author>
    </entry>
    <entry>
        <title type="html"><![CDATA[A Watermark for Large Language Model]]></title>
        <id>https://weaviate.io/papers/paper5</id>
        <link href="https://weaviate.io/papers/paper5"/>
        <updated>2023-11-25T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[Differentiating between human-written language and AI-generated text.]]></summary>
        <content type="html"><![CDATA[<p><img loading="lazy" alt="A preview of the paper" src="/assets/images/hero-5fb0a5cca4a15cbe3a766f50d0b77ed7.jpeg" width="1200" height="1133" class="img_ev3q"></p><p>Can you tell the difference between human-written language and AI-generated text?🤔 </p><p>To solve this problem we need watermarks!📃</p><p>Researchers at the University of Maryland(<a href="https://arxiv.org/abs/2301.10226" target="_blank" rel="noopener noreferrer">https://arxiv.org/abs/2301.10226</a>) created a way for us to modify LLMs such that a watermark would automatically be applied to any content that LLM generates. This allows us to run a test for this watermark to identify synthetic content in the wild.</p><p>A watermark is a hidden pattern in text that is imperceptible to humans, but when the text is statistically analyzed it allows us to identify synthetic content. The watermark they created can be identified in as little as 25 tokens and has negligible impact on text quality.</p><p>The watermark works by selecting a randomized set of secret “green” tokens before a word is generated, and then softly incentivizing the LLM to use those green tokens by slightly nudging the output word probabilities during sampling. The more "green" tokens found in a chunk of text the higher the probability it was generated by an LLM.</p><p>The challenge here is that in order to apply this watermark the company owning the LLM (OpenAI, Cohere, Anthropic etc.) needs to promote the use of these random secret "green" tokens by slightly increasing their probability of being generated.</p><p>Yet, another problem is that the higher this "green" token probability the easier the watermark will be to detect however, this also lowers the quality of the text overall.</p><p><a class="btn_VbJ1 btnMain_ywTD" href="https://arxiv.org/abs/2301.10226" download="">🔗 arXiv Link</a></p><p><a class="btn_VbJ1 btnMain_ywTD" href="https://arxiv.org/pdf/2301.10226" download="">📜 Download paper</a></p><h2 class="anchor anchorWithStickyNavbar_LWe7" id="ready-to-start-building">Ready to start building?<a href="#ready-to-start-building" class="hash-link" aria-label="Direct link to Ready to start building?" title="Direct link to Ready to start building?">​</a></h2><p>Check out the <a href="https://docs.weaviate.io/weaviate/quickstart" target="_blank" rel="noopener noreferrer">Quickstart tutorial</a>, or build amazing apps with a free trial of <a href="https://console.weaviate.cloud/" target="_blank" rel="noopener noreferrer">Weaviate Cloud (WCD)</a>.</p><div class="communityWrapper_ZpuS"><div class="container_sUl4"><div class="wrapper_FyvH"><div class="rightSide_UqS8"><div class="socialBox_W1XR"><a href="https://github.com/weaviate/weaviate" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="github_DEOB"></div><p class="text_g9NY">GitHub</p></a></div><div class="socialBox_W1XR"><a href="https://forum.weaviate.io/" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="forum_pUq6"></div><p class="text_g9NY">Forum</p></a></div><div class="socialBox_W1XR"><a href="https://twitter.com/weaviate_io" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="twitter_ewvw"></div><p class="text_g9NY">X (Twitter)</p></a></div></div><div class="leftSide_WlMC"><h2 class="communityHeader_jLni">Don't want to miss another blog post?</h2><span class="rightText_noBq"><p>Sign up for our bi-weekly newsletter to stay updated!</p> <br>By submitting, I agree to the<!-- --> <a href="/service">Terms of Service </a>and<!-- --> <a href="/privacy">Privacy Policy</a>.</span><div class="communityForm_pedn"><iframe src="https://embeds.beehiiv.com/15b21ebd-decd-433b-ada8-2d405e345f2e?slim=true" data-test-id="beehiiv-embed" frameborder="0" scrolling="no" style="margin:0;border-radius:0px;button-colour:#61BD73;background-color:transparent;width:100%;important:"></iframe></div></div></div></div></div>]]></content>
        <author>
            <name>Zain Hasan</name>
            <uri>https://github.com/zainhas</uri>
        </author>
    </entry>
    <entry>
        <title type="html"><![CDATA[The Curse of Recursion: Training on Generated Data Makes Models Forget]]></title>
        <id>https://weaviate.io/papers/paper-4</id>
        <link href="https://weaviate.io/papers/paper-4"/>
        <updated>2023-11-23T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[Can we use synthetic data to train future generations of LLM's?]]></summary>
        <content type="html"><![CDATA[<p><img loading="lazy" alt="A preview of the paper" src="/assets/images/hero-ee798184802611d4f468dc79f42a7e9f.jpeg" width="1006" height="418" class="img_ev3q"></p><p>Can we use synthetic, LLM generated, data to train the next generations of bigger and better LLMs? How far will synthetic data take us in the pursuit of AGI?🤔</p><p>A paper(<a href="https://arxiv.org/abs/2305.17493" target="_blank" rel="noopener noreferrer">https://arxiv.org/abs/2305.17493</a>) from researchers at Oxford and Cambridge addressed these questions earlier this year.</p><p>Synthetic data is quite promising, but to see how far we can push the limit with it, this paper investigates what happens when text produced by a version of GPT forms most of the training dataset of following models. What happens to GPT versions GPT-{n} as generation n increases?</p><p><img loading="lazy" alt="image1" src="/assets/images/img-d6b8b9c20426e5c2114a25fc4ca60214.jpeg" width="2118" height="1471" class="img_ev3q"></p><p>In short, it was found that "model-collapse" occurs – a degenerative process whereby, over time, models forget the true underlying data distribution. Over time, you start losing information about the true distribution, which first starts with tails disappearing, and over the generations learned behaviors start converging to a point estimate with very small variance.</p><p>In other words - using synthetic data works at first but relying on it to successively train better and better models generation after generation seems like a loosing bet. Access to the original data distribution is crucial: in learning where the tails of the underlying distribution matter you need access to real human-produced data.</p><p>Here are some more details:</p><ol><li><p>Two sets of experiments were done: one in which all data is replaced with synthetic data produced by the last generation of LLM and another where only 90% is replaced(10% original human-produced data)</p></li><li><p>They found that preservation of the original data allows for better model fine-tuning and leads to only minor degradation of performance.</p></li><li><p>In early model collapse the model begins losing information about the tails of the distribution</p></li><li><p>In late model collapse model entangles different modes of the original distributions and converges to a distribution that carries little resemblance to the original one, often with very small variance.</p></li><li><p>This collapse occurs due to statistical approximation error(due to the number of samples being finite) and functional approximation error(comes from models being insufficiently expressive). A third source of error can also be computational error coming from floating point arithmetic.</p></li></ol><p><a class="btn_VbJ1 btnMain_ywTD" href="https://arxiv.org/abs/2305.17493" download="">🔗 arXiv Link</a></p><p><a class="btn_VbJ1 btnMain_ywTD" href="https://arxiv.org/pdf/2305.17493" download="">📜 Download paper</a></p><h2 class="anchor anchorWithStickyNavbar_LWe7" id="ready-to-start-building">Ready to start building?<a href="#ready-to-start-building" class="hash-link" aria-label="Direct link to Ready to start building?" title="Direct link to Ready to start building?">​</a></h2><p>Check out the <a href="https://docs.weaviate.io/weaviate/quickstart" target="_blank" rel="noopener noreferrer">Quickstart tutorial</a>, or build amazing apps with a free trial of <a href="https://console.weaviate.cloud/" target="_blank" rel="noopener noreferrer">Weaviate Cloud (WCD)</a>.</p><div class="communityWrapper_ZpuS"><div class="container_sUl4"><div class="wrapper_FyvH"><div class="rightSide_UqS8"><div class="socialBox_W1XR"><a href="https://github.com/weaviate/weaviate" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="github_DEOB"></div><p class="text_g9NY">GitHub</p></a></div><div class="socialBox_W1XR"><a href="https://forum.weaviate.io/" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="forum_pUq6"></div><p class="text_g9NY">Forum</p></a></div><div class="socialBox_W1XR"><a href="https://twitter.com/weaviate_io" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="twitter_ewvw"></div><p class="text_g9NY">X (Twitter)</p></a></div></div><div class="leftSide_WlMC"><h2 class="communityHeader_jLni">Don't want to miss another blog post?</h2><span class="rightText_noBq"><p>Sign up for our bi-weekly newsletter to stay updated!</p> <br>By submitting, I agree to the<!-- --> <a href="/service">Terms of Service </a>and<!-- --> <a href="/privacy">Privacy Policy</a>.</span><div class="communityForm_pedn"><iframe src="https://embeds.beehiiv.com/15b21ebd-decd-433b-ada8-2d405e345f2e?slim=true" data-test-id="beehiiv-embed" frameborder="0" scrolling="no" style="margin:0;border-radius:0px;button-colour:#61BD73;background-color:transparent;width:100%;important:"></iframe></div></div></div></div></div>]]></content>
        <author>
            <name>Zain Hasan</name>
            <uri>https://github.com/zainhas</uri>
        </author>
    </entry>
    <entry>
        <title type="html"><![CDATA[Can Large Language Models Infer Causation from Correlation?]]></title>
        <id>https://weaviate.io/papers/paper-3</id>
        <link href="https://weaviate.io/papers/paper-3"/>
        <updated>2023-11-22T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[Identify a key shortcoming of LLMs in terms of their causal inference skills.]]></summary>
        <content type="html"><![CDATA[<p><img loading="lazy" alt="A preview of the paper" src="/assets/images/hero-149df4245ecd93223cde9d5255169e7f.jpeg" width="1126" height="484" class="img_ev3q"></p><p>Can large language models infer causation from correlation? </p><p>And if they can't automatically bridge the gap from correlation to causation, then can we at least fine-tune them to improve at this task?</p><p>These two questions were addressed by researchers at the Max Planck Institute(<a href="https://arxiv.org/abs/2306.05836" target="_blank" rel="noopener noreferrer">https://arxiv.org/abs/2306.05836</a>). </p><p>We know that the success of LLM's arises from capturing a vast set of statistical correlations among words, but how well can these correlations be used to infer deeper causal relationships behind the words? </p><p>They showed that this is a very big shortcoming of all of the LLMs they tested as of June 2023.</p><p>Some details:</p><ol><li><p>Using a dataset, they created called Corr2Cause, they assessed an LLM's ability to take multiple correlational statements as input and accurately determine the causal relationships between these same variables.</p></li><li><p>Every model, out of 17 tested, provided a near-random performance on the task and was unable to perform pure causal reasoning. </p></li><li><p>Furthermore, they tested that even if you finetune the model for this task, it still doesn't generalize and only performs well for the variable names and textual expressions found in the training set. When the variable names and text were paraphrased model accuracy in inferring causality dropped. </p></li></ol><p>An important point to make is that It's quite difficult to differentiate and assess actual reasoning from rote training set memorization - even more so now that we have ever-growing training sets and don't know what data the model was trained on and only have API access to the models. </p><p>It'd be interesting to see if newer models since the date of publication can do better on paraphrased and renamed versions of this Corr2Cause dataset of ~400k datapoints.</p><p><a class="btn_VbJ1 btnMain_ywTD" href="https://arxiv.org/abs/2306.05836" download="">🔗 arXiv Link</a></p><p><a class="btn_VbJ1 btnMain_ywTD" href="https://arxiv.org/pdf/2306.05836" download="">📜 Download paper</a></p><h2 class="anchor anchorWithStickyNavbar_LWe7" id="ready-to-start-building">Ready to start building?<a href="#ready-to-start-building" class="hash-link" aria-label="Direct link to Ready to start building?" title="Direct link to Ready to start building?">​</a></h2><p>Check out the <a href="https://docs.weaviate.io/weaviate/quickstart" target="_blank" rel="noopener noreferrer">Quickstart tutorial</a>, or build amazing apps with a free trial of <a href="https://console.weaviate.cloud/" target="_blank" rel="noopener noreferrer">Weaviate Cloud (WCD)</a>.</p><div class="communityWrapper_ZpuS"><div class="container_sUl4"><div class="wrapper_FyvH"><div class="rightSide_UqS8"><div class="socialBox_W1XR"><a href="https://github.com/weaviate/weaviate" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="github_DEOB"></div><p class="text_g9NY">GitHub</p></a></div><div class="socialBox_W1XR"><a href="https://forum.weaviate.io/" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="forum_pUq6"></div><p class="text_g9NY">Forum</p></a></div><div class="socialBox_W1XR"><a href="https://twitter.com/weaviate_io" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="twitter_ewvw"></div><p class="text_g9NY">X (Twitter)</p></a></div></div><div class="leftSide_WlMC"><h2 class="communityHeader_jLni">Don't want to miss another blog post?</h2><span class="rightText_noBq"><p>Sign up for our bi-weekly newsletter to stay updated!</p> <br>By submitting, I agree to the<!-- --> <a href="/service">Terms of Service </a>and<!-- --> <a href="/privacy">Privacy Policy</a>.</span><div class="communityForm_pedn"><iframe src="https://embeds.beehiiv.com/15b21ebd-decd-433b-ada8-2d405e345f2e?slim=true" data-test-id="beehiiv-embed" frameborder="0" scrolling="no" style="margin:0;border-radius:0px;button-colour:#61BD73;background-color:transparent;width:100%;important:"></iframe></div></div></div></div></div>]]></content>
        <author>
            <name>Zain Hasan</name>
            <uri>https://github.com/zainhas</uri>
        </author>
    </entry>
    <entry>
        <title type="html"><![CDATA[Lost in the Middle: How Language Models Use Long Contexts]]></title>
        <id>https://weaviate.io/papers/paper-2</id>
        <link href="https://weaviate.io/papers/paper-2"/>
        <updated>2023-11-19T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[Why do large language models attend better over the beginning and end of thier context?]]></summary>
        <content type="html"><![CDATA[<p><img loading="lazy" alt="A preview of the paper" src="/assets/images/hero-e40b5a3da1ceab43bdc65eddad88807f.jpeg" width="1118" height="1012" class="img_ev3q"></p><p>Why do large language models pay more attention to and reason better over the beginning and end of what you tell them in prompts?🤔</p><p>Nelson Liu and Percy Liang's group at Stanford recently published a <a href="https://arxiv.org/abs/2307.03172" target="_blank" rel="noopener noreferrer">paper</a> that discovered this "lost in the middle" effect. </p><p>Greg Kamradt also ran great experiments and posted about how this very same pattern of underperformance exists in the new GPT-4 128K models from OpenAI. </p><p>The point of the paper was to establish "how well LLMs use longer context" and ran experiments conducting QnA and key-value retrieval tasks on models from Mosaic, Anthropic and OpenAI and varied input context size and the position of the relevant information in the context.</p><p>The main discovery was that attention followed a U-shaped pattern where more importance was given to the beginning and end of the context window as opposed to the middle portion.</p><p>This is such a great paper with a wealth of knowledge gems💎- here are some details and reasons why this happens:</p><ol><li><p>Due to Model Architecture: LLMs are transformers that scale poorly to long sequences (O(d^2)). As a result, language models are typically trained with relatively small context windows and thus perform better on these.</p></li><li><p>Tasks during supervised instruction-tuning are commonly placed at the beginning of the input context, which might lead these LLMs to place more weight on the start of the input context.</p></li><li><p>Encoder-decoder models perform better than decoder-only models, by making better use of their context windows because their bidirectional encoder allows processing each document in the context of future documents.</p></li><li><p>You can improve the performance of decoder-only models(can only attend to prior tokens at each timestep) by placing the query before and after the data, enabling query-aware contextualization of documents.</p></li><li><p>Based on the key-value retrieval experiments, let alone attending less to the middle, many models struggle to simply retrieve matching tokens that occur in the middle of their input context.</p></li><li><p>Even base language models (i.e., without instruction fine-tuning) show a U-shaped performance curve.</p></li><li><p>For open-domain QnA tasks, where none or many of the top k documents may contain the answer, models fail to effectively use more than 20 retrieved documents.</p></li></ol><p><a class="btn_VbJ1 btnMain_ywTD" href="https://arxiv.org/abs/2307.03172" download="">🔗 arXiv Link</a></p><p><a class="btn_VbJ1 btnMain_ywTD" href="https://arxiv.org/pdf/2307.03172" download="">📜 Download paper</a></p><h2 class="anchor anchorWithStickyNavbar_LWe7" id="ready-to-start-building">Ready to start building?<a href="#ready-to-start-building" class="hash-link" aria-label="Direct link to Ready to start building?" title="Direct link to Ready to start building?">​</a></h2><p>Check out the <a href="https://docs.weaviate.io/weaviate/quickstart" target="_blank" rel="noopener noreferrer">Quickstart tutorial</a>, or build amazing apps with a free trial of <a href="https://console.weaviate.cloud/" target="_blank" rel="noopener noreferrer">Weaviate Cloud (WCD)</a>.</p><div class="communityWrapper_ZpuS"><div class="container_sUl4"><div class="wrapper_FyvH"><div class="rightSide_UqS8"><div class="socialBox_W1XR"><a href="https://github.com/weaviate/weaviate" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="github_DEOB"></div><p class="text_g9NY">GitHub</p></a></div><div class="socialBox_W1XR"><a href="https://forum.weaviate.io/" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="forum_pUq6"></div><p class="text_g9NY">Forum</p></a></div><div class="socialBox_W1XR"><a href="https://twitter.com/weaviate_io" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="twitter_ewvw"></div><p class="text_g9NY">X (Twitter)</p></a></div></div><div class="leftSide_WlMC"><h2 class="communityHeader_jLni">Don't want to miss another blog post?</h2><span class="rightText_noBq"><p>Sign up for our bi-weekly newsletter to stay updated!</p> <br>By submitting, I agree to the<!-- --> <a href="/service">Terms of Service </a>and<!-- --> <a href="/privacy">Privacy Policy</a>.</span><div class="communityForm_pedn"><iframe src="https://embeds.beehiiv.com/15b21ebd-decd-433b-ada8-2d405e345f2e?slim=true" data-test-id="beehiiv-embed" frameborder="0" scrolling="no" style="margin:0;border-radius:0px;button-colour:#61BD73;background-color:transparent;width:100%;important:"></iframe></div></div></div></div></div>]]></content>
        <author>
            <name>Zain Hasan</name>
            <uri>https://github.com/zainhas</uri>
        </author>
    </entry>
    <entry>
        <title type="html"><![CDATA[Retrieval meets Long Context Large Language Models]]></title>
        <id>https://weaviate.io/papers/paper-1</id>
        <link href="https://weaviate.io/papers/paper-1"/>
        <updated>2023-11-18T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[Comparing LLM performance using retreival vs longer context lengths.]]></summary>
        <content type="html"><![CDATA[<p><img loading="lazy" alt="A preview of the paper" src="/assets/images/hero-d127328c2372ffae7efb00f688c17ec4.jpeg" width="1200" height="493" class="img_ev3q"></p><p>Fine-tuned larger language models and longer context lengths eliminate the need for retrieval from external knowledge/vector databases, right? ... Not quite!!</p><p>NVIDIA asked the same question last month! </p><p>They published a <a href="https://arxiv.org/abs/2310.03025" target="_blank" rel="noopener noreferrer">new paper</a> examining how well very large finetuned LLMs with longer context lengths compare to shorter context length RAG supported LLMs. They explore two main questions:</p><ol><li>Retrieval-augmentation versus long context window, which one is better for downstream tasks?</li><li>Can both methods be combined to get the best of both worlds?</li></ol><p>In short, they found:</p><ol><li>RAG outperforms long context alone.</li><li>Yes they perform better together. RAG works better with longer context than with shorter context.</li></ol><p>The main finding presented in the paper was that "retrieval can significantly improve the performance of LLMs regardless of their extended context window sizes". </p><p>Some more details: </p><ol><li><p>RAG is more important than context windows: a LLM with 4K context window using simple retrieval-augmentation at generation can achieve comparable performance to finetuned LLM with 16K context window.</p></li><li><p>RAG is also faster: Augmenting generation with retrieval not only performs better by requiring significantly less computation and is much faster at generation.</p></li><li><p>RAG works even better as parameter count increases because smaller 6-7B LLMs have relatively worse zero-shot capability to incorporate the retrieved chunked context: Perhaps counter intuitively the benefits of RAG on performance are more pronounced the larger the language model gets, experiments were done for LLMs with 43B and 70B params.</p></li><li><p>RAG works even better as context length increases: Retrieval-augmented long context LLM (e.g., 16K and 32K) can obtain better results than retrieval-augmented 4K context LLM, even when fed with the same top 5 chunks of evidence.</p></li><li><p>Retrieval-augmented LLaMA2-70B with 32K context window, outperforms GPT-3.5-turbo-16k and Davinci003 and non-retrieval LLaMA2-70B-32k baseline for question answering and query-based summarization.</p></li></ol><p><a class="btn_VbJ1 btnMain_ywTD" href="https://arxiv.org/abs/2310.03025" download="">🔗 arXiv Link</a></p><p><a class="btn_VbJ1 btnMain_ywTD" href="https://arxiv.org/pdf/2310.03025" download="">📜 Download paper</a></p><h2 class="anchor anchorWithStickyNavbar_LWe7" id="ready-to-start-building">Ready to start building?<a href="#ready-to-start-building" class="hash-link" aria-label="Direct link to Ready to start building?" title="Direct link to Ready to start building?">​</a></h2><p>Check out the <a href="https://docs.weaviate.io/weaviate/quickstart" target="_blank" rel="noopener noreferrer">Quickstart tutorial</a>, or build amazing apps with a free trial of <a href="https://console.weaviate.cloud/" target="_blank" rel="noopener noreferrer">Weaviate Cloud (WCD)</a>.</p><div class="communityWrapper_ZpuS"><div class="container_sUl4"><div class="wrapper_FyvH"><div class="rightSide_UqS8"><div class="socialBox_W1XR"><a href="https://github.com/weaviate/weaviate" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="github_DEOB"></div><p class="text_g9NY">GitHub</p></a></div><div class="socialBox_W1XR"><a href="https://forum.weaviate.io/" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="forum_pUq6"></div><p class="text_g9NY">Forum</p></a></div><div class="socialBox_W1XR"><a href="https://twitter.com/weaviate_io" target="_blank" rel="noopener noreferrer" class="mobileSocialBox_UAY5"><div class="twitter_ewvw"></div><p class="text_g9NY">X (Twitter)</p></a></div></div><div class="leftSide_WlMC"><h2 class="communityHeader_jLni">Don't want to miss another blog post?</h2><span class="rightText_noBq"><p>Sign up for our bi-weekly newsletter to stay updated!</p> <br>By submitting, I agree to the<!-- --> <a href="/service">Terms of Service </a>and<!-- --> <a href="/privacy">Privacy Policy</a>.</span><div class="communityForm_pedn"><iframe src="https://embeds.beehiiv.com/15b21ebd-decd-433b-ada8-2d405e345f2e?slim=true" data-test-id="beehiiv-embed" frameborder="0" scrolling="no" style="margin:0;border-radius:0px;button-colour:#61BD73;background-color:transparent;width:100%;important:"></iframe></div></div></div></div></div>]]></content>
        <author>
            <name>Zain Hasan</name>
            <uri>https://github.com/zainhas</uri>
        </author>
    </entry>
</feed>