From Few-Shot to Guidelines: A Smarter Way to Prompt AI
A framework that replaces few-shot prompts with task-specific instructions, improving AI performance. It uses feedback to build organized guidelines, leading to better results in different tasks.
In the world of large language models (LLMs), one of the biggest challenges is how to get the best possible response from the AI.
Traditionally, "shot" methods have been the go-to, where a model is given example questions and answers, like in few-shot learning. This approach prompts the AI to mimic the reasoning steps in the examples. However, this method isn’t without its downsides—it’s hard to choose the right examples, and sometimes, even the best examples can miss crucial task-specific knowledge.
Enter the "Guideline" method, a promising alternative that uses structured guidance rather than examples. This method explicitly instructs the model on how to think through a problem by providing it with a set of clear, task-specific rules.
But the real breakthrough comes from a new framework called FGT (Feedback, Guideline, and Tree-gather), which automates the creation of these guidelines directly from the data.
The FGT Framework: How It Works
It consists of 3 key agents:
This framework also encourages the model to show its work—literally. By prompting the AI to provide a thought process instead of just the answer, the system ensures that the reasoning aligns with the guidelines, leading to more accurate and reliable results.
Performance Gains Across Tasks
To test this approach, the researchers evaluated it on the Big-Bench Hard (BBH) dataset, which includes tasks like math calculations, logical reasoning, and context understanding. The results were impressive.
The FGT framework not only outperformed traditional few-shot methods but also did better than more advanced techniques like Chain-of-Thought (CoT) reasoning.
For logical reasoning tasks, FGT achieved a significant improvement, with accuracy jumping to 93.9%, far outstripping both few-shot and many-shot methods. Even in tasks requiring deep context understanding, where many-shot usually excels, FGT was competitive, proving that well-constructed guidelines can be just as effective as providing a large number of examples.
What makes guidelines so powerful is their ability to generalize. This allows the model to adapt more flexibly to new tasks without needing countless examples.