Prompt Engineering · Prodigy · An annotation tool for AI, Machine Learning & NLP

Prompts as an A/B test

When you're engineering prompts you're going to get noisy results. In order to determine which prompt is best, you'll need a quantifiable method to compare them. Prodigy offers tools and annotation interfaces for this task and even offers pre-made recipes that integrate with OpenAI.

Example
prodigyab.openai.promptshaikuinput.jsonltemplates/ab/input.jinja2templates/ab/prompt1.jinja2templates/ab/prompt2.jinja2-F ./recipes/ab.py

Example of A/B prompt workflow

This live demo requires JavaScript to be enabled.

Compare two prompts

The ab.openai.promptsrecipe allows you to quickly compare the quality of outputs from two OpenAI prompts in a quantifiable and blind way. Given these two prompts and the following input data, you can get this interface, with candidates automatically generated by OpenAI.

prompt1.jinja2Write a haiku about {{topic}}.

prompt2.jinja2Write a hilarious haiku about {{topic}}.

input.jsonl{"id": 0, "prompt_args": {"topic": "Python"}}
{"id": 0, "prompt_args": {"topic": "star wars"}}
{"id": 0, "prompt_args": {"topic": "maths"}}

Tournament of prompts

The ab.openai.tournamentrecipe allows you to quickly create a prompt tournament from any number of OpenAI prompts. The prompts will be competing and as you annotate the winner will gain a higher ranking.

You're even able to re-use the built-in tournament classes that Prodigy provides in your own custom recipes.

Example
prodigyab.openai.tournamenthaiku-tournamentinput.jsonltitle.jinja2prompt_folder====================== Current winner: prompt2.jinja2 ======================desc                                      value             
P(prompt2.jinja2 > prompt3.jinja2)        0.667322
P(prompt2.jinja2 > prompt1.jinja2)        0.753021
P(prompt2.jinja2 > prompt4.jinja2)        0.952123

...after more annotations ...====================== Current winner: prompt2.jinja2 ======================desc                                      value             
P(prompt2.jinja2 > prompt3.jinja2)        0.96732
P(prompt2.jinja2 > prompt1.jinja2)        0.99302
P(prompt2.jinja2 > prompt4.jinja2)        0.99945