Large Language Models · Prodigy · An annotation tool for AI, Machine Learning & NLP

Named Entity Recognition

You can use ner.openai.correctto annotate examples with live suggestions from OpenAI. This recipe marks entity predictions obtained from a large language model and allows you to accept them as correct, or to manually curate them. Alternatively you can also choose to fetch examples ahead of time. The ner.openai.fetch recipe gives you the same suggestions from OpenAI but is able to download a large batch of examples upfront. These examples can then be annotated and corrected via the ner.manual recipe.

Both recipes can be used to detect entities that spaCy models aren't trained on and you're free to adapt the recipes. You can provide examples to have OpenAI do few-shot learning, change the hyperparameters from the command line or choose to send your own custom prompts.

Example
prodigyner.openai.fetchexamples.jsonlopenai-out.jsonldish,ingredient,equipment-F ./recipes/ner.py

Example of entities OpenAI can pre-highlight

This live demo requires JavaScript to be enabled.

Example
prodigytextcat.openai.fetchexamples.jsonlopenai-out.jsonlrecipe,feedback,question-F ./recipes/textcat.py

Example response from OpenAI with reasoning

This live demo requires JavaScript to be enabled.

Text Classification

The recipe textcat.openai.correctlets you classify texts faster with the help of OpenAI. It also provides a reason why a particular label was chosen. Just like the named entity recipes, you can also choose to fetch examples upfront instead via the textcat.openai.fetch recipe.

By fetching the examples upfront, you'll also be able to filter based on the OpenAI predictions. This can be incredibly useful when you're dealing with an imbalanced classification task with a rare label. Instead of going through all the examples manually you can only check the examples in which OpenAI predicts the label of interest.

You can also provide extra context to the prompt by adding examples to steer the large language model. Alternatively, you may also choose to customise the prompt completely by writing your own jinja2 templates.

Generate terminology lists from scratch

There are many ways to use a large language model with zero-shot capabilities. You make predictions to pre-annotate examples, but you can also have it bootstrap terminology lists via the terms.openai.fetch recipe. These terms can be reviewed so they can later be used for named entity recognition, span categorization or weak-supervision.

Example
prodigyterms.openai.fetch"skateboard tricks"skateboard-tricks.jsonl-F ./recipes/terms.py

skateboard-tricks.jsonl{"text": "kickflip", "meta": {"openai_query": "skateboard tricks"}}
{"text": "nose manual", "meta": {"openai_query": "skateboard tricks"}}
{"text": "heelside flip", "meta": {"openai_query":"skateboard tricks"}}
{"text":"ollie", "meta": {"openai_query": "skateboard tricks"}}
{"text": "frontside boardslide", "meta": {"openai_query": "skateboard tricks"}}
{"text": "5050 Grind", "meta": {"openai_query": "skateboard tricks"}}

View the documentation