Span Categorization
Extracting longer phrases and nested expressions from documents is a common task in applied Natural Language Processing. Prodigy lets you label training data for span categorization or improve an existing model’s accuracy with ease.
Fast and flexible annotation
Prodigy’s web-based annotation app has been carefully designed to be as efficient as possible. The manual interface lets you label spans by highlighting words text by hand. Your annotations snap to token boundaries, and you can mark single-word spans by double-clicking.
patterns.jsonl{"pattern": "septic shock", "label": "CONDITION"}
{"pattern": [{"like_num": true}, {"orth": "-"}, {"lower": "day"}, {"lower": "mortality"}], "label": "EFFECT"}
Bootstrap with powerful patterns
Prodigy is a fully scriptable annotation tool, letting you automate as much as possible with custom rule-based logic. You don’t want to waste time labeling every instance of common phrases by hand. Instead, give Prodigy rules or a list of examples, review the spans in context and annotate the exceptions.
Immediately train spancat models
Once you've got your first annotations you can immediately have Prodigy train spaCy models for span categorization. You can point the train
to the datasets of interest and immediately get a machine learning pipeline for text classification. You can even train a model that handles multiple tasks and choose to override the settings from the command line.
From here, you can re-use the model to make annotation easier via spans.correct
to pre-highlight annotations for you.
Example
prodigytrain./spancat-model--spancat dataset_a,dataset_b--training.max-steps 1000
Example
prodigyspans.correctspans-dataset./spancat-modelexamples.jsonl--label condition,effect