Radically efficient data annotation tool
Fully scriptable.
Made for machine learning
and developers.
pip install ./prodigy.whl
Successfully installed prodigy
prodigy ner.manual reviews_ner en_core_web_sm ./data.jsonl --label PRODUCT,PERSON,ORG
✨ Starting the web server on port 8080...
Open the app in your browser and start annotating!
Train a new AI model in hours
Prodigy is a scriptable annotation tool so efficient that data scientists can do the annotation themselves, enabling a new level of rapid iteration.
Today’s transfer learning technologies mean you can train production-quality models with very few examples. With Prodigy you can take full advantage of modern machine learning by adopting a more agile approach to data collection. You'll move faster, be more independent and ship far more successful projects.
How it worksThe missing piece in your data science workflow
Prodigy brings together state-of-the-art insights from machine learning and user experience. With its continuous active learning system, you're only asked to annotate examples the model does not already know the answer to. The web application is powerful, extensible and follows modern UX principles. The secret is very simple: it's designed to help you focus on one decision at a time and keep you clicking – like Tinder for data.
Everyone knows data scientists should spend more time looking at their data. When good habits are hard to form, the trick is to remove the friction. Prodigy makes the right thing easy, encouraging you to spend more time understanding your problem and interpreting your results.
Try the demoTry out new ideas quickly
Annotation is usually the part where projects stall. Instead of having an idea and trying it out, you start scheduling meetings, writing specifications and dealing with quality control. With Prodigy, you can have an idea over breakfast and get your first results by lunch. Once the model is trained, you can export it as a versioned Python package, giving you a smooth path from prototype to production.
Read moreFully scriptable and extensible
Prodigy is fully scriptable, and slots neatly into the rest of your Python-based data science workflow. As the makers of spaCy, a popular library for Natural Language Processing, we understand how to make tools programmers love. The simple secret is this: programmers want to be able to program. Good developer tools need to let you in, not lock you out. That's why Prodigy comes with a rich Python API, elegant command-line integration, and a super productive Jupyter extension. Using custom recipe scripts, you can adapt Prodigy to read and write data however you like, and plug in custom models using any of your favourite frameworks.
recipe.pyimport prodigy
from prodigy.components.loaders import JSONL
@prodigy.recipe("custom")
def custom_recipe(dataset, source):
return {
"dataset": dataset,
"stream": JSONL(source),
"view_id": "classification"
}
Command-line usage
prodigycustommy_dataset./data.jsonl-F recipe.py