CleanML is an MLOps tool for data centric AI. CleanML has been built to help a ML team to manage the lifecycle of their Named Entity Recognition (NER) projects. In order to achieve this in an efficient way, with CleanML data can be curated and annotated, edited and added from a single platform. Insights can also be gained through annotated data which can be easily exported in multiple data formats. The tool was developed by AI startup Astutic AI, a member of the Intel® Liftoff program.
The Challenge
Managing the complete lifecycle of complex and huge Named Entity Recognition (NER) based projects can be really challenging. The process includes data being curated and annotated properly, and then making the necessary insights about the data. With current software tools and platforms, this proves to be a complex and almost unmanageable task in many cases. This is where AI-based software tools come into play as they automatically put together all related tasks and jobs which have to be completed.
The Solution
CleanML is a SaaS application designed to help enterprise machine learning teams find the best models efficiently. It focuses on improving Named Entity Recognition (NER), a key part of Natural Language Processing (NLP). CleanML streamlines the process of analyzing and processing natural language by bringing together essential machine learning workflows into one unified platform.
With CleanML project managers, data scientist, annotators and developers can complete the following tasks:
- Model experimentation, training, comparison and lifecycle management
- Data quality assessment and rectification
- Data segmentation for training and evaluation
- Advanced data annotation and re-tagging
CleanML allows teams to identify and correct common data and annotation inaccuracies using data-centric analytics, while managing and experimenting with models to fine-tune performance at the same time. You’re also able to deep dive into model evaluation records to better understand model behavior and track and compare training iterations, assessing how changes in data or code impact overall model metrics.
Who will benefit from using CleanML?
- Project managers can create multiple projects and track their progress independently.
- Data scientists are able to gain more insights about training & test data, distribution of annotated entities, and decide how to curate more data for better accuracy.
- Annotators can speed up and improve the annotation process with CleanML's helpful features, all from a single window.
- Software developers are able to experiment with multiple algorithms using different libraries irrespective of them being on GPU, CPU, on-prem or cloud.
With the help of CleanML's data-centric dashboard data and data-classification issues can be identified and fixed. Moreover drill-down analytics on the dataset can be performed
What are the key features?
Data-centric dashboard: Find and resolve data and classification problems, while exploring detailed analytics. Discover insights about how data is grouped, along with missed or unusual classifications.
Advanced workbench: Workbench offers helpful features like text annotation, renaming entities across records, in-place content editing, tag and auto-labeling suggestions, access to previous classifications, and the option to add a custom dictionary.
In-built data versioning: CleanML automatically tracks data versions, making it easier to reproduce training results. It also allows you to compare a model’s performance across different versions, algorithms, and even with models currently in production.
Train, test, compare, repeat: Train and compare models with different algorithms on the same dataset. CleanML tracks versions of training and data, making detailed comparisons easier and boosting productivity.
Auto labeling suggestions: Receive automated labeling suggestions from trained algorithms to streamline and accelerate data annotation.
CleanML offers Advanced Workbenches which provide useful features including annotating text, entity renaming across records, editing content in-place, and more.
Train and compare models of different algorithms with the same dataset. CleanML versions all the training and helps compare between versions of training and data.
"It was an absolute honor to participate in the Intel® Liftoff AI Hackathon. The energy and innovation were contagious, and I'm incredibly grateful for the opportunity to collaborate with such brilliant minds. We were able to showcase using LLMs to improve the accuracy of NER annotation on Intel Hardware. The straightforward feature/API set of Intel hardware coupled with the performance capabilities of the infrastructure were quite a joy to hack on”, says Vimal Menon, Founder of Astutic AI.
Want to advance your startup and gain valuable insights? Join the Intel® Liftoff program to connect with experts, access essential resources, and push your ideas forward. Find out more here: https://www.intel.com/content/www/us/en/developer/tools/oneapi/liftoff.html
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.