Skip to content

Data

cleanlab.ai, unstructured.io

Data Creation

Generative AI is a splendid use-case for creating data that can be used to train or refine new models. Here are some tools that allow for creation of data for down-stream purposes, always being sure to be consistent with dual-use concerns.

GitHub Repo stars AutoLabel A nice pythonic system for generating semantic labels repeatedly for use in downstream datasets

GitHub Repo stars Kor For extracting structured data using LLMs.