Global visual content provider Getty Images recently announced the release of an open sample dataset on the Hugging Face platform, aiming to become a trusted data partner for AI training. This dataset includes selected images from its creative library, aiming to address common issues of data quality and legal compliance in AI/ML model training.
The dataset contains 3750 high-quality images covering 15 categories, such as abstract and background, architecture, business, concepts, education, healthcare, icons, industry, nature, illustrations, and travel. These images are sourced from Getty Images' fully-owned creative library, ensuring their commercial safety and legal compliance, avoiding potential legal disputes for developers in subsequent use.
This dataset is specifically optimized for machine learning training, providing high-resolution images and rich structured metadata, removing inappropriate content such as adult content, low-resolution images, and images with missing metadata. This aims to alleviate the burden on developers in data cleaning and enrichment, improving the efficiency and quality of AI model training.
However, the use of this dataset still requires compliance with certain conditions, including the prohibition of redistributing the dataset, developing products/services that replicate or generate content from the dataset, creating products/services that directly compete with Getty Images, and using the dataset in any way that violates laws and regulations.
Getty Images states that this is to demonstrate its ability to provide comprehensive, high-quality, and legally compliant content for AI model training, and hopes to establish closer connections with the developer community to enhance its reputation and influence in the field of AI training data. In the future, the company plans to provide larger-scale authorized data repositories based on the needs of developers and continue to explore new models for sharing revenue with creators.
This release of the dataset is another important move by Getty Images in the AI field, aiming to promote the healthy development of AI technology by providing high-quality data support.