LAION, the German research organization that created the dataset used to train Stable Diffusion, has released a cleaned version of its dataset, named Re-LAION-5B. This new dataset has been thoroughly cleaned to remove links to suspected child sexual abuse material (CSAM) and other inappropriate content, making it safer for research and development purposes.
The release of the cleaned dataset comes after an investigation in December 2023 by the Stanford Internet Observatory, which found that the original LAION-5B dataset contained links to illegal images scraped from social media and adult websites. The investigation also highlighted the presence of other inappropriate content, such as pornographic imagery and racist slurs, raising concerns about the ethical implications of training AI models on such data.
The release of Re-LAION-5B demonstrates LAION's commitment to improving the ethical use of AI. By removing illegal content and ensuring a safer dataset, LAION aims to promote responsible AI development and research.
The availability of a cleaned and ethically sound dataset will have a significant impact on the development of AI image generation models, especially Stable Diffusion. With a dataset free from harmful content, researchers can focus on exploring the potential of AI for creative expression and other beneficial applications.
LAION's efforts in providing large-scale datasets for AI research have played a critical role in the advancement of image generation and other AI technologies. The organization's commitment to ethical AI development and data cleaning practices sets a positive example for the broader AI community.
Ask anything...