The first wave of major generative AI tools were largely trained on publicly available data scraped from the internet. However, as the hunt for additional data sources intensifies, sources are increasingly restricting access and pushing for licensing agreements.
The Dataset Providers Alliance (DPA), a trade group formed this summer, aims to standardize and make the AI industry more fair. It has released a position paper outlining its stances on major AI-related issues, advocating for an opt-in system where data can only be used after explicit consent from creators and rights holders.
The DPA sees the opt-in route as more ethical and pragmatic, as it ensures artists and creators are on board and reduces the risk of legal issues. Experts also support the opt-in approach as fundamentally fair to creators.
While the DPA's efforts to source data ethically are admirable, some experts raise concerns about the opt-in standard's potential to limit the availability of data required for modern AI models.
The DPA endorses the use of synthetic data generated by AI for training purposes, arguing that it will constitute the majority of training data in the near future. However, it advocates for proper licensing of the pre-training information used to create synthetic data, transparency in its creation process, and regular evaluation to mitigate biases and ethical issues.
While the DPA's efforts to establish standards for ethical data licensing are commendable, the real challenge lies in getting major AI companies to adopt these practices.
Ask anything...