Summary of A New Group Is Trying to Make AI Data Licensing Ethical

  • wired.com
  • Article
  • Summarized Content

    Rise of Generative AI and the Need for Ethical Data Sourcing

    The first wave of major generative AI tools were largely trained on publicly available data scraped from the internet. However, as the hunt for additional data sources intensifies, sources are increasingly restricting access and pushing for licensing agreements.

    • Generative AI companies are now facing challenges in obtaining training data.
    • Data sources are restricting access and demanding licensing agreements.
    • New licensing startups have emerged to facilitate the sourcing of data for generative AI models.

    The Dataset Providers Alliance and Its Position on AI Data Licensing

    The Dataset Providers Alliance (DPA), a trade group formed this summer, aims to standardize and make the AI industry more fair. It has released a position paper outlining its stances on major AI-related issues, advocating for an opt-in system where data can only be used after explicit consent from creators and rights holders.

    • The DPA comprises seven AI licensing companies, including Rightsify, Pixta, and Calliope Networks.
    • It advocates for an opt-in system, where data can only be used with explicit consent from creators and rights holders.
    • This approach contrasts with the practices of major AI companies, which often use opt-out systems or offer no opt-outs.

    The Advantages of an Opt-In System for Generative AI Training Data

    The DPA sees the opt-in route as more ethical and pragmatic, as it ensures artists and creators are on board and reduces the risk of legal issues. Experts also support the opt-in approach as fundamentally fair to creators.

    • Opt-in ensures creators are aware and consenting to their data being used.
    • It is seen as a more ethical approach than opt-out systems.
    • Opt-in reduces the risk of legal issues and lawsuits for AI companies.

    Challenges and Concerns Regarding Opt-In Data Licensing

    While the DPA's efforts to source data ethically are admirable, some experts raise concerns about the opt-in standard's potential to limit the availability of data required for modern AI models.

    • An opt-in regime could lead to data scarcity or high costs for AI companies.
    • Only large tech companies may be able to afford licensing all the required data.
    • Standardizing compensation structures could help smooth the road for mainstream adoption.

    The Role of Synthetic Data in Generative AI Training

    The DPA endorses the use of synthetic data generated by AI for training purposes, arguing that it will constitute the majority of training data in the near future. However, it advocates for proper licensing of the pre-training information used to create synthetic data, transparency in its creation process, and regular evaluation to mitigate biases and ethical issues.

    • Synthetic data is expected to be a significant source of training data for generative AI.
    • The DPA supports proper licensing and transparency in synthetic data creation.
    • Regular evaluation is recommended to address biases and ethical concerns.

    The Need for Industry Adoption of Ethical Data Licensing Standards

    While the DPA's efforts to establish standards for ethical data licensing are commendable, the real challenge lies in getting major AI companies to adopt these practices.

    • Not enough AI companies are currently adopting ethical data licensing standards.
    • The AI industry needs to move away from the "Wild West" approach to data sourcing.
    • The DPA's existence demonstrates a shift towards more responsible practices in the rapidly evolving generative AI landscape.

    Ask anything...

    Sign Up Free to ask questions about anything you want to learn.