The author, whose profession involves research and analysis, is intrigued by OpenAI's Deep Research, believing it could significantly reduce manual labor involved in data compilation and analysis. The author's usual workflow includes extensive manual data searching, collation, charting, and refining the presentation to clearly explain the issue.
Instead of starting with a new problem, the author opts to test Deep Research on the familiar territory of the smartphone market, leveraging OpenAI's sample report. This allows for a more focused evaluation of the AI’s capabilities. The author views this as a more efficient way to assess the tool's performance before investing significant time and credits.
The initial presentation of the data from Deep Research is visually appealing, suggesting hours of work are saved. However, a critical review of the source data reveals a major problem with the report's reliability.
The article highlights the inconsistent data sources used by the Deep Research model. The reliance on Statista and Statcounter raises concerns regarding the accuracy and reliability of the generated reports. The model's inability to correctly interpret or utilize data from more reliable sources like Kantar Worldpanel presents a significant limitation.
The author emphasizes that LLMs are not databases. They are not designed for precise, deterministic data retrieval. The AI model's inherent probabilistic nature contrasts with the need for accurate, deterministic answers in market analysis. The article argues that the core problem isn't simply an error rate; it's a fundamental mismatch between the model's capabilities and the nature of the task.
The ambiguity in the question ("adoption") itself is identified as a key factor. The term "adoption" can refer to the installed base, usage share, or spending on apps – all distinct metrics. The AI model's failure to distinguish between these subtle yet crucial differences leads to inaccuracies.
The author discusses the larger implications of these findings in relation to the potential of AI and the development of AI-driven products. The article questions whether the error rate will decrease significantly in future models, and what implications that would have on the design of future products. Should products be built to account for AI inaccuracies, or will we eventually reach a point where AI models provide entirely accurate information?
The article concludes with a nuanced perspective on the utility of AI models in market research. While acknowledging the potential time savings offered by AI tools like OpenAI's Deep Research, the author emphasizes the need for critical evaluation and human oversight. The inherent limitations of LLMs in ensuring perfectly accurate data retrieval remain a significant challenge that needs to be addressed before AI can completely automate complex research tasks.
The author uses OpenAI's own example report to demonstrate the limitations of Deep Research. The analysis reveals inconsistencies, questionable sources, and an overall inability of the LLM to provide completely reliable data. The research questions the efficacy of current large language models in performing accurate, detailed market analysis.
Ask anything...