turbopuffer: Fast Search on Object Storage

Summary of turbopuffer: fast search on object storage

turbopuffer.com

Article

Summarized Content

Turbopuffer: A New Search Engine on Object Storage

Turbopuffer is a novel approach to building search engines that leverages object storage, like Amazon S3 or Google Cloud Storage, instead of the traditional expensive and resource-intensive methods. This new architecture addresses the high costs and operational complexities associated with traditional search engines, offering a cost-effective and scalable solution for large-scale vector search and semantic search.

Traditional search engines rely on in-memory storage, leading to exorbitant costs, especially for large datasets.
Turbopuffer's innovative approach leverages object storage, significantly reducing the cost per gigabyte of data stored.
The combination of object storage and smart caching allows turbopuffer to scale effortlessly to billions of vectors and millions of tenants/namespaces.

The Inspiration for turbopuffer

The idea for turbopuffer arose from the challenges faced by Readwise, a read-it-later app that wanted to implement article recommendations and semantic search using vector embeddings.

Readwise's existing relational database solution was expensive and inadequate for large-scale vector search.
The high cost of vector search on relational databases made implementing these features unfeasible.
This experience led to the realization that a new approach to search was necessary, one that was cost-effective and efficient.

Cost Efficiency of Object Storage

Turbopuffer utilizes object storage to significantly reduce the cost of storing data, compared to the traditional approaches used by existing search engines.

Traditional in-memory storage solutions can cost upwards of $2 per GB, while object storage offers costs as low as $0.02 per GB.
Turbopuffer's architecture also includes SSD caching for frequently accessed data, striking a balance between cost and performance.
This approach results in a 100x cost reduction for cold storage and a 6-20x cost reduction for warm storage, making it a highly cost-effective solution for large-scale search.

The Five Common Databases and their Limitations

The article highlights the five common specialized databases in modern infrastructure stacks, each designed for specific use cases:

Category	Tech	Read Latency	Write Latency	Storage	Use-Cases
Caching	Redis, Memcached	<500µs	<500µs	Memory	Cost/performance
Relational	MySQL, Postgres	<1ms	<1ms	Memory + Replicated SSDs	Source of truth, transactions, CRUD
Search	ElasticSearch, Vector DBs	<100ms	<1s	Memory + Replicated SSDs	Recommendations, search, feeds, RAG
Warehouse	BigQuery, Snowflake	>1s	>1s	Object Storage	Reports, data analysis
Streaming	Kafka, Warpstream	<100ms	<100ms	Replicated HDDs or Object Storage	Logging, moving data between systems, real-time analytics

The article argues that the storage architecture of current-generation search engines is not optimal for performance and cost efficiency. They typically rely on replicated SSDs, which is overkill for search use cases.

Object Storage-Native Database

Turbopuffer's approach is to build a database specifically for object storage, taking full advantage of its cost efficiency and scalability. This "object storage-native" approach offers several benefits:

Cost-effectiveness: By using object storage as the primary storage medium, turbopuffer drastically reduces storage costs compared to traditional search engines.
Unparalleled reliability: Object storage offers high durability and reliability, ensuring data integrity and availability.
Virtually unlimited scalability: Object storage can scale to accommodate massive amounts of data, making it suitable for growing businesses and complex search scenarios.
Optimized for search: turbopuffer's architecture balances cost and performance with SSD caching for frequently accessed data, ensuring fast response times for warm queries.
High Availability: turbopuffer utilizes multi-tenancy and sharding, ensuring high availability and resilience against node failures.

Customer Success Stories

The article highlights several successful implementations of turbopuffer by leading companies, demonstrating its effectiveness in real-world scenarios:

Cursor: The AI Code Editor leverages turbopuffer to index billions of vectors in millions of codebases. Cursor experienced a 10x reduction in costs and improved latency after transitioning to turbopuffer.
Suno: Suno's radio feature uses turbopuffer for its search capabilities, benefiting from its scalability and cost efficiency.
Dot's memory (New Computer): turbopuffer powers the memory function for Dot, taking advantage of its advanced vector search capabilities.
Shapes: Shapes uses turbopuffer for its search and information retrieval needs, appreciating its cost-effective and scalable solution.

turbopuffer: The Future of Search

turbopuffer represents a paradigm shift in search engine architecture, offering a cost-effective, scalable, and reliable solution for the modern era. By leveraging object storage and smart caching, it allows businesses to unlock the full potential of search, enabling them to implement complex and data-intensive search functionalities without the constraints of high costs and resource limitations. As the adoption of vector databases and semantic search continues to grow, turbopuffer is poised to play a pivotal role in enabling businesses to search everything they have.

View Original Content

Discover content by category

.NET

.NET Porting

.com Domain

.gov Websites

.tech Domains

1+1=11

1-Man Business Model

10Xer Club Podcast

18th Century

1984 Anti-Sikh Riots

View all →

Ask anything...