Service

GitHub - deepseek-ai/smallpond: A lightweight data processing framework built on DuckDB and 3FS.

deepseek-ai

2025.03.08

·GitHub·by Anonymous

#Data Processing#DuckDB#3FS#Framework#Python

Key Points

1Smallpond is a lightweight, high-performance data processing framework built on DuckDB and 3FS, designed for PB-scale datasets.
2It features easy operations with no long-running services, allowing users to process data, including repartitioning and SQL queries, with a simple API.
3The framework demonstrated exceptional performance, sorting 110.5 TiB of data in just over 30 minutes on the GraySort benchmark, achieving an average throughput of 3.66 TiB/min.

repartition(n_partitions, hash_by="column_name")

Service

deepseek-ai

2025.03.08

·GitHub·by Anonymous

#Data Processing#DuckDB#3FS#Framework#Python

1Smallpond is a lightweight, high-performance data processing framework built on DuckDB and 3FS, designed for PB-scale datasets.
2It features easy operations with no long-running services, allowing users to process data, including repartitioning and SQL queries, with a simple API.
3The framework demonstrated exceptional performance, sorting 110.5 TiB of data in just over 30 minutes on the GraySort benchmark, achieving an average throughput of 3.66 TiB/min.

repartition(n_partitions, hash_by="column_name")