DuckDB is an in-process SQL OLAP database management system designed for efficient analytical queries. It's optimized for fast read-heavy workloads and is ideal for data science applications, especially when working with Pandas and large datasets.
duckdb website: https://duckdb.org/ duckdb documentation: https://duckdb.org/docs/ duckdb windows package GitHub: installer file
No, you do not need both the installer package and the pip module. The installer package is for using DuckDB as a standalone command-line tool, while the pip module allows you to use DuckDB directly within Python scripts. Choose one based on your use case.
pip install duckdb
import duckdb
import pandas as pd
# Example 1: Run a SQL query on a Pandas DataFrame
df = pd.DataFrame({"id": [1, 2, 3], "value": [10, 20, 30]})
result = duckdb.query("SELECT * FROM df WHERE value > 15").df()
print(result)
# Example 2: Simple SELECT query
print(duckdb.query("SELECT 42 AS answer").fetchall())
# Example 3: Create a DuckDB table
duckdb.query("CREATE TABLE items (name VARCHAR, price INTEGER)")
# Example 4: Insert values
duckdb.query("INSERT INTO items VALUES ('Pen', 10), ('Book', 30)")
# Example 5: Select from table
print(duckdb.query("SELECT * FROM items").fetchdf())
# Example 6: Aggregate functions
print(duckdb.query("SELECT AVG(price) FROM items").fetchone())
# Example 7: Join operations
users = pd.DataFrame({"id": [1, 2], "name": ["Alice", "Bob"]})
scores = pd.DataFrame({"id": [1, 2], "score": [90, 85]})
query = "SELECT u.name, s.score FROM users u JOIN scores s ON u.id = s.id"
print(duckdb.query(query).df())
# Example 8: Working with Parquet
# duckdb.query("CREATE TABLE parquet_table AS SELECT * FROM 'data.parquet'")
# Example 9: Export to Pandas
data = duckdb.query("SELECT * FROM items").df()
print(data)
# Example 10: Using SQL functions
print(duckdb.query("SELECT LENGTH('DuckDB')").fetchone())