Quick Start
Get up and running with SQLStream in 5 minutes!
Step 1: Install SQLStream
Step 2: Create Sample Data
Create a sample CSV file:
cat > employees.csv << EOF
id,name,department,salary,hire_date
1,Alice,Engineering,95000,2020-01-15
2,Bob,Sales,75000,2019-06-01
3,Charlie,Engineering,105000,2018-03-20
4,Diana,Marketing,68000,2021-02-14
5,Eve,Sales,82000,2020-09-10
EOF
Step 3: Your First Query
CLI Usage
# Select all rows
$ sqlstream query employees.csv "SELECT * FROM employees"
# Filter by department
$ sqlstream query employees.csv "SELECT name, salary FROM employees WHERE department = 'Engineering'"
# Sort by salary
$ sqlstream query employees.csv "SELECT * FROM employees ORDER BY salary DESC LIMIT 3"
# Query with inline file path (no source argument needed)
$ sqlstream query "SELECT * FROM 'employees.csv' WHERE salary > 80000"
Python API
from sqlstream import query
# Simple query with explicit source
results = query("employees.csv").sql("SELECT * FROM employees WHERE salary > 80000")
# Query with inline source (extracted from SQL)
results = query().sql("SELECT * FROM 'employees.csv' WHERE salary > 80000")
# Print results
for row in results:
print(f"{row['name']}: ${row['salary']:,}")
Output:
Step 4: Advanced Features
Aggregations
# Count employees by department
$ sqlstream query employees.csv "SELECT department, COUNT(*) AS count FROM employees GROUP BY department"
Output:
┌─────────────┬───────┐
│ department │ count │
├─────────────┼───────┤
│ Engineering │ 2 │
│ Sales │ 2 │
│ Marketing │ 1 │
└─────────────┴───────┘
Joins
Create another file for orders:
Join the two files:
$ sqlstream query "SELECT e.name, o.amount FROM 'employees.csv' e JOIN 'orders.csv' o ON e.id = o.employee_id"
Output:
┌─────────┬────────┐
│ name │ amount │
├─────────┼────────┤
│ Alice │ 1500 │
│ Bob │ 2300 │
│ Alice │ 1800 │
│ Charlie │ 2100 │
└─────────┴────────┘
Output Formats
Step 5: Performance Boost
For large files, use the pandas backend:
Performance comparison:
| Rows | Python Backend | Pandas Backend | Speedup |
|---|---|---|---|
| 10K | 0.5s | 0.05s | 10x |
| 100K | 5.2s | 0.15s | 35x |
| 1M | 52s | 0.8s | 65x |
For complex SQL queries, use the DuckDB backend:
$ sqlstream query "
WITH ranked AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) as rank
FROM 'employees.csv'
)
SELECT * FROM ranked WHERE rank = 1
" --backend duckdb
Step 6: Query Different Data Formats
SQLStream supports multiple file formats beyond CSV:
JSON Files
# Query a simple JSON array
$ sqlstream query "users.json" "SELECT name, email FROM users"
# Query nested JSON with path syntax
$ sqlstream query "api.json#json:data.users" "SELECT name FROM users WHERE age > 25"
HTML Tables
Extract and query tables from HTML documents:
# Query first table in HTML file
$ sqlstream query "report.html" "SELECT * FROM report WHERE revenue > 100000"
# Query specific table by index
$ sqlstream query "report.html#html:1" "SELECT * FROM report"
# Query tables from web pages
$ sqlstream query "https://example.com/data.html" "SELECT * FROM data"
Markdown Tables
Parse tables from Markdown documents:
# Query a table from README
$ sqlstream query "README.md" "SELECT * FROM readme"
# Query specific table by index
$ sqlstream query "docs/api.md#markdown:2" "SELECT * FROM api"
Parquet Files
High-performance columnar format:
Step 7: Interactive Shell
For a full interactive experience, use the shell command:
This launches a powerful TUI (Terminal User Interface) with:
- Query Editor: Multi-line editing with syntax highlighting and multiple tabs (
Ctrl+Tto add,Ctrl+Wto close). - Word Deletion: Use
Ctrl+DeleteandCtrl+Backspacefor fast editing. - Results Viewer: Scrollable table with pagination.
- Sidebar: Toggle between Schema and Files browser (
F2to toggle,Ctrl+Oto open files). - Backend Toggle: Press
F5orCtrl+Bto cycle through backends (auto/duckdb/pandas/python). - Save Progress: Save your work with
Ctrl+S. - Export: Press
Ctrl+Xto export results to CSV/JSON/Parquet. - History: Use
Ctrl+Up/Downto navigate previous queries. - State Persistence: Your open tabs and queries are saved automatically on exit.
-
Incognito Mode: Start fresh without loading previous session or saving on exit:
Common Workflows
Data Exploration
# Check file structure
$ head -5 data.csv
# Count rows
$ sqlstream query data.csv "SELECT COUNT(*) FROM data"
# Show unique values
$ sqlstream query data.csv "SELECT DISTINCT category FROM data"
# Summary statistics
$ sqlstream query data.csv "SELECT MIN(price), MAX(price), AVG(price) FROM data"
Data Cleaning
from sqlstream import query
# Remove duplicates and filter nulls
results = query("messy_data.csv").sql("""
SELECT DISTINCT *
FROM messy_data
WHERE name IS NOT NULL
AND age > 0
ORDER BY id
""")
# Export cleaned data
import csv
with open("clean_data.csv", "w") as f:
writer = csv.DictWriter(f, fieldnames=results.to_list()[0].keys())
writer.writeheader()
writer.writerows(results.to_list())
ETL Pipeline
from sqlstream import query
# Extract
customers = query("customers.csv")
orders = query("orders.csv")
# Transform: Calculate total orders per customer
result = query("customers.csv").sql("""
SELECT c.name, COUNT(o.order_id) as total_orders
FROM customers c
JOIN orders o ON c.id = o.customer_id
GROUP BY c.name
ORDER BY total_orders DESC
""")
# Load
for row in result:
# Send to database, API, etc.
print(row)
What's Next?
Now that you're familiar with the basics, explore:
- Core Concepts - Understand how SQLStream works
- SQL Support - Learn all supported SQL features
- CLI Reference - Master the command-line interface
- Python API - Deep dive into the programmatic API
- Examples - More real-world examples