Volcano Model
The default execution model in SQLStream is based on the Volcano Iterator Model (also known as the Open-Next-Close model).
How it Works
Each operator in the query plan (Scan, Filter, Project, Join) implements a standard interface with three methods:
open(): Initialize the operator.next(): Retrieve the next tuple (row).close(): Clean up resources.
Execution Flow
When a query is executed, the top-level operator calls next() on its child, which calls next() on its child, and so on, down to the Scan operator which reads from the file.
# Simplified representation
class FilterOperator:
def next(self):
while True:
row = self.child.next()
if row is None:
return None
if self.predicate(row):
return row
Benefits
- Low Memory Footprint: Data is processed one row at a time. The entire dataset does not need to be loaded into memory.
- Pipelining: No intermediate results need to be materialized (except for blocking operators like Sort or Aggregate).
- Simplicity: Easy to implement and extend.
Trade-offs
- CPU Overhead: Function call overhead for every row can be significant in Python.
- Performance: Slower than vectorized execution for large datasets. This is why SQLStream also offers a Pandas backend.