Operators Reference
Query execution operators.
Operator
Operator
Base class for all query operators
Operators form a tree where: - Leaf operators (e.g., Scan) read from data sources - Internal operators (e.g., Filter, Project) transform data - Root operator is pulled by the executor to get results
The pull-based execution model means: - Operators are lazy (generators) - Data flows through the tree on-demand - Memory usage is O(pipeline depth), not O(data size)
Source code in sqlstream/operators/base.py
__init__
Initialize operator
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
child
|
Optional[Operator]
|
Child operator to pull data from (None for leaf operators) |
None
|
__iter__
Execute operator and yield results
This is the core method that defines operator behavior. Subclasses must implement this to define how they process data.
Yields:
| Type | Description |
|---|---|
dict[str, Any]
|
Rows as dictionaries |
Source code in sqlstream/operators/base.py
Filter
Filter
Bases: Operator
Filter operator - evaluates WHERE conditions
Pulls rows from child and only yields those that satisfy all conditions (AND logic).
Source code in sqlstream/operators/filter.py
14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 | |
__init__
__iter__
Yield only rows that match all conditions
For each row from child: 1. Evaluate all conditions 2. If all are True, yield the row 3. Otherwise, skip it
Source code in sqlstream/operators/filter.py
GroupByOperator
GroupByOperator
Bases: Operator
GROUP BY operator with aggregation
Uses hash-based aggregation: 1. Scan all input rows 2. Group by key columns 3. Maintain aggregators for each group 4. Yield one row per group
Note: This operator materializes all data in memory (not lazy). For large datasets, consider external sorting/grouping.
Source code in sqlstream/operators/groupby.py
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 | |
__init__
__init__(source: Operator, group_by_columns: list[str], aggregates: list[AggregateFunction], select_columns: list[str])
Initialize GroupBy operator
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
source
|
Operator
|
Source operator |
required |
group_by_columns
|
list[str]
|
List of columns to group by |
required |
aggregates
|
list[AggregateFunction]
|
List of aggregate functions to compute |
required |
select_columns
|
list[str]
|
List of columns in SELECT clause (for output order) |
required |
Source code in sqlstream/operators/groupby.py
__iter__
Execute GROUP BY aggregation
Yields:
| Type | Description |
|---|---|
dict[str, Any]
|
One row per group with group columns and aggregated values |
Source code in sqlstream/operators/groupby.py
explain
Generate execution plan explanation
Source code in sqlstream/operators/groupby.py
HashJoinOperator
HashJoinOperator
Bases: Operator
Hash Join operator for equi-joins
Supports: - INNER JOIN: Only matching rows from both tables - LEFT JOIN: All rows from left, matched rows from right (NULL if no match) - RIGHT JOIN: All rows from right, matched rows from left (NULL if no match)
Algorithm: 1. Build hash table from right table (keyed by join column) 2. Probe hash table with rows from left table 3. Output joined rows based on join type
Note: This operator materializes the right table in memory. For very large right tables, consider external hash join or other algorithms.
Source code in sqlstream/operators/join.py
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 | |
__init__
Initialize Hash Join operator
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
left
|
Operator
|
Left table operator |
required |
right
|
Operator
|
Right table operator |
required |
join_type
|
str
|
'INNER', 'LEFT', or 'RIGHT' |
required |
left_key
|
str
|
Column name in left table for join condition |
required |
right_key
|
str
|
Column name in right table for join condition |
required |
Source code in sqlstream/operators/join.py
__iter__
Execute hash join
Yields:
| Type | Description |
|---|---|
dict[str, Any]
|
Joined rows with columns from both tables |
Source code in sqlstream/operators/join.py
explain
Generate execution plan explanation
Source code in sqlstream/operators/join.py
Limit
Limit
Bases: Operator
Limit operator - restricts number of rows (LIMIT clause)
Pulls rows from child and yields only the first N rows. This allows for early termination - we stop pulling from child once we've yielded enough rows.
Source code in sqlstream/operators/limit.py
__init__
Initialize limit operator
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
child
|
Operator
|
Child operator to pull rows from |
required |
limit
|
int
|
Maximum number of rows to yield |
required |
__iter__
Yield at most limit rows
This is efficient because it stops pulling from child as soon as we've yielded enough rows (early termination).
Source code in sqlstream/operators/limit.py
OrderByOperator
OrderByOperator
Bases: Operator
ORDER BY operator
Sorts all input rows by specified columns.
Note: This operator materializes all data in memory (not lazy). For large datasets that don't fit in memory, consider external sorting.
Source code in sqlstream/operators/orderby.py
__init__
Initialize OrderBy operator
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
source
|
Operator
|
Source operator |
required |
order_by
|
list[OrderByColumn]
|
List of OrderByColumn specifications |
required |
Source code in sqlstream/operators/orderby.py
__iter__
Execute ORDER BY sorting
Yields:
| Type | Description |
|---|---|
dict[str, Any]
|
Rows in sorted order |
Source code in sqlstream/operators/orderby.py
explain
Generate execution plan explanation
Source code in sqlstream/operators/orderby.py
ReverseCompare
ReverseCompare
Wrapper class to reverse comparison order for non-numeric types
Used for DESC sorting of strings and other non-numeric types.
Source code in sqlstream/operators/orderby.py
Project
Project
Bases: Operator
Project operator - selects columns (SELECT clause)
Pulls rows from child and yields only the requested columns.
For efficiency, we use dict views rather than copying data.
Source code in sqlstream/operators/project.py
__init__
Initialize project operator
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
child
|
Operator
|
Child operator to pull rows from |
required |
columns
|
list[str]
|
List of column names to select (or ['*'] for all) |
required |
Source code in sqlstream/operators/project.py
__iter__
Yield rows with only selected columns
If columns is ['*'], yields all columns unchanged. Otherwise, creates a new dict with only the requested columns.
Source code in sqlstream/operators/project.py
Scan
Scan
Bases: Operator
Scan operator - wrapper around a data source reader
This is the leaf of the operator tree. It pulls data from a reader and yields it to parent operators.
Source code in sqlstream/operators/scan.py
__init__
Initialize scan operator
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
reader
|
BaseReader
|
Data source reader to scan |
required |
__iter__
Yield all rows from the reader
This delegates directly to the reader's lazy iterator.