Trino SQL surface
Cyoda exposes an analytical SQL surface through a Trino connector. Every entity model is projected into a set of virtual SQL tables so that nested JSON/XML data can be queried with ordinary relational SQL — no pre-flattening required.
This page documents the projection rules, the table naming convention, the column categories, the JSON-to-SQL type mapping, and how polymorphic fields are handled. For the conceptual framing of the three Cyoda surfaces (REST, gRPC, Trino SQL), see APIs and surfaces.
Core concepts
Section titled “Core concepts”Cyoda represents data (JSON/XML) as hierarchical tree structures.
Internally, these trees are decomposed into a collection of Nodes, each
capturing a specific branch or subset of the data. This decomposition
provides a uniform representation for querying and traversal — whether
through API-level path queries or via SQL through the Trino connector.
In the Trino view, each node is exposed as a virtual SQL table, allowing nested structures to be queried using familiar relational syntax without flattening the original hierarchy.
Key principle: each Node corresponds to exactly one SQL table.
Node model
Section titled “Node model”Each Node has:
- path: a string identifying the node’s position in the hierarchy
(e.g.
$,$.organization.clients,$.agreement_data) - fields: a map of field names to values, where values can be:
- primitive types (string, number, boolean, date, etc.)
- 1-dimensional arrays of primitives
The collection of these nodes forms the internal data model that underpins
the Trino connector. The TreeNodeEntity is the object type that
encapsulates this node collection when interacting with the Cyoda API, but
it is a byproduct of this broader structural model rather than its
defining feature.
Node creation rules
Section titled “Node creation rules”Nodes are derived directly from the structure of the data (such as JSON/XML):
- Root node: always created with path
$containing top-level fields. - Array of objects: each array of objects creates a new node.
- Multidimensional arrays: each dimension beyond one creates a further node to preserve structural depth.
This consistent mapping enables Cyoda to represent, navigate, and query arbitrarily nested data structures in a predictable and composable way.
Example: node structure
Section titled “Example: node structure”Given a JSON file with these paths:
$.organization.name$.organization.address[]$.organization.clients[].name$.organization.clients[].address[]$.quarterly_metrics[][]The system creates the following nodes:
Node 1 (Root):
- Path:
$ - Fields:
.organization.name,.organization.address[]
Node 2 (Clients array):
- Path:
$.organization.clients[*] - Fields:
.name,.address[]
Node 3 (Agreement data — 2D array, detached):
- Path:
$.quarterly_metrics[*] - Fields:
[*](detached array containing the inner array elements) - This is a detached array because
quarterly_metricsis a 2-dimensional array.
For example:
{ "organization": { "name": "Acme Corp", "address": [ "123 Market Street", "Suite 400", "San Francisco, CA 94105" ], "clients": [ { "name": "Client A", "address": ["10 First Ave", "Seattle, WA 98101"] }, { "name": "Client B", "address": ["200 Second St", "Portland, OR 97204"] } ] }, "quarterly_metrics": [ [1000, 1200, 900, 1100], [1300, 1400, 1250, 1500] ]}Tree decomposition
Section titled “Tree decomposition”Here is a visual representation of the node structure for the example above, where the corresponding SQL tables are labelled ORGANIZATION, CLIENTS, and METRICS:
Embedded arrays and detached arrays
Section titled “Embedded arrays and detached arrays”When JSON/XML contains one-dimensional arrays of primitives within objects,
the system does not create a separate node for such arrays. Instead they
are represented as a single column of an array type. In the table it can
be represented as a column of array type (i.e. field_array —
ARRAY[STRING]) or as multiple columns (i.e. field_0 — STRING, field_1
— STRING, …), depending on the flatten_array flag for this field in
the Trino schema settings. That setting also works for system fields,
such as index.
When JSON/XML contains multidimensional arrays (arrays of arrays), the system creates separate nodes for each dimension after the first. This process is called array detachment.
Understanding detached arrays
Section titled “Understanding detached arrays”A detached array is created when an array contains other arrays as elements. Each additional dimension becomes a separate node with its own table.
The primary motivation behind this approach: if a JSON contains a table-like structure — i.e. a 2-dimensional array of primitives — it should be represented as a table in Trino. This logic was then extended and generalized to work for arrays of any number of dimensions larger than 1.
Example 1: simple 2D array
Section titled “Example 1: simple 2D array”Consider this JSON with a 2-dimensional array:
{ "matrix": [ [1, 2, 3], [4, 5, 6] ]}Nodes created:
Node 1 (Root — $):
- Path:
$ - Fields: (none — the matrix field is an array of arrays, so it’s not stored here)
Node 2 (First dimension — $.matrix[*]):
- Path:
$.matrix[*] - Fields:
[1],[2],[3](these are detached array fields containing values from the inner arrays, which are represented as rows) - This node is marked as having a detached array
Generated tables:
Table 1: mydata (from node $)
- Contains only special and root columns (no data fields in this case)
Table 2: mydata_matrix_array (from node $.matrix[*] — detached
array)
- Table name breakdown:
mydata= entity name_matrix= field name from path_array= suffix indicating this is a detached array table
- Columns:
entity_id(UUID)point_time(DATE)- Root columns (creation_date, last_update_date, state)
index_0(INTEGER) — Position in the outer array (0 or 1 in this example)element_0,element_1,element_2(INTEGER) — The three elements of each inner array
Example 2: 3D array
Section titled “Example 2: 3D array”For a 3-dimensional array:
{ "cube": [ [[1, 2], [3, 4]], [[5, 6], [7, 8]] ]}Nodes created:
Node 1 (Root — $):
- No data fields
Node 2 (Second dimension — $.cube[*][*], no separate layer for the
first dimension in this case):
- Fields:
[1],[2],[3]
Generated tables:
Table 1: mydata (from node $)
Table 2: mydata_cube_2d_array (from node $.cube[*][*]. Note that
2d in this case represents the number of collapsed dimensions, while
the 3rd dimension is detached)
- Columns include
index_0andindex_1for positions in both dimensions element_0,element_1for the two primitive values
Example 3: array of objects with nested arrays
Section titled “Example 3: array of objects with nested arrays”{ "data": [ { "name": "item1", "values": [10, 20, 30] }, { "name": "item2", "values": [40, 50] } ]}Nodes created:
Node 1 (Root — $):
- Path:
$
Node 2 (Array of objects — $.data[*]):
- Path:
$.data[*] - Fields:
.name,.values[]
Generated tables:
Table 1: mydata (from node $)
Table 2: mydata_data (from node $.data[*])
- Columns:
- Special and root columns
index_0(INTEGER) — Position in the data arrayname(STRING) — From.namefieldvalues_array(ARRAY[INTEGER]) orvalues_0,values_1,values_2(INTEGER) depending on theflatten_arrayflag for this field in the Trino schema settings
Variable-dimension arrays (mixed depths)
Section titled “Variable-dimension arrays (mixed depths)”The system can handle arrays where elements have different nesting depths — some elements are primitives while others are arrays.
Example:
{ "data": [ 1, [2, 3], [4, 5, [6, 7], [8, 9]] ]}This is a polymorphic array containing:
- A primitive value:
1 - A 1-dimensional array:
[2, 3] - A 2-dimensional array:
[4, 5, [6, 7], [8, 9]]
Nodes created:
Node 1 (Root — $)
Node 2 (Mixed node — $.data[*])
Node 3 (Mixed node — $.data[*][*])
Generated tables:
Table 1: mydata (from node $)
- Columns:
- Special and root columns
data[*](ARRAY[INTEGER]) — Field for the top-level array elements (one row with array value[1]for the current example)
Table 2: mydata_data_array (from node $.data[*])
- Contains rows for second-level array elements (2 rows for the current example)
- Columns:
- Special and root columns
index_0(INTEGER) — Position in the data arrayelement_0,element_1— columns for primitive values at the second level (for values2,3,4,5presented in 2 rows)
Table 3: mydata_data_2d_array (from array part of $.data[*][*])
- Contains the elements from nested arrays
- Columns:
- Special and root columns
index_0,index_1(INTEGER) — Position in the outer arrayelement_0,element_1— Elements from the third-level arrays (for values6,7,8,9presented in 2 rows, with index values2,2and2,3)
SQL table generation
Section titled “SQL table generation”Table naming convention
Section titled “Table naming convention”Important: each node in your TreeNodeEntity is mapped to exactly
one SQL table.
Table names are generated using the following rules:
- Base name: Entity model name (e.g.,
prizes,companies_details) - Version suffix (if version > 1):
_<version>(e.g.,_2,_3) - Path suffix (for non-root nodes): Derived from the node path with
.and#replaced by_- Example:
$.prizes[*]→_prizes - Example:
$.prizes[*].laureates[*]→_prizes_laureates
- Example:
- Multidimensional suffix:
_<N>dwhere N is the number of][sequences in the node path + 1 (Note that this suffix is derived from the node path, not the field path — it represents the number of collapsed dimensions. So if we are dealing with a 3-dimensional array of primitives, the suffix will be_2d, as the last dimension is expanded into columns.)- Example:
$.data[*][*]has one][→_2d - Example:
$.cube[*][*][*]has two][→_3d
- Example:
- Detached array suffix:
_arrayadded when the node represents a detached array- This happens for multidimensional arrays where inner dimensions are “detached”.
Table naming examples
Section titled “Table naming examples”| Entity Name | Version | Node Path | Is Detached Array? | Table Name | Explanation |
|---|---|---|---|---|---|
prizes | 1 | $ | No | prizes | Root node |
prizes | 2 | $ | No | prizes_2 | Root node, version 2 |
prizes | 1 | $.prizes[*] | No | prizes_prizes | Array of objects |
prizes | 1 | $.prizes[*].laureates[*] | No | prizes_prizes_laureates | Nested array of objects |
companies | 1 | $.matrix[*] | Yes | companies_matrix_array | 2D array — detached |
companies | 1 | $.cube[*][*] | Yes | companies_cube_2d_array | 3D array — has ][ so gets _2d, plus _array |
This is the default schema-generation naming; any of these names can be changed manually in the schema settings.
Special JSON table
Section titled “Special JSON table”In addition to the structured tables, every entity model gets a special JSON table that contains the complete reconstructed JSON for each entity:
- Table name:
<entity_name>_json(e.g.,prizes_json) - Purpose: allows you to retrieve the full original JSON document
Good practice:
SELECT entity FROM prizes_json WHERE entity_id = '<uuid>';Avoid:
SELECT entity FROM prizes_json; -- This will be slow!Table columns
Section titled “Table columns”Every SQL table contains several categories of columns.
1. Special columns
Section titled “1. Special columns”These are system-generated columns available in all tables:
| Column Name | Data Type | Description |
|---|---|---|
entity_id | UUID | Unique identifier for the entity (the loaded JSON/XML file) |
point_time | DATE | Allows querying data as it existed at a specific point in time |
The JSON table also includes:
entity(STRING): the complete reconstructed JSON document.
2. Root columns
Section titled “2. Root columns”These columns provide metadata about the entity:
| Column Name | Source Field | Data Type | Description |
|---|---|---|---|
creation_date | creationDate | DATE | When the entity was created in the system |
last_update_date | lastUpdateTime | DATE | When the entity was last modified |
state | state | STRING | Current workflow state of the entity |
3. Index columns
Section titled “3. Index columns”For tables representing array (object or detached) elements (depth > 0),
an index column is provided:
- Column name:
index - Purpose: identifies the position of this row in the array hierarchy
- Structure: can be flattened into individual columns (
index_0,index_1, etc.) for multidimensional arrays. Flattened by default.
Example: For $.prizes[*].laureates[*]:
index_0: Position in theprizesarrayindex_1: Position in thelaureatesarray within that prize
4. Data columns
Section titled “4. Data columns”These are the actual fields from your JSON/XML data.
Primitive fields
Section titled “Primitive fields”Simple fields are mapped directly to columns:
| JSON Path | Field Key | Column Name | Data Type |
|---|---|---|---|
$.organization.name | .organization.name | organization_name | STRING |
$.prizes[*].year | .year | year | STRING |
$.prizes[*].laureates[*].id | .id | id | STRING |
Naming rules:
- Leading
.is removed from the field key. - Reserved field names (like
index) are prefixed with_(e.g.,_index).
Array fields
Section titled “Array fields”1-dimensional arrays of primitives are handled in two ways:
Option 1: Array column (default for homogeneous arrays)
- Column name:
<field_name>_array - Data type:
ARRAY[<element_type>] - Example:
.addresses[]→addresses_array(ARRAY[STRING])
Option 2: Flattened columns (for multi-type or ZONED_DATE_TIME arrays)
- Multiple columns created:
<field_name>_0,<field_name>_1, etc. - Each column represents one position in the array.
- Example:
.scores[]→scores_0,scores_1,scores_2(if array has 3 elements)
Supported data types
Section titled “Supported data types”The system supports the following data types, which are mapped to appropriate SQL types for Trino queries. Understanding how these types are detected from JSON is crucial for working with your data effectively.
Important: All data is stored internally in Cyoda with full
precision. Trino provides a SQL query interface to this data, and some
types (like UNBOUND_DECIMAL and UNBOUND_INTEGER) are represented as
strings in Trino due to Trino’s numeric limitations, even though they
are stored as numbers in Cyoda.
Data type reference table
Section titled “Data type reference table”| System Type | SQL Type | Description | JSON Detection |
|---|---|---|---|
| STRING | VARCHAR | Text data (max 1024 characters) | Text values, or values that can’t be parsed as other types |
| CHAR | CHAR | Single character | Single-character strings |
| BYTE | TINYINT | 8-bit integer (-128 to 127) | Integers in byte range |
| SHORT | SMALLINT | 16-bit integer (-32,768 to 32,767) | Integers in short range |
| INT | INTEGER | 32-bit integer | Integers in int range |
| LONG | BIGINT | 64-bit integer | Integers in long range |
| FLOAT | REAL | Single-precision floating point | Decimals with ≤6 digits precision and scale ≤31 |
| DOUBLE | DOUBLE | Double-precision floating point | Decimals with ≤15 digits precision and scale ≤292 |
| BIG_DECIMAL | DECIMAL(38,18) | High-precision decimal (Trino Int128) | Decimals that fit in Int128 with scale ≤18 |
| UNBOUND_DECIMAL | VARCHAR | Very large/precise decimals (Trino representation) | Decimals exceeding BIG_DECIMAL limits |
| BIG_INTEGER | DECIMAL(38,0) | Large integer (Trino Int128) | Integers that fit in Int128 range |
| UNBOUND_INTEGER | VARCHAR | Very large integers (Trino representation) | Integers exceeding BIG_INTEGER limits |
| BOOLEAN | BOOLEAN | True/false values | JSON boolean values |
| LOCAL_DATE | DATE | Date without time | ISO date strings (e.g., “2024-01-15”) |
| LOCAL_TIME | TIME | Time without date | ISO time strings (e.g., “14:30:00”) |
| LOCAL_DATE_TIME | TIMESTAMP | Date and time without timezone | ISO datetime strings |
| ZONED_DATE_TIME | TIMESTAMP WITH TIME ZONE | Date and time with timezone | ISO datetime with timezone |
| UUID | UUID | Universally unique identifier | Valid UUID strings |
| TIME_UUID | UUID | Time-based UUID (version 1) | UUID version 1 strings |
| BYTE_ARRAY | VARBINARY | Binary data | Base64-encoded strings |
How number types are detected from JSON
Section titled “How number types are detected from JSON”When the system parses JSON numbers, it follows a specific detection algorithm to determine the most appropriate type.
Integer detection
Section titled “Integer detection”For whole numbers (no decimal point), the system checks in this order:
- BYTE — if the value is between -128 and 127
- SHORT — if the value is between -32,768 and 32,767
- INT — if the value is between -2,147,483,648 and 2,147,483,647
- LONG — if the value is between -9,223,372,036,854,775,808 and 9,223,372,036,854,775,807
- BIG_INTEGER — if the value fits in Int128 range (see below)
- UNBOUND_INTEGER — for integers larger than Int128
Example:
{ "small": 42, // → BYTE "medium": 1000, // → SHORT "large": 100000, // → INT "veryLarge": 10000000000, // → LONG "huge": 123456789012345678901234567890 // → BIG_INTEGER or UNBOUND_INTEGER}Decimal detection
Section titled “Decimal detection”For numbers with decimal points, the system checks in this order:
- FLOAT — if precision ≤ 6 digits and scale is between -31 and 31
- DOUBLE — if precision ≤ 15 digits and scale is between -292 and 292
- BIG_DECIMAL — if the value fits in Int128 decimal constraints (see below)
- UNBOUND_DECIMAL — for decimals exceeding BIG_DECIMAL limits
Example:
{ "price": 19.99, // → FLOAT (6 digits precision) "precise": 123.456789012345, // → DOUBLE (15 digits precision) "veryPrecise": 123456789012345.123456789012345, // → BIG_DECIMAL "extremelyPrecise": 1.23456789012345678901234567890123456789 // → UNBOUND_DECIMAL}Understanding BIG_DECIMAL and UNBOUND_DECIMAL
Section titled “Understanding BIG_DECIMAL and UNBOUND_DECIMAL”BIG_DECIMAL (Trino Int128 decimal)
Section titled “BIG_DECIMAL (Trino Int128 decimal)”BIG_DECIMAL is bounded by Trino’s maximum numeric capacity, which uses a 128-bit integer (Int128) with a fixed scale of 18.
Constraints:
- Scale: must be ≤ 18 decimal places
- Precision: must be ≤ 38 total digits (with some complexity — see below)
- Range: approximately ±170,141,183,460,469,231,731.687303715884105727
Detailed precision rules:
The system uses two precision checks:
- Strict check:
precision ≤ 38ANDexponent ≤ 20(where exponent = precision - scale) - Loose check:
precision ≤ 39ANDexponent ≤ 21AND the value fits when scaled to 18 decimal places
Examples:
{ "fits": 12345678901234567890.123456789012345678, // ✓ BIG_DECIMAL (38 digits, scale 18) "tooManyDecimals": 123.1234567890123456789, // ✗ UNBOUND_DECIMAL (scale > 18) "tooLarge": 999999999999999999999.999999999999999999 // ✗ UNBOUND_DECIMAL (exceeds Int128)}UNBOUND_DECIMAL
Section titled “UNBOUND_DECIMAL”UNBOUND_DECIMAL is used for decimal values that exceed Trino’s numeric representation limits.
Storage vs representation:
- In Cyoda: stored as full-precision BigDecimal numbers with all numeric operations available in workflows.
- In Trino SQL: represented as VARCHAR (strings) due to Trino’s Int128 limitations.
When a decimal becomes UNBOUND_DECIMAL:
- Scale > 18 decimal places.
- Total value exceeds Int128 range.
- Precision and exponent exceed the limits.
Important for SQL queries: when querying UNBOUND_DECIMAL columns in Trino, treat them as VARCHAR, not as numeric types. Numeric operations are not available in SQL queries for these fields.
-- Correct: treat as string in Trino SQLSELECT * FROM mytable WHERE unbound_decimal_field = '123.12345678901234567890123456789';
-- Incorrect: cannot use numeric operations in Trino SQLSELECT * FROM mytable WHERE unbound_decimal_field > 100; -- This will fail!
-- Note: numeric operations ARE available in Cyoda workflows, just not in Trino SQL queries.Understanding BIG_INTEGER and UNBOUND_INTEGER
Section titled “Understanding BIG_INTEGER and UNBOUND_INTEGER”BIG_INTEGER (Trino Int128)
Section titled “BIG_INTEGER (Trino Int128)”BIG_INTEGER is bounded by Trino’s 128-bit integer capacity.
Constraints:
- Range: -170,141,183,460,469,231,731,687,303,715,884,105,728 to 170,141,183,460,469,231,731,687,303,715,884,105,727
- This is 2127 - 1 for the maximum and -2127 for the minimum.
Examples:
{ "fits": 123456789012345678901234567890, // ✓ BIG_INTEGER (within Int128) "tooLarge": 999999999999999999999999999999999999999 // ✗ UNBOUND_INTEGER (exceeds Int128)}UNBOUND_INTEGER
Section titled “UNBOUND_INTEGER”UNBOUND_INTEGER is used for integer values that exceed the Int128 range.
Storage vs representation:
- In Cyoda: stored as full-precision BigInteger numbers with all numeric operations available in workflows.
- In Trino SQL: represented as VARCHAR (strings) due to Trino’s Int128 limitations.
Important for SQL queries: when querying UNBOUND_INTEGER columns in Trino, treat them as VARCHAR. Numeric operations are not available in SQL queries for these fields.
-- Correct: treat as string in Trino SQLSELECT * FROM mytable WHERE unbound_integer_field = '999999999999999999999999999999999999999';
-- Incorrect: cannot use numeric operations in Trino SQLSELECT * FROM mytable WHERE unbound_integer_field > 1000; -- This will fail!
-- Note: numeric operations ARE available in Cyoda workflows, just not in Trino SQL queries.Type detection priority
Section titled “Type detection priority”When parsing JSON, the system always tries to use the most specific type that fits the value:
- Smallest integer type that can hold the value (BYTE → SHORT → INT → LONG → BIG_INTEGER → UNBOUND_INTEGER).
- Smallest decimal type that can hold the value (FLOAT → DOUBLE → BIG_DECIMAL → UNBOUND_DECIMAL).
- STRING as a fallback for any value that can’t be parsed as a more specific type.
String parsing
Section titled “String parsing”When a JSON value is a string, the system attempts to parse it as other types in this priority order:
- Temporal types (dates, times, datetimes).
- UUID types.
- Boolean (“true” or “false”).
- Numeric types (if the string contains a valid number).
- STRING (if none of the above match).
Example:
{ "date": "2024-01-15", // → LOCAL_DATE "uuid": "550e8400-e29b-41d4-a716-446655440000", // → UUID "bool": "true", // → BOOLEAN "text": "hello world" // → STRING}Polymorphic fields
Section titled “Polymorphic fields”What are polymorphic fields?
Section titled “What are polymorphic fields?”A polymorphic field occurs when the same field path has different data types across different elements in your JSON/XML data. This commonly happens in arrays of objects where the same field name contains different types of values.
Example of polymorphic data
Section titled “Example of polymorphic data”{ "items": [ { "value": "text string" }, { "value": 123 }, { "value": 45.67 } ]}In this example, the field $.items[*].value is polymorphic because it
contains:
- A STRING in the first element
- An INTEGER in the second element
- A DOUBLE in the third element
How polymorphic fields are handled
Section titled “How polymorphic fields are handled”When the system detects polymorphic fields, it automatically determines a common data type that can accommodate all the different types encountered. The logic:
- Check for compatible types: the system first checks if all types are compatible and can be converted to a common type.
- Find the lowest common denominator: it selects the most general type that all values can be converted to.
- Fall back to STRING: if no common numeric or date type exists, the field is stored as STRING.
Type compatibility rules
Section titled “Type compatibility rules”The system recognizes certain types as compatible and will convert them to a common type using widening conversions. It always chooses the type that can represent all values without loss of information.
Numeric type hierarchy
Section titled “Numeric type hierarchy”Integer types (from smallest to largest):
- BYTE → SHORT → INT → LONG → BIG_INTEGER → UNBOUND_INTEGER
Decimal types (from smallest to largest):
- FLOAT → DOUBLE → BIG_DECIMAL → UNBOUND_DECIMAL
Cross-hierarchy conversions:
- Any integer type can widen to any larger integer type or any decimal type.
- Any decimal type can widen to a larger decimal type.
- UNBOUND_DECIMAL is the widest numeric type (can hold any number).
Common type conversion examples
Section titled “Common type conversion examples”| Types Found | Common Type Used | Explanation |
|---|---|---|
| BYTE, SHORT | SHORT | Wider integer type |
| INT, LONG | LONG | Wider integer type |
| BYTE, DOUBLE | DOUBLE | Integer widens to decimal |
| INT, BIG_DECIMAL | BIG_DECIMAL | Integer widens to decimal |
| LONG, UNBOUND_INTEGER | UNBOUND_INTEGER | Wider integer type |
| FLOAT, DOUBLE | DOUBLE | Wider decimal type |
| DOUBLE, BIG_DECIMAL | UNBOUND_DECIMAL | DOUBLE can’t fit in BIG_DECIMAL’s fixed scale |
| BIG_INTEGER, BIG_DECIMAL | BIG_DECIMAL | Integer widens to decimal |
| BIG_DECIMAL, UNBOUND_DECIMAL | UNBOUND_DECIMAL | Wider decimal type |
| BIG_INTEGER, UNBOUND_INTEGER | UNBOUND_INTEGER | Wider integer type |
| Any numeric, STRING | STRING | Incompatible — falls back to STRING |
| BOOLEAN, INT | STRING | Incompatible — falls back to STRING |
| UUID, STRING | STRING | Incompatible — falls back to STRING |
Temporal type conversions
Section titled “Temporal type conversions”Temporal types have a resolution hierarchy, where lower-resolution types (like YEAR) can be converted to higher-resolution types (like LOCAL_DATE) by adding default values for the missing components.
Resolution hierarchy:
- YEAR → YEAR_MONTH → LOCAL_DATE → LOCAL_DATE_TIME → ZONED_DATE_TIME
- LOCAL_TIME → LOCAL_DATE_TIME → ZONED_DATE_TIME
When polymorphic temporal fields are detected, the system converts all values to the highest-resolution type found.
Upscaling (low resolution → high resolution)
Section titled “Upscaling (low resolution → high resolution)”When converting from a lower-resolution type to a higher-resolution type, the system adds default values for the missing components:
| From Type | To Type | Conversion Rule | Example |
|---|---|---|---|
| YEAR | YEAR_MONTH | Add month = 1 (January) | 2024 → 2024-01 |
| YEAR_MONTH | LOCAL_DATE | Add day = 1 (first day of month) | 2024-01 → 2024-01-01 |
| LOCAL_DATE | LOCAL_DATE_TIME | Add time = 00:00:00 (midnight) | 2024-01-01 → 2024-01-01T00:00:00 |
| LOCAL_TIME | LOCAL_DATE_TIME | Add date = 1970-01-01 (epoch) | 14:30:00 → 1970-01-01T14:30:00 |
Multi-step conversions:
The system can perform multi-step conversions by chaining the rules above:
- YEAR → LOCAL_DATE:
2024→2024-01→2024-01-01 - YEAR → LOCAL_DATE_TIME:
2024→2024-01→2024-01-01→2024-01-01T00:00:00 - YEAR_MONTH → LOCAL_DATE_TIME:
2024-06→2024-06-01→2024-06-01T00:00:00
Downscaling (high resolution → low resolution)
Section titled “Downscaling (high resolution → low resolution)”When converting from a higher-resolution type to a lower-resolution type, the system truncates the extra precision:
| From Type | To Type | Conversion Rule | Example |
|---|---|---|---|
| YEAR_MONTH | YEAR | Extract year only | 2024-06 → 2024 |
| LOCAL_DATE | YEAR_MONTH | Extract year and month | 2024-06-15 → 2024-06 |
| LOCAL_DATE_TIME | LOCAL_DATE | Extract date part | 2024-06-15T14:30:00 → 2024-06-15 |
| LOCAL_DATE_TIME | LOCAL_TIME | Extract time part | 2024-06-15T14:30:00 → 14:30:00 |
Downscaling is primarily used internally for query optimization (e.g. when processing a query condition against a [YEAR, DATE] polymorphic field, the query condition is downscaled to YEAR for the YEAR part of the target field). In polymorphic fields represented in Trino, the system always upscales to the highest-resolution type.
Polymorphic temporal field examples
Section titled “Polymorphic temporal field examples”Example 1: Mixed date resolutions
{ "events": [ { "date": "2024" }, // YEAR { "date": "2024-06" }, // YEAR_MONTH { "date": "2024-06-15" } // LOCAL_DATE ]}The field $.events[*].date is polymorphic with types: YEAR,
YEAR_MONTH, LOCAL_DATE.
Common type: LOCAL_DATE (highest resolution).
Trino SQL values after conversion:
"2024"→2024-01-01(January 1st, 2024)"2024-06"→2024-06-01(June 1st, 2024)"2024-06-15"→2024-06-15(unchanged)
Example 2: Date and DateTime mix
{ "timestamps": [ { "when": "2024-01-15" }, // LOCAL_DATE { "when": "2024-01-15T14:30:00" } // LOCAL_DATE_TIME ]}Common type: LOCAL_DATE_TIME.
Trino SQL values after conversion:
"2024-01-15"→2024-01-15T00:00:00(midnight)"2024-01-15T14:30:00"→2024-01-15T14:30:00(unchanged)
Example 3: Time and DateTime mix
{ "schedule": [ { "time": "14:30:00" }, // LOCAL_TIME { "time": "2024-01-15T14:30:00" } // LOCAL_DATE_TIME ]}Common type: LOCAL_DATE_TIME.
Trino SQL values after conversion:
"14:30:00"→1970-01-01T14:30:00(epoch date + time)"2024-01-15T14:30:00"→2024-01-15T14:30:00(unchanged)
Important considerations for temporal polymorphism
Section titled “Important considerations for temporal polymorphism”-
Default values matter: When YEAR is converted to LOCAL_DATE, it becomes January 1st. This means:
-- If the field contains polymorphic YEAR and LOCAL_DATE valuesSELECT * FROM events WHERE date = '2024-01-01'-- This will match both "2024" (converted to 2024-01-01) and "2024-01-01" -
Loss of semantic meaning: A YEAR value of
"2024"represents the entire year, but when converted to LOCAL_DATE it becomes2024-01-01, which represents a specific day. The original semantic meaning (the entire year) is lost. -
Query implications: when querying polymorphic temporal fields, be aware of the conversion rules:
-- Original data: ["2024", "2024-06-15"]-- Stored as LOCAL_DATE: [2024-01-01, 2024-06-15]-- This query will NOT match the original "2024" valueSELECT * FROM events WHERE date >= '2024-06-15'-- Because "2024" was converted to 2024-01-01, which does not include every day of the year -
Best practice: if you need to preserve the original resolution, consider using separate fields:
{"yearOnly": "2024","exactDate": "2024-06-15"}
Polymorphic temporal conversion summary
Section titled “Polymorphic temporal conversion summary”Common polymorphic combinations:
| Types Found | Common Type | Conversion Applied |
|---|---|---|
| YEAR, YEAR_MONTH | YEAR_MONTH | YEAR → YEAR_MONTH (add month=1) |
| YEAR, LOCAL_DATE | LOCAL_DATE | YEAR → YEAR_MONTH → LOCAL_DATE |
| YEAR_MONTH, LOCAL_DATE | LOCAL_DATE | YEAR_MONTH → LOCAL_DATE (add day=1) |
| LOCAL_DATE, LOCAL_DATE_TIME | LOCAL_DATE_TIME | LOCAL_DATE → LOCAL_DATE_TIME (add time=00:00:00) |
| LOCAL_TIME, LOCAL_DATE_TIME | LOCAL_DATE_TIME | LOCAL_TIME → LOCAL_DATE_TIME (add date=1970-01-01) |
Important notes
Section titled “Important notes”Type conversion: when a polymorphic field is stored as a common type (e.g., STRING), all values are converted to that type. This means:
- Numeric values may be stored as strings:
"123"instead of123. - You may need to cast values in your SQL queries:
CAST(value AS INTEGER).
Best practices for polymorphic data
Section titled “Best practices for polymorphic data”- Consistent typing: when possible, maintain consistent data types for the same field across all array elements.
- Explicit casting: when querying polymorphic fields that were converted to STRING, use explicit CAST operations with caution — some values may not be castable.
- Understand your data: use the schema-inspection API to see which fields are polymorphic and what their common type is.
- Consider restructuring: if you have control over the data structure, consider using different field names for different types.
Complete example
Section titled “Complete example”Input JSON saved under model “prizes” version 1
Section titled “Input JSON saved under model “prizes” version 1”{ "extraction-date": "2024-01-15", "prizes": [ { "year": "2023", "category": "Physics", "laureates": [ { "id": "1001", "firstname": "Anne", "surname": "L'Huillier", "motivation": "for experimental methods...", "share": 3 }, { "id": "1002", "firstname": "Pierre", "surname": "Agostini", "motivation": "for experimental methods...", "share": 3 } ] } ]}Generated tables
Section titled “Generated tables”Table 1: prizes (Root node: $)
Section titled “Table 1: prizes (Root node: $)”| Column Name | Data Type | Category | Description |
|---|---|---|---|
entity_id | UUID | SPECIAL | Entity identifier |
point_time | DATE | SPECIAL | Query time point |
creation_date | DATE | ROOT | Entity creation date |
last_update_date | DATE | ROOT | Entity last update |
state | STRING | ROOT | Entity state |
extraction_date | DATE | DATA | From .extraction-date |
Table 2: prizes_prizes (Node: $.prizes[*])
Section titled “Table 2: prizes_prizes (Node: $.prizes[*])”| Column Name | Data Type | Category | Description |
|---|---|---|---|
entity_id | UUID | SPECIAL | Entity identifier |
point_time | DATE | SPECIAL | Query time point |
creation_date | DATE | ROOT | Entity creation date |
last_update_date | DATE | ROOT | Entity last update |
state | STRING | ROOT | Entity state |
index_0 | INTEGER | INDEX | Position in prizes array |
year | STRING | DATA | From .year |
category | STRING | DATA | From .category |
Table 3: prizes_prizes_laureates (Node: $.prizes[*].laureates[*])
Section titled “Table 3: prizes_prizes_laureates (Node: $.prizes[*].laureates[*])”| Column Name | Data Type | Category | Description |
|---|---|---|---|
entity_id | UUID | SPECIAL | Entity identifier |
point_time | DATE | SPECIAL | Query time point |
creation_date | DATE | ROOT | Entity creation date |
last_update_date | DATE | ROOT | Entity last update |
state | STRING | ROOT | Entity state |
index_0 | INTEGER | INDEX | Position in prizes array |
index_1 | INTEGER | INDEX | Position in laureates array |
id | STRING | DATA | From .id |
firstname | STRING | DATA | From .firstname |
surname | STRING | DATA | From .surname |
motivation | STRING | DATA | From .motivation |
share | TINYINT | DATA | From .share |
Table 4: prizes_json (Special JSON table)
Section titled “Table 4: prizes_json (Special JSON table)”| Column Name | Data Type | Category | Description |
|---|---|---|---|
entity_id | UUID | SPECIAL | Entity identifier |
point_time | DATE | SPECIAL | Query time point |
creation_date | DATE | ROOT | Entity creation date |
last_update_date | DATE | ROOT | Entity last update |
state | STRING | ROOT | Entity state |
entity | STRING | SPECIAL | Complete JSON document |
Querying your data
Section titled “Querying your data”Example queries
Section titled “Example queries”1. Get all prizes from 2023:
SELECT * FROM prizes_prizes WHERE year = '2023';2. Find all laureates with their prize information:
SELECT p.year, p.category, l.firstname, l.surname, l.motivationFROM prizes_prizes pJOIN prizes_prizes_laureates l ON p.entity_id = l.entity_id AND p.index_0 = l.index_0;3. Retrieve the full JSON for a specific entity:
SELECT entity FROM prizes_json WHERE entity_id = '<uuid>';4. Query data as it existed at a specific time:
SELECT * FROM prizes_prizesWHERE point_time = TIMESTAMP '2024-01-01 00:00:00';Best practices
Section titled “Best practices”- Use index columns for joins: when joining tables from nested
arrays, always join on
entity_idand matching index columns. - Understand your schema: use the schema-generation API to see exactly what tables and columns are created for your data.
- Leverage the JSON table carefully: for complex queries or when
you need the full document, query the
_jsontable, but always includeentity_idin the WHERE clause for performance. - Filter by
entity_id: for better performance, includeentity_idin your WHERE clause when possible, especially when querying the JSON table. - Use
point_timewisely: only specifypoint_timewhen you need historical data; omit it for current data.
Schema management
Section titled “Schema management”You can create any number of SQL schemas, each with a different set of tables and columns.
Via the HTTP API
Section titled “Via the HTTP API”You can manage and inspect your SQL schemas using the REST API:
- Create default schema:
PUT /sql/schema/putDefault/{schemaName} - Get schema:
GET /sql/schema/{schemaId} - List all schemas:
GET /sql/schema/listAll - Generate tables from entity model:
GET /sql/schema/genTables/{entityModelId} - Update tables:
POST /sql/schema/updateTables/{entityModelId}
For the full grammar, see the REST API reference.
Using the Cyoda UI
Section titled “Using the Cyoda UI”It is very straightforward to create SQL schemas in the Cyoda UI. Once logged in, navigate to the Trino/SQL menu — you can create and configure new schemas, or edit existing ones.
Connecting via JDBC
Section titled “Connecting via JDBC”The JDBC connection string follows this pattern:
jdbc:trino://trino-client-<caas_user_id>.eu.cyoda.net:443/cyoda/<your_schema>where caas_user_id is your CAAS user ID and your_schema is the
schema name you configured.
For authentication credentials and technical-user setup, see Authentication and identity.
Gaps in this reference
Section titled “Gaps in this reference”The following are intentionally not yet specified here and are tracked as upstream asks:
- Supported SQL dialect scope — which Trino features are guaranteed supported versus best-effort.
- Push-down matrix — which predicates, projections, and aggregates execute in the underlying store versus requiring a full scan.
- Consistency / isolation of long-running queries relative to concurrent transition writes.
- Performance envelope — rows/sec scan rates, partitioning, and per-tenant query limits.
See the cyoda-go issue tracker for progress.