﻿# search — entity search API

Search operates against a specific entity model `(entityName, modelVersion)`. Two modes are supported:

<em>cyoda-go version <a href="https://github.com/Cyoda-platform/cyoda-go/releases/tag/v0.6.2">0.6.2</a></em>

# search

## NAME

search — entity search API: synchronous direct search, asynchronous snapshot search, and entity statistics.

## SYNOPSIS

```
POST   /api/search/direct/{entityName}/{modelVersion}
POST   /api/search/async/{entityName}/{modelVersion}
GET    /api/search/async/{jobId}
GET    /api/search/async/{jobId}/status
PUT    /api/search/async/{jobId}/cancel
```

Context path prefix is `CYODA_CONTEXT_PATH` (default `/api`). All endpoints require `Authorization: Bearer <token>` except when `CYODA_IAM_MODE=mock`.

## DESCRIPTION

Search operates against a specific entity model `(entityName, modelVersion)`. Two modes are supported:

**Synchronous (direct) search**: `POST /search/direct/{entityName}/{modelVersion}`. Executes inline within the HTTP request. The response is an NDJSON stream (`application/x-ndjson`), one entity envelope per line. The default result limit is 1000 entities per request; the maximum is 10000 (values above 10000 are clamped to 10000).

**Asynchronous search**: `POST /search/async/{entityName}/{modelVersion}`. Submits a search job and returns a job UUID immediately. The search executes in a background goroutine (or in the plugin's own executor for `SelfExecutingSearchStore` plugins). Results are retrieved by polling status and then fetching pages.

Both modes accept the same `Condition` DSL as the request body. When the storage plugin implements `spi.Searcher`, the condition is translated to a plugin-level predicate and pushed down to the backend. When translation fails (unsupported condition type) or an active transaction is present, the service falls back to in-memory filtering after a full `GetAll` scan.

## CONDITION DSL

All search requests accept a `Condition` JSON document as the POST body. Conditions are parsed recursively up to a maximum nesting depth of 50. Body size limit: 10 MiB.

**SimpleCondition** — match a single JSON path against a scalar value:

```json
{
  "type": "simple",
  "jsonPath": "$.category",
  "operatorType": "EQUALS",
  "value": "physics"
}
```

- `type`: `"simple"`
- `jsonPath`: JSONPath string (e.g., `"$.year"`, `"$.laureates[0].firstname"`)
- `operatorType` (also accepted as `operator` or `operation`): operator string (see valid values below)
- `value`: any JSON scalar

**Valid `operatorType` values** (exhaustive):
- `EQUALS` — exact equality; numeric-aware (JSON number vs string representation)
- `NOT_EQUAL` — inequality; inverse of EQUALS
- `GREATER_THAN` — numeric or lexicographic greater-than
- `LESS_THAN` — numeric or lexicographic less-than
- `GREATER_OR_EQUAL` — greater-than or equal
- `LESS_OR_EQUAL` — less-than or equal
- `CONTAINS` — substring or array-element containment
- `NOT_CONTAINS` — inverse of CONTAINS
- `STARTS_WITH` — string prefix match
- `NOT_STARTS_WITH` — inverse of STARTS_WITH
- `ENDS_WITH` — string suffix match
- `NOT_ENDS_WITH` — inverse of ENDS_WITH
- `LIKE` — SQL-style LIKE pattern (`%` = any sequence, `_` = any single char)
- `IS_NULL` — field is absent or JSON null
- `NOT_NULL` — field is present and not JSON null
- `BETWEEN` — range check (exclusive bounds); `value` must be a two-element array `[low, high]`
- `BETWEEN_INCLUSIVE` — range check (inclusive bounds); same `value` shape as BETWEEN
- `MATCHES_PATTERN` — regular expression match
- `IEQUALS` — case-insensitive EQUALS
- `INOT_EQUAL` — case-insensitive NOT_EQUAL
- `ICONTAINS` — case-insensitive CONTAINS
- `INOT_CONTAINS` — case-insensitive NOT CONTAINS
- `ISTARTS_WITH` — case-insensitive STARTS_WITH
- `INOT_STARTS_WITH` — case-insensitive NOT STARTS_WITH
- `IENDS_WITH` — case-insensitive ENDS_WITH
- `INOT_ENDS_WITH` — case-insensitive NOT ENDS_WITH

Operator strings outside this list are rejected with `errors.BAD_REQUEST` at request time; the error detail includes the canonical list.

**LifecycleCondition** — match entity lifecycle metadata:

```json
{
  "type": "lifecycle",
  "field": "state",
  "operatorType": "EQUALS",
  "value": "APPROVED"
}
```

- `type`: `"lifecycle"`
- `field`: `"state"`, `"creationDate"`, or `"previousTransition"`
- `operatorType` (also accepted as `operator` or `operation`): operator string — same valid values as for `SimpleCondition`
- `value`: any JSON scalar

**GroupCondition** — combine conditions with a logical operator:

```json
{
  "type": "group",
  "operator": "AND",
  "conditions": [
    { "type": "simple", "jsonPath": "$.year", "operatorType": "EQUALS", "value": "2024" },
    { "type": "lifecycle", "field": "state", "operatorType": "EQUALS", "value": "NEW" }
  ]
}
```

- `type`: `"group"`
- `operator`: `"AND"` or `"OR"` — these are the only supported values; any other string produces `errors.BAD_REQUEST` at match time ("unknown group operator")
- `conditions`: array of `Condition` objects (recursive; maximum nesting depth 50)

`"NOT"` is not supported. An `AND` group with an empty `conditions` array evaluates to `true` (vacuous conjunction). An `OR` group with an empty `conditions` array evaluates to `false` (vacuous disjunction).

**EMPTY CONDITION**: Submitting an empty body (`{}`) or a body with no `type` field as the top-level search condition is rejected with `errors.BAD_REQUEST` — the parser requires a valid `type` field. Submitting a valid `AND` group with an empty `conditions` array (`{"type":"group","operator":"AND","conditions":[]}`) is accepted and matches all entities — this is the correct way to retrieve all entities without filtering.

**ArrayCondition** — match positional values in a JSON array:

```json
{
  "type": "array",
  "jsonPath": "$.laureates",
  "values": ["John", null, "Hopfield"]
}
```

- `type`: `"array"`
- `jsonPath`: path to the array field
- `values`: positional values; `null` entries match any value at that index

**FunctionCondition** — server-side function predicate dispatched to a compute member:

```json
{
  "type": "function",
  "function": {
    "name": "my-criteria-fn",
    "config": {
      "calculationNodesTags": "approval-service",
      "attachEntity": true,
      "responseTimeoutMs": 30000
    }
  }
}
```

- `type`: `"function"`
- `function.name`: string — identifies the function; becomes `criteriaId` / `criteriaName` in the dispatch request; required for routing
- `function.config.calculationNodesTags`: string — comma-separated tags used to select a registered compute member; follows the same tag-intersection rules as processor dispatch
- `function.config.attachEntity`: boolean (optional, default `true`) — when `true`, the full entity payload is included in the dispatch request
- `function.config.responseTimeoutMs`: int64 (optional, default `30000`) — timeout in milliseconds

The function is dispatched as `EntityCriteriaCalculationRequest` to the matching compute member — see the `grpc` topic for the request/response shape. `FunctionCondition` cannot be translated to a storage-plugin pushdown filter; it always executes as a post-filter with in-memory entity loading.

## ENDPOINTS

**POST /api/search/direct/{entityName}/{modelVersion}** — Synchronous search

- `entityName` (path): string
- `modelVersion` (path): int32
- `pointInTime` (query, optional): RFC 3339 date-time — search against entity state at this instant
- `limit` (query, optional): string-encoded integer, clamped to maximum 10000; default 1000

Request body: `Condition` JSON document.

Response: `200 OK`, `Content-Type: application/x-ndjson`.

Each line is a complete entity envelope JSON object:

```
{"type":"ENTITY","data":{"category":"physics","year":"2024"},"meta":{"id":"74807f00-ed0d-11ee-a357-ae468cd3ed16","state":"NEW","creationDate":"2025-08-01T10:00:00.000000000Z","lastUpdateTime":"2025-08-01T10:00:00.000000000Z"}}
{"type":"ENTITY","data":{"category":"chemistry","year":"2023"},"meta":{"id":"89abc100-ed0d-11ee-a357-ae468cd3ed16","state":"APPROVED","creationDate":"2025-07-15T09:00:00.000000000Z","lastUpdateTime":"2025-07-20T14:00:00.000000000Z"}}
```

The stream is truncated on encode failure after the header has been sent; the client detects truncation via a connection error or incomplete last line.

**POST /api/search/async/{entityName}/{modelVersion}** — Submit async search job

- `entityName` (path): string
- `modelVersion` (path): int32
- `pointInTime` (query, optional): RFC 3339 — if not provided, the current time is captured at submission

Request body: `Condition` JSON document.

Response: `200 OK`, `application/json` — bare UUID string (job ID):

```
"a1b2c3d4-e5f6-11ee-9e63-ae468cd3ed16"
```

The job is stored with status `RUNNING`. For non-`SelfExecutingSearchStore` backends, a goroutine begins the search immediately using a background context derived from the submitting user's tenant context.

**GET /api/search/async/{jobId}/status** — Get async job status

- `jobId` (path): UUID

Response: `200 OK`, `application/json`:

```json
{
  "searchJobStatus": "SUCCESSFUL",
  "createTime": "2025-08-01T10:00:00.000000000Z",
  "entitiesCount": 42,
  "calculationTimeMillis": 145,
  "finishTime": "2025-08-01T10:00:00.145000000Z",
  "expirationDate": "2025-08-02T10:00:00.000000000Z"
}
```

- `searchJobStatus`: `"RUNNING"`, `"SUCCESSFUL"`, `"FAILED"`, or `"CANCELLED"`
- `createTime`: RFC 3339 with nanoseconds
- `entitiesCount`: total matching entities (0 while running)
- `calculationTimeMillis`: elapsed search time in milliseconds
- `finishTime`: RFC 3339 with nanoseconds; absent when status is `RUNNING`
- `expirationDate`: `createTime + 24h` — job results expire after this time

**GET /api/search/async/{jobId}** — Retrieve async job results (paginated)

- `jobId` (path): UUID
- `pageSize` (query, optional): string-encoded integer, default `1000`
- `pageNumber` (query, optional): string-encoded integer, default `0`; offset = `pageNumber * pageSize`

The job must be in `SUCCESSFUL` status. Returns `400 BAD_REQUEST` if the job is not yet complete.

Response: `200 OK`, `application/json`:

```json
{
  "content": [
    {
      "type": "ENTITY",
      "data": { "category": "physics", "year": "2024" },
      "meta": {
        "id": "74807f00-ed0d-11ee-a357-ae468cd3ed16",
        "state": "NEW",
        "creationDate": "2025-08-01T10:00:00.000000000Z",
        "lastUpdateTime": "2025-08-01T10:00:00.000000000Z"
      }
    }
  ],
  "page": {
    "number": 0,
    "size": 1000,
    "totalElements": 42,
    "totalPages": 1
  }
}
```

Results are fetched from the stored entity snapshots at the job's `pointInTime`. Entities deleted or modified after submission are returned as they existed at submission time.

**PUT /api/search/async/{jobId}/cancel** — Cancel a running async job

- `jobId` (path): UUID

Cancellation succeeds only when the job status is `RUNNING`. If the job has already reached a terminal state (`SUCCESSFUL`, `FAILED`, or `CANCELLED`), the server returns `400 Bad Request`:

```json
{
  "detail": "snapshot by id=<jobId> is not running. current status=SUCCESSFUL",
  "properties": {
    "currentStatus": "SUCCESSFUL",
    "snapshotId": "<jobId>"
  },
  "status": 400,
  "title": "Bad Request",
  "type": "about:blank"
}
```

On successful cancellation, response: `200 OK`, `application/json`:

```json
{
  "isCancelled": true,
  "cancelled": true,
  "currentSearchJobStatus": "CANCELLED"
}
```

## PAGINATION

Async search results use page-number pagination: `pageNumber=0` is the first page, `offset = pageNumber * pageSize`. `pageNumber` and `pageSize` are both string-encoded integers in query parameters.

Synchronous search does not paginate; use the `limit` parameter (max 10000) to bound results. For large datasets, use async search with page retrieval.

## ERRORS

- `errors.SEARCH_JOB_NOT_FOUND` — `404` — async job UUID does not exist.
- `errors.SEARCH_JOB_ALREADY_TERMINAL` — `400` — cancel attempted on a job that is already `SUCCESSFUL`, `FAILED`, or `CANCELLED`; error code in response is `BAD_REQUEST`
- `errors.SEARCH_RESULT_LIMIT` — result set exceeds configured limit
- `errors.SEARCH_SHARD_TIMEOUT` — per-shard search timeout exceeded (relevant for distributed backends)
- `errors.BAD_REQUEST` — `400` — malformed condition JSON, invalid limit/pageSize/pageNumber, result retrieval on non-SUCCESSFUL job, unknown async job ID in result retrieval

## EXAMPLES

**Synchronous search — match by field value:**

```
curl -s -X POST \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"type":"simple","jsonPath":"$.category","operatorType":"EQUALS","value":"physics"}' \
  "http://localhost:8080/api/search/direct/nobel-prize/1"
```

**Synchronous search — match by lifecycle state:**

```
curl -s -X POST \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"type":"lifecycle","field":"state","operatorType":"EQUALS","value":"APPROVED"}' \
  "http://localhost:8080/api/search/direct/nobel-prize/1"
```

**Synchronous search — AND group:**

```
curl -s -X POST \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "type": "group",
    "operator": "AND",
    "conditions": [
      {"type":"simple","jsonPath":"$.year","operatorType":"EQUALS","value":"2024"},
      {"type":"lifecycle","field":"state","operatorType":"EQUALS","value":"NEW"}
    ]
  }' \
  "http://localhost:8080/api/search/direct/nobel-prize/1"
```

**Synchronous search at point in time with limit:**

```
curl -s -X POST \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"type":"group","operator":"AND","conditions":[]}' \
  "http://localhost:8080/api/search/direct/nobel-prize/1?pointInTime=2025-08-01T00:00:00Z&limit=100"
```

**Submit async search:**

```
JOB_ID=$(curl -s -X POST \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"type":"simple","jsonPath":"$.year","operatorType":"EQUALS","value":"2024"}' \
  "http://localhost:8080/api/search/async/nobel-prize/1" | tr -d '"')
```

**Poll async job status:**

```
curl -s -H "Authorization: Bearer $TOKEN" \
  "http://localhost:8080/api/search/async/$JOB_ID/status"
```

**Retrieve async results (page 0):**

```
curl -s -H "Authorization: Bearer $TOKEN" \
  "http://localhost:8080/api/search/async/$JOB_ID?pageNumber=0&pageSize=500"
```

**Cancel an async job:**

```
curl -s -X PUT \
  -H "Authorization: Bearer $TOKEN" \
  "http://localhost:8080/api/search/async/$JOB_ID/cancel"
```

## SEE ALSO

- crud
- models
- analytics
- errors.SEARCH_JOB_NOT_FOUND
- errors.SEARCH_JOB_ALREADY_TERMINAL
- errors.SEARCH_RESULT_LIMIT
- errors.SEARCH_SHARD_TIMEOUT
- openapi

## See also

- [`cyoda help crud`](/help/crud/) — Entities are instances of models. Each entity has a UUID, a model reference (`entityName`, `modelVersion`), and a lifecycle state managed by the workflow engine. Creating an entity requires the referenced model to be in `LOCKED` state. All write operations run within a Cyoda transaction and return a `transactionId` alongside the affected entity IDs.
- [`cyoda help models`](/help/models/) — A model is a named, versioned schema registered per tenant. Every entity in the system is an instance of exactly one model. Models are identified by `(entityName, modelVersion)`. The model ID is a deterministic UUID v5 derived from that key: `UUID.newSHA1(NameSpaceURL, "{entityName}.{modelVersion}")`.
- [`cyoda help analytics`](/help/analytics/) — Cyoda Cloud exposes entity data as Trino SQL tables through a Trino connector. The connector uses the Schema Management REST API to discover table definitions and the WebSocket (STOMP) messaging API to stream entity rows at query time.
- [`cyoda help errors SEARCH_JOB_NOT_FOUND`](/help/errors/search_job_not_found/) — Polling a search job by ID returns this error when the job ID is unknown or belongs to a different tenant. Jobs are tenant-scoped; a valid job ID from one tenant is not visible to another.
- [`cyoda help errors SEARCH_JOB_ALREADY_TERMINAL`](/help/errors/search_job_already_terminal/) — Search jobs are long-running asynchronous operations. Once a job reaches a terminal state it cannot be cancelled, resumed, or otherwise modified. This error is returned when such an operation is attempted on a finished job.
- [`cyoda help errors SEARCH_RESULT_LIMIT`](/help/errors/search_result_limit/) — The server imposes an upper bound on the number of results returned per page and per job to protect cluster resources. Returned when the request exceeds this limit — either by requesting too large a page size or by the matched result count exceeding the cap.
- [`cyoda help errors SEARCH_SHARD_TIMEOUT`](/help/errors/search_shard_timeout/) — Distributed search fans out to multiple shards in parallel. If any shard does not return results before the search timeout expires, the job is marked failed and this error is returned. Occurs under high load, during partial cluster degradation, or with expensive queries.
- [`cyoda help openapi`](/help/openapi/) — cyoda-go generates its OpenAPI 3.1 specification from the embedded `api/openapi.yaml` file compiled into the binary at build time. The spec is served at `/openapi.json` with runtime-patched server URLs. The Scalar API Reference UI is served at `/docs` and loads the spec from `/openapi.json`.

## Raw formats

- [`/help/search.json`](/help/search.json) — full descriptor (matches `GET /help/{topic}` envelope)
- [`/help/search.md`](/help/search.md) — body only