Some questions are slow on a relational database no matter how you index them. The one I hit most: Top-K spenders by loyalty tier, bucketed by average spend, across hundreds of millions of rows of transactions spanning a decade. OpenSearch answers that in near real time with almost no special configuration, where a relational database needs crafted indexes and still strains. That, more than the full-text search it is known for, is why it earns a place in the platform. In the 2020 edition this layer was the ELK stack, Elasticsearch, Logstash, and Kibana; it is OpenSearch now, standing up on the storage and behind the gateway the platform already has.
This series rebuilds my 2020 Apress book, Advanced Platform Development with Kubernetes, for 2026. The approach behind it comes from building and running data platforms in production for more than twenty years.
§Why OpenSearch and Not Elasticsearch
The reason is licensing, and it is the cleanest example of why this whole series insists on liberal open source. In 2021 Elastic relicensed Elasticsearch and Kibana away from Apache 2.0 to the SSPL and the Elastic License, source-available terms that are not open source. AWS and the community forked the last Apache-2.0 release into OpenSearch, which carries on under Apache 2.0 with the engine and a Kibana fork called OpenSearch Dashboards. The project has since moved under the OpenSearch Software Foundation at the Linux Foundation, which is the governance maturity you want: a neutral foundation rather than a single company that can change the terms.
I moved to OpenSearch and have not looked back. The deciding factor was the governance, not any single feature: the terms cannot be revoked. The same story repeats across this series with MinIO and a few others. Choosing the liberally licensed fork is risk management. You do not build a platform’s search layer on a license someone else can revoke.
§Aggregation at Scale, Not Just Search
OpenSearch is famous for full-text search, and it is good at it, but that is not why it is in this platform. The reason is aggregation over enormous datasets, and it is worth understanding why it wins there, because it shapes when you should reach for it.
A relational database stores rows and answers a GROUP BY over hundreds of millions of them by scanning and sorting, which you make tolerable with an index built for that exact query shape. Change the question, the grouping, the filter, the time window, and you often need a different index. OpenSearch stores data the other way around. Its inverted index and column-oriented doc values let it filter to the relevant documents instantly and then roll them up, fanning the work out across shards on every node and merging the result at a coordinating node. Ask for average spend per loyalty tier across a decade of transactions and it answers in near real time, then ask a completely different aggregation of the same data and it answers that one too, without a new index built for it. That flexibility, answering analytical questions that change shape, fast, over huge volumes without bespoke index engineering, is the capability a relational database struggles to match at scale.
So the rule of thumb for this platform: Postgres is the system of record, OpenSearch is where you put data you need to slice and aggregate every which way at volume, and the full-text search comes along for free. A note on ingestion before building it: where the old stack used Logstash, the modern OpenSearch path is Data Prepper or Fluent Bit, and the natural pattern here is pulling straight from the Kafka topics into an index, which I wire up below.
§Install the OpenSearch Operator
This series is manifest-first, and I avoid Helm except where a chart is genuinely the canonical install. The OpenSearch Kubernetes operator is one of those places. Running OpenSearch by hand means wiring up its security plugin, node certificates, internal users, and discovery, exactly the error-prone work you do not want to hand-roll. The operator handles all of it and exposes a clean OpenSearchCluster resource, and its documented install is the Helm chart. So Helm installs the operator, once, and everything after is plain manifests.
helm repo add opensearch-operator https://opensearch-project.github.io/opensearch-k8s-operator/
helm repo update
helm install opensearch-operator opensearch-operator/opensearch-operator \
--namespace opensearch-operator-system --create-namespace
§Declare the Cluster
Give search its own namespace and declare an OpenSearchCluster: three nodes acting as cluster managers and data nodes, each on a rook-ceph-block volume, with Dashboards enabled. (Field names follow the operator’s user guide; check it for the full spec and a current 3.x version.)
kubectl create namespace search
apiVersion: opensearch.opster.io/v1
kind: OpenSearchCluster
metadata:
name: platform-os
namespace: search
spec:
general:
serviceName: platform-os
version: "3.1.0" # a current OpenSearch 3.x release
dashboards:
enable: true
version: "3.1.0"
replicas: 1
nodePools:
- component: nodes
replicas: 3
diskSize: "50Gi"
roles:
- cluster_manager
- data
persistence:
pvc:
storageClassName: rook-ceph-block
accessModes:
- ReadWriteOnce
kubectl apply -f opensearch-cluster.yaml
The operator provisions the nodes, generates the TLS and security configuration, forms the cluster, and brings up a Dashboards pod alongside it.
§Verify
kubectl -n search get pods
NAME READY STATUS RESTARTS AGE
platform-os-nodes-0 1/1 Running 0 4m
platform-os-nodes-1 1/1 Running 0 4m
platform-os-nodes-2 1/1 Running 0 4m
platform-os-dashboards-... 1/1 Running 0 3m
The operator stores the generated admin password in a secret. Pull it, then ask the cluster for its health from a temporary pod on the cluster network.
kubectl -n search get secret platform-os-admin-password \
-o jsonpath='{.data.password}' | base64 -d
kubectl -n search run curl -ti --rm --restart=Never --image=curlimages/curl -- \
curl -sk -u admin:<password> https://platform-os:9200/_cluster/health?pretty
{
"cluster_name": "platform-os",
"status": "green",
"number_of_nodes": 3,
"active_primary_shards": 1,
"active_shards": 2
}
green across three nodes means the search layer is live.
§Index and Aggregate
OpenSearch is a REST API. Index a few transactions, then run the kind of aggregation that is the reason it is here: average spend per loyalty tier, ranked, in a single query.
# index a handful of transactions
curl -sk -u admin:<password> -X POST https://platform-os:9200/transactions/_bulk \
-H 'Content-Type: application/x-ndjson' --data-binary '
{"index":{}}
{"loyalty":"gold","amount":120.50}
{"index":{}}
{"loyalty":"gold","amount":98.00}
{"index":{}}
{"loyalty":"silver","amount":42.25}
{"index":{}}
{"loyalty":"bronze","amount":15.00}
'
# top loyalty tiers by average spend
curl -sk -u admin:<password> "https://platform-os:9200/transactions/_search?pretty" \
-H 'Content-Type: application/json' -d '{
"size": 0,
"aggs": {
"by_loyalty": {
"terms": { "field": "loyalty.keyword", "size": 10, "order": { "avg_spend": "desc" } },
"aggs": { "avg_spend": { "avg": { "field": "amount" } } }
}
}
}'
The response comes back with a bucket per loyalty tier, each carrying its average spend, sorted highest first:
"by_loyalty": {
"buckets": [
{ "key": "gold", "doc_count": 2, "avg_spend": { "value": 109.25 } },
{ "key": "silver", "doc_count": 1, "avg_spend": { "value": 42.25 } },
{ "key": "bronze", "doc_count": 1, "avg_spend": { "value": 15.00 } }
]
}
This is trivial at four rows. The point is that the same query answers the same way over hundreds of millions of rows in near real time, with no purpose-built index behind it. That is the capability a relational database struggles to match at scale.
§Ingest From Kafka
Hand-indexing with curl proves the API; a platform feeds the index from its event backbone. Data Prepper is the OpenSearch project’s ingestion engine, the Logstash replacement, and it reads from a Kafka topic and writes to an index with a small pipeline definition. Run it as a Deployment with this pipeline in a ConfigMap, pointed at the events topic from the Kafka post and the cluster here.
# data-prepper pipeline
events-pipeline:
source:
kafka:
bootstrap_servers: ["platform-kafka-kafka-bootstrap.kafka:9092"]
topics:
- name: events
group_id: data-prepper
sink:
- opensearch:
hosts: ["https://platform-os.search:9200"]
username: admin
password: ${OS_PASSWORD}
index: "events-%{yyyy.MM.dd}"
Every event produced to Kafka now lands in a daily events-* index, queryable and aggregatable the moment it arrives. The event backbone and the search layer are wired together, each on storage you own.
§Expose Dashboards Through the Gateway
OpenSearch Dashboards is a web UI on port 5601. Put it behind the Gateway with an HTTPRoute, the same way any service gets a hostname and TLS. The Gateway’s listeners already allow routes from any namespace, so this route in search attaches cleanly.
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: opensearch-dashboards
namespace: search
spec:
parentRefs:
- name: platform-gateway
namespace: gateway
hostnames:
- "search.apk8s.dev"
rules:
- backendRefs:
- name: platform-os-dashboards
port: 5601
With DNS pointed at the Gateway, https://search.apk8s.dev serves Dashboards over the auto-renewing certificate cert-manager issued, and you log in with the admin credentials from the secret above.
§Operating the Search Layer
A managed search service quietly ages out old data, scales the cluster, snapshots it, and manages who can read what. Own it and OpenSearch does all of that, much of it declaratively through the operator’s resources.
Age data out automatically. A search or log index grows without bound unless something retires old data, and doing it by hand is how you wake up to a full disk. OpenSearch handles it with Index State Management: a policy that rolls an index over at a size or age, moves it through hot and warm stages, and deletes it after a retention window. For the daily events-* indices, a policy that deletes anything older than thirty days keeps the cluster bounded with no cron job and no babysitting.
{
"policy": {
"description": "roll daily, delete after 30 days",
"default_state": "hot",
"states": [
{ "name": "hot", "transitions": [{ "state_name": "delete", "conditions": { "min_index_age": "30d" } }] },
{ "name": "delete", "actions": [{ "delete": {} }] }
]
}
}
The operator also exposes OpensearchISMPolicy, OpensearchUser, and OpensearchRole as custom resources, so retention policies, users, and their permissions are manifests in version control rather than API calls against a live cluster, the same declarative posture as topics and users in Kafka.
Scale it. Running low on capacity is raising replicas in the node pool. New data nodes join and OpenSearch rebalances shards onto them automatically, no manual reassignment, because shard movement is what the engine does natively.
Snapshot it. OpenSearch snapshots to an S3-compatible repository, which is the SeaweedFS object store from the next post. You register the repository once and schedule snapshots, and a restore rebuilds an index or the whole cluster. It is the same own-your-backups discipline as etcd, Postgres, and Ceph: the data lives on your infrastructure, so the backups do too.
See it. The operator publishes Prometheus metrics for cluster health, query latency, and indexing rate, scraped by the monitoring stack later in the series. Cluster status going from green to yellow is the kind of thing you want a dashboard to catch before a user does.
§When Something Is Wrong
A node crashes on startup with a vm.max_map_count error. OpenSearch needs the host kernel setting vm.max_map_count=262144, higher than the default, for its memory-mapped files. The operator sets it with a privileged init container by default; if that is disabled on your nodes, set it in the node sysctl the same way the cluster build configured the other kernel parameters. This is the single most common first-run failure.
The cluster is yellow. Replica shards are unassigned, usually because a replica cannot be placed on a different node than its primary and there are not enough nodes, or a node is down. With three nodes and default replication this should be green; yellow after a node loss is OpenSearch protecting you, and it recovers when the node returns.
Indices suddenly go read-only and reject writes. The disk-based flood-stage watermark. When a node’s disk passes about 95 percent, OpenSearch blocks writes to protect the cluster, setting index.blocks.read_only_allow_delete. Free space, or grow the Ceph volume, then clear the block. The real fix is the ISM retention policy above, so disks never get there.
A node is killed by the OOM killer. The JVM heap is too large or too small for the pod’s memory limit. OpenSearch wants its heap at roughly half the container memory and no more than about 32 GiB; set the node pool’s resources and heap together so they agree.
§What You Have
A three-node OpenSearch cluster with Dashboards, fed from Kafka, on your own replicated storage, behind your own gateway with TLS, managed by an operator so the security and certificate machinery is correct without you hand-writing it. Old data ages itself out, the cluster scales by a number, and it snapshots to storage you own. It is the index from the platform’s toolkit, strongest at the aggregation-at-scale work that pushed me to it, on a fork no vendor can relicense out from under me.
Next I give the platform an object store for the data lake and for backups, SeaweedFS, after MinIO took the same license turn Elasticsearch did.