Also at Deasil Works · txn2 · Plexara
Profiles GitHub · X · LinkedIn
Theme Light · Auto · Dark
Professional notes by Craig Johnston
long-form, short-form, working drafts · since 2008
VOL. XIX · MMXXVI
119 NOTES IN PRINT
FOLIO CXIX 2026-07-05 · 9 MIN · LONG-FORM

Your Own Snowflake: A Trino and Iceberg Lakehouse

SQL analytics over open table formats on the object storage you own

Diagram · folio cxix
sequenceDiagram
  participant C as SQL client / BI
  participant T as Trino
  participant J as Iceberg JDBC catalog (Postgres)
  participant S as SeaweedFS S3 (bucket lake)
  C->>T: SQL query
  T->>J: resolve table to metadata location
  J-->>T: Iceberg metadata pointer
  T->>S: read Iceberg metadata + Parquet
  S-->>T: data files
  T-->>C: result set

A platform’s analytical questions, the roll-ups, joins, and slices over years of data, want a warehouse. The cloud answer is Snowflake or BigQuery: rent the storage and the compute, hand over your data, and pay on both meters forever. The open answer is a lakehouse, and it does the same job on infrastructure you own. SQL engine on top, open table format in the middle, your object storage underneath, with the catalog living in the Postgres the platform already runs.

This series rebuilds my 2020 Apress book, Advanced Platform Development with Kubernetes, for 2026. The approach behind it comes from building and running data platforms in production for more than twenty years.

§What a Lakehouse Is, and Why It Beats a Warehouse Vendor

The 2020 edition built its warehouse on Hive and Presto. Presto became Trino, Hive tables became Apache Iceberg, and the pattern got a name: the lakehouse.

A data warehouse vendor couples storage and compute and keeps your data in a format only it can read. That coupling is the business model: your data goes in, and getting it out, or querying it with anything but their engine, ranges from hard to impossible. You pay on the storage meter and the compute meter, forever, and the switching cost climbs with every table you add. A lakehouse takes those apart. Apache Iceberg is an open table format that gives you the things a warehouse provides while the actual data sits as ordinary Parquet files in object storage that you own.

Iceberg’s guarantees are the ones that used to require a warehouse vendor. ACID transactions, so concurrent writers do not corrupt a table and a reader always sees a consistent snapshot. Schema evolution, so you add, drop, or rename a column without rewriting the data. Hidden partitioning, so the table partitions itself and queries do not need to know how. And time travel, so every write creates a snapshot you can query or roll back to. Trino is the SQL engine on top. Because the table format is open, the engine is swappable; Spark, Flink, or another Trino reads the same tables. You run one out of parts you control, and storage and compute scale independently, which is the trick that made Snowflake worth its valuation.

§The Pieces

Three things make the lakehouse, and you already have two. The object store is SeaweedFS, holding the Parquet data and Iceberg metadata in the lake bucket. The catalog, the registry of which tables exist and where their metadata lives, runs in Postgres through Iceberg’s JDBC catalog, so there is no separate catalog service to operate. Trino is the only new component: the query engine, configured to use both.

Worth noting what is absent. The 2020 warehouse needed a Hive Metastore running on its own MySQL, and Hadoop libraries underneath it, just to track where tables lived, a whole subsystem with its own operational weight. Iceberg’s JDBC catalog replaces all of that with a few tables in the Postgres you already run. No Hive, no Hadoop, no separate metastore service. The catalog is just another consumer of the database spine, and it gets backed up with everything else in it.

§Configure Trino

Trino reads plain properties files. Two ConfigMaps hold them: the server config, and the Iceberg catalog definition.

apiVersion: v1
kind: ConfigMap
metadata:
  name: trino-config
  namespace: data
data:
  node.properties: |
    node.environment=platform
    node.id=trino-coordinator
    node.data-dir=/data/trino
  config.properties: |
    coordinator=true
    node-scheduler.include-coordinator=true
    http-server.http.port=8080
    discovery.uri=http://localhost:8080
  jvm.config: |
    -server
    -Xmx2G
    -XX:+UseG1GC
    -XX:+ExitOnOutOfMemoryError

The catalog file points Iceberg’s JDBC catalog at Postgres and its file system at SeaweedFS. Passwords come from the environment, not the file, so no secret lands in a ConfigMap.

apiVersion: v1
kind: ConfigMap
metadata:
  name: trino-catalog
  namespace: data
data:
  iceberg.properties: |
    connector.name=iceberg
    iceberg.catalog.type=jdbc
    iceberg.jdbc-catalog.catalog-name=platform
    iceberg.jdbc-catalog.driver-class=org.postgresql.Driver
    iceberg.jdbc-catalog.connection-url=jdbc:postgresql://platform-pg-rw.data:5432/app
    iceberg.jdbc-catalog.connection-user=app
    iceberg.jdbc-catalog.connection-password=${ENV:PG_PASSWORD}
    iceberg.jdbc-catalog.default-warehouse-dir=s3://lake/warehouse
    fs.native-s3.enabled=true
    s3.endpoint=http://seaweedfs.storage:8333
    s3.region=us-east-1
    s3.path-style-access=true
    s3.aws-access-key=platform
    s3.aws-secret-key=${ENV:S3_SECRET_KEY}

Put the SeaweedFS secret key where Trino can read it as an environment variable.

kubectl -n data create secret generic trino-s3 \
  --from-literal=secret-key=REPLACE_WITH_SEAWEEDFS_SECRET

The Deployment mounts the config files by name (so the image’s other defaults survive) and injects the two passwords from secrets, the Postgres password from the cluster’s app secret, the S3 key from the one above.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: trino
  namespace: data
spec:
  replicas: 1
  selector:
    matchLabels:
      app: trino
  template:
    metadata:
      labels:
        app: trino
    spec:
      containers:
        - name: trino
          image: trinodb/trino:471   # pin a current version
          ports:
            - containerPort: 8080
          env:
            - name: PG_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: platform-pg-app
                  key: password
            - name: S3_SECRET_KEY
              valueFrom:
                secretKeyRef:
                  name: trino-s3
                  key: secret-key
          volumeMounts:
            - { name: config, mountPath: /etc/trino/node.properties, subPath: node.properties }
            - { name: config, mountPath: /etc/trino/config.properties, subPath: config.properties }
            - { name: config, mountPath: /etc/trino/jvm.config, subPath: jvm.config }
            - { name: catalog, mountPath: /etc/trino/catalog/iceberg.properties, subPath: iceberg.properties }
            - { name: data, mountPath: /data/trino }
      volumes:
        - name: config
          configMap: { name: trino-config }
        - name: catalog
          configMap: { name: trino-catalog }
        - name: data
          emptyDir: {}
---
apiVersion: v1
kind: Service
metadata:
  name: trino
  namespace: data
spec:
  selector:
    app: trino
  ports:
    - { name: http, port: 8080, targetPort: 8080 }
kubectl apply -f trino-config.yaml -f trino-catalog.yaml -f trino.yaml
kubectl -n data rollout status deployment/trino

This is a single-node Trino, coordinator and worker in one, which is right for development. Add worker replicas with coordinator=false to scale compute; the storage underneath does not change.

§Query the Lakehouse

The Trino image bundles its CLI. Exec in and create a schema, a table, and some data. The table is Iceberg, its files Parquet, all landing in the lake bucket.

kubectl -n data exec -it deploy/trino -- trino
CREATE SCHEMA iceberg.analytics
  WITH (location = 's3://lake/warehouse/analytics');

CREATE TABLE iceberg.analytics.events (
  id      BIGINT,
  level   VARCHAR,
  message VARCHAR,
  ts      TIMESTAMP(6)
) WITH (format = 'PARQUET');

INSERT INTO iceberg.analytics.events VALUES
  (1, 'info', 'platform online', current_timestamp),
  (2, 'warn', 'disk at 80 percent', current_timestamp);

SELECT level, count(*) AS n
FROM iceberg.analytics.events
GROUP BY level;
 level | n
-------+---
 info  | 1
 warn  | 1
(2 rows)

That SELECT ran SQL over Parquet files in your object store, with Iceberg tracking the table and Postgres holding the catalog. List the bucket and the data is right there:

aws --endpoint-url http://seaweedfs.storage:8333 s3 ls s3://lake/warehouse/analytics/ --recursive
... analytics/events/data/<uuid>.parquet
... analytics/events/metadata/00000-<uuid>.metadata.json

§The Features a Warehouse Charges For

The lakehouse has the warehouse capabilities, and they are worth seeing because they are why people pay Snowflake. Every write to an Iceberg table creates a snapshot, and Trino can query the table as of any of them. Insert more rows, then travel back to before you did.

-- the snapshots Iceberg has recorded for this table
SELECT committed_at, snapshot_id FROM iceberg.analytics."events$snapshots";

-- query the table as it was at a past moment
SELECT count(*) FROM iceberg.analytics.events
  FOR TIMESTAMP AS OF TIMESTAMP '2026-07-05 18:00:00 UTC';

Schema evolution is just as direct, and it does not rewrite the data. Add a column and old files keep working, reading the new column as null until something writes it.

ALTER TABLE iceberg.analytics.events ADD COLUMN source VARCHAR;

Time travel turns a bad load into an undo instead of a restore-from-backup, and schema changes stop being migration projects. These are the warehouse features, on your tables, in your object storage.

§Federate Across the Platform

Here is the capability that made the original book’s warehouse chapter end on a high note, and Trino still does it better than most paid warehouses: one SQL query across different systems. Trino is not limited to Iceberg. Each data source is a catalog, and a query can join across them. Add a catalog for the operational Postgres alongside the Iceberg one.

data:
  postgresql.properties: |
    connector.name=postgresql
    connection-url=jdbc:postgresql://platform-pg-rw.data:5432/app
    connection-user=app
    connection-password=${ENV:PG_PASSWORD}

Now a single statement joins analytical data in Iceberg against live rows in Postgres, no ETL to move one into the other first.

SELECT e.level, count(*) AS events, max(s.created) AS last_signup
FROM iceberg.analytics.events e
JOIN postgresql.public.signups s ON s.id = e.id
GROUP BY e.level;

The same pattern adds the OpenSearch cluster, or the Cassandra cluster from a later post, as further catalogs, so Trino becomes the single SQL surface over the whole platform: the lakehouse for history, Postgres for operational state, OpenSearch for the big aggregations, joined in one query. That federation, querying everything in place rather than copying it all into one vendor’s warehouse first, is a genuine advantage over the cloud product.

§Operating the Lakehouse

A cloud warehouse hides its housekeeping inside the bill. Own the lakehouse and the housekeeping is yours, and Iceberg exposes it as plain SQL.

Scale compute by adding workers. Trino’s coordinator plans queries and workers execute them, and workers are stateless: add replicas with coordinator=false and queries get faster, with no change to the storage underneath. That independent scaling of compute against fixed storage is the Snowflake trick, running on your own nodes.

Maintain the tables. Streaming and frequent writes leave Iceberg tables with many small files and old snapshots, which slow queries and waste space. Three maintenance procedures, run on a schedule, keep them healthy: compact small files, expire old snapshots, and delete orphaned files left by failed writes.

ALTER TABLE iceberg.analytics.events EXECUTE optimize;
ALTER TABLE iceberg.analytics.events EXECUTE expire_snapshots(retention_threshold => '7d');
ALTER TABLE iceberg.analytics.events EXECUTE remove_orphan_files(retention_threshold => '7d');

Back it up for free. The lakehouse’s state is the data in SeaweedFS and the catalog in Postgres, and both are already backed up by the object-store and database backups from earlier posts. There is no separate warehouse to snapshot; protecting the spine protects the lakehouse.

Survive big queries. For long analytical jobs that should not die on a single worker failure, Trino’s fault-tolerant execution (retry-policy=TASK) spools intermediate results to object storage and retries failed tasks, trading some speed for resilience on the queries that run for many minutes.

§When Something Is Wrong

Trino starts but the Iceberg catalog fails. It cannot reach Postgres, or the credentials are wrong. Confirm platform-pg-rw.data:5432 resolves and the PG_PASSWORD env matches the cluster’s app secret; the catalog tables live in that database.

Queries fail reading or writing S3. Almost always endpoint or addressing. SeaweedFS needs s3.path-style-access=true and the in-cluster endpoint; a wrong endpoint or virtual-host addressing produces access or not-found errors that look like permissions but are routing.

A worker is killed by the OOM killer. The JVM -Xmx is too large for the pod’s memory limit, or a query is too big for the cluster. Set the heap and the pod limit to agree, and add workers for genuinely large queries rather than a bigger single one.

Table metadata not found after a write elsewhere. The JDBC catalog is the source of truth; an engine that wrote without using the same catalog left tables Trino does not know about. Keep every engine pointed at the one catalog.

§What You Have

A working lakehouse: Trino querying open Iceberg tables stored as Parquet in object storage you own, cataloged in your own Postgres, all in a handful of manifests. It does what Snowflake does: separates storage from compute, runs standard SQL, scales the two independently, without the vendor, the bill, or the data held in a format you cannot read without them. Point any BI tool at the Trino service and you have analytics.

For multi-engine setups where Spark and Trino share the same tables, swap the JDBC catalog for an Iceberg REST catalog; the storage and table format stay exactly the same. With the data backend now built end to end, the next posts turn to the tools people actually work in, and the first thing those tools need is a way to log in. Next I stand up single sign-on with Keycloak, one identity for everything that follows.

← back to all notes