Also at Deasil Works · txn2 · Plexara
Profiles GitHub · X · LinkedIn
Theme Light · Auto · Dark
Professional notes by Craig Johnston
long-form, short-form, working drafts · since 2008
VOL. XIX · MMXXVI
118 NOTES IN PRINT
FOLIO CXVIII 2026-07-03 · 10 MIN · LONG-FORM

Self-Hosted S3 with SeaweedFS

S3-compatible object storage for the data lake, backups, and the lakehouse

Diagram · folio cxviii
flowchart LR
  APP["applications"] -->|S3| SW["SeaweedFS<br>S3 gateway"]
  BK["Postgres and Kafka backups"] -->|S3| SW
  LH["Trino + Iceberg lakehouse"] -->|S3| SW
  SW --> V[("PVC on rook-ceph-block")]

A data platform needs object storage: the S3-compatible bucket layer that holds a data lake’s raw files, the destination for database and stream backups, and the storage under the lakehouse tables coming next. The 2020 edition built this on MinIO. I would not start there today. Over 2025 MinIO took its own open-source edition apart, piece by piece, so this platform uses SeaweedFS, and it runs on the block storage the cluster already provides.

This series rebuilds my 2020 Apress book, Advanced Platform Development with Kubernetes, for 2026. The approach behind it comes from building and running data platforms in production for more than twenty years.

§Why Not MinIO

I want to be specific about this. MinIO did not get acquired or shut down. It deliberately hollowed out the free version to push people toward its paid product.

In June 2025 it stripped almost all management features out of the Community Edition’s web console, around 110,000 lines of code, and moved them, along with LDAP and OIDC login, into the commercial AIStor offering. In October 2025 it stopped publishing Docker images and prebuilt binaries for the community edition at all, so when a CVE landed that same month, community users could not pull a fix; they had to build from source. By December 2025 the open-source project was in maintenance mode: no new features, no pull-request reviews, no guaranteed security patches.

The AGPL is the part that actually burned me. I had a client self-hosting an AGPL MinIO who wanted me to run some of my own software in their stack. Read the license literally and it is not clear my code stays mine; the network-copyleft clause can be read to reach anything that talks to the server, not just code that links MinIO’s libraries, but code that merely calls its API over the wire. I asked MinIO directly for a plain clarification. Instead of answering, they told me to consult my own attorney. That is the exact mess I will not step into, and it is trivially avoided by not running their software.

What bothers me is not a company charging for software. Charge whatever you want. What is rotten is using “open source” as a marketing channel and then pulling the rug. A FOSS label recruits two things no commercial pitch can buy. It recruits contributors who send patches because they believe they are improving a shared project, not doing unpaid work for a private company’s product. And it recruits a whole ecosystem of advocates who write about your project, deploy it, and recommend it for free, precisely because it is open. Harvest both on the way up and then close the gate, and you have monetized goodwill you did not earn. That is why I treat the license as part of the architecture.

§SeaweedFS, and the Other Options

SeaweedFS is what I landed on, and it has outrun MinIO for my workloads by a wide margin, especially the ones that punish object stores: millions of small files. Its design gives roughly O(1) disk access per file no matter how many you have, which matters at scale. It is Apache 2.0, a single Go binary, and it speaks the S3 API, so anything built for MinIO or AWS S3 talks to it unchanged.

It is not the only liberally licensed choice, and you should pick for your workload:

  • SeaweedFS (Apache 2.0): my default, exceptional with huge numbers of small files.
  • Apache Ozone (Apache 2.0): built for Hadoop-scale analytics, billions of objects.
  • RustFS (Apache 2.0): a newer Rust implementation aimed squarely at the MinIO gap.
  • Ceph RGW (LGPL): you are already running Ceph for block storage in this platform, so its RADOS Gateway gives you S3 on the same cluster.

One worth flagging by counterexample: Garage is a common MinIO alternative, but it is AGPL, the same license that started this whole problem. “Alternative to MinIO” does not automatically mean you have escaped the licensing question, so read the license before adopting one.

§How SeaweedFS Is Built, and Why It Is Fast

The performance difference comes from the design, and understanding it tells you when SeaweedFS is the right call. It has four roles. The master tracks where data lives and assigns writes to volumes. The volume servers hold the actual bytes. The filer keeps the directory tree and file metadata, and serves the POSIX and S3 views. The S3 gateway translates S3 API calls onto the filer.

The trick is in how volume servers store files. Most object stores put each object somewhere on disk as its own file, and the filesystem’s own per-file overhead, the inode lookups, the directory entries, becomes the bottleneck once you have millions of them. SeaweedFS borrows Facebook’s Haystack design instead: it packs many small files as “needles” inside a handful of large volume files, and keeps the needle locations in memory. A read becomes a single disk seek to a known offset, roughly O(1) no matter how many files you store. That is why it outran MinIO for me on the workloads with millions of small objects, where a per-file-on-disk store spends most of its time in filesystem metadata. If your objects are few and large, most stores are fine; if they are many and small, this design matters.

§Deploy SeaweedFS

For the development platform here, the single weed server process runs the master, a volume server, the filer, and the S3 gateway together, which keeps this to a few plain manifests. Scale it out to dedicated masters and volume servers when you outgrow one node, which the operating section covers.

Give it a namespace and S3 credentials. The credentials go in a Secret SeaweedFS reads as its S3 identity config.

kubectl create namespace storage
apiVersion: v1
kind: Secret
metadata:
  name: seaweedfs-s3-config
  namespace: storage
stringData:
  s3.json: |
    {
      "identities": [
        {
          "name": "platform",
          "credentials": [
            { "accessKey": "platform", "secretKey": "REPLACE_WITH_A_STRONG_SECRET" }
          ],
          "actions": ["Admin", "Read", "Write"]
        }
      ]
    }

A headless Service governs the StatefulSet and gives clients a stable name to reach the S3 port.

apiVersion: v1
kind: Service
metadata:
  name: seaweedfs
  namespace: storage
spec:
  clusterIP: None
  selector:
    app: seaweedfs
  ports:
    - { name: s3, port: 8333 }
    - { name: filer, port: 8888 }
    - { name: master, port: 9333 }

The StatefulSet runs weed server with the S3 gateway on, its data on a rook-ceph-block volume, and the credentials mounted in.

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: seaweedfs
  namespace: storage
spec:
  serviceName: seaweedfs
  replicas: 1
  selector:
    matchLabels:
      app: seaweedfs
  template:
    metadata:
      labels:
        app: seaweedfs
    spec:
      containers:
        - name: seaweedfs
          image: chrislusf/seaweedfs:3.80   # pin a current version
          args:
            - server
            - -dir=/data
            - -s3
            - -s3.config=/etc/seaweedfs/s3.json
          ports:
            - { name: s3, containerPort: 8333 }
            - { name: filer, containerPort: 8888 }
            - { name: master, containerPort: 9333 }
            - { name: volume, containerPort: 8080 }
          volumeMounts:
            - { name: data, mountPath: /data }
            - { name: s3-config, mountPath: /etc/seaweedfs, readOnly: true }
      volumes:
        - name: s3-config
          secret:
            secretName: seaweedfs-s3-config
  volumeClaimTemplates:
    - metadata:
        name: data
      spec:
        accessModes: [ReadWriteOnce]
        storageClassName: rook-ceph-block
        resources:
          requests:
            storage: 100Gi
kubectl apply -f seaweedfs-secret.yaml -f seaweedfs-service.yaml -f seaweedfs-statefulset.yaml
kubectl -n storage rollout status statefulset/seaweedfs

§Use It Like S3

Because the API is S3, the ordinary AWS CLI is the client. Set the credentials and point it at the in-cluster endpoint (or port-forward 8333 and use http://localhost:8333).

export AWS_ACCESS_KEY_ID=platform
export AWS_SECRET_ACCESS_KEY=REPLACE_WITH_A_STRONG_SECRET
export S3=http://seaweedfs.storage:8333

aws --endpoint-url $S3 s3 mb s3://lake
echo "hello lake" | aws --endpoint-url $S3 s3 cp - s3://lake/hello.txt
aws --endpoint-url $S3 s3 ls s3://lake
make_bucket: lake
upload: <stdin> to s3://lake/hello.txt
2026-07-03 18:20:41         11 hello.txt

That creates a bucket, writes an object, and lists it through the standard S3 API. Anything in your stack that already knows how to talk to S3, and nearly everything does, now has somewhere to write.

§Wire Up the Platform’s Backups

Earlier posts pointed their backups at “the object store I stand up later.” This is later. Make the target real by creating a backups bucket and the credentials secret the Postgres cluster’s barmanObjectStore referenced.

aws --endpoint-url $S3 s3 mb s3://platform-backups

kubectl -n data create secret generic backup-creds \
  --from-literal=ACCESS_KEY_ID=platform \
  --from-literal=ACCESS_SECRET_KEY=REPLACE_WITH_A_STRONG_SECRET

With the bucket and secret in place, the Postgres ScheduledBackup from that post now runs, writing base backups and a continuous stream of WAL into s3://platform-backups/postgres, which is what makes point-in-time recovery possible. The OpenSearch snapshot repository registers against the same endpoint, so its indices snapshot here too.

curl -sk -u admin:<password> -X PUT "https://platform-os.search:9200/_snapshot/platform" \
  -H 'Content-Type: application/json' -d '{
  "type": "s3",
  "settings": {
    "endpoint": "seaweedfs.storage:8333",
    "protocol": "http",
    "bucket": "platform-backups",
    "base_path": "opensearch"
  }
}'

The backup discipline that ran through the cluster, storage, and database posts now has a single home: etcd snapshots, Postgres WAL, and OpenSearch indices, all landing in object storage you own, on the block storage you own. Nothing leaves your infrastructure to be recoverable.

§Operating the Object Store

A managed object store, S3 itself, gives you durability, scale, identities, and lifecycle without a thought. Own the store and those become settings you control.

Durability through replication. The single-server development setup keeps one copy of each object, which is fine for a lab and not for anything you cannot lose. SeaweedFS replicates with a three-digit code: the digits are copies in other data centers, other racks, and other servers, so 001 keeps a second copy on a different server, 010 on a different rack. Replication needs more than one volume server, which is the first reason to scale out.

Scale by splitting the tiers. The combined weed server is one process for convenience. To grow, you run the roles separately: a master (or a small odd-numbered group of them), several volume servers across nodes for capacity and replication, and filers for metadata throughput. You add volume servers as you add data, and the master spreads writes across them. Nothing about the S3 endpoint your applications use changes when you do this; they keep talking to the gateway.

Identities and buckets. Access is governed by the S3 identity config, the same s3.json the deployment mounted. You add an identity per consumer, the lakehouse, the backup user, an application, each with its own key pair and scoped actions, rather than sharing one admin key everywhere. Buckets are created on demand with aws s3 mb, and map to collections in the filer underneath.

Metadata at scale, and cold tiering. The filer’s metadata store defaults to an embedded database, and at scale you point it at an external one. The platform already runs Postgres, and SeaweedFS can use it as the filer store. For data that is rarely read, SeaweedFS can tier cold volumes out to a remote S3 and keep hot data local, so capacity is not all on your own disks. And it exposes Prometheus metrics, so the monitoring stack later in the series watches it like everything else.

§When Something Is Wrong

S3 calls return access denied. The credentials do not match the identity config. Confirm the access and secret keys you exported match an identity in s3.json, and that the identity grants the action you are attempting; a read-only identity cannot create a bucket.

Writes fail with no free volumes. The volume servers are full, or there is only one and the requested replication needs more. Add a volume server, or lower the replication setting for a single-server lab. The master log says plainly when it cannot place a write.

Replication does nothing. A replication code above 000 requires enough distinct volume servers to satisfy it. Asking for 001 with a single server leaves objects unreplicated; the setting is not ignored so much as unsatisfiable until you scale out.

The filer is slow under many small files. The embedded metadata store is the bottleneck, not the volume storage. Move the filer’s store to Postgres, where it belongs once the object count climbs.

§What It Is For

The rest of the platform depends on this bucket layer. The Postgres cluster’s continuous backups and the Kafka topic archives land here. It is the raw landing zone for a data lake, where files arrive before they are structured. And it is the storage under the open lakehouse in the next post, where Trino queries Apache Iceberg tables that live as files in these very buckets.

You now own the object store too, with no console behind a paywall, no AGPL license to interpret, and no vendor deciding your community edition is finished. Next I put it to work as an analytical warehouse, a Trino and Iceberg lakehouse that does what Snowflake does, on the storage you just stood up.

← back to all notes