How to Install Apache Iceberg on Docker: A Complete Guide

August 11, 2025

In this short article we will deep inside to Apache Iceberg installation on Docker. I hope this article will be useful for anyone who are seeking the information about Apache Iceberg.

1. Introduction to Apache Iceberg

Apache Iceberg is an open-source table format designed for large-scale analytics datasets. Originally created by Netflix, Iceberg allows you to manage massive data lakes with schema evolution, time travel queries, hidden partitioning, and ACID transactions.
By running Iceberg inside Docker, you can quickly test and develop without setting up complex infrastructure.

2. Why Use Docker for Apache Iceberg

Using Docker to deploy Apache Iceberg offers a streamlined, consistent, and portable environment that simplifies both setup and maintenance. Docker eliminates the complexities of manual configuration by packaging Iceberg and its dependencies into lightweight containers, ensuring the application runs identically across development, staging, and production environments.

Docker provides a lightweight, isolated environment that allows you to:

Deploy Apache Iceberg quickly without manual installation.
Run multiple services like Spark, Trino, or Flink together for testing.
Avoid dependency conflicts with your host machine.
Easily reset or scale your setup.

3. Prerequisites

Before installing Apache Iceberg in Docker, we need to ensure if we have already had:

Docker installed (version 20+ recommended), for Docker installation on Ubuntu 24.04 LTS can be found on this article.
Docker Compose installed (v2+ preferred)
At least 4 GB of free RAM for container services
Basic knowledge of the command line

4. Installing Apache Iceberg on Docker

4.1 Pull Required Docker Images

As we know, that Apache Iceberg itself does not run as a standalone service; it integrates with compute engines like Apache Spark or Trino.
You can pull the images for Spark with Iceberg support:

docker pull tabulario/spark-iceberg

Or for Trino with Iceberg connector:

docker pull trinodb/trino

4.2 Configure Docker Compose

Here’s an example docker-compose.yml file to run Trino and MinIO (S3-compatible storage) for Iceberg:

version: "3.8"
services:
  minio:
    image: minio/minio
    container_name: minio
    environment:
      - MINIO_ROOT_USER=minio
      - MINIO_ROOT_PASSWORD=minio123
    ports:
      - "9000:9000"
      - "9001:9001"
    command: server /data --console-address ":9001"

  trino:
    image: trinodb/trino
    container_name: trino
    ports:
      - "8080:8080"
    volumes:
      - ./etc:/etc/trino

This setup provides:

MinIO as object storage
Trino for querying Iceberg tables

4.3 Start Apache Iceberg Services

Run:

docker-compose up -d

Once running, Trino will be available at http://localhost:8080 and MinIO console at http://localhost:9001.

5. Verifying the Installation

To verify Iceberg works, we need to connect to Trino, we need to execute the command line below:

docker exec -it trino trino

Run a test query:

CREATE TABLE iceberg.test_table (
    id BIGINT,
    name STRING
) WITH (
    format = 'PARQUET',
    partitioning = ARRAY['id']

If this runs successfully, Iceberg is correctly installed.

6. Basic Apache Iceberg Operations

Here is an example commands in Trino:

INSERT INTO iceberg.test_table VALUES (1, 'Alice');
SELECT * FROM iceberg.test_table;

Iceberg will handle metadata, partitioning, and snapshots automatically.

7. Common Troubleshooting Tips

Port conflicts: Change exposed ports in docker-compose.yml.
Low memory errors: Increase Docker’s allocated memory in settings.
Authentication errors in MinIO: Double-check MINIO_ROOT_USER and MINIO_ROOT_PASSWORD.
Connector not found: Ensure Trino has the Iceberg connector configured in /etc/trino/catalog/iceberg.properties.

8. Best Practices for Running Apache Iceberg in Docker

We should notice several items below for running Apache Iceberg in Docker :

Use persistent volumes to avoid losing data on container restarts.
Keep Docker images updated for security patches.
Use resource limits to prevent container overuse.
For production, consider Kubernetes or cloud-native deployments.

9. Conclusion

Installing Apache Iceberg in Docker allows you to experiment, develop, and test without heavy infrastructure setup. By combining Trino or Spark with Iceberg and object storage like MinIO, you can create a fully functional data lakehouse on your local machine in minutes.

I hope this article could be helpful for anyone who is seeking Apache Iceberg references.

(Visited 127 times, 1 visits today)