How to Install Apache Iceberg on Docker: A Complete Guide

In this short article we will deep inside to Apache Iceberg installation on Docker. I hope this article will be useful for anyone who are seeking the information about Apache Iceberg.
Table of Contents
- Introduction to Apache Iceberg
- Why Use Docker for Apache Iceberg
- Prerequisites
- Installing Apache Iceberg on Docker
- Verifying the Installation
- Basic Apache Iceberg Operations
- Common Troubleshooting Tips
- Best Practices for Running Apache Iceberg in Docker
- Conclusion
1. Introduction to Apache Iceberg
Apache Iceberg is an open-source table format designed for large-scale analytics datasets. Originally created by Netflix, Iceberg allows you to manage massive data lakes with schema evolution, time travel queries, hidden partitioning, and ACID transactions.
By running Iceberg inside Docker, you can quickly test and develop without setting up complex infrastructure.
2. Why Use Docker for Apache Iceberg
Using Docker to deploy Apache Iceberg offers a streamlined, consistent, and portable environment that simplifies both setup and maintenance. Docker eliminates the complexities of manual configuration by packaging Iceberg and its dependencies into lightweight containers, ensuring the application runs identically across development, staging, and production environments.
Docker provides a lightweight, isolated environment that allows you to:
- Deploy Apache Iceberg quickly without manual installation.
- Run multiple services like Spark, Trino, or Flink together for testing.
- Avoid dependency conflicts with your host machine.
- Easily reset or scale your setup.
3. Prerequisites
Before installing Apache Iceberg in Docker, we need to ensure if we have already had:
- Docker installed (version 20+ recommended), for Docker installation on Ubuntu 24.04 LTS can be found on this article.
- Docker Compose installed (v2+ preferred)
- At least 4 GB of free RAM for container services
- Basic knowledge of the command line
4. Installing Apache Iceberg on Docker
4.1 Pull Required Docker Images
As we know, that Apache Iceberg itself does not run as a standalone service; it integrates with compute engines like Apache Spark or Trino.
You can pull the images for Spark with Iceberg support:
docker pull tabulario/spark-iceberg
Or for Trino with Iceberg connector:
docker pull trinodb/trino
4.2 Configure Docker Compose
Here’s an example docker-compose.yml file to run Trino and MinIO (S3-compatible storage) for Iceberg:
version: "3.8"
services:
minio:
image: minio/minio
container_name: minio
environment:
- MINIO_ROOT_USER=minio
- MINIO_ROOT_PASSWORD=minio123
ports:
- "9000:9000"
- "9001:9001"
command: server /data --console-address ":9001"
trino:
image: trinodb/trino
container_name: trino
ports:
- "8080:8080"
volumes:
- ./etc:/etc/trino
This setup provides:
- MinIO as object storage
- Trino for querying Iceberg tables
4.3 Start Apache Iceberg Services
Run:
docker-compose up -d
Once running, Trino will be available at http://localhost:8080
and MinIO console at http://localhost:9001
.
5. Verifying the Installation
To verify Iceberg works, we need to connect to Trino, we need to execute the command line below:
docker exec -it trino trino
Run a test query:
CREATE TABLE iceberg.test_table (
id BIGINT,
name STRING
) WITH (
format = 'PARQUET',
partitioning = ARRAY['id']
If this runs successfully, Iceberg is correctly installed.
6. Basic Apache Iceberg Operations
Here is an example commands in Trino:
INSERT INTO iceberg.test_table VALUES (1, 'Alice');
SELECT * FROM iceberg.test_table;
Iceberg will handle metadata, partitioning, and snapshots automatically.
7. Common Troubleshooting Tips
- Port conflicts: Change exposed ports in
docker-compose.yml
. - Low memory errors: Increase Docker’s allocated memory in settings.
- Authentication errors in MinIO: Double-check
MINIO_ROOT_USER
andMINIO_ROOT_PASSWORD
. - Connector not found: Ensure Trino has the Iceberg connector configured in
/etc/trino/catalog/iceberg.properties
.
8. Best Practices for Running Apache Iceberg in Docker
We should notice several items below for running Apache Iceberg in Docker :
- Use persistent volumes to avoid losing data on container restarts.
- Keep Docker images updated for security patches.
- Use resource limits to prevent container overuse.
- For production, consider Kubernetes or cloud-native deployments.
9. Conclusion
Installing Apache Iceberg in Docker allows you to experiment, develop, and test without heavy infrastructure setup. By combining Trino or Spark with Iceberg and object storage like MinIO, you can create a fully functional data lakehouse on your local machine in minutes.
I hope this article could be helpful for anyone who is seeking Apache Iceberg references.