How To Install Apache Flink On CentOS 8

On this short tutorial we will discuss how to install Apache Flink version 1.13.2 on CentOS 8 operating system and run a simple example of how Apache Flink is running.

Introduction

Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams. Flink has been designed to run in all common cluster environments perform computations at in-memory speed and at any scale. Apache Flink is executing arbitrary dataflow programs in a data-parallel and pipelined (hence task parallel) manner. Flink’s pipelined runtime system enables the execution of bulk/batch and stream processing programs. Flink is written in Java and Scala. On this article we will learn how to install Apache Flink version 1.13.2 on CentOS 8 operating system.

Apache Flink Installation On CentOS 8

On this tutorial we will examine how to install Apache Flink on local installation. The Apache Flink installation on CentOS 8 will be consist of several steps, namely :

  • Download Flink Source file
  • Unpack Apache Flink Source file
  • Start a Local Flink Cluster
  • Submit The Job
  • Stop the Local Cluster

Prerequisites

Before we are going to install Apache Flink on CentOS 8 operating system, we have to prepare the environment first as mentioned below :

  • CentOS 8 System with updated repository
  • root or ordinary account with sudo privilege
  • Sufficient disk space and good internet access





Apache Flink also requires Java 8 or 11 to run smoothly on the system, then we will verify if our system has already Java installed by submitting command line :

$ java -version

The output :

[ramans@diginetapp01 ~]$ java -version
openjdk version "1.8.0_302"
OpenJDK Runtime Environment (build 1.8.0_302-b08)
OpenJDK 64-Bit Server VM (build 25.302-b08, mixed mode)

Based on the information above, we have Java version 8 running on the system.

Download Apache Flink Binary File

The first step to install Apache Flink is to download the source from Apache Flink official website. To do this task we will use wget command line, as shown below :

$ wget https://downloads.apache.org/flink/flink-1.13.2/flink-1.13.2-bin-scala_2.12.tgz

The output :

[ramans@diginetapp01 ~]$ wget https://downloads.apache.org/flink/flink-1.13.2/flink-1.13.2-bin-scala_2.12.tgz
--2021-09-10 17:36:52-- https://downloads.apache.org/flink/flink-1.13.2/flink-1.13.2-bin-scala_2.12.tgz
Resolving downloads.apache.org (downloads.apache.org)... 88.99.95.219, 135.181.214.104, 135.181.209.10, ...
Connecting to downloads.apache.org (downloads.apache.org)|88.99.95.219|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 304614530 (291M) [application/x-gzip]
Saving to: ‘flink-1.13.2-bin-scala_2.12.tgz’

flink-1.13.2-bin-scala_2.12.t 100%[===============================================>] 290.50M 826KB/s in 6m 5s

2021-09-10 17:42:58 (815 KB/s) - ‘flink-1.13.2-bin-scala_2.12.tgz’ saved [304614530/304614530]

Unpack Apache Flink Source file

On this stage, we will upack Apache Flink source file by using tar command line. To do this task, we will submit the command line :

$ tar -xzf flink-1.13.2-bin-scala_2.12.tgz

Then we will find /bin sub directory to reach .sh files.

$ cd flink-1.13.2/bin/
$ ls -ltr *.sh

The output :

ramans@diginetapp01 bin]$ ls -ltr *.sh
-rwxr-xr-x. 1 ramans ramans 2405 May 30 19:46 zookeeper.sh
-rwxr-xr-x. 1 ramans ramans 1725 May 30 19:46 yarn-session.sh
-rwxr-xr-x. 1 ramans ramans 2960 May 30 19:46 taskmanager.sh
-rwxr-xr-x. 1 ramans ramans 1845 May 30 19:46 stop-zookeeper-quorum.sh
-rwxr-xr-x. 1 ramans ramans 1617 May 30 19:46 stop-cluster.sh
-rwxr-xr-x. 1 ramans ramans 1854 May 30 19:46 start-zookeeper-quorum.sh
-rwxr-xr-x. 1 ramans ramans 1837 May 30 19:46 start-cluster.sh
-rwxr-xr-x. 1 ramans ramans 2006 May 30 19:46 standalone-job.sh
-rwxr-xr-x. 1 ramans ramans 1770 May 30 19:46 kubernetes-taskmanager.sh
-rwxr-xr-x. 1 ramans ramans 1717 May 30 19:46 kubernetes-session.sh
-rwxr-xr-x. 1 ramans ramans 1650 May 30 19:46 kubernetes-jobmanager.sh
-rwxr-xr-x. 1 ramans ramans 2295 May 30 19:46 jobmanager.sh
-rwxr-xr-x. 1 ramans ramans 1564 May 30 19:46 historyserver.sh
-rwxr-xr-x. 1 ramans ramans 6571 May 30 19:46 flink-daemon.sh
-rwxr-xr-x. 1 ramans ramans 1318 May 30 19:46 find-flink-home.sh
-rwxr-xr-x. 1 ramans ramans 20562 May 30 19:46 config.sh
-rwxr-xr-x. 1 ramans ramans 2994 May 30 19:46 pyflink-shell.sh
-rwxr-xr-x. 1 ramans ramans 3742 May 30 19:46 sql-client.sh
-rwxr-xr-x. 1 ramans ramans 1891 Jul 2 00:31 mesos-taskmanager.sh
-rwxr-xr-x. 1 ramans ramans 1958 Jul 2 00:31 mesos-jobmanager.sh
-rwxr-xr-x. 1 ramans ramans 1137 Jul 2 00:31 mesos-appmaster.sh
-rwxr-xr-x. 1 ramans ramans 1133 Jul 2 00:31 mesos-appmaster-job.sh
-rwxr-xr-x. 1 ramans ramans 4137 Jul 2 00:31 flink-console.sh

Start a Local Flink Cluster

On our short tutorial, we will start a local Flink cluster and initiate a job to try how Apache Flink is running on the system.

To start a local Flink cluster, we will use the following command line :

$ ./start-cluster.sh

The output :

[ramans@diginetapp01 bin]$ ./start-cluster.sh
Starting cluster.
Starting standalonesession daemon on host diginetapp01.
Starting taskexecutor daemon on host diginetapp01.

Submitting A Job

At this stage, we will create a new job, this jar file is counting a given words.

$ ./bin/flink run examples/streaming/WordCount.jar

The output :

[ramans@diginetapp01 flink-1.13.2]$ ./bin/flink run examples/streaming/WordCount.jar
Executing WordCount example with default input data set.
Use --input to specify file input.
Printing result to stdout. Use --output to specify output path.
Job has been submitted with JobID d5210027f89d443e4b62fdad2a5713a8
Program execution finished
Job with JobID d5210027f89d443e4b62fdad2a5713a8 has finished.
Job Runtime: 2360 ms
[ramans@diginetapp01 flink-1.13.2]$ tail log/flink-*-taskexecutor-*.out
(nymph,1)
(in,3)
(thy,1)
(orisons,1)
(be,4)
(all,2)
(my,1)
(sins,1)
(remember,1)
(d,4)

To Stop Flink cluster, we will use command line :

$ ./bin/stop-cluster.sh

The output :

[ramans@diginetapp01 flink-1.13.2]$ ./bin/stop-cluster.sh
Stopping taskexecutor daemon (pid: 5396) on host diginetapp01.
Stopping standalonesession daemon (pid: 5129) on host diginetapp01.

Conclusion

On this short tutorial we have installed Apache Flink on CentOS 8 and start a local cluster then submit a new job successfully. I hope this article will be helpful for anyone who need it.

Add a Comment

Your email address will not be published.