On this article we will discuss the procedures of how to install Apache Airflow 2.1 on Ubuntu 20.04 LTS Operating system.
Introduction
Apache Airflow is an open-source workflow management platform for building the data pipelines. Airflow is written in Python, and workflows are created via Python scripts. Airflow is designed under the principle of “configuration as code”. Airflow is initiated at Airbnb in October 2014 as a solution to manage the company’s increasingly complex workflows. From the beginning, the project was made open source, becoming an Apache Incubator project in March 2016 and a Top-Level Apache Software Foundation project in January 2019.
Airflow uses directed acyclic graphs (DAGs) to manage workflow orchestration. Tasks and dependencies are defined in Python and then Airflow manages the scheduling and execution. In this article, we will discuss the procedures for installing Apache Airflow version 2.1.x on Ubuntu 20.04 LTS operating system.
Apache Airflow Installation on Ubuntu 20.04 LTS
Before we are starting to Apache Airflow installation guide, we need to know what is the prerequisites of Apache Airflow installation on Ubuntu. Here are the prerequisite of Apache Airflow installation on Ubuntu 20.04.
- An Ubuntu 20.04 LTS Server with sufficient disk space
- An account with sudo or root access to run privileged commands.
- Python: 3.6, 3.7, 3.8 must be installed (3.9 is not supported)
- Databases installed on the server :PostgreSQL (version :9.6, 10, 11, 12, 13), MySQL: (version :5.7, 8), SQLite: (version :3.15.0+)
By using Apache Airflow, we can easily visualize the data pipelines’ dependencies, progress, logs, code, trigger tasks, and success status. The Apache Aifrlow installation process will consist of several steps as mentioned below:
- Install Required Packages
- Update the packages
- Install the python3-pip
- Install Apache Airflow required dependencies
- Install the Apache-Airflow on Ubuntu 20.04
- Set Apache-Airflow Login credentials for airflow web interface
- Start the Apache-Airflow web interface
A more detailed explanation will be described in the following sub-chapters.
1. Install Required Packages
The first stage in installing Apache Airflow is to install required packages, as shown below :
$ sudo apt-get install software-properties-common $ sudo apt-add-repository universe
Output :
ramans@otodiginet:~$ sudo apt-get install software-properties-common Reading package lists... Done Building dependency tree Reading state information... Done The following additional packages will be installed: python3-software-properties software-properties-gtk The following packages will be upgraded: python3-software-properties software-properties-common software-properties-gtk 3 upgraded, 0 newly installed, 0 to remove and 541 not upgraded. Need to get 0 B/99.7 kB of archives. After this operation, 0 B of additional disk space will be used. Do you want to continue? [Y/n] Y (Reading database ... 142625 files and directories currently installed.) Preparing to unpack .../software-properties-common_0.98.9.5_all.deb ... Unpacking software-properties-common (0.98.9.5) over (0.98.9) ... Preparing to unpack .../software-properties-gtk_0.98.9.5_all.deb ... Unpacking software-properties-gtk (0.98.9.5) over (0.98.9) ... Preparing to unpack .../python3-software-properties_0.98.9.5_all.deb ... Unpacking python3-software-properties (0.98.9.5) over (0.98.9) ... Setting up python3-software-properties (0.98.9.5) ... Setting up software-properties-common (0.98.9.5) ... Setting up software-properties-gtk (0.98.9.5) ... Processing triggers for dbus (1.12.16-2ubuntu2) ... Processing triggers for shared-mime-info (1.15-1) ... Processing triggers for desktop-file-utils (0.24-1ubuntu2) ... Processing triggers for mime-support (3.64ubuntu1) ... Processing triggers for hicolor-icon-theme (0.17-2) ... Processing triggers for gnome-menus (3.36.0-1ubuntu1) ... Processing triggers for libglib2.0-0:amd64 (2.64.2-1~fakesync1) ... Processing triggers for man-db (2.9.1-1) ... ramans@otodiginet:~$ sudo apt-add-repository universe 'universe' distribution component is already enabled for all sources. ramans@otodiginet:~$ sudo apt-get update Hit:1 http://security.ubuntu.com/ubuntu focal-security InRelease Hit:2 http://us.archive.ubuntu.com/ubuntu focal InRelease Hit:3 http://us.archive.ubuntu.com/ubuntu focal-updates InRelease Hit:4 http://us.archive.ubuntu.com/ubuntu focal-backports InRelease Reading package lists... Done
ramans@otodiginet:~$ sudo apt-get install python-setuptools Reading package lists... Done Building dependency tree Reading state information... Done The following additional packages will be installed: libpython2-stdlib libpython2.7-minimal libpython2.7-stdlib python-pkg-resources python2 python2-minimal python2.7 python2.7-minimal Suggested packages: python-setuptools-doc python2-doc python-tk python2.7-doc binutils binfmt-support The following NEW packages will be installed: libpython2-stdlib libpython2.7-minimal libpython2.7-stdlib python-pkg-resources python-setuptools python2 python2-minimal python2.7 python2.7-minimal 0 upgraded, 9 newly installed, 0 to remove and 541 not upgraded. Need to get 4,275 kB of archives. After this operation, 18.5 MB of additional disk space will be used. Do you want to continue? [Y/n] Y . . . Setting up libpython2-stdlib:amd64 (2.7.17-2ubuntu4) ... Setting up python2 (2.7.17-2ubuntu4) ... Setting up python-pkg-resources (44.0.0-2) ... Setting up python-setuptools (44.0.0-2) ... Processing triggers for mime-support (3.64ubuntu1) ... Processing triggers for gnome-menus (3.36.0-1ubuntu1) ... Processing triggers for man-db (2.9.1-1) ... Processing triggers for desktop-file-utils (0.24-1ubuntu2) ...
2. Update The Packages
We will update the packages which was added on the previous step. To update the package on our Ubuntu system, we will use the command line :
$ sudo apt-get update
Output :
ramans@otodiginet:~$ sudo apt-get update Hit:1 http://security.ubuntu.com/ubuntu focal-security InRelease Hit:2 http://us.archive.ubuntu.com/ubuntu focal InRelease Hit:3 http://us.archive.ubuntu.com/ubuntu focal-updates InRelease Hit:4 http://us.archive.ubuntu.com/ubuntu focal-backports InRelease Reading package lists... Done
3. Install Python 3 pip
pip is standard package manager for Python. It allows us to install and manage packages that aren’t part of the Python standard packages. Apache Airflow need pip to install the software. On this stage, we will install pip with the following command line :
$ sudo apt-get install python-setuptools $ sudo apt install python3-pip
Output :
ramans@otodiginet:~$ sudo apt-get install python-setuptools Reading package lists... Done Building dependency tree Reading state information... Done The following additional packages will be installed: libpython2-stdlib libpython2.7-minimal libpython2.7-stdlib python-pkg-resources python2 python2-minimal python2.7 python2.7-minimal Suggested packages: python-setuptools-doc python2-doc python-tk python2.7-doc binutils binfmt-support The following NEW packages will be installed: libpython2-stdlib libpython2.7-minimal libpython2.7-stdlib python-pkg-resources python-setuptools python2 python2-minimal python2.7 python2.7-minimal 0 upgraded, 9 newly installed, 0 to remove and 541 not upgraded. Need to get 4,275 kB of archives. After this operation, 18.5 MB of additional disk space will be used. Do you want to continue? [Y/n] Y Get:1 http://us.archive.ubuntu.com/ubuntu focal-updates/universe amd64 libpython2.7-minimal amd64 2.7.18-1~20.04.1 [335 kB] Get:2 http://us.archive.ubuntu.com/ubuntu focal-updates/universe amd64 python2.7-minimal amd64 2.7.18-1~20.04.1 [1,285 kB] . . . Setting up python-pkg-resources (44.0.0-2) ... Setting up python-setuptools (44.0.0-2) ... Processing triggers for mime-support (3.64ubuntu1) ... Processing triggers for gnome-menus (3.36.0-1ubuntu1) ... Processing triggers for man-db (2.9.1-1) ... Processing triggers for desktop-file-utils (0.24-1ubuntu2) ...
ramans@otodiginet:~$ sudo apt install python3-pip Reading package lists... Done Building dependency tree Reading state information... Done The following additional packages will be installed: binutils binutils-common binutils-x86-64-linux-gnu build-essential cpp-9 dpkg-dev fakeroot g++ g++-9 gcc gcc-10-base gcc-9 gcc-9-base libalgorithm-diff-perl libalgorithm-diff-xs-perl libalgorithm-merge-perl libasan5 libatomic1 libbinutils libc-dev-bin libc6 libc6-dbg libc6-dev libcc1-0 libcrypt-dev libctf-nobfd0 libctf0 libexpat1-dev libfakeroot libgcc-9-dev libgcc-s1 libgomp1 libitm1 liblsan0 libpython3-dev libpython3.8 libpython3.8-dev libpython3.8-minimal libpython3.8-stdlib libquadmath0 libstdc++-9-dev libstdc++6 libtsan0 libubsan1 linux-libc-dev make manpages-dev python-pip-whl python3-dev python3-distutils python3-lib2to3 python3-setuptools python3-wheel python3.8 python3.8-dev python3.8-minimal zlib1g zlib1g-dev Suggested packages: binutils-doc gcc-9-locales debian-keyring g++-multilib g++-9-multilib gcc-9-doc gcc-multilib autoconf automake libtool flex bison gcc-doc gcc-9-multilib glibc-doc libstdc++-9-doc make-doc python-setuptools-doc python3.8-venv python3.8-doc binfmt-support The following NEW packages will be installed: binutils binutils-common binutils-x86-64-linux-gnu build-essential dpkg-dev fakeroot g++ g++-9 gcc gcc-9 libalgorithm-diff-perl libalgorithm-diff-xs-perl libalgorithm-merge-perl libasan5 libatomic1 libbinutils libc-dev-bin libc6-dev libcrypt-dev libctf-nobfd0 libctf0 libexpat1-dev libfakeroot libgcc-9-dev libitm1 liblsan0 libpython3-dev libpython3.8-dev libquadmath0 libstdc++-9-dev libtsan0 libubsan1 linux-libc-dev make manpages-dev python-pip-whl python3-dev python3-distutils python3-pip python3-setuptools python3-wheel python3.8-dev zlib1g-dev The following packages will be upgraded: cpp-9 gcc-10-base gcc-9-base libc6 libc6-dbg libcc1-0 libgcc-s1 libgomp1 libpython3.8 libpython3.8-minimal libpython3.8-stdlib libstdc++6 python3-lib2to3 python3.8 python3.8-minimal zlib1g 16 upgraded, 43 newly installed, 0 to remove and 525 not upgraded. Need to get 48.8 MB/69.6 MB of archives. After this operation, 217 MB of additional disk space will be used. Do you want to continue? [Y/n] Y Get:1 http://us.archive.ubuntu.com/ubuntu focal-updates/main amd64 libgomp1 amd64 10.3.0-1ubuntu1~20.04 [102 kB] Get:2 http://us.archive.ubuntu.com/ubuntu focal-updates/main amd64 gcc-10-base amd64 10.3.0-1ubuntu1~20.04 [20.2 kB] . . . Setting up build-essential (12.8ubuntu1.1) ... Setting up python3-dev (3.8.2-0ubuntu2) ... Processing triggers for mime-support (3.64ubuntu1) ... Processing triggers for gnome-menus (3.36.0-1ubuntu1) ... Processing triggers for libc-bin (2.31-0ubuntu9) ... Processing triggers for man-db (2.9.1-1) ... Processing triggers for desktop-file-utils (0.24-1ubuntu2) ...
4. Install Apache Airflow required dependencies
On this stage, we will install Airflow required dependencies packages, we need our Apache Airflow application will be running properly so we need to install all its dependencies. The Airflow dependencies packages installation will be perform on the following steps.
$ sudo apt-get install libmysqlclient-dev $ sudo apt-get install libssl-dev $ sudo apt-get install libkrb5-dev
Output :
ramans@otodiginet:~$ sudo apt-get install libmysqlclient-dev Reading package lists... Done Building dependency tree Reading state information... Done The following additional packages will be installed: libmysqlclient21 libssl-dev libssl1.1 Suggested packages: libssl-doc The following NEW packages will be installed: libmysqlclient-dev libssl-dev The following packages will be upgraded: libmysqlclient21 libssl1.1 2 upgraded, 2 newly installed, 0 to remove and 523 not upgraded. Need to get 4,348 kB/5,668 kB of archives. After this operation, 17.9 MB of additional disk space will be used. Do you want to continue? [Y/n] Y Get:1 http://us.archive.ubuntu.com/ubuntu focal-updates/main amd64 libmysqlclient21 amd64 8.0.26-0ubuntu0.20.04.2 [1,228 kB] Get:2 http://us.archive.ubuntu.com/ubuntu focal-updates/main amd64 libssl-dev amd64 1.1.1f-1ubuntu2.4 [1,583 kB] Get:3 http://us.archive.ubuntu.com/ubuntu focal-updates/main amd64 libmysqlclient-dev amd64 8.0.26-0ubuntu0.20.04.2 [1,538 kB] Fetched 4,348 kB in 23s (188 kB/s) Preconfiguring packages ... (Reading database ... 149739 files and directories currently installed.) Preparing to unpack .../libssl1.1_1.1.1f-1ubuntu2.4_amd64.deb ... Unpacking libssl1.1:amd64 (1.1.1f-1ubuntu2.4) over (1.1.1f-1ubuntu2) ... Preparing to unpack .../libmysqlclient21_8.0.26-0ubuntu0.20.04.2_amd64.deb ... Unpacking libmysqlclient21:amd64 (8.0.26-0ubuntu0.20.04.2) over (8.0.19-0ubuntu5) ... Selecting previously unselected package libssl-dev:amd64. Preparing to unpack .../libssl-dev_1.1.1f-1ubuntu2.4_amd64.deb ... Unpacking libssl-dev:amd64 (1.1.1f-1ubuntu2.4) ... Selecting previously unselected package libmysqlclient-dev. Preparing to unpack .../libmysqlclient-dev_8.0.26-0ubuntu0.20.04.2_amd64.deb ... Unpacking libmysqlclient-dev (8.0.26-0ubuntu0.20.04.2) ... Setting up libssl1.1:amd64 (1.1.1f-1ubuntu2.4) ... Setting up libssl-dev:amd64 (1.1.1f-1ubuntu2.4) ... Setting up libmysqlclient21:amd64 (8.0.26-0ubuntu0.20.04.2) ... Setting up libmysqlclient-dev (8.0.26-0ubuntu0.20.04.2) ... Processing triggers for man-db (2.9.1-1) ... Processing triggers for libc-bin (2.31-0ubuntu9) ...
5. Install the Apache Airflow on Ubuntu 20.04 System
On this stage, we will install Apache Airflow on Ubuntu 20.04 operating system. Apache Airflow requires a home directory where it stores all its settings, configurations, to make it we will do the following command line :
$ export AIRFLOW_HOME=~/airflow
Then we will install Apache Airflow by submitting command line :
$ pip3 install apache-airflow
Output will be as shown below :
ramans@otodiginet:~$ export AIRFLOW_HOME=~/airflow ramans@otodiginet:~$ pip3 install apache-airflow Collecting apache-airflow Downloading apache_airflow-2.1.2-py3-none-any.whl (5.2 MB) |████████████████████████████████| 5.2 MB 2.1 MB/s Requirement already satisfied: pyjwt<2 in /usr/lib/python3/dist-packages (from apache-airflow) (1.7.1) Requirement already satisfied: pyyaml>=5.1 in /usr/lib/python3/dist-packages (from apache-airflow) (5.3.1) Collecting flask-wtf<0.15,>=0.14.3 Downloading Flask_WTF-0.14.3-py2.py3-none-any.whl (13 kB) Collecting sqlalchemy<1.4,>=1.3.18 Downloading SQLAlchemy-1.3.24-cp38-cp38-manylinux2010_x86_64.whl (1.3 MB) |████████████████████████████████| 1.3 MB 3.1 MB/s Collecting lazy-object-proxy Downloading lazy_object_proxy-1.6.0-cp38-cp38-manylinux1_x86_64.whl (58 kB) |████████████████████████████████| 58 kB 1.6 rMB/s Collecting argcomplete~=1.10 Downloading argcomplete-1.12.3-py2.py3-none-any.whl (38 kB) Collecting unicodecsv>=0.14.1 Downloading unicodecsv-0.14.1.tar.gz (10 kB) Requirement already satisfied: python-dateutil<3,>=2.3 in /usr/lib/python3/dist-packages (from apache-airflow) (2.7.3) . . . Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location. Successfully installed Babel-2.9.1 Flask-Babel-1.0.0 Flask-JWT-Extended-3.25.1 Flask-OpenID-1.2.5 Flask-SQLAlchemy-2.5.1 WTForms-2.3.3 alembic-1.6.5 anyio-3.3.0 apache-airflow-2.1.2 apache-airflow-providers-ftp-2.0.0 apache-airflow-providers-imap-2.0.0 apache-airflow-providers-sqlite-2.0.0 apispec-3.3.2 argcomplete-1.12.3 attrs-20.3.0 cattrs-1.5.0 click-8.0.1 clickclick-20.10.2 colorama-0.4.4 colorlog-5.0.1 commonmark-0.9.1 croniter-1.0.15 defusedxml-0.7.1 dill-0.3.4 dnspython-2.1.0 docutils-0.16 email-validator-1.1.3 flask-1.1.4 flask-appbuilder-3.3.2 flask-caching-1.10.1 flask-login-0.4.1 flask-wtf-0.14.3 graphviz-0.17 gunicorn-20.1.0 h11-0.12.0 httpcore-0.13.6 httpx-0.18.2 importlib-metadata-4.6.1 importlib-resources-1.5.0 inflection-0.5.1 iso8601-0.1.16 isodate-0.6.0 itsdangerous-1.1.0 jinja2-2.11.3 jsonschema-3.2.0 lazy-object-proxy-1.6.0 markdown-3.3.4 markupsafe-1.1.1 marshmallow-3.13.0 marshmallow-enum-1.5.1 marshmallow-oneofschema-3.0.1 marshmallow-sqlalchemy-0.23.1 numpy-1.21.1 openapi-schema-validator-0.1.5 openapi-spec-validator-0.3.1 pandas-1.3.1 pendulum-2.1.2 prison-0.1.3 psutil-5.8.0 pygments-2.9.0 pyrsistent-0.18.0 python-daemon-2.3.0 python-editor-1.0.4 python-nvd3-0.15.0 python-slugify-4.0.1 python3-openid-3.2.0 pytzdata-2020.1 rfc3986-1.5.0 rich-10.6.0 setproctitle-1.2.2 sniffio-1.2.0 sqlalchemy-1.3.24 sqlalchemy-jsonfield-1.0.0 sqlalchemy-utils-0.37.8 swagger-ui-bundle-0.0.8 tabulate-0.8.9 tenacity-6.2.0 termcolor-1.1.0 text-unidecode-1.3 unicodecsv-0.14.1 werkzeug-1.0.1 zipp-3.5.0 ramans@otodiginet:~$ sudo apt install python3-virtualenv Reading package lists... Done Building dependency tree Reading state information... Done The following additional packages will be installed: python3-appdirs python3-distlib python3-filelock python3-importlib-metadata python3-more-itertools python3-zipp The following NEW packages will be installed: python3-appdirs python3-distlib python3-filelock python3-importlib-metadata python3-more-itertools python3-virtualenv python3-zipp 0 upgraded, 7 newly installed, 0 to remove and 523 not upgraded. Need to get 252 kB of archives. After this operation, 1,304 kB of additional disk space will be used. Do you want to continue? [Y/n] Y Get:1 http://us.archive.ubuntu.com/ubuntu focal/main amd64 python3-appdirs all 1.4.3-2.1 [10.8 kB] Get:2 http://us.archive.ubuntu.com/ubuntu focal/universe amd64 python3-distlib all 0.3.0-1 [116 kB] Get:3 http://us.archive.ubuntu.com/ubuntu focal/universe amd64 python3-filelock all 3.0.12-2 [7,948 B] Get:4 http://us.archive.ubuntu.com/ubuntu focal/main amd64 python3-more-itertools all 4.2.0-1build1 [39.4 kB] Get:5 http://us.archive.ubuntu.com/ubuntu focal/main amd64 python3-zipp all 1.0.0-1 [5,312 B] Get:6 http://us.archive.ubuntu.com/ubuntu focal/main amd64 python3-importlib-metadata all 1.5.0-1 [9,992 B] Get:7 http://us.archive.ubuntu.com/ubuntu focal-updates/universe amd64 python3-virtualenv all 20.0.17-1ubuntu0.4 [62.7 kB] Fetched 252 kB in 3s (93.1 kB/s) Selecting previously unselected package python3-appdirs. (Reading database ... 149886 files and directories currently installed.) Preparing to unpack .../0-python3-appdirs_1.4.3-2.1_all.deb ... Unpacking python3-appdirs (1.4.3-2.1) ... Selecting previously unselected package python3-distlib. Preparing to unpack .../1-python3-distlib_0.3.0-1_all.deb ... Unpacking python3-distlib (0.3.0-1) ... Selecting previously unselected package python3-filelock. Preparing to unpack .../2-python3-filelock_3.0.12-2_all.deb ... Unpacking python3-filelock (3.0.12-2) ... Selecting previously unselected package python3-more-itertools. Preparing to unpack .../3-python3-more-itertools_4.2.0-1build1_all.deb ... Unpacking python3-more-itertools (4.2.0-1build1) ... Selecting previously unselected package python3-zipp. Preparing to unpack .../4-python3-zipp_1.0.0-1_all.deb ... Unpacking python3-zipp (1.0.0-1) ... Selecting previously unselected package python3-importlib-metadata. Preparing to unpack .../5-python3-importlib-metadata_1.5.0-1_all.deb ... Unpacking python3-importlib-metadata (1.5.0-1) ... Selecting previously unselected package python3-virtualenv. Preparing to unpack .../6-python3-virtualenv_20.0.17-1ubuntu0.4_all.deb ... Unpacking python3-virtualenv (20.0.17-1ubuntu0.4) ... Setting up python3-more-itertools (4.2.0-1build1) ... Setting up python3-filelock (3.0.12-2) ... Setting up python3-distlib (0.3.0-1) ... Setting up python3-zipp (1.0.0-1) ... Setting up python3-appdirs (1.4.3-2.1) ... Setting up python3-importlib-metadata (1.5.0-1) ... Setting up python3-virtualenv (20.0.17-1ubuntu0.4) ... Processing triggers for man-db (2.9.1-1) ...
Create a new virtual environment, (on our tutorial, it will be as airflow_otodigi), by submitting command line :
ramans@otodiginet:~$ virtualenv airflow_otodigi created virtual environment CPython3.8.10.final.0-64 in 567ms creator CPython3Posix(dest=/home/ramans/airflow_otodigi, clear=False, global=False) seeder FromAppData(download=False, pip=latest, setuptools=latest, wheel=latest, pkg_resources=latest, via=copy, app_data_dir=/home/ramans/.local/share/virtualenv/seed-app-data/v1.0.1.debian.1) activators BashActivator,CShellActivator,FishActivator,PowerShellActivator,PythonActivator,XonshActivator
Find a new created directory and go to inside it :
ramans@otodiginet:~$ ls -lt total 36 drwxrwxr-x 4 ramans ramans 4096 Jul 29 02:44 airflow_otodigi drwxr-xr-x 3 ramans ramans 4096 Jul 29 00:58 Pictures drwxr-xr-x 2 ramans ramans 4096 May 26 23:33 Desktop drwxr-xr-x 2 ramans ramans 4096 May 26 23:33 Documents drwxr-xr-x 2 ramans ramans 4096 May 26 23:33 Downloads drwxr-xr-x 2 ramans ramans 4096 May 26 23:33 Music drwxr-xr-x 2 ramans ramans 4096 May 26 23:33 Public drwxr-xr-x 2 ramans ramans 4096 May 26 23:33 Templates drwxr-xr-x 2 ramans ramans 4096 May 26 23:33 Videos ramans@otodiginet:~$ cd airflow_otodigi/ ramans@otodiginet:~/airflow_otodigi$ ls -l total 12 drwxrwxr-x 2 ramans ramans 4096 Jul 29 02:44 bin drwxrwxr-x 3 ramans ramans 4096 Jul 29 02:44 lib -rw-rw-r-- 1 ramans ramans 203 Jul 29 02:44 pyvenv.cfg ramans@otodiginet:~/airflow_otodigi$ cd bin ramans@otodiginet:~/airflow_otodigi/bin$ ls activate activate.ps1 easy_install pip pip3.8 python3.8 wheel-3.8 activate.csh activate_this.py easy_install3 pip3 python wheel activate.fish activate.xsh easy_install-3.8 pip-3.8 python3 wheel3 ramans@otodiginet:~/airflow_otodigi$ ls -l total 12 drwxrwxr-x 2 ramans ramans 4096 Jul 29 02:44 bin drwxrwxr-x 3 ramans ramans 4096 Jul 29 02:44 lib -rw-rw-r-- 1 ramans ramans 203 Jul 29 02:44 pyvenv.cfg ramans@otodiginet:~/airflow_otodigi$ cd bin ramans@otodiginet:~/airflow_otodigi/bin$ ls activate activate.ps1 easy_install pip pip3.8 python3.8 wheel-3.8 activate.csh activate_this.py easy_install3 pip3 python wheel activate.fish activate.xsh easy_install-3.8 pip-3.8 python3 wheel3
Then activate it, by submitting command line :
$ source activate
5.1 Install Airflow Extension
On this stage, we will install typing_extension by submitting command line :
$ pip3 install typing_extensions
Output :
(airflow_otodigi) ramans@otodiginet:~/airflow_otodigi/bin$ pip3 install typing_extensions WARNING: Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='pypi.org', port=443): Read timed out. (read timeout=15)")': /simple/typing-extensions/ WARNING: Retrying (Retry(total=3, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='pypi.org', port=443): Read timed out. (read timeout=15)")': /simple/typing-extensions/ WARNING: Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='pypi.org', port=443): Read timed out. (read timeout=15)")': /simple/typing-extensions/ WARNING: Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='pypi.org', port=443): Read timed out. (read timeout=15)")': /simple/typing-extensions/ Collecting typing_extensions Using cached typing_extensions-3.10.0.0-py3-none-any.whl (26 kB) Installing collected packages: typing-extensions Successfully installed typing-extensions-3.10.0.0
5.2. Init Database Airflow
The next step is to initialize Apache Airflow database. We will use the following command line :
$ airflow db init
Output :
(airflow_otodigi) ramans@otodiginet:~/airflow_otodigi/bin$ airflow db init DB: sqlite:////home/ramans/airflow/airflow.db [2021-07-29 02:57:04,096] {db.py:692} INFO - Creating tables INFO [alembic.runtime.migration] Context impl SQLiteImpl. INFO [alembic.runtime.migration] Will assume non-transactional DDL. INFO [alembic.runtime.migration] Running upgrade -> e3a246e0dc1, current schema INFO [alembic.runtime.migration] Running upgrade e3a246e0dc1 -> 1507a7289a2f, create is_encrypted /home/ramans/airflow_otodigi/lib/python3.8/site-packages/alembic/ddl/sqlite.py:43 UserWarning: Skipping unsupported ALTER for creation of implicit constraintPlease refer to the batch mode feature which allows for SQLite migrations using a copy-and-move strategy. INFO [alembic.runtime.migration] Running upgrade 1507a7289a2f -> 13eb55f81627, maintain history for compatibility with earlier migrations INFO [alembic.runtime.migration] Running upgrade 13eb55f81627 -> 338e90f54d61, More logging into task_instance INFO [alembic.runtime.migration] Running upgrade 338e90f54d61 -> 52d714495f0, job_id indices INFO [alembic.runtime.migration] Running upgrade 52d714495f0 -> 502898887f84, Adding extra to Log INFO [alembic.runtime.migration] Running upgrade 502898887f84 -> 1b38cef5b76e, add dagrun INFO [alembic.runtime.migration] Running upgrade 1b38cef5b76e -> 2e541a1dcfed, task_duration INFO [alembic.runtime.migration] Running upgrade 2e541a1dcfed -> 40e67319e3a9, dagrun_config INFO [alembic.runtime.migration] Running upgrade 40e67319e3a9 -> 561833c1c74b, add password column to user INFO [alembic.runtime.migration] Running upgrade 561833c1c74b -> 4446e08588, dagrun start end INFO [alembic.runtime.migration] Running upgrade 4446e08588 -> bbc73705a13e, Add notification_sent column to sla_miss INFO [alembic.runtime.migration] Running upgrade bbc73705a13e -> bba5a7cfc896, Add a column to track the encryption state of the 'Extra' field in connection INFO [alembic.runtime.migration] Running upgrade bba5a7cfc896 -> 1968acfc09e3, add is_encrypted column to variable table INFO [alembic.runtime.migration] Running upgrade 1968acfc09e3 -> 2e82aab8ef20, rename user table INFO [alembic.runtime.migration] Running upgrade 2e82aab8ef20 -> 211e584da130, add TI state index INFO [alembic.runtime.migration] Running upgrade 211e584da130 -> 64de9cddf6c9, add task fails journal table INFO [alembic.runtime.migration] Running upgrade 64de9cddf6c9 -> f2ca10b85618, add dag_stats table INFO [alembic.runtime.migration] Running upgrade f2ca10b85618 -> 4addfa1236f1, Add fractional seconds to mysql tables INFO [alembic.runtime.migration] Running upgrade 4addfa1236f1 -> 8504051e801b, xcom dag task indices INFO [alembic.runtime.migration] Running upgrade 8504051e801b -> 5e7d17757c7a, add pid field to TaskInstance INFO [alembic.runtime.migration] Running upgrade 5e7d17757c7a -> 127d2bf2dfa7, Add dag_id/state index on dag_run table INFO [alembic.runtime.migration] Running upgrade 127d2bf2dfa7 -> cc1e65623dc7, add max tries column to task instance WARNI [unusual_prefix_4b1ffe5d701ea116e7b4c0dcdd50964f852ae761_example_kubernetes_executor_config] Could not import DAGs in example_kubernetes_executor_config.py: No module named 'kubernetes' WARNI [unusual_prefix_4b1ffe5d701ea116e7b4c0dcdd50964f852ae761_example_kubernetes_executor_config] Install kubernetes dependencies with: pip install apache-airflow['cncf.kubernetes'] INFO [alembic.runtime.migration] Running upgrade cc1e65623dc7 -> bdaa763e6c56, Make xcom value column a large binary INFO [alembic.runtime.migration] Running upgrade bdaa763e6c56 -> 947454bf1dff, add ti job_id index INFO [alembic.runtime.migration] Running upgrade 947454bf1dff -> d2ae31099d61, Increase text size for MySQL (not relevant for other DBs' text types) INFO [alembic.runtime.migration] Running upgrade d2ae31099d61 -> 0e2a74e0fc9f, Add time zone awareness INFO [alembic.runtime.migration] Running upgrade d2ae31099d61 -> 33ae817a1ff4, kubernetes_resource_checkpointing INFO [alembic.runtime.migration] Running upgrade 33ae817a1ff4 -> 27c6a30d7c24, kubernetes_resource_checkpointing INFO [alembic.runtime.migration] Running upgrade 27c6a30d7c24 -> 86770d1215c0, add kubernetes scheduler uniqueness INFO [alembic.runtime.migration] Running upgrade 86770d1215c0, 0e2a74e0fc9f -> 05f30312d566, merge heads INFO [alembic.runtime.migration] Running upgrade 05f30312d566 -> f23433877c24, fix mysql not null constraint INFO [alembic.runtime.migration] Running upgrade f23433877c24 -> 856955da8476, fix sqlite foreign key INFO [alembic.runtime.migration] Running upgrade 856955da8476 -> 9635ae0956e7, index-faskfail INFO [alembic.runtime.migration] Running upgrade 9635ae0956e7 -> dd25f486b8ea, add idx_log_dag INFO [alembic.runtime.migration] Running upgrade dd25f486b8ea -> bf00311e1990, add index to taskinstance INFO [alembic.runtime.migration] Running upgrade 9635ae0956e7 -> 0a2a5b66e19d, add task_reschedule table INFO [alembic.runtime.migration] Running upgrade 0a2a5b66e19d, bf00311e1990 -> 03bc53e68815, merge_heads_2 INFO [alembic.runtime.migration] Running upgrade 03bc53e68815 -> 41f5f12752f8, add superuser field INFO [alembic.runtime.migration] Running upgrade 41f5f12752f8 -> c8ffec048a3b, add fields to dag INFO [alembic.runtime.migration] Running upgrade c8ffec048a3b -> dd4ecb8fbee3, Add schedule interval to dag INFO [alembic.runtime.migration] Running upgrade dd4ecb8fbee3 -> 939bb1e647c8, task reschedule fk on cascade delete INFO [alembic.runtime.migration] Running upgrade 939bb1e647c8 -> 6e96a59344a4, Make TaskInstance.pool not nullable INFO [alembic.runtime.migration] Running upgrade 6e96a59344a4 -> d38e04c12aa2, add serialized_dag table INFO [alembic.runtime.migration] Running upgrade d38e04c12aa2 -> b3b105409875, add root_dag_id to DAG INFO [alembic.runtime.migration] Running upgrade 6e96a59344a4 -> 74effc47d867, change datetime to datetime2(6) on MSSQL tables INFO [alembic.runtime.migration] Running upgrade 939bb1e647c8 -> 004c1210f153, increase queue name size limit INFO [alembic.runtime.migration] Running upgrade c8ffec048a3b -> a56c9515abdc, Remove dag_stat table INFO [alembic.runtime.migration] Running upgrade a56c9515abdc, 004c1210f153, 74effc47d867, b3b105409875 -> 08364691d074, Merge the four heads back together INFO [alembic.runtime.migration] Running upgrade 08364691d074 -> fe461863935f, increase_length_for_connection_password INFO [alembic.runtime.migration] Running upgrade fe461863935f -> 7939bcff74ba, Add DagTags table INFO [alembic.runtime.migration] Running upgrade 7939bcff74ba -> a4c2fd67d16b, add pool_slots field to task_instance INFO [alembic.runtime.migration] Running upgrade a4c2fd67d16b -> 852ae6c715af, Add RenderedTaskInstanceFields table INFO [alembic.runtime.migration] Running upgrade 852ae6c715af -> 952da73b5eff, add dag_code table INFO [alembic.runtime.migration] Running upgrade 952da73b5eff -> a66efa278eea, Add Precision to execution_date in RenderedTaskInstanceFields table INFO [alembic.runtime.migration] Running upgrade a66efa278eea -> da3f683c3a5a, Add dag_hash Column to serialized_dag table INFO [alembic.runtime.migration] Running upgrade da3f683c3a5a -> 92c57b58940d, Create FAB Tables INFO [alembic.runtime.migration] Running upgrade 92c57b58940d -> 03afc6b6f902, Increase length of FAB ab_view_menu.name column INFO [alembic.runtime.migration] Running upgrade 03afc6b6f902 -> cf5dc11e79ad, drop_user_and_chart INFO [alembic.runtime.migration] Running upgrade cf5dc11e79ad -> bbf4a7ad0465, Remove id column from xcom INFO [alembic.runtime.migration] Running upgrade bbf4a7ad0465 -> b25a55525161, Increase length of pool name INFO [alembic.runtime.migration] Running upgrade b25a55525161 -> 3c20cacc0044, Add DagRun run_type INFO [alembic.runtime.migration] Running upgrade 3c20cacc0044 -> 8f966b9c467a, Set conn_type as non-nullable INFO [alembic.runtime.migration] Running upgrade 8f966b9c467a -> 8d48763f6d53, add unique constraint to conn_id INFO [alembic.runtime.migration] Running upgrade 8d48763f6d53 -> e38be357a868, Add sensor_instance table INFO [alembic.runtime.migration] Running upgrade e38be357a868 -> b247b1e3d1ed, Add queued by Job ID to TI INFO [alembic.runtime.migration] Running upgrade b247b1e3d1ed -> e1a11ece99cc, Add external executor ID to TI INFO [alembic.runtime.migration] Running upgrade e1a11ece99cc -> bef4f3d11e8b, Drop KubeResourceVersion and KubeWorkerId INFO [alembic.runtime.migration] Running upgrade bef4f3d11e8b -> 98271e7606e2, Add scheduling_decision to DagRun and DAG INFO [alembic.runtime.migration] Running upgrade 98271e7606e2 -> 52d53670a240, fix_mssql_exec_date_rendered_task_instance_fields_for_MSSQL INFO [alembic.runtime.migration] Running upgrade 52d53670a240 -> 364159666cbd, Add creating_job_id to DagRun table INFO [alembic.runtime.migration] Running upgrade 364159666cbd -> 45ba3f1493b9, add-k8s-yaml-to-rendered-templates INFO [alembic.runtime.migration] Running upgrade 45ba3f1493b9 -> 849da589634d, Prefix DAG permissions. INFO [alembic.runtime.migration] Running upgrade 849da589634d -> 2c6edca13270, Resource based permissions. [2021-07-29 02:57:07,558] {manager.py:788} WARNING - No user yet created, use flask fab command to do it. INFO [alembic.runtime.migration] Running upgrade 2c6edca13270 -> 61ec73d9401f, Add description field to connection INFO [alembic.runtime.migration] Running upgrade 61ec73d9401f -> 64a7d6477aae, fix description field in connection to be text INFO [alembic.runtime.migration] Running upgrade 64a7d6477aae -> e959f08ac86c, Change field in DagCode to MEDIUMTEXT for MySql INFO [alembic.runtime.migration] Running upgrade e959f08ac86c -> 82b7c48c147f, Remove can_read permission on config resource for User and Viewer role [2021-07-29 02:57:12,985] {manager.py:788} WARNING - No user yet created, use flask fab command to do it. INFO [alembic.runtime.migration] Running upgrade 82b7c48c147f -> 449b4072c2da, Increase size of connection.extra field to handle multiple RSA keys INFO [alembic.runtime.migration] Running upgrade 449b4072c2da -> 8646922c8a04, Change default pool_slots to 1 INFO [alembic.runtime.migration] Running upgrade 8646922c8a04 -> 2e42bb497a22, rename last_scheduler_run column INFO [alembic.runtime.migration] Running upgrade 2e42bb497a22 -> 90d1635d7b86, Increase pool name size in TaskInstance INFO [alembic.runtime.migration] Running upgrade 90d1635d7b86 -> e165e7455d70, add description field to variable INFO [alembic.runtime.migration] Running upgrade e165e7455d70 -> a13f7613ad25, Resource based permissions for default FAB views. [2021-07-29 02:57:15,225] {manager.py:788} WARNING - No user yet created, use flask fab command to do it. WARNI [airflow.models.crypto] empty cryptography key - values will not be stored encrypted. INFO [airflow.models.dagbag.DagBag] Filling up the DagBag from /home/ramans/airflow/dags WARNI [unusual_prefix_4b1ffe5d701ea116e7b4c0dcdd50964f852ae761_example_kubernetes_executor_config] Could not import DAGs in example_kubernetes_executor_config.py: No module named 'kubernetes' WARNI [unusual_prefix_4b1ffe5d701ea116e7b4c0dcdd50964f852ae761_example_kubernetes_executor_config] Install kubernetes dependencies with: pip install apache-airflow['cncf.kubernetes'] INFO [airflow.models.dag] Sync 33 DAGs INFO [airflow.models.dag] Creating ORM DAG for example_xcom INFO [airflow.models.dag] Creating ORM DAG for tutorial_etl_dag INFO [airflow.models.dag] Creating ORM DAG for example_branch_operator INFO [airflow.models.dag] Creating ORM DAG for example_short_circuit_operator INFO [airflow.models.dag] Creating ORM DAG for latest_only_with_trigger INFO [airflow.models.dag] Creating ORM DAG for example_xcom_args_with_operators INFO [airflow.models.dag] Creating ORM DAG for example_bash_operator INFO [airflow.models.dag] Creating ORM DAG for example_nested_branch_dag INFO [airflow.models.dag] Creating ORM DAG for example_branch_dop_operator_v3 INFO [airflow.models.dag] Creating ORM DAG for example_dag_decorator INFO [airflow.models.dag] Creating ORM DAG for tutorial INFO [airflow.models.dag] Creating ORM DAG for example_passing_params_via_test_command INFO [airflow.models.dag] Creating ORM DAG for example_trigger_target_dag INFO [airflow.models.dag] Creating ORM DAG for tutorial_taskflow_api_etl INFO [airflow.models.dag] Creating ORM DAG for example_branch_labels INFO [airflow.models.dag] Creating ORM DAG for example_complex INFO [airflow.models.dag] Creating ORM DAG for example_weekday_branch_operator INFO [airflow.models.dag] Creating ORM DAG for example_subdag_operator.section-1 INFO [airflow.models.dag] Creating ORM DAG for example_python_operator INFO [airflow.models.dag] Creating ORM DAG for example_kubernetes_executor INFO [airflow.models.dag] Creating ORM DAG for example_branch_datetime_operator_2 INFO [airflow.models.dag] Creating ORM DAG for example_external_task_marker_parent INFO [airflow.models.dag] Creating ORM DAG for latest_only INFO [airflow.models.dag] Creating ORM DAG for test_utils INFO [airflow.models.dag] Creating ORM DAG for example_subdag_operator INFO [airflow.models.dag] Creating ORM DAG for example_trigger_controller_dag INFO [airflow.models.dag] Creating ORM DAG for example_skip_dag INFO [airflow.models.dag] Creating ORM DAG for example_external_task_marker_child INFO [airflow.models.dag] Creating ORM DAG for example_task_group_decorator INFO [airflow.models.dag] Creating ORM DAG for example_xcom_args INFO [airflow.models.dag] Creating ORM DAG for tutorial_taskflow_api_etl_virtualenv INFO [airflow.models.dag] Creating ORM DAG for example_task_group INFO [airflow.models.dag] Creating ORM DAG for example_subdag_operator.section-2 INFO [airflow.models.dag] Setting next_dagrun for example_bash_operator to 2021-07-27 00:00:00+00:00 INFO [airflow.models.dag] Setting next_dagrun for example_branch_datetime_operator_2 to 2021-07-27 00:00:00+00:00 INFO [airflow.models.dag] Setting next_dagrun for example_branch_dop_operator_v3 to 2021-07-27 00:00:00+00:00 INFO [airflow.models.dag] Setting next_dagrun for example_branch_labels to 2021-07-27 00:00:00+00:00 INFO [airflow.models.dag] Setting next_dagrun for example_branch_operator to 2021-07-27 00:00:00+00:00 INFO [airflow.models.dag] Setting next_dagrun for example_complex to None INFO [airflow.models.dag] Setting next_dagrun for example_dag_decorator to None INFO [airflow.models.dag] Setting next_dagrun for example_external_task_marker_child to None INFO [airflow.models.dag] Setting next_dagrun for example_external_task_marker_parent to None INFO [airflow.models.dag] Setting next_dagrun for example_kubernetes_executor to None INFO [airflow.models.dag] Setting next_dagrun for example_nested_branch_dag to 2021-07-27 00:00:00+00:00 INFO [airflow.models.dag] Setting next_dagrun for example_passing_params_via_test_command to 2021-07-28 00:00:00+00:00 INFO [airflow.models.dag] Setting next_dagrun for example_python_operator to None INFO [airflow.models.dag] Setting next_dagrun for example_short_circuit_operator to 2021-07-27 00:00:00+00:00 INFO [airflow.models.dag] Setting next_dagrun for example_skip_dag to 2021-07-27 00:00:00+00:00 INFO [airflow.models.dag] Setting next_dagrun for example_subdag_operator to 2021-07-27 00:00:00+00:00 INFO [airflow.models.dag] Setting next_dagrun for example_subdag_operator.section-1 to None INFO [airflow.models.dag] Setting next_dagrun for example_subdag_operator.section-2 to None INFO [airflow.models.dag] Setting next_dagrun for example_task_group to 2021-07-27 00:00:00+00:00 INFO [airflow.models.dag] Setting next_dagrun for example_task_group_decorator to 2021-07-27 00:00:00+00:00 INFO [airflow.models.dag] Setting next_dagrun for example_trigger_controller_dag to 2021-07-27 00:00:00+00:00 INFO [airflow.models.dag] Setting next_dagrun for example_trigger_target_dag to None INFO [airflow.models.dag] Setting next_dagrun for example_weekday_branch_operator to 2021-07-27 00:00:00+00:00 INFO [airflow.models.dag] Setting next_dagrun for example_xcom to 2021-07-27 00:00:00+00:00 INFO [airflow.models.dag] Setting next_dagrun for example_xcom_args to None INFO [airflow.models.dag] Setting next_dagrun for example_xcom_args_with_operators to None INFO [airflow.models.dag] Setting next_dagrun for latest_only to 2021-07-27 00:00:00+00:00 INFO [airflow.models.dag] Setting next_dagrun for latest_only_with_trigger to 2021-07-27 00:00:00+00:00 INFO [airflow.models.dag] Setting next_dagrun for test_utils to None INFO [airflow.models.dag] Setting next_dagrun for tutorial to 2021-07-27 00:00:00+00:00 INFO [airflow.models.dag] Setting next_dagrun for tutorial_etl_dag to None INFO [airflow.models.dag] Setting next_dagrun for tutorial_taskflow_api_etl to None INFO [airflow.models.dag] Setting next_dagrun for tutorial_taskflow_api_etl_virtualenv to None INFO [airflow.models.dag] Sync 2 DAGs INFO [airflow.models.dag] Setting next_dagrun for example_subdag_operator.section-1 to None INFO [airflow.models.dag] Setting next_dagrun for example_subdag_operator.section-2 to None Initialization done
5.3 Set the Apache-Airflow Login Credentials
The next step is to set the Apache-Airflow login credentials for airflow web interfaceadmin. The user creation will be using command line :
$ airflow users create \ > --username admin \ > --firstname Admin \ > --lastname Otodiginet \ > --role Admin \ > --email admin@otodiginet.com
Output :
(airflow_otodigi) ramans@otodiginet:~/airflow_otodigi/bin$ airflow users create \ > --username admin \ > --firstname Admin \ > --lastname Otodiginet \ > --role Admin \ > --email admin@otodiginet.com [2021-07-29 03:44:05,603] {manager.py:788} WARNING - No user yet created, use flask fab command to do it. Password: Repeat for confirmation: Admin user admin created
5.4 Start the Apache-Airflow web Interface
The next step is to activate Airflow Web Server, by submitting command line :
$ airflow webserver -p 8080
Output :
(airflow_otodigi) ramans@otodiginet:~/airflow_otodigi/bin$ airflow webserver -p 8080 ____________ _____________ ____ |__( )_________ __/__ /________ __ ____ /| |_ /__ ___/_ /_ __ /_ __ \_ | /| / / ___ ___ | / _ / _ __/ _ / / /_/ /_ |/ |/ / _/_/ |_/_/ /_/ /_/ /_/ \____/____/|__/ [2021-07-29 03:45:07,949] {dagbag.py:496} INFO - Filling up the DagBag from /dev/null Running the Gunicorn Server with: Workers: 4 sync Host: 0.0.0.0:8080 Timeout: 120 Logfiles: - - Access Logformat: ================================================================= [2021-07-29 03:45:12 -0700] [33487] [INFO] Starting gunicorn 20.1.0 [2021-07-29 03:45:12 -0700] [33487] [INFO] Listening at: http://0.0.0.0:8080 (33487) [2021-07-29 03:45:12 -0700] [33487] [INFO] Using worker: sync [2021-07-29 03:45:12 -0700] [33489] [INFO] Booting worker with pid: 33489 [2021-07-29 03:45:12 -0700] [33490] [INFO] Booting worker with pid: 33490 [2021-07-29 03:45:12 -0700] [33491] [INFO] Booting worker with pid: 33491 [2021-07-29 03:45:12 -0700] [33492] [INFO] Booting worker with pid: 33492 [2021-07-29 03:45:13 -0700] [33487] [INFO] Handling signal: winch
6. Accessing Airflow Web Interface
Conclusion
On this short article, we have discussed how to install Apache Airflow on Ubuntu 20.04 LTS operating system.