Skip to main content

Airflow for Beginners — II

 As in the previous article Airflow for beginners — I we learn about the airflow and how to install the airflow on local. In this article, we will learn about the DAG and how to use the web airflow for scheduling the work and managing it.

Let’s start with the DAG and then we will learn about the web airflow.

DAG (Directed Acyclic Graph)

Image for post

DAG is a directed graph without a cycle in it. It has a finite number of edges and nodes. In which we can know which thing(node) will come first and we will know the sequence of the nodes. so just think as each node is a work( that you want to execute ) so we can represent the sequencing of the work by the DAG and that’s why DAG is an essential part of the airflow.

WEB AIRFLOW

So now as you saw that on

localhost:8080 

if you open it will look like:

Image for post

now, these are the example DAG or work that is added already so for adding your own script(DAG) just follow simple steps mentioned below:

  1. Just go to the airflow directory in terminal (it will be in the root) then just make the new folder by the name “dags” because in the config file the path is set for dags for taking the dags.
  2. Now only left is how to create the dag that we will now go through by taking the example of executing two files one after another. We are doing this example because usually in your workspace you have to execute some files at some scheduled interval.

2.a. create two files in my case I created two python scripts for executing.

2.b. now you have to write a python script for creating DAG

so let's go in details

  1. First import the libraries we need
import airflowfrom airflow.models import DAGfrom airflow.operators.bash_operator import BashOperatorfrom airflow.operators.dummy_operator import DummyOperator

2. Now initialize the dag argument that we need to pass and initialize the dag.

def_arg = {'owner': 'RD_TYGA','start_date': airflow.utils.dates.days_ago(2),}dag = DAG(dag_id='rahul_dhawan_dag',default_args=def_arg,schedule_interval='0 0 * * *',) # run every minute

3. Now create tasks you want to execute.

t1 = BashOperator(task_id='python_task_1',bash_command='python /Users/rahul/Desktop/goal.py ',dag=dag)t2 = BashOperator(task_id='python_task_2',bash_command='python /Users/rahul/Desktop/goal2.py ',dag=dag)

4. Now add the order for the run.

t1 >> t2

now just save it and run this command

python your_file_name.py

and after 1 or 2 minutes you will see that dag in the webserver.

Here you are ready with scheduling now just see how it's running.

Thank you for your support.

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data…

Comments

Popular posts from this blog

NEW TREND OF DATA SCIENCE: REINFORCEMENT LEARNING

Reinforcement Learning (RL) is a machine learning method that empowers a specialist to learn in an intuitive environment by performing trial and error utilizing observations from its very own activities and encounters. In spite of the fact that both direct and reinforcement learning use mapping among input and output, not at all like supervised learning where input gave to the specialist is basically the right set of activities for playing out a task, reinforcement learning utilizes prizes and discipline as signs for positive and negative conduct. When compared with unsupervised learning, reinforcement learning is distinctive as far as objectives are taken into consideration. While the objective in unsupervised learning is to discover synonymities and contrasts between data points, in reinforcement learning the objective is to locate a reasonable activity model that would boost the aggregate total reward of the specialist. Reinforcement learning will be a huge thing in Data science in ...