Skip to main content

Airflow for Beginners — II

 As in the previous article Airflow for beginners — I we learn about the airflow and how to install the airflow on local. In this article, we will learn about the DAG and how to use the web airflow for scheduling the work and managing it.

Let’s start with the DAG and then we will learn about the web airflow.

DAG (Directed Acyclic Graph)

Image for post

DAG is a directed graph without a cycle in it. It has a finite number of edges and nodes. In which we can know which thing(node) will come first and we will know the sequence of the nodes. so just think as each node is a work( that you want to execute ) so we can represent the sequencing of the work by the DAG and that’s why DAG is an essential part of the airflow.

WEB AIRFLOW

So now as you saw that on

localhost:8080 

if you open it will look like:

Image for post

now, these are the example DAG or work that is added already so for adding your own script(DAG) just follow simple steps mentioned below:

  1. Just go to the airflow directory in terminal (it will be in the root) then just make the new folder by the name “dags” because in the config file the path is set for dags for taking the dags.
  2. Now only left is how to create the dag that we will now go through by taking the example of executing two files one after another. We are doing this example because usually in your workspace you have to execute some files at some scheduled interval.

2.a. create two files in my case I created two python scripts for executing.

2.b. now you have to write a python script for creating DAG

so let's go in details

  1. First import the libraries we need
import airflowfrom airflow.models import DAGfrom airflow.operators.bash_operator import BashOperatorfrom airflow.operators.dummy_operator import DummyOperator

2. Now initialize the dag argument that we need to pass and initialize the dag.

def_arg = {'owner': 'RD_TYGA','start_date': airflow.utils.dates.days_ago(2),}dag = DAG(dag_id='rahul_dhawan_dag',default_args=def_arg,schedule_interval='0 0 * * *',) # run every minute

3. Now create tasks you want to execute.

t1 = BashOperator(task_id='python_task_1',bash_command='python /Users/rahul/Desktop/goal.py ',dag=dag)t2 = BashOperator(task_id='python_task_2',bash_command='python /Users/rahul/Desktop/goal2.py ',dag=dag)

4. Now add the order for the run.

t1 >> t2

now just save it and run this command

python your_file_name.py

and after 1 or 2 minutes you will see that dag in the webserver.

Here you are ready with scheduling now just see how it's running.

Thank you for your support.

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data…

Comments

Popular posts from this blog

Random Forest and how it works

  Random Forest Random Forest is a Machine Learning Algorithm based on Decision Trees. Random forest works on the ensemble method which is very common these days. The ensemble method means that to make a decision collectively based on the decision trees. Actually, we make a prediction, not simply based on One Decision Tree, but by an unanimous Prediction, made by ‘ K’  Decision Trees. Why should we use There are four reasons why should we us e  the random forest algorithm. The one is that it can be used for both  classification and regression  businesses. Overfitting is one critical problem that may make the results worse, but for the Random Forest algorithm, if there are enough trees in the forest, the classifier  won’t overfit  the model. The third reason is the classifier of Random Forest can handle  missing values , and the last advantage is that the Random Forest classifier can be modeled for  categorical values. How does the Random...

DBSCAN Clustering Algorithm-with maths

  DBSCAN is a short-form of   D ensity- B ased   S patial   C lustering of   A pplications with   N oise. It is an unsupervised algorithm that will take the set of points and make them into some sets which have the same properties. It is based on the density-based clustering and it will mark the outliers also which do not lie in any of the cluster or set. There are some terms that we need to know before we proceed further for algorithm: Density Reachability A point “p” is said to be   density reachable from a point “q” if point “p” is within ε distance from point “q” and “q” has a sufficient number of points in its neighbors which are within distance ε. Density Connectivity A point “p” and “q” are said to be density connected if there exists a point “r” which has a sufficient number of points in its neighbors and both the points “p” and “q” is within the ε distance. This is a chaining process. So, if “q” is neighbor of “r”, “r” is neighbor of “s”, “s” ...

Neural Network theory and implementation for Regression

Introduction and background In this article, we are going to build the regression model from neural networks for predicting the price of a house based on the features. Here is the implementation and the theory behind it. The neural network is basically if you see is derived from the logistic regression, as we know that in the logistic regression: Formulae for Logistic Regression:  y = ax+b so for every node in each layer, we will apply it and after this output is from the activation function which will have the input from logistic regression and the output is output from the activation function. So now  w e will implement the neural  network  with 5 hidden layers. Implementation 1. Import the libraries which we will going to use 2. Import the dataset and check the types of the columns 3. Now build your training and test set from the dataset. 4. Now we have our data we will now make the model and I will describe to you how it will predict the price. Here we are making...