Data Engineering

Azure Data Factory – Modern ETL On Cloud – Data Migration Use Case | Azure Data Factory

Introduction ETL is one of the major tasks for any data engineer, and we have many solutions either on-premise or cloud solutions available in the market to implement this concept, in Microsoft Azure, Azure Data Factory is the ETL solution to implement data pipelines using data from the cloud source or data from on-premise sources, …

Azure Data Factory – Modern ETL On Cloud – Data Migration Use Case | Azure Data Factory Read More »

Analyze COVID-19 Dataset with Databricks | Databricks Unified Analytics Platform

In this article, we will analyze COVID-19 Dataset using Databricks unified analytics platform using the community edition of the platform, which is totally for free and you can use it as your playground to test Apache Spark applications in Python or R depends on your favorite API of development. Dataset will be used in this …

Analyze COVID-19 Dataset with Databricks | Databricks Unified Analytics Platform Read More »

Detailed Guide for String Wrangling in SQL | MySQL | SQL Analysis

Extracting information from string columns is almost a repetitive necessity in Data Engineers, Data Scientists, and Business Analysts day to day tasks, and this task can be done using a programming language such as Python, or by SQL depends on your application and on the task required. In this tutorial, we will discover together how …

Detailed Guide for String Wrangling in SQL | MySQL | SQL Analysis Read More »

Setup Talend Open Studio on Linux

Introduction Talend is an open-source data integration platform. It provides different solutions and services for data integration, data quality, cloud storage, and Big Data. According to the latest Gartner report, Talend named in the leader’s quadrant among other data integration solutions. In this article, we will show you step by step to install and configure Talend …

Setup Talend Open Studio on Linux Read More »

How to choose your ETL solution | Data Integration

ETL stands for Extraction Transform Load is a common concept in data engineering, and as we can imply from the name of the concept that this concept has three types of operations, Extract which indicate the process of extracting data from the source system of information, Transform to represent the process of manipulating the data …

How to choose your ETL solution | Data Integration Read More »

Create a Kafka Pipeline using Java Application | Apache Kafka

Introduction This Article is about Programming Apache Kafka producer and consumer using Java language, as we’ll see, using Java we’ll be able to reproduce what the CLI does and even more. Prerequisites Kafka Installation and configuration article ( To setup cluster will be used in this article) Any java programming editor Ex. (Netbeans – IntelliJ …

Create a Kafka Pipeline using Java Application | Apache Kafka Read More »

Setup Apache Flink Environment Standalone on Windows | Apache Flink

Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams, for introduction about Apache Flink components please check our previous article In this article we will learn together how to setup and run Apache Flink in Standalone mode. Run Apache Flink Standalone Flink has been designed to …

Setup Apache Flink Environment Standalone on Windows | Apache Flink Read More »