data engineering

Building a data pipeline using Dataflow | GCP Dataflow

Data uncover deep insights, support informed decisions, and enhances efficient processes. But when data coming from various sources, in varying formats, and stored across different infrastructures, so here are data pipelines are coming as the first step to centralizing data for reliable business intelligence, operational insights, and analytics. By contrast, the data pipeline is a …

Building a data pipeline using Dataflow | GCP Dataflow Read More »

Introduction to Impala .. Architecture and Components | Impala

Cloudera Impala provides fast, interactive SQL queries directly on your Apache Hadoop data stored in HDFS or HBase. In addition to using the same unified storage platform, Impala also uses the same metadata, SQL syntax (Hive SQL), ODBC driver, and user interface (Cloudera Impala query UI in Hue) as Apache Hive. This provides a familiar …

Introduction to Impala .. Architecture and Components | Impala Read More »

Dimensional Modeling … Design Methodology for Analytics Oriented Data Warehouse | Data Warehouse

Data warehouses has been around since the 80s. Throughout these years, it has proven its capabilities to support decision making and business analysis. Data warehouses allow Integrating many source systems such as databases, spreadsheets, and flat files. Cleansing and Transformation can be applied to these data after integration then organizes it in a way that …

Dimensional Modeling … Design Methodology for Analytics Oriented Data Warehouse | Data Warehouse Read More »

Getting Started with Containers & Dockers | Dockers

Introduction Containerization revolutionized the software development and it becomes a common building block in today’s architecture, applications, big data environments, and data engineering applications can be deployed and developed inside containers In this article, we will know more containers and its advantage, and we will discuss Dockers which is a container image that packages all …

Getting Started with Containers & Dockers | Dockers Read More »

Aggregation Queries in Apache Hive | Apache Hive

Introduction Data aggregation is the process of gathering and expressing data in a summary to get more information about particular groups based on specific conditions. HiveQL offers several built-in aggregate functions, such as max, min, avg,..etc. It also supports advanced aggregation using keywords such as Variance and Standard Deviation and different types of window functions. …

Aggregation Queries in Apache Hive | Apache Hive Read More »

Analyze COVID-19 Dataset with Databricks | Databricks Unified Analytics Platform

In this article, we will analyze COVID-19 Dataset using Databricks unified analytics platform using the community edition of the platform, which is totally for free and you can use it as your playground to test Apache Spark applications in Python or R depends on your favorite API of development. Dataset will be used in this …

Analyze COVID-19 Dataset with Databricks | Databricks Unified Analytics Platform Read More »

Detailed Guide for String Wrangling in SQL | MySQL | SQL Analysis

Extracting information from string columns is almost a repetitive necessity in Data Engineers, Data Scientists, and Business Analysts day to day tasks, and this task can be done using a programming language such as Python, or by SQL depends on your application and on the task required. In this tutorial, we will discover together how …

Detailed Guide for String Wrangling in SQL | MySQL | SQL Analysis Read More »