Data Engineering

Data Lake Concept and Solutions on GCP using Cloud Storage | GCP Cloud Storage.

Introduction to Data Lakes Let’s start with a discussion about what data lakes are, and then where they fit in as a critical component to your overall data engineering ecosystem. So what is a data lake? Well, it’s a fairly broad term, but it generally describes a place where you can securely store various types …

Data Lake Concept and Solutions on GCP using Cloud Storage | GCP Cloud Storage. Read More »

NoSQL Databases in GCP

NoSQL Database Services | Cloud Datastore, Cloud Firestore, and Cloud Bigtable

Introduction The relational database (RDBMS) model completely dominated database technology for over 20 years. Today this “one size fits all” stability has been disrupted by a relatively recent explosion of new database technologies. These paradigm-busting technologies are powering the “Big Data” and “NoSQL” revolutions, as well as forcing fundamental changes in databases across the board. …

NoSQL Database Services | Cloud Datastore, Cloud Firestore, and Cloud Bigtable Read More »

Bigquery

Building a data warehouse solution using BigQuery | GCP BigQuery

An enterprise data warehouse brings the data together and makes it available for querying and data processing, it should consolidate data from many sources. All data in a data warehouse should be available for querying and it’s important to ensure that those queries are quick. Another reason to consolidate all of your data besides standardizing …

Building a data warehouse solution using BigQuery | GCP BigQuery Read More »

Building a data pipeline using Dataflow | GCP Dataflow

Data uncover deep insights, support informed decisions, and enhances efficient processes. But when data coming from various sources, in varying formats, and stored across different infrastructures, so here are data pipelines are coming as the first step to centralizing data for reliable business intelligence, operational insights, and analytics. By contrast, the data pipeline is a …

Building a data pipeline using Dataflow | GCP Dataflow Read More »

Introduction to Impala .. Architecture and Components | Impala

Cloudera Impala provides fast, interactive SQL queries directly on your Apache Hadoop data stored in HDFS or HBase. In addition to using the same unified storage platform, Impala also uses the same metadata, SQL syntax (Hive SQL), ODBC driver, and user interface (Cloudera Impala query UI in Hue) as Apache Hive. This provides a familiar …

Introduction to Impala .. Architecture and Components | Impala Read More »

Dimensional Modeling … Design Methodology for Analytics Oriented Data Warehouse | Data Warehouse

Data warehouses has been around since the 80s. Throughout these years, it has proven its capabilities to support decision making and business analysis. Data warehouses allow Integrating many source systems such as databases, spreadsheets, and flat files. Cleansing and Transformation can be applied to these data after integration then organizes it in a way that …

Dimensional Modeling … Design Methodology for Analytics Oriented Data Warehouse | Data Warehouse Read More »

Your Guide to NoSQL Databases | Data Engineering

One of the major reasons that the era of big data started was the increase in the number of data source and variety of data types that each organization has nowadays, almost any organization has different types of data not only structured data but also it can have unstructured or semi-structured data, and each type …

Your Guide to NoSQL Databases | Data Engineering Read More »

Getting Started with Containers & Dockers | Dockers

Introduction Containerization revolutionized the software development and it becomes a common building block in today’s architecture, applications, big data environments, and data engineering applications can be deployed and developed inside containers In this article, we will know more containers and its advantage, and we will discuss Dockers which is a container image that packages all …

Getting Started with Containers & Dockers | Dockers Read More »

Aggregation Queries in Apache Hive | Apache Hive

Introduction Data aggregation is the process of gathering and expressing data in a summary to get more information about particular groups based on specific conditions. HiveQL offers several built-in aggregate functions, such as max, min, avg,..etc. It also supports advanced aggregation using keywords such as Variance and Standard Deviation and different types of window functions. …

Aggregation Queries in Apache Hive | Apache Hive Read More »