big data

Bigquery

Building a data warehouse solution using BigQuery | GCP BigQuery

An enterprise data warehouse brings the data together and makes it available for querying and data processing, it should consolidate data from many sources. All data in a data warehouse should be available for querying and it’s important to ensure that those queries are quick. Another reason to consolidate all of your data besides standardizing …

Building a data warehouse solution using BigQuery | GCP BigQuery Read More »

Introduction to Impala .. Architecture and Components | Impala

Cloudera Impala provides fast, interactive SQL queries directly on your Apache Hadoop data stored in HDFS or HBase. In addition to using the same unified storage platform, Impala also uses the same metadata, SQL syntax (Hive SQL), ODBC driver, and user interface (Cloudera Impala query UI in Hue) as Apache Hive. This provides a familiar …

Introduction to Impala .. Architecture and Components | Impala Read More »

Aggregation Queries in Apache Hive | Apache Hive

Introduction Data aggregation is the process of gathering and expressing data in a summary to get more information about particular groups based on specific conditions. HiveQL offers several built-in aggregate functions, such as max, min, avg,..etc. It also supports advanced aggregation using keywords such as Variance and Standard Deviation and different types of window functions. …

Aggregation Queries in Apache Hive | Apache Hive Read More »

Create a Kafka Pipeline using Java Application | Apache Kafka

Introduction This Article is about Programming Apache Kafka producer and consumer using Java language, as we’ll see, using Java we’ll be able to reproduce what the CLI does and even more. Prerequisites Kafka Installation and configuration article ( To setup cluster will be used in this article) Any java programming editor Ex. (Netbeans – IntelliJ …

Create a Kafka Pipeline using Java Application | Apache Kafka Read More »

Setup Apache Kafka Environment | Apache Kafka

Introduction This article is about configuring and starting an Apache Kafka server on a Windows OS and Linux. This guide will also provide instructions to set up Java and Apache Zookeeper, and after the setup we will create a simple pipeline to test our installation. Kafka on windows Make sure you have the following prerequisites …

Setup Apache Kafka Environment | Apache Kafka Read More »

Setup Apache Spark environment on Windows | Apache Spark

Apache Spark is easy to use, unified platform for all purposes of big data processing, and equipped with rich set of APIs for different application needs as Spark DataFrame and Spark SQL for structured data processing, Spark Streaming and Structured Streaming for streaming applications, Spark MLib for machine learning applications, Spark Graphx for Graph analytics …

Setup Apache Spark environment on Windows | Apache Spark Read More »

Apache Spark Application Execution Mode | Apache Spark

Apache Spark is a powerful processing platform for big data applications that supports different big data processing types. In this article we will discover together how Apache Spark application can be executed in multiple modes, depends on the environment architecture and on the application requirements. Before going into details, if you would like to setup …

Apache Spark Application Execution Mode | Apache Spark Read More »