Big Data

Introduction to Impala .. Architecture and Components | Impala

Cloudera Impala provides fast, interactive SQL queries directly on your Apache Hadoop data stored in HDFS or HBase. In addition to using the same unified storage platform, Impala also uses the same metadata, SQL syntax (Hive SQL), ODBC driver, and user interface (Cloudera Impala query UI in Hue) as Apache Hive. This provides a familiar …

Introduction to Impala .. Architecture and Components | Impala Read More »

Analyze COVID-19 Dataset with Databricks | Databricks Unified Analytics Platform

In this article, we will analyze COVID-19 Dataset using Databricks unified analytics platform using the community edition of the platform, which is totally for free and you can use it as your playground to test Apache Spark applications in Python or R depends on your favorite API of development. Dataset will be used in this …

Analyze COVID-19 Dataset with Databricks | Databricks Unified Analytics Platform Read More »

Apache Kafka and Apache Spark Integration | Apache Kafka | Apache Spark

Introduction Apache Kafka is a scalable, high performance, low latency platform that allows reading and writing streams of data like a messaging system. We can start writing Kafka applications using Java fairly easily, check our previous article on how to design a Kafka pipeline in Java. If you research the variety of real-world use-cases for Kafka, you will very …

Apache Kafka and Apache Spark Integration | Apache Kafka | Apache Spark Read More »

Create a Kafka Pipeline using Java Application | Apache Kafka

Introduction This Article is about Programming Apache Kafka producer and consumer using Java language, as we’ll see, using Java we’ll be able to reproduce what the CLI does and even more. Prerequisites Kafka Installation and configuration article ( To setup cluster will be used in this article) Any java programming editor Ex. (Netbeans – IntelliJ …

Create a Kafka Pipeline using Java Application | Apache Kafka Read More »

Setup Apache Flink Environment Standalone on Windows | Apache Flink

Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams, for introduction about Apache Flink components please check our previous article In this article we will learn together how to setup and run Apache Flink in Standalone mode. Run Apache Flink Standalone Flink has been designed to …

Setup Apache Flink Environment Standalone on Windows | Apache Flink Read More »

Introduction to Apache Flink | Apache Flink

Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams. Flink has been designed to run in all common cluster environments, perform computations at in-memory speed and at any scale. Apache Flink is powerful open source engine which provides: Batch Processing Interactive Processing Real-time (Streaming) Processing Graph …

Introduction to Apache Flink | Apache Flink Read More »

Setup Apache Kafka Environment | Apache Kafka

Introduction This article is about configuring and starting an Apache Kafka server on a Windows OS and Linux. This guide will also provide instructions to set up Java and Apache Zookeeper, and after the setup we will create a simple pipeline to test our installation. Kafka on windows Make sure you have the following prerequisites …

Setup Apache Kafka Environment | Apache Kafka Read More »