Month: August 2020

Dimensional Modeling … Design Methodology for Analytics Oriented Data Warehouse | Data Warehouse

Data warehouses has been around since the 80s. Throughout these years, it has proven its capabilities to support decision making and business analysis. Data warehouses allow Integrating many source systems such as databases, spreadsheets, and flat files. Cleansing and Transformation can be applied to these data after integration then organizes it in a way that …

Dimensional Modeling … Design Methodology for Analytics Oriented Data Warehouse | Data Warehouse Read More »

Your Guide to NoSQL Databases | Data Engineering

One of the major reasons that the era of big data started was the increase in the number of data source and variety of data types that each organization has nowadays, almost any organization has different types of data not only structured data but also it can have unstructured or semi-structured data, and each type …

Your Guide to NoSQL Databases | Data Engineering Read More »

Getting Started with Containers & Dockers | Dockers

Introduction Containerization revolutionized the software development and it becomes a common building block in today’s architecture, applications, big data environments, and data engineering applications can be deployed and developed inside containers In this article, we will know more containers and its advantage, and we will discuss Dockers which is a container image that packages all …

Getting Started with Containers & Dockers | Dockers Read More »

Ultimate Guide to choose the best Chart for your Dashboard | Business Intelligence

Introduction Everyone have a different view to the data, you can extract insights from the data and another one extract different insights from the same data.Also the different audiences have different informational needs, so when you’re building your dashboard ask the decision makers: “What are we trying to extract and know from this analyze to …

Ultimate Guide to choose the best Chart for your Dashboard | Business Intelligence Read More »

Aggregation Queries in Apache Hive | Apache Hive

Introduction Data aggregation is the process of gathering and expressing data in a summary to get more information about particular groups based on specific conditions. HiveQL offers several built-in aggregate functions, such as max, min, avg,..etc. It also supports advanced aggregation using keywords such as Variance and Standard Deviation and different types of window functions. …

Aggregation Queries in Apache Hive | Apache Hive Read More »

Azure Data Factory – Modern ETL On Cloud – Data Migration Use Case | Azure Data Factory

Introduction ETL is one of the major tasks for any data engineer, and we have many solutions either on-premise or cloud solutions available in the market to implement this concept, in Microsoft Azure, Azure Data Factory is the ETL solution to implement data pipelines using data from the cloud source or data from on-premise sources, …

Azure Data Factory – Modern ETL On Cloud – Data Migration Use Case | Azure Data Factory Read More »

Scikit-learn Advanced Features | Data Science

Neither Titanic dataset nor sklearn a new thing for any data scientist but there are some important features in scikit-learn that will make any model pre-processing and tuning easier, to be specific this notebook will cover the following concepts ColumnTransformer Pipeline SimpleImputer StandardScalar OneHotEncoder OrdinalEncoder GridSearch The dataset used in this article can be found …

Scikit-learn Advanced Features | Data Science Read More »