Apache Kylin

Overview:

Kylin is designed to reduce query latency on Hadoop for extremely large datasets (10 + billion rows).

Multi-Dimensional OLAP Engine – Users can define data model and pre-build in Kylin for whole data with required dimensions & measures.

Kylin will build the pre-aggregate Cube for all combination of dimensions (1D,2D,3D etc) provided. Dimensions cab be configured during the cube build phase according to the queries needs to be run for getting the measures.

Problem Statement:

The existing SQL-on-Hadoop needs to scan partial or whole data set to answer a user query. Moreover, table join may trigger the data transfer across host. Due to large scan range and network traffic, many queries are very slow (minute+ latency).

Kylin – ROLAP/MOLAP:

ROLAP (Relational-OLAP) is to do runtime aggregation. ROLAP is flexible but much slower. All existing SQL-on-Hadoop is kind of ROLAP.

MOLAP (Multi-dimensional OLAP) is to pre-compute data with required dimensions and store resultant values in the cube. MOLAP is much faster. Kylin is more like MOLAP.

Kylin builds data cube (MOLAP) from hive table (ROLAP) according to the metadata definition. If the query can be fulfilled by data cube, Kylin will route the query to data cube that is MOLAP. If the query can’t be fulfilled by data cube, Kylin will route the query to hive table that is ROLAP.

Kylin – Highlights:

  • Rest API – Build Cube, Job Monitoring
  • SQL interface in Kylin
  • Incremental refresh of cubes
  • Integration capability with BI tolls using JDBC connection
  • Measures – Sum, Count, Max, Min, Avg, Distinct count
  • Storage – Hbase with Pre-join & Pre-Aggregation

Architecture:

kylin1.jpg

Job Flow:

kylin2.jpg

  • Create Intermediate Flat Hive Table
  • Extract Fact Table Distinct Dimension Columns
  • Create HTable
  • Build Cube with Spark
  • Load the HFile into HBase Table
  • Hive Clean-up

Metrics – Kylin vs Hive:

Below metrics have been provided by the Kylin team.

kylin3.jpg

Kylin Resources:

Web Site – http://kylin.apache.org/

Google Groups – https://groups.google.com/forum/#!forum/kylin-olap

Source Code – https://github.com/KylinOLAP/Kylin

Leave a comment