HCIA - Big Data Learning Path

Online courses

This course consists of two parts.The first part mainly describes what is big data and the opportunities and challenges we face in the age of big data. The second part describes the Huawei Kunpeng Big Data solution, including the Kunpeng server based on the Kunpeng chipset and HUAWEI CLOUD Kunpeng cloud services.

Enroll Now

Big Data Development Trend and Kunpeng Big Data Solution

This course describes the big data distributed storage system HDFS and the ZooKeeper distributed service framework that resolves some frequently-encountered data management problems in distributed services.

Enroll Now

HDFS and ZooKeeper

The Apache Hive data warehouse software helps read, write, and manage large data sets that reside in distributed storage by using SQL. Structures can be projected onto stored data. The command line tool and JDBC driver are provided to connect users to Hive.

Enroll Now

Hive - Distributed Data Warehouse

This course describes the non-relational distributed database called HBase in the Hadoop open-source community, which can meet the requirements of large-scale and real-time data processing applications.

Enroll Now

HBase Technical Principles

This course describes MapReduce and YARN. MapReduce is the most famous computing framework for batch processing and offline processing in the big data field. YARN is the component responsible for unified resource management and scheduling in the Hadoop cluster.

Enroll Now

MapReduce and YARN Technical Principles

This course describes the basic concepts of Spark and the similarities and differences between the Resilient Distributed Dataset (RDD), DataSet, and DataFrame data structures in Spark.

Enroll Now

Spark - An In-Memory Distributed Computing Engine

This course describes the core technologies and architecture, the time and window mechanisms and the fault tolerance mechanism of Flink.

Enroll Now

Flink, Stream and Batch Processing in a Single Engine

Flume is an open-source, distributed, reliable, and highly available massive log aggregation system. It supports custom data transmitters for collecting data. It roughly processes data and writes data to data receivers.

Enroll Now

Flume - Massive Log Aggregation

Loader is used for efficient data import and export between the big data platform and structured data storage (such as relational databases). Based on the open-source Sqoop 1.99.x, Loader functions have been enhanced.

Enroll Now

Loader Data Conversion

This chapter describes the basic concepts, architecture, and functions of Kafka. It is important to know how Kafka ensures reliability for data storage and transmission and how historical data is processed.

Enroll Now

Kafka - Distributed Publish-Subscribe Messaging System

The in-depth development of big data open-source technologies cannot be achieved without the support of underlying platform technologies such as Hadoop. To manage the access control permission of data and resources in the cluster, Huawei big data platform implements a highly reliable cluster security mode based on LDAP and Kerberos and provides an integrated security authentication.

Enroll Now

LDAP and Kerberos

In recent years, Elasticsearch has developed rapidly and surpassed its original role as a search engine. It has added the features of data aggregation analysis and visualization. If you need to locate desired content using keywords in millions of documents, Elasticsearch is the best choice.

Enroll Now

Elasticsearch - Distributed Search Engine

Redis is a network-based, high-performance key-value in-memory database which is frequently used in differently scenarios. This course talks about the related architecture and application scenarios of Redis.

Enroll Now

Redis In-Memory Database

This course mainly talks about the Huawei Big Data solution. This solution implements cross-cloud seamless synchronization of advanced service capabilities and multi-scenario collaboration, and supports Huawei Kunpeng and Ascend computing capabilities to help governments and enterprises realize refined resource control, cross-cloud hybrid orchestration, collaboration of multiple scenarios.

Enroll Now

Huawei Big Data Solution

Training the cloud talent of the future.

Learn More

HCIA - Big Data V3.0 Learning Path

Online courses

This course describes the big data distributed storage system HDFS and the ZooKeeper distributed service framework that resolves some frequently-encountered data management problems in distributed services.

The Apache Hive data warehouse software helps read, write, and manage large data sets that reside in distributed storage by using SQL. Structures can be projected onto stored data. The command line tool and JDBC driver are provided to connect users to Hive.

This course describes the non-relational distributed database called HBase in the Hadoop open-source community, which can meet the requirements of large-scale and real-time data processing applications.

This course describes MapReduce and YARN. MapReduce is the most famous computing framework for batch processing and offline processing in the big data field. YARN is the component responsible for unified resource management and scheduling in the Hadoop cluster.

This course describes the basic concepts of Spark and the similarities and differences between the Resilient Distributed Dataset (RDD), DataSet, and DataFrame data structures in Spark.

This course describes the core technologies and architecture, the time and window mechanisms and the fault tolerance mechanism of Flink.

Flume is an open-source, distributed, reliable, and highly available massive log aggregation system. It supports custom data transmitters for collecting data. It roughly processes data and writes data to data receivers.

Loader is used for efficient data import and export between the big data platform and structured data storage (such as relational databases). Based on the open-source Sqoop 1.99.x, Loader functions have been enhanced.

This chapter describes the basic concepts, architecture, and functions of Kafka. It is important to know how Kafka ensures reliability for data storage and transmission and how historical data is processed.

In recent years, Elasticsearch has developed rapidly and surpassed its original role as a search engine. It has added the features of data aggregation analysis and visualization. If you need to locate desired content using keywords in millions of documents, Elasticsearch is the best choice.

Redis is a network-based, high-performance key-value in-memory database which is frequently used in differently scenarios. This course talks about the related architecture and application scenarios of Redis.