Untangling Apache Hadoop YARN, Part 3

Part 3: Scheduler Concepts By Ray Chiang and Dennis Dawson In Parts 1 and 2, we covered the basics of YARN resource allocation. In this installment, we’ll provide an overview of cluster scheduling and introduce the Fair Scheduler, one of the scheduler choices available in YARN. A standalone computer can have several CPU cores, each running […]

Untangling Apache Hadoop YARN, Part 2:

Part 2: Global Configuration Basics By Ray Chiang and Dennis Dawson   A new installment in the series about the tangled ball of thread that is YARN In part1 of this series, we covered the fundamentals of clusters of YARN. In Part 2, you’ll learn about other components than can run on a cluster and how […]

Untangling Apache Hadoop YARN

 Part 1: Cluster and YARN Basics Ray Chiang is a Software Engineer at Cloudera. Dennis Dawson is a Senior Technical Writer at Cloudera. Categories: Hadoop MapReduce YARN In this multipart series, fully explore the tangled ball of thread that is YARN. YARN (Yet Another Resource Negotiator) is the resource management layer for the Apache Hadoop […]

APACHE ZEPPELIN

APACHE ZEPPELIN: THE ROAD AHEAD by Vinay Shukla& Guest Author The below blog has been co-authored by Vinay Shukla, Hortonworks, Moon So Lee, Apache Zeppelin PMC & NFLabs, Prabhjyot Singh, Apache Zeppelin PMC & Hortonworks” Recently the Apache Software Foundation (ASF) announced Apache Zeppelin as a top level project. This was a great milestone for both […]

Using Apache Hive on Docker

Apache Hive is data warehouse framework for storing, managing and querying large data sets. The Hive query language HiveQL is a SQL-like language. Hive stores data in HDFS by default, and a Hive table may be used to define structure on the data. Hive supports two kinds of tables: managed tables and external tables. A managed table is […]