A complete, hands-on guide to building and maintaining large Apache Hadoop clusters using Cloudera Manager and CDH5
- Understand the CDH architecture and its components and successfully set up a Hadoop cluster
- Maintain, troubleshoot, and secure your cluster using Cloudera Manager
- Easy-to-follow administrator’s guide with step-by-step explanations to help you master Apache Hadoop
Apache Hadoop is an open source distributed computing technology that assists users in processing large volumes of data with relative ease, helping them to generate tremendous insights into their data. Cloudera, with their open source distribution of Hadoop, has made data analytics on big data possible and accessible to anyone interested.
This book fully prepares you to be a Hadoop administrator, with special emphasis on Cloudera's CDH. It provides step-by-step instructions on setting up and managing a robust Hadoop cluster running CDH5. This book will also equip you with an understanding of tools such as Cloudera Manager, which is currently being used by many companies to manage Hadoop clusters with hundreds of nodes. You will learn how to set up security using Kerberos. You will also use Cloudera Manager to set up alerts and events that will help you monitor and troubleshoot cluster issues.
What you will learn from this book
- Understand the Apache Hadoop architecture and the future of distributed processing frameworks
- Use HDFS and MapReduce for all file-related operations
- Install and configure CDH to bring up an Apache Hadoop cluster
- Configure HDFS High Availability and HDFS Federation to prevent single points of failure
- Install and configure Cloudera Manager to perform administrator operations
- Implement security by installing and configuring Kerberos for all services in the cluster
- Add, remove, and rebalance nodes in a cluster using cluster management tools
- Understand and configure the different backup options to back up your HDFS
An easy-to-follow Apache Hadoop administrator’s guide filled with practical screenshots and explanations for each step and configuration.
Who this book is written for
This book is great for administrators interested in setting up and managing a large Hadoop cluster. If you are an administrator, or want to be an administrator, and you are ready to build and maintain a production-level cluster running CDH5, then this book is for you.
Table of Contents
Chapter 1. Getting Started with Apache Hadoop
Chapter 2. HDFS and MapReduce
Chapter 3. Cloudera's Distribution Including Apache Hadoop
Chapter 4. Exploring HDFS Federation and Its High Availability
Chapter 5. Using Cloudera Manager
Chapter 6. Implementing Security Using Kerberos
Chapter 7. Managing an Apache Hadoop Cluster
Chapter 8. Cluster Monitoring Using Events and Alerts
Chapter 9. Configuring Backups