Up until recently, Hadoop deployments have existed on hardware owned and run by organizations, often alongside legacy "big-iron" hardware. Today, cloud service providers allow customers to effectively rent hardware and associated network connectivity, along with a variety of other features like databases and bulk storage.
But installing a Hadoop cluster on a public cloud service is not as straightforward as it may appear. This practical book shows you how to install these clusters in a way that harmonizes with public cloud service features, and examine ways to use and manage them efficiently.
You’ll learn how to architect clusters in a way that works with the features of the provider, not only to avoid potential pitfalls, but also to take full advantage of what the services can do. A cluster installed in a suboptimal fashion will run slower and cost more than expected, which can defeat the goals of moving to the service in the first place.
Table of Contents
Chapter 1 Why Hadoop in the Cloud?
Chapter 2 Overview and Comparison of Cloud Providers
Chapter 3 Instances
Chapter 4 Networking and Security
Chapter 5 Storage
Chapter 6 Setting Up in AWS
Chapter 7 Setting Up in GCP
Chapter 8 Setting Up in Azure
Chapter 9 Standing Up a Cluster
Chapter 10 High Availability
Chapter 11 Relational Data with Apache Hive
Chapter 12 Complex Analytics with Spark
Chapter 13 Pricing and Performance
Chapter 14 Network Topologies
Chapter 15 Patterns for Managing Clusters
Chapter 16 Backup and Restoration