The Practice of Cloud System Administration: Designing and Operating Large Distributed Systems, Volume 2

Book Description

“There’s an incredible amount of depth and thinking in the practicesdescribed here, and it’s impressive to see it all in one place.”

–Win Treese, coauthor of Designing Systems for Internet Commerce

The Practice of Cloud System Administration, Volume 2, focuses on “distributed” or “cloud” computing and brings a /SRE sensibility to the practice of system administration. Unsatisfied with books that cover either design or operations in isolation, the authors created this authoritative reference centered on a comprehensive approach.

Case studies and examples from Google, Etsy, Twitter, Facebook, Netflix, , and other industry giants are explained in practical ways that are useful to all enterprises. The new companion to the best-selling first volume, The Practice of System and Network Administration, Second Edition, this guide offers expert coverage of the following and many other crucial topics:

Designing and building modern web and distributed systems

  • Fundamentals of large
  • Understand the new implications of cloud administration
  • Make systems that are resilient to failure and grow and scale dynamically
  • Implement DevOps principles and cultural changes
  • IaaS/PaaS/SaaS and virtual platform selection

Operating and running systems using the latest DevOps/SRE strategies

  • Upgrade production systems with zero down-time
  • What and how to automate; how to decide what not to automate
  • On-call best practices that improve uptime
  • Why distributed systems require fundamentally different system administration techniques
  • Identify and resolve resiliency problems before they surprise you

Assessing and evaluating your team’s operational effectiveness

  • Manage the  process of continuous improvement
  • A forty-page, pain-free assessment system you can start using today

Table of Contents

Part I Design: Building It
Chapter 1 Designing in a Distributed World
Chapter 2 Designing for Operations
Chapter 3 Selecting a Service Platform
Chapter 4 Application Architectures
Chapter 5 for Scaling
Chapter 6 Design Patterns for Resiliency

Part II Operations: Running It
Chapter 7 Operations in a Distributed World
Chapter 8 DevOps Culture
Chapter 9 Service Delivery: The Build Phase
Chapter 10 Service Delivery: The Deployment Phase
Chapter 11 Upgrading Live Services
Chapter 12 Automation
Chapter 13 Design Documents
Chapter 14 Oncall
Chapter 15 Disaster Preparedness
Chapter 16 Monitoring Fundamentals
Chapter 17 Monitoring Architecture and Practice
Chapter 18 Capacity Planning
Chapter 19 Creating KPIs
Chapter 20 Operational Excellence

Part III Appendices
Appendix A Assessments
Appendix B The Origins and Future of Distributed Computing and Clouds
Appendix C Scaling Terminology and Concepts
Appendix D Templates and Examples
Appendix E Recommended Reading

Book Details