Beginning Apache Pig: Big Data Processing Made Easy Front Cover

Beginning Apache Pig: Big Data Processing Made Easy

  • Length: 274 pages
  • Edition: 1st ed.
  • Publisher:
  • Publication Date: 2017-01-08
  • ISBN-10: 1484223365
  • ISBN-13: 9781484223369
  • Sales Rank: #1557250 (See Top 100 Books)
Description

Learn to use Apache Pig to develop lightweight big data applications easily and quickly. This book shows you many optimization techniques and covers every context where Pig is used in big data analytics. Beginning Apache Pig shows you how Pig is easy to learn and requires relatively little time to develop big data applications.

The book is divided into four parts: the complete features of Apache Pig; integration with other tools; how to solve complex business problems; and optimization of tools.

You’ll discover topics such as MapReduce and why it cannot meet every business need; the features of Pig Latin such as data types for each load, store, joins, groups, and ordering; how Pig workflows can be created; submitting Pig jobs using Hue; and working with Oozie. You’ll also see how to extend the framework by writing UDFs and custom load, store, and filter functions. Finally you’ll cover different optimization techniques such as gathering statistics about a Pig script, joining strategies, parallelism, and the role of data formats in good performance.

What You Will Learn

  • Use all the features of Apache Pig
  • Integrate Apache Pig with other tools
  • Extend Apache Pig
  • Optimize Pig Latin code
  • Solve different use cases for Pig Latin

Who This Book Is For

All levels of IT professionals: architects, big data enthusiasts, engineers, developers, and big data administrators

Table of Contents

Chapter 1: MapReduce and Its Abstractions
Chapter 2: Data Types
Chapter 3: Grunt
Chapter 4: Pig Latin Fundamentals
Chapter 5: Joins and Functions
Chapter 6: Creating and Scheduling Workflows Using Apache Oozie
Chapter 7: HCatalog
Chapter 8: Pig Latin in Hue
Chapter 9: Pig Latin Scripts in Apache Falcon
Chapter 10: Macros
Chapter 11: User-Defined Functions
Chapter 12: Writing Eval Functions
Chapter 13: Writing Load and Store Functions
Chapter 14: Troubleshooting
Chapter 15: Data Formats
Chapter 16: Optimization
Chapter 17: Hadoop Ecosystem Tools
Appendix A: Built-in Functions
Appendix B :Apache Pig in Apache Ambari
Appendix C: HBaseStorage and ORCStorage Options

To access the link, solve the captcha.