Advanced Analytics with Spark: Patterns for Learning from Data at Scale

by Josh Wills, Sandy Ryza, Sean Owen, Uri Laserson

Length: 200 pages
Edition: 1
Language: English
Publisher: O'Reilly Media
Publication Date: 2015-04-25
ISBN-10: 1491912766
ISBN-13: 9781491912768
Sales Rank: #440404 (See Top 100 Books)

Description

Apache Spark is emerging as one of the most popular technologies for performing analytics on huge datasets, and this practical guide shows you how to harness Spark’s power for approaching a variety of analytics problems. You’ll learn how to apply common techniques, such as classification, clustering, collaborative filtering, anomaly detection, dimensionality reduction, and Monte Carlo simulation to fields such as genomics, security, and finance.

Advanced Analytics with Spark supplies complete implementations that analyze large public datasets, and acts as an introduction to using these techniques and other best practices in Spark programming.

Become familiar with the Spark programming model and ecosystem
Learn general approaches in data science
Discover which machine learning tools make sense for particular problems
Acquire code from GitHub that can be adapted to many uses

This book will interest both data science professionals and aspiring data scientists, students studying learning techniques for analyzing large datasets, and scientists interested in using Spark as a research tool.

Chapter 1. Analyzing Big Data
Chapter 2. Introduction to Data Analysis with Scala and Spark
Chapter 3. Recommending Music and the Audioscrobbler data set
Chapter 4. Predicting Forest Cover with Decision Trees
Chapter 5. Anomaly Detection in Network Traffic with K-means clustering
Chapter 6. Understanding Wikipedia with Latent Semantic Analysis
Chapter 7. Analyzing Co-occurrence Networks with GraphX
Chapter 8. Geospatial and Temporal Data Analysis on the New York City Taxicab Data
Chapter 9. Financial Risk through Monte Carlo Simulation
Chapter 10. Analyzing Genomics Data and the BDG Project
Chapter 11. Analyzing Neuroimaging Data with PySpark and Thunder
Chapter 12. Appendix: Deeper Into Spark
Chapter 13. Appendix: Upcoming MLlib Pipelines API

Free ChaptersTry Audible and Get Two Free Audiobooks »

To access the link, solve the captcha.

Recommended BooksMore Similar Books »