Big Data Processing Using Spark in Cloud Front Cover

Big Data Processing Using Spark in Cloud

  • Length: 264 pages
  • Edition: 1st ed. 2019
  • Publisher:
  • Publication Date: 2018-07-15
  • ISBN-10: 9811305498
  • ISBN-13: 9789811305498
  • Sales Rank: #1639209 (See Top 100 Books)
Description

The book describes the emergence of big data technologies and the role of Spark in the entire big data stack. It compares Spark and Hadoop and identifies the shortcomings of Hadoop that have been overcome by Spark. The book mainly focuses on the in-depth architecture of Spark and our understanding of Spark RDDs and how RDD complements big data’s immutable nature, and solves it with lazy evaluation, cacheable and type inference. It also addresses advanced topics in Spark, starting with the basics of Scala and the core Spark framework, and exploring Spark data frames, machine learning using Mllib, graph analytics using Graph X and real-time processing with Apache Kafka, AWS Kenisis, and Azure Event Hub. It then goes on to investigate Spark using PySpark and R. Focusing on the current big data stack, the book examines the interaction with current big data tools, with Spark being the core processing layer for all types of data.

The book is intended for data engineers and scientists working on massive datasets and big data technologies in the cloud. In addition to industry professionals, it is helpful for aspiring data processing professionals and students working in big data processing and cloud computing environments.

Table of Contents

Chapter 1. A Survey on Big Data—Its Challenges and Solution from Vendors
Chapter 2. Big Data Streaming with Spark
Chapter 3. Big Data Analysis in Cloud and Machine Learning
Chapter 4. Cloud Computing Based Knowledge Mapping Between Existing and Possible Academic Innovations—An Indian Techno-Educational Context
Chapter 5. Data Processing Framework Using Apache and Spark Technologies in Big Data
Chapter 6. Implementing Big Data Analytics Through Network Analysis Software Applications in Strategizing Higher Learning Institutions
Chapter 7. Machine Learning on Big Data: A Developmental Approach on Societal Applications
Chapter 8. Personalized Diabetes Analysis Using Correlation-Based Incremental Clustering Algorithm
Chapter 9. Processing Using Spark—A Potent of BD Technology
Chapter 10. Recent Developments in Big Data Analysis Tools and Apache Spark
Chapter 11. SCSI: Real-Time Data Analysis with Cassandra and Spark

To access the link, solve the captcha.