This book covers some of the basics of visualizing data in R and summarizing high dimensional data with statistical multivariate analysis techniques. There is less of an emphasis on formal statistical inference methods, as inference is typically not the focus of EDA. Rather, the goal is to show the data, summarize the evidence and identify interesting patterns while eliminating ideas that likely won’t pan out.
Throughout the book, we will focus on the R statistical programming language. We will cover the various plotting systems in R and how to use them effectively. We will also discuss how to implement dimension reduction techniques like clustering and the singular value decomposition. All of these techniques will help you to visualize your data and to help you make key decisions in any data analysis.
Table of Contents
- Getting Started with R
- Managing Data Frames with the dplyr package
- Exploratory Data Analysis Checklist
- Principles of Analytic Graphics
- Exploratory Graphs
- Plotting Systems
- Graphics Devices
- The Base Plotting System
- Plotting and Color in R
- Hierarchical Clustering
- K-Means Clustering
- Dimension Reduction
- The ggplot2 Plotting System: Part 1
- The ggplot2 Plotting System: Part 2
- Data Analysis Case Study: Changes in Fine Particle Air Pollution in the U.S.