- An easy to follow guide taking you through every step of the data wrangling process in the best possible way
- Work with different types of datasets, and reshape the layout of your data to make it easier for analysis
- Simple examples and real-life data wrangling solutions for data pre-processing
Around 80% of time in data analysis is spent on cleaning and preparing data for analysis. This is, however, and important task, and is a prerequisite to the rest of the data analysis workflow, including visualization, analysis and reporting. Python and R are considered a popular choice of tool for data analysis, and have packages which can be best used to manipulate different kinds of data, as per your requirement. This book will show you the different data wrangling techniques, and how you can leverage the power of Python and R packages to implement them.
You will start with understanding the data wrangling process and get a solid foundation for working with different types of data. You will work with different data structures and aqquire and parse data from various locations. The book will also show you how to reshape the layout of data and manipulate, summarize, and join data sets. Finally, the book includes a quick primer on accessing and processing data from databases, conduct data exploration, and store and retrieve data quickly using databases.
The book will include practical examples on each of the above pointers using simple and real-world datasets for easier understanding. By the end of the book, you will have a thorough understanding of all the data wrangling concepts and how to implement them in the best possible way.
What you will learn
- Read a csv file into python and R, and print out some statistics on the data.
- Gain knowledge of the data formats and programming stuctures involved in retrieving API data.
- Make effective use of regular expression in the data wrangling process.
- Explore the tools and packages available for preparing numerical data for analysis.
- Learn how to have better control over the manupulation of the structure of the data.
- Create a dexterity for programmatically reading, auditing, correcting, and shaping data.
- Write and complete programs for taking in, formatting and outputting datasets.
Table of Contents
Chapter 1. Programming With Data
Chapter 2. Introduction To Programming In Python
Chapter 3. Reading, Exploring, And Modifying Data - Part I
Chapter 4. Reading, Exploring, And Modifying Data - Part Ii
Chapter 5. Manipulating Text Data - An Introduction To Regular Expressions
Chapter 6. Cleaning Numerical Data - An Introduction To R And Rstudio
Chapter 7. Simplifying Data Manipulation With Dplyr
Chapter 8. Getting Data From The Web
Chapter 9. Working With Large Datasets