About This Book
- Break text down into its component parts for spelling correction, feature extraction, and phrase transformation
- Learn how to do custom sentiment analysis and named entity recognition
- Work through the natural language processing concepts with simple and easy-to-follow programming recipes
Who This Book Is For
This book is intended for Python programmers interested in learning how to do natural language processing. Maybe you've learned the limits of regular expressions the hard way, or you've realized that human language cannot be deterministically parsed like a computer language. Perhaps you have more text than you know what to do with, and need automated ways to analyze and structure that text. This Cookbook will show you how to train and use statistical language models to process text in ways that are practically impossible with standard programming tools. A basic knowledge of Python and the basic text processing concepts is expected. Some experience with regular expressions will also be helpful.
This book will show you the essential techniques of text and language processing. Starting with tokenization, stemming, and the WordNet dictionary, you'll progress to part-of-speech tagging, phrase chunking, and named entity recognition. You'll learn how various text corpora are organized, as well as how to create your own custom corpus. Then, you'll move onto text classification with a focus on sentiment analysis. And because NLP can be computationally expensive on large bodies of text, you'll try a few methods for distributed text processing. Finally, you'll be introduced to a number of other small but complementary Python libraries for text analysis, cleaning, and parsing.
This cookbook provides simple, straightforward examples so you can quickly learn text processing with Python and NLTK.
Table of Contents
Chapter 1: Tokenizing Text and WordNet Basics
Chapter 2: Replacing and Correcting Words
Chapter 3: Creating Custom Corpora
Chapter 4: Part-of-speech Tagging
Chapter 5: Extracting Chunks
Chapter 6: Transforming Chunks and Trees
Chapter 7: Text Classification
Chapter 8: Distributed Processing and Handling Large Datasets
Chapter 9: Parsing Specific Data Types
Appendix: Penn Treebank Part-of-speech Tags