php|architect’s Guide to Web Scraping

Book Description

Despite all the advancements in web and interoperability, it's inevitable that, at some point in your career, you will have to "scrape" content from a website that was not built with web services in mind. And, despite its sometimes less-than-stellar reputation, web scraping is usually an entire legitimate activity-for example, to capture data from an old version of a website for insertion into a modern . This book, written by scraping expert Matthew Turland, covers web scraping techniques and topics that range from the simple to exotic using a variety of technologies and frameworks: · Understanding HTTP requests · The PHP HTTP streams wrapper · cURL · pecl_http · PEAR:HTTP · Zend_Http_Client · Building your own scraping library · Using Tidy · Analyzing with the , SimpleXML and XMLReader extensions · selector libraries · PCRE pattern matching · Tips and Tricks · Multiprocessing / processing

Book Details

  • Title: php|architect’s Guide to Web Scraping
  • Author:
  • Length: 192 pages
  • Edition: 1
  • Language: English
  • Publisher:
  • Publication Date: 2010-09-01
  • ISBN-10: 0981034519
  • ISBN-13: 9780981034515