Practical Web Scraping for Data Science: Best Practices and Examples with PythonThis book provides a complete and modern guide to web scraping, using Python as the programming language, without glossing over important details or best practices. Written with a data science audience in mind, the book explores both scraping and the larger context of web technologies in which it operates, to ensure full understanding. The authors recommend web scraping as a powerful tool for any data scientist’s arsenal, as many data science projects start by obtaining an appropriate data set. Starting with a brief overview on scraping and real-life use cases, the authors explore the core concepts of HTTP, HTML, and CSS to provide a solid foundation. Along with a quick Python primer, they cover Selenium for JavaScript-heavy sites, and web crawling in detail. The book finishes with a recap of best practices and a collection of examples that bring together everything you've learned and illustrate various data science use cases.What You'll Learn
A data science oriented audience that is probably already familiar with Python or another programming language or analytical toolkit (R, SAS, SPSS, etc). Students or instructors in university courses may also benefit. Readers unfamiliar with Python will appreciate a quick Python primer in chapter 1 to catch up with the basics and provide pointers to other guides as well. |
レビュー - レビューを書く
レビューが見つかりませんでした。
他の版 - すべて表示
多く使われている語句
argument attribute Baesens base_url Beautiful Soup BeautifulSoup(r.text browser bs4 import BeautifulSoup CAPTCHA checks Chrome contours cookies crawler crawling CRISP-DM data science database dataset developer tools element encoding example fetch Figure find and find_all format fragment identifier full_url function Google headers html_soup html.parser import requests url instance JavaScript JSON KHTML link_url LinkedIn loc.host login matching matplotlib method multiple news_url Note numpy OpenCV parse perform pip install POST request print(r.text programming protocol proxy server Python quotes redirect regular expression requests and Beautiful requests.get(url requests.post(url retrieve scraping script selector Selenium Seppe server session simply SQLite status code string submit take a look there’s URL parameters urljoin User-Agent vanden Broucke we’ll web browser web crawlers web scraping web server webdriver XPath you’ll