02 Nov The pdf file contains the assignment.some notes th
The pdf file contains the assignment.some notes that may be helpful:Please read: Data Science from Scratch, chapters 11 and 12.Please read: Python for Data Analysis, chapter 10. Week 10 Core ReadingPlease read: Getting Started with Beautiful Soup, chapters 1 through 4Thetwo primary ways to get information from the web are through webscraping and by using web APIs. The first three minutes (of this fourminute video) compare web scraping to working with APIs. The remainderof the video talks about R-specific technologies. In general, webscraping is easier with Python and BeautifulSoup that it is with R!Oneinteresting description of the difference between a data analyst and adata scientist is that a data analyst is good at working with structureddata (e.g. SQL and CSV formats), while a data scientist is comfortablewith both structured and unstructured data. For the remainder of thiscourse, we’ll be working with less structured data.Regular ExpressionsRegularexpressions provide advanced text processing capabilities. Often, youcan choose between using Python’s string manipulation functions and’regex’ functions. For messy text based source data, like scraping textfrom HTML-based web pages, regular expressions are often the best wayforward.Regularexpressions are implemented somewhat differently in differentprogramming languages, so you’ll need to carefully test any regex codethat you bring from another environment into Python.Thebest introduction that I’ve found to Regular Expressions was puttogether by Software Carpentry for its (open source) Python course.Please also watch the video below:Source: Software Carpentry, Inc. https://www.youtube.com/watch?v=c-Ov1JUMDv4Optional.If you’re interested in learning more about Regular Expressions, hereare links to the other Software Carpentry videos on regular expressions,with example code in Python.Hereis a 5 minute video that I put together that shows how to useSelectorGadget and R to make scraping web pages easier. In Python, wewould use SelectorGadget, then BeautifulSoup to accomplish the samework!If you want to follow along with the video, you should first install the SelectorGadget extension into Google Chrome, from SelectorGadget.com.[Optional, for interested students: The R code is in the attached R Markdown file; it can also be viewed on-line here: http://rpubs.com/catlin/rvest]Learning more about SeleniumHere is the first of seven short videos in a series on getting started using Selenium with Python to scrape web pages:Source: “1. Selenium Webdriver with Python Tutorial – Installing Firefox Plugins,” Gabiste Akoua, https://youtu.be/Ssp6dMWIocY?list=PL4tmZ2wr68XWgqJFzwWrw-JFd_vBEonsT. Sep 28, 2014. [0 h, 2 m]Thisweek’s assignment provides an opportunity to practice some of thematerial that we learned in week 10. You may also find it helpful toread the material on ‘Using APIs’ in chapter 9 of Data Science from Scratch.