Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → CU-ITSS → Web Data Scraping S2019

CU-ITSS / Web Data Scraping S2019

Labels

jupyter-notebook

Projects that are alternatives of or similar to Web Data Scraping S2019

Pyspark Tutorials

Code snippets and tutorials for working with social science data in PySpark

Stars: ✭ 300 (-1.64%)

Mutual labels: jupyter-notebook

Musicautobot

Using deep learning to generate music in MIDI format.

Stars: ✭ 304 (-0.33%)

Mutual labels: jupyter-notebook

Cartola

Extração de dados da API do CartolaFC, análise exploratória dos dados e modelos preditivos em R e Python - 2014-20. [EN] Data munging, analysis and modeling of CartolaFC - the most popular fantasy football game in Brazil and maybe in the world. Data cover years 2014-19.

Stars: ✭ 304 (-0.33%)

Mutual labels: jupyter-notebook

Handwritten Text Recognition For Apache Mxnet

This repository lets you train neural networks models for performing end-to-end full-page handwriting recognition using the Apache MXNet deep learning frameworks on the IAM Dataset.

Stars: ✭ 300 (-1.64%)

Mutual labels: jupyter-notebook

Minibook 2nd Code

Code of the IPython Minibook, 2nd edition (2015)

Stars: ✭ 303 (-0.66%)

Mutual labels: jupyter-notebook

Baby Steps Of Rl Ja

Pythonで学ぶ強化学習 -入門から実践まで- サンプルコード

Stars: ✭ 302 (-0.98%)

Mutual labels: jupyter-notebook

Playing Card Detection

Stars: ✭ 302 (-0.98%)

Mutual labels: jupyter-notebook

Tensorflow 2.x Yolov3

YOLOv3 implementation in TensorFlow 2.3.1

Stars: ✭ 300 (-1.64%)

Mutual labels: jupyter-notebook

Text similarity

Text Similarity

Stars: ✭ 301 (-1.31%)

Mutual labels: jupyter-notebook

Zat

Zeek Analysis Tools (ZAT): Processing and analysis of Zeek network data with Pandas, scikit-learn, Kafka and Spark

Stars: ✭ 303 (-0.66%)

Mutual labels: jupyter-notebook

Chinese sentiment

用tensorflow进行中文自然语言处理的情感分析

Stars: ✭ 304 (-0.33%)

Mutual labels: jupyter-notebook

Adaptiveattention

Implementation of "Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning"

Stars: ✭ 303 (-0.66%)

Mutual labels: jupyter-notebook

Opensleep

platform for sleep hacking and research

Stars: ✭ 304 (-0.33%)

Mutual labels: jupyter-notebook

Python Seminar

Python for Data Science (Seminar Course at UC Berkeley; AY 250)

Stars: ✭ 302 (-0.98%)

Mutual labels: jupyter-notebook

Randomfun

Notebooks and various random fun

Stars: ✭ 304 (-0.33%)

Mutual labels: jupyter-notebook

Pydataroad

open source for wechat-official-account (ID: PyDataLab)

Stars: ✭ 302 (-0.98%)

Mutual labels: jupyter-notebook

120 Ds Interview Questions

My Answer to 120 Data Science Interview Questions

Stars: ✭ 304 (-0.33%)

Mutual labels: jupyter-notebook

Qiskit Community Tutorials

A collection of Jupyter notebooks developed by the community showing how to use Qiskit

Stars: ✭ 298 (-2.3%)

Mutual labels: jupyter-notebook

Bayesmadesimple

Code for a tutorial on Bayesian Statistics by Allen Downey.

Stars: ✭ 303 (-0.66%)

Mutual labels: jupyter-notebook

Pytorch exercises

Stars: ✭ 304 (-0.33%)

Mutual labels: jupyter-notebook

View All Similar Projects ➔

Web Data Scraping

Spring 2019 ITSS Mini-Course — ARSC 5040
Brian C. Keegan, Ph.D.
Assistant Professor, Department of Information Science
University of Colorado Boulder

Course description

This is a five-week one-credit "mini-course" on retrieving ("scraping") data from the web. The course is intended for researchers in the social sciences and humanities with computational instincts but limited or no prior programming experience. Each class will be 2.5 hours long: we'll take a break mid-way for biological input and output. Lectures will use a combination of lecture-by-notebook as well as hands-on exercises. The end of each class will have links to resources and additional take-home exercises. Students will have the option of presenting their solutions to the take-home exercises at the beginning of the next class.

Although many programming languages offer libraries for web information retrieval and analysis, we will be focusing on the Python data analysis ecosystem given its popularity and capabilities. I would strongly recommend that students download the latest Python 3.7 or above version of the Anaconda distribution which includes the Jupyter Notebook environment we're currently in, most of the data libraries we will use, and other conveniences.

Learning objectives

Students will:

Be able to navigate and access structured web data like HTML, XML, and JSON
Develop strategies for identifying relevant structures in semi-structed data using browser console tools
Utilize Python-based libraries to make request and parse web data
Retrieve data from platforms' application programming interfaces (APIs)
Critically reflect about the technological and ethical constraints on web scraping

Class outline

Week 1: Introduction to Jupyter, browser console, structured data, ethical considerations
Week 2: Scraping HTML with requests and BeautifulSoup
Week 3: Scraping an API with requests and json, Wikipedia and Reddit
Week 4: Scraping web data with Selenium, ethics of screen-scraping
Week 5: Scraping Twitter

Evaluation

To be determined based on enrollments, distribution of skills, etc. but will primarily involve regular attendance, participation, and upwards trajectory in skill and confidence.

Acknowledgements

This course will draw on resources built by myself and Allison Morgan for the 2018 Summer Institute for Computational Social Science, which were in turn derived from other resources developed by Simon Munzert and Chris Bail.

Thank you also to Professors Bhuvana Narasimhan and Stefanie Mollborn for coordinating the ITSS seminars.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 305

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (0) 🔗