All Projects → mattmurray → Juno_crawler

mattmurray / Juno_crawler

Scrapy crawler to collect data on the back catalog of songs listed for sale.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Juno crawler

Scrapy Training
Scrapy Training companion code
Stars: ✭ 157 (+4.67%)
Mutual labels:  scrapy, web-scraping
IMDB-Scraper
Scrapy project for scraping data from IMDB with Movie Dataset including 58,623 movies' data.
Stars: ✭ 37 (-75.33%)
Mutual labels:  web-scraping, scrapy
City Scrapers
Scrape, standardize and share public meetings from local government websites
Stars: ✭ 220 (+46.67%)
Mutual labels:  scrapy, web-scraping
Netflix Clone
Netflix like full-stack application with SPA client and backend implemented in service oriented architecture
Stars: ✭ 156 (+4%)
Mutual labels:  scrapy, web-scraping
Scrapy Fake Useragent
Random User-Agent middleware based on fake-useragent
Stars: ✭ 520 (+246.67%)
Mutual labels:  scrapy, web-scraping
scrapy-wayback-machine
A Scrapy middleware for scraping time series data from Archive.org's Wayback Machine.
Stars: ✭ 92 (-38.67%)
Mutual labels:  web-scraping, scrapy
OLX Scraper
📻 An OLX Scraper using Scrapy + MongoDB. It Scrapes recent ads posted regarding requested product and dumps to NOSQL MONGODB.
Stars: ✭ 15 (-90%)
Mutual labels:  web-scraping, scrapy
scraping-ebay
Scraping Ebay's products using Scrapy Web Crawling Framework
Stars: ✭ 79 (-47.33%)
Mutual labels:  web-scraping, scrapy
Scrapple
A framework for creating semi-automatic web content extractors
Stars: ✭ 464 (+209.33%)
Mutual labels:  scrapy, web-scraping
restaurant-finder-featureReviews
Build a Flask web application to help users retrieve key restaurant information and feature-based reviews (generated by applying market-basket model – Apriori algorithm and NLP on user reviews).
Stars: ✭ 21 (-86%)
Mutual labels:  web-scraping, scrapy
Scrapy Craigslist
Web Scraping Craigslist's Engineering Jobs in NY with Scrapy
Stars: ✭ 54 (-64%)
Mutual labels:  scrapy, web-scraping
Faster Than Requests
Faster requests on Python 3
Stars: ✭ 639 (+326%)
Mutual labels:  scrapy, web-scraping
Scrapyd Cluster On Heroku
Set up free and scalable Scrapyd cluster for distributed web-crawling with just a few clicks. DEMO 👉
Stars: ✭ 106 (-29.33%)
Mutual labels:  scrapy, web-scraping
Crawlab Lite
Lite version of Crawlab. 轻量版 Crawlab 爬虫管理平台
Stars: ✭ 122 (-18.67%)
Mutual labels:  scrapy
Zillow
Zillow Scraper for Python using Selenium
Stars: ✭ 141 (-6%)
Mutual labels:  web-scraping
Python Tutorial
🏃 Some of the python tutorial - 《Python学习笔记》
Stars: ✭ 122 (-18.67%)
Mutual labels:  scrapy
Qqmusicspider
基于Scrapy的QQ音乐爬虫(QQ Music Spider),爬取歌曲信息、歌词、精彩评论等,并且分享了QQ音乐中排名前6400名的内地和港台歌手的49万+的音乐语料
Stars: ✭ 120 (-20%)
Mutual labels:  scrapy
Taobaoscrapy
😩Tool For Taobao/Tmall| 儿时玩具已经过时
Stars: ✭ 146 (-2.67%)
Mutual labels:  scrapy
Pigat
pigat ( Passive Intelligence Gathering Aggregation Tool ) 被动信息收集聚合工具
Stars: ✭ 140 (-6.67%)
Mutual labels:  scrapy
30 Days Of Python
Learn Python for the next 30 (or so) Days.
Stars: ✭ 1,748 (+1065.33%)
Mutual labels:  web-scraping

Juno Download Crawler

Crawls Juno Download and collects data on the entire back catalogue of music singles.

Fields collected:

  • Artist
  • Title
  • Record label
  • Catalog number
  • Release date
  • Music genre
  • Individual track names
  • mp3 sample urls

Example output code:

[
  {
    "_type": "JunoCrawlerItem",
    "catalog_number": "SB 215-0",
    "title": "Tell Me",
    "release_date": "10 Sep 08",
    "artist": "CLEAR VIEW feat JESSICA",
    "label": "Songbird Holland",
    "tracks": [
      [
        "Tell Me - (6:43)",
        "http://www.junodownload.com/MP3/SF1354749-02-01-01.mp3"
      ],
      [
        "Tell Me (Max Graham remix) - (8:49)",
        "http://www.junodownload.com/MP3/SF1354749-02-01-02.mp3"
      ]
    ],
    "genre": "Progressive House"
  }
]
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].