All Projects → AnthonyBloomer → Daftlistings

AnthonyBloomer / Daftlistings

Licence: mit
A library that enables programmatic interaction with daft.ie. Daft.ie has nationwide coverage and contains about 80% of the total available properties in Ireland.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Daftlistings

Scrapple
A framework for creating semi-automatic web content extractors
Stars: ✭ 464 (+439.53%)
Mutual labels:  web-scraping, web-scraper, beautifulsoup
top-github-scraper
Scape top GitHub repositories and users based on keywords
Stars: ✭ 40 (-53.49%)
Mutual labels:  web-scraper, web-scraping
Cascadia
Go cascadia package command line CSS selector
Stars: ✭ 67 (-22.09%)
Mutual labels:  web-scraping, web-scraper
Detect Cms
PHP Library for detecting CMS
Stars: ✭ 78 (-9.3%)
Mutual labels:  web-scraping, web-scraper
grailer
web scraping tool for grailed.com
Stars: ✭ 30 (-65.12%)
Mutual labels:  web-scraping, beautifulsoup
Linkedin-Client
Web scraper for grabing data from Linkedin profiles or company pages (personal project)
Stars: ✭ 42 (-51.16%)
Mutual labels:  web-scraper, web-scraping
Php Curl Class
PHP Curl Class makes it easy to send HTTP requests and integrate with web APIs
Stars: ✭ 2,903 (+3275.58%)
Mutual labels:  web-scraping, web-scraper
Scrape Linkedin Selenium
`scrape_linkedin` is a python package that allows you to scrape personal LinkedIn profiles & company pages - turning the data into structured json.
Stars: ✭ 239 (+177.91%)
Mutual labels:  web-scraping, web-scraper
Social Media Profile Scrapers
Fetch user's data across social media
Stars: ✭ 60 (-30.23%)
Mutual labels:  web-scraping, web-scraper
Faster Than Requests
Faster requests on Python 3
Stars: ✭ 639 (+643.02%)
Mutual labels:  web-scraping, web-scraper
Project Tauro
A Router WiFi key recovery/cracking tool with a twist.
Stars: ✭ 52 (-39.53%)
Mutual labels:  web-scraping, web-scraper
Arachnid
Powerful web scraping framework for Crystal
Stars: ✭ 68 (-20.93%)
Mutual labels:  web-scraping, web-scraper
Data-Wrangling-with-Python
Simplify your ETL processes with these hands-on data sanitation tips, tricks, and best practices
Stars: ✭ 90 (+4.65%)
Mutual labels:  web-scraping, beautifulsoup
OLX Scraper
📻 An OLX Scraper using Scrapy + MongoDB. It Scrapes recent ads posted regarding requested product and dumps to NOSQL MONGODB.
Stars: ✭ 15 (-82.56%)
Mutual labels:  web-scraper, web-scraping
BookingScraper
🌎 🏨 Scrape Booking.com 🏨 🌎
Stars: ✭ 68 (-20.93%)
Mutual labels:  web-scraping, beautifulsoup
MediumScraper
Scraping articles of medium and providing audio versions 📑 to 🔊 using django
Stars: ✭ 12 (-86.05%)
Mutual labels:  web-scraper, beautifulsoup
Web Scraping
Detailed web scraping tutorials for dummies with financial data crawlers on Reddit WallStreetBets, CME (both options and futures), US Treasury, CFTC, LME, SHFE and news data crawlers on BBC, Wall Street Journal, Al Jazeera, Reuters, Financial Times, Bloomberg, CNN, Fortune, The Economist
Stars: ✭ 153 (+77.91%)
Mutual labels:  web-scraping, web-scraper
Bet On Sibyl
Machine Learning Model for Sport Predictions (Football, Basketball, Baseball, Hockey, Soccer & Tennis)
Stars: ✭ 190 (+120.93%)
Mutual labels:  web-scraping, beautifulsoup
Basketball reference web scraper
NBA Stats API via Basketball Reference
Stars: ✭ 279 (+224.42%)
Mutual labels:  web-scraping, web-scraper
Spidr
A versatile Ruby web spidering library that can spider a site, multiple domains, certain links or infinitely. Spidr is designed to be fast and easy to use.
Stars: ✭ 656 (+662.79%)
Mutual labels:  web-scraping, web-scraper

Daftlistings

Build Status codecov

A library that enables programmatic interaction with Daft.ie. Daft.ie has nationwide coverage and contains about 80% of the total available properties in Ireland.

Installation

Daftlistings is available on the Python Package Index (PyPI). You can install daftlistings using pip.

virtualenv env
source env/bin/activate
pip install daftlistings

To install the development version, run:

pip install https://github.com/AnthonyBloomer/daftlistings/archive/dev.zip

Temporary map visualization function fix

There was a major daft website update which breaks this repo. There is a minimal working subset in temporary-map-visualization-fix folder

cd temporary-map-visualization-fix

Inspect main.py and tweak the searching parameters. You can deduct the parameters from https://www.daft.ie/property-for-sale/dublin-city?numBeds_from=2&numBeds_to=5. such as {"numBeds_from": "5"} Add the desired parameters to line 9 in temporary-map-visualization-fix/main.py

python main.py

The searched results will be wrote to temporary-map-visualization-fix/result.txt

python map.py

Run map.py to visualize the results.

Usage

from daftlistings import Daft

daft = Daft()
listings = daft.search()

for listing in listings:
    print(listing.formalised_address)
    print(listing.daft_link)
    print(listing.price)

By default, the Daft search function iterates over each page of results and appends each Listing object to the array that is returned. If you wish to disable this feature, you can set fetch_all to False:

daft.search(fetch_all=False)

Examples

Get apartments to let in Dublin City that are between €1000 and €1500 and contact the advertiser of each listing.

from daftlistings import Daft, RentType

daft = Daft()

daft.set_county("Dublin City")
daft.set_listing_type(RentType.APARTMENTS)
daft.set_min_price(1000)
daft.set_max_price(1500)

listings = daft.search()

for listing in listings:

    contact = listing.contact_advertiser(
        name="Jane Doe",
        contact_number="019202222",
        email="[email protected]",
        message="Hi, I seen your listing on daft.ie and I would like to schedule a viewing."
    )
    
    if contact:
        print("Advertiser contacted")

You can sort the listings by price, distance, upcoming viewing or date using the SortType object. The SortOrder object allows you to sort the listings descending or ascending.

from daftlistings import Daft, SortOrder, SortType, RentType

daft = Daft()

daft.set_county("Dublin City")
daft.set_listing_type(RentType.ANY)
daft.set_sort_order(SortOrder.ASCENDING)
daft.set_sort_by(SortType.PRICE)
daft.set_max_price(2500)

listings = daft.search()

for listing in listings:
    print(listing.formalised_address)
    print(listing.daft_link)
    print(listing.price)
    features = listing.features
    if features is not None:
        print('Features: ')
        for feature in features:
            print(feature)
    print("")

Parse listing data from a given search result url.

from daftlistings import Daft

daft = Daft()
daft.set_result_url("https://www.daft.ie/dublin/apartments-for-rent?")
listings = daft.search()

for listing in listings:
    print(listing.formalised_address)
    print(listing.price)
    print(' ')


Find student accommodation near UCD that is between 850 and 1000 per month

from daftlistings import Daft, SortOrder, SortType, RentType, University, StudentAccommodationType

daft = Daft()
daft.set_listing_type(RentType.STUDENT_ACCOMMODATION)
daft.set_university(University.UCD)
daft.set_student_accommodation_type(StudentAccommodationType.ROOMS_TO_SHARE)
daft.set_min_price(850)
daft.set_max_price(1000)
daft.set_sort_by(SortType.PRICE)
daft.set_sort_order(SortOrder.ASCENDING)
daft.set_offset(offset)
listings = daft.search()

for listing in listings:
    print(listing.price)
    print(listing.formalised_address)
    print(listing.daft_link)

Map the 2-bed rentling properties in Dublin and color code them wrt to prices. Save the map in a html file.

from daftlistings import Daft, SortOrder, SortType, RentType, MapVisualization
import pandas as pd

daft = Daft()
daft.set_county("Dublin City")
daft.set_listing_type(RentType.ANY)
daft.set_sort_order(SortOrder.ASCENDING)
daft.set_sort_by(SortType.PRICE)
# must sort by price in asending order, MapVisualization class will take care of the weekly/monthly value mess
daft.set_max_price(2400)
daft.set_min_beds(2)
daft.set_max_beds(2)

listings = daft.search()
properties = []
print("Translating {} listing object into json, it will take a few minutes".format(str(len(listings))))
print("Ignore the error message")
for listing in listings:
    try:
        if listing.search_type != 'rental':
            continue
        properties.append(listing.as_dict_for_mapping())
    except:
        continue


df = pd.DataFrame(properties)
print(df)

dublin_map = MapVisualization(df)
dublin_map.add_markers()
dublin_map.add_colorbar()
dublin_map.save("dublin_apartment_to_rent_2_bed_price_map.html")
print("Done, please checkout the html file")

For more examples, check the Examples folder

Parallel as_dict()

lisitng.as_dict() is relatively slow for large volume of listings. Below is an exmple script using threading and joblib library technique to speedup this process

from daftlistings import Daft, RentType
from joblib import Parallel, delayed
import time

def translate_listing_to_json(listing):
    try:
        if listing.search_type != 'rental':
            return None
        return listing.as_dict_for_mapping()
    except:
        return None

daft = Daft()
daft.set_county("Dublin City")
daft.set_listing_type(RentType.ANY)
daft.set_max_price(2000)
daft.set_min_beds(2)
daft.set_max_beds(2)

listings = daft.search()
properties = []
print("Translating {} listing object into json, it will take a few minutes".format(str(len(listings))))
print("Ignore the error message")

# time the translation
start = time.time()
properties = Parallel(n_jobs=6, prefer="threads")(delayed(translate_listing_to_json)(listing) for listing in listings)
properties = [p for p in properties if p is not None] # remove the None
end = time.time()
print("Time for json translations {}s".format(end-start))

Table of perfomance speedup for 501 listings Threads | Time (s) | Speedup ------------ | ------------- | ------------- 1 | 178 | 1.0 2 | 101 | 1.8 3 | 72 | 2.5 4 | 61 | 2.9 6 | 54 | 3.3

Tests

The Python unittest module contains its own test discovery function, which you can run from the command line:

 python -m unittest discover tests/

Contributing

  • Fork the project and clone locally.
  • Create a new branch for what you're going to work on.
  • Push to your origin repository.
  • Create a new pull request in GitHub.
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].