All Projects → rfussien → Leboncoin Crawler

rfussien / Leboncoin Crawler

Licence: mit
Crawler for leboncoin.fr

Projects that are alternatives of or similar to Leboncoin Crawler

Scrapit
Scraping scripts for various websites.
Stars: ✭ 25 (-21.87%)
Mutual labels:  crawler
Ccrawl
Simple CORPORA list crawler
Stars: ✭ 11 (-65.62%)
Mutual labels:  crawler
Papercrawler
Crawler used to crawl papers
Stars: ✭ 20 (-37.5%)
Mutual labels:  crawler
Sqliv
massive SQL injection vulnerability scanner
Stars: ✭ 840 (+2525%)
Mutual labels:  crawler
Goods Crawling
爬取amazon/bestbuy/costco/6pm 的商品详情
Stars: ✭ 9 (-71.87%)
Mutual labels:  crawler
Axegrinder
Crawl websites for accessibility issues from the command line.
Stars: ✭ 12 (-62.5%)
Mutual labels:  crawler
Tumblthree
A Tumblr Blog Backup Application
Stars: ✭ 923 (+2784.38%)
Mutual labels:  crawler
Autocrawler
Google, Naver multiprocess image web crawler (Selenium)
Stars: ✭ 957 (+2890.63%)
Mutual labels:  crawler
Disec
Distributed Image Search Engine Crawler
Stars: ✭ 11 (-65.62%)
Mutual labels:  crawler
Scrapy Azuresearch Crawler Samples
Scrapy as a Web Crawler for Azure Search Samples
Stars: ✭ 20 (-37.5%)
Mutual labels:  crawler
Pic Gather
[ Closed ] 🎨 image collector, which supports custom acquisition source configuration and is compatible with MacOS and Windows operating systems.
Stars: ✭ 842 (+2531.25%)
Mutual labels:  crawler
Beian Domain
获取最新可备案域名列表爬虫
Stars: ✭ 9 (-71.87%)
Mutual labels:  crawler
Pypergrabber
Fetches PubMed article IDs (PMIDs) from email inbox, then crawls PubMed, Google Scholar and Sci-Hub for respective PDF files.
Stars: ✭ 14 (-56.25%)
Mutual labels:  crawler
Appcrawler
Android应用市场网络爬虫
Stars: ✭ 25 (-21.87%)
Mutual labels:  crawler
Toutiaocrawler
头条号爬虫案例
Stars: ✭ 30 (-6.25%)
Mutual labels:  crawler
Appcrawler
基于appium的app自动遍历工具
Stars: ✭ 925 (+2790.63%)
Mutual labels:  crawler
Sina Stock Crawler
Sina stock options crawler with CSV output 新浪上证ETF期权数据爬虫
Stars: ✭ 12 (-62.5%)
Mutual labels:  crawler
Vw Crawler
🐞简单轻便的Java爬虫框架,只要会一点简单的正则表达式和简单的css选择器就能轻松的采集数据。
Stars: ✭ 32 (+0%)
Mutual labels:  crawler
Universityrecruitment Ssurvey
用严肃的数据来回答“什么样的企业会到什么样的大学招聘”?
Stars: ✭ 30 (-6.25%)
Mutual labels:  crawler
Onion Crawler
Tor website crawler (specific for Alphabay at the time)
Stars: ✭ 15 (-53.12%)
Mutual labels:  crawler

Crawler for leboncoin.fr

Latest Version on Packagist Software License Build Status Coverage Status Quality Score Total Downloads

This is a small crawler package for the site leboncoin.fr.

Why ?

leboncoin.fr is one of the most famous classified ads website in france. Whatever what you're looking for, it is probably there. It has a lots of ads and is very fast and simple to use.

However, the first problem comes when you need to exploit the search results in a way that the site can't help you. In fact, the search results page is pretty poor in terms of data. For exemple, it'd be so cool to get the mileage when I'm looking for a car or the surface when I'm looking for a flat.

The second problem is that saving a search is really a pain on the actual site. All the searches you want to save give one single result page. That's pretty dumb, but anyway.

And the third and last problem is that you are forced to use selected values with some criterias. For example, when I was looking for a motorcycle, I was looking for those with a bigger engine than 1200cc. The fact that the biggest value available in the input is 1000cc and because there are tons of ads with 1000cc motorcycle, it made search much more complicated. I did send an email to ask for an additional value, but I didn't get any answer (which I didn't expect anyway). So I had to change the value in the query string every single request... What a waste of time...

So for all those little reasons, I decided to write my good old web scraper to be able to extract the data from the site to anywhere (a DB, an array, a json, an api, who knows...).

Requirements

  • PHP 7
  • [optional] PHPUnit to execute the test suite

Install

$ composer require rfussien/leboncoin-crawler

Usage

Super easy !!!

Get the structured data from a search result page

(new Lbc\GetFrom)->search('<search_result_url>');
// or with detailed ads
(new Lbc\GetFrom)->search('<search_result_url>', true);

example of output:

[
  'page' => 2,
  'links' => [
    'current' => 'https://www.leboncoin.fr/ventes_immobilieres/offres/basse_normandie/?o=2&sqs=12&ret=1&location=Caen%2014000',
    'previous' => 'https://www.leboncoin.fr/ventes_immobilieres/offres/basse_normandie/?o=1&sqs=12&ret=1&location=Caen%2014000',
    'next' => 'https://www.leboncoin.fr/ventes_immobilieres/offres/basse_normandie/?o=3&sqs=12&ret=1&location=Caen%2014000',
  ],
  'total_ads' => 466,
  'total_page' => 14,
  'ads_per_page' => 35,
  'category' => 'ventes_immobilieres',
  'location' => 'Caen 14000',
  'search_area' => 'basse_normandie',
  'sort_by' => 'date',
  'type' => 'all',
  'ads' => [
    1117890265 => [
      'id' => '1117890265',
      'titre' => 'Maison 7 pièces 243 m²',
      'is_pro' => true,
      'prix' => 490000,
      'url' => 'https://www.leboncoin.fr/ventes_immobilieres/1117890265.htm',
      'created_at' => '2017-04-06',
      'images_thumbs' => 'https://img1.leboncoin.fr/ad-thumb/fdf29ab66506b52f5768c509cbd4c9940035b220.jpg',
      'nb_image' => '10',
      'placement' => 'Caen / Calvados',
    ],
    [...],
    1116940130 => [
      'id' => '1116940130',
      'titre' => 'Maison de ville 5 pièces 121 m²',
      'is_pro' => true,
      'prix' => 338000,
      'url' => 'https://www.leboncoin.fr/ventes_immobilieres/1116940130.htm',
      'created_at' => '2017-04-04',
      'images_thumbs' => 'https://img2.leboncoin.fr/ad-thumb/2bb09136b010d9009f0d5542c8699ede3f6bedfd.jpg',
      'nb_image' => '4',
      'placement' => 'Caen / Calvados',
    ],
  ],
]

Get the structured data from an ad

(new Lbc\GetFrom)->ad('<ad_url>');
// or
(new Lbc\GetFrom)->ad('<ad_id>', '<ad_category>');

example of output:

[
    'id'            => '1072097995',
    'category'      => 'ventes_immobilieres',
    'images_thumbs' => [
        0 => 'https://img0.leboncoin.fr/ad-thumb/6c3962c95d1be2367d8b30f8cc1c04317be61cae.jpg',
        1 => 'https://img5.leboncoin.fr/ad-thumb/9346546557dc1cf9eafc0249c8f80e27530ec36f.jpg',
        2 => 'https://img6.leboncoin.fr/ad-thumb/f0e61ab47f008ae101c0ed03e3023d34ee37df5f.jpg',
        3 => 'https://img4.leboncoin.fr/ad-thumb/60a4a187064407bc792b421189e66f87e1a2425c.jpg',
        4 => 'https://img5.leboncoin.fr/ad-thumb/d34a4ef9545e60ae88169acbe4858608ba01e8a9.jpg',
    ],
    'images'        => [
        0 => 'https://img0.leboncoin.fr/ad-image/6c3962c95d1be2367d8b30f8cc1c04317be61cae.jpg',
        1 => 'https://img5.leboncoin.fr/ad-image/9346546557dc1cf9eafc0249c8f80e27530ec36f.jpg',
        2 => 'https://img6.leboncoin.fr/ad-large/f0e61ab47f008ae101c0ed03e3023d34ee37df5f.jpg',
        3 => 'https://img4.leboncoin.fr/ad-image/60a4a187064407bc792b421189e66f87e1a2425c.jpg',
        4 => 'https://img5.leboncoin.fr/ad-image/d34a4ef9545e60ae88169acbe4858608ba01e8a9.jpg',
    ],
    'properties'    => [
        'titre'          => 'Maison 11 pièces 450 m²',
        'created_at'     => '2017-02-18',
        'is_pro'         => 1,
        'prix'           => 1185000,
        'ville'          => 'Bayeux',
        'cp'             => '14400',
        'type_de_bien'   => 'Maison',
        'pieces'         => 11,
        'surface'        => 450,
        'reference'      => '394348',
        'ges'            => 'C (de 11 à 20)',
        'classe_energie' => 'C (de 91 à 150)',
    ],
    'description'   => 'Vente Maison/villa 11 piè[email protected] France - [...]3562178Référence annonce : 394348',
]

There are a bunch of features if you digg a bit in the sources.

Testing

$ composer test

Contributing

Please see CONTRIBUTING and CONDUCT for details.

Security

If you discover any security related issues, please email me ([email protected]) instead of using the issue tracker.

Credits

License

The MIT License (MIT). Please see License File for more information.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].