All Projects → yannham → Mechaml

yannham / Mechaml

Licence: lgpl-3.0
OCaml functional web scraping library

Programming Languages

ocaml
1615 projects

Projects that are alternatives of or similar to Mechaml

Oj
Tools for various online judges. Downloading sample cases, generating additional test cases, testing your code, and submitting it.
Stars: ✭ 517 (+761.67%)
Mutual labels:  scraping
Lulu
[Unmaintained] A simple and clean video/music/image downloader 👾
Stars: ✭ 789 (+1215%)
Mutual labels:  scraping
Pge Outages
Tracking PG&E outages
Stars: ✭ 43 (-28.33%)
Mutual labels:  scraping
Headless Chrome Crawler
Distributed crawler powered by Headless Chrome
Stars: ✭ 5,129 (+8448.33%)
Mutual labels:  scraping
Parsel
Parsel lets you extract data from XML/HTML documents using XPath or CSS selectors
Stars: ✭ 628 (+946.67%)
Mutual labels:  scraping
Instagram Scraper
Scrape the Instagram frontend. Inspired from twitter-scraper by @kennethreitz.
Stars: ✭ 903 (+1405%)
Mutual labels:  scraping
Facebook Scraper
Scrape Facebook public pages without an API key
Stars: ✭ 499 (+731.67%)
Mutual labels:  scraping
Mtnt
Code for the collection and analysis of the MTNT dataset
Stars: ✭ 48 (-20%)
Mutual labels:  scraping
Imagescraper
✂️ High performance, multi-threaded image scraper
Stars: ✭ 630 (+950%)
Mutual labels:  scraping
Configs
Public, free to use, repository with diggers configs for scraping / extracting data from various e-commerce websites and online stores
Stars: ✭ 37 (-38.33%)
Mutual labels:  scraping
Tabula
Tabula is a tool for liberating data tables trapped inside PDF files
Stars: ✭ 5,420 (+8933.33%)
Mutual labels:  scraping
Newcrawler
Free Web Scraping Tool with Java
Stars: ✭ 589 (+881.67%)
Mutual labels:  scraping
Scrapy Cluster
This Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster.
Stars: ✭ 921 (+1435%)
Mutual labels:  scraping
Gazpacho
🥫 The simple, fast, and modern web scraping library
Stars: ✭ 525 (+775%)
Mutual labels:  scraping
Django Dynamic Scraper
Creating Scrapy scrapers via the Django admin interface
Stars: ✭ 1,024 (+1606.67%)
Mutual labels:  scraping
Facebook data analyzer
Analyze facebook copy of your data with ruby language. Download zip file from facebook and get info about friends ranking by message, vocabulary, contacts, friends added statistics and more
Stars: ✭ 515 (+758.33%)
Mutual labels:  scraping
Webhere
HTML scraping for Objective-C.
Stars: ✭ 16 (-73.33%)
Mutual labels:  scraping
Awesome Python Primer
自学入门 Python 优质中文资源索引,包含 书籍 / 文档 / 视频,适用于 爬虫 / Web / 数据分析 / 机器学习 方向
Stars: ✭ 57 (-5%)
Mutual labels:  scraping
Artoo
artoo.js - the client-side scraping companion.
Stars: ✭ 1,029 (+1615%)
Mutual labels:  scraping
Pypatent
Search for and retrieve US Patent and Trademark Office Patent Data
Stars: ✭ 31 (-48.33%)
Mutual labels:  scraping

Mechaml Build Status

Description

Mechaml is a functional web scraping library that allows to :

  • Fetch web content
  • Analyze, fill and submit HTML forms
  • Handle cookies, headers and redirections

Mechaml is built on top of existing libraries that provide low-level features : Cohttp and Lwt for asynchronous I/O and HTTP handling, and Lambdasoup to parse HTML. It provides an interface that handles the interactions between these and add a few other features.

Overview

The library is divided into 3 main modules :

  • Agent : User-agent features. Perform requests, get back content, headers, status code, ...
  • Cookiejar : Cookies handling
  • Page : HTML parsing and forms handling

The Format module provides helpers to manage the formatted content in forms such as date, colors, etc. For more details, see the documentation

Installation

From opam

opam install mechaml

From source

Mechaml uses the dune build system, which can be installed through opam. Then, just run

dune build

to build the library.

Use dune build @doc to generate the documentation, dune runtest to build and execute tests, and dune build examples/XXX.exe to compile example XXX.

Usage

Here is sample of code that fetches a web page, fills a login form and submits it in the monadic style:

open Mechaml
module M = Agent.Monad
open M.Infix

let require msg = function
  | Some a -> a
  | None -> failwith msg

let action_login =
  Agent.get "http://www.somewebsite.com"
  >|= Agent.HttpResponse.page
  >|= (function page ->
    page
    |> Page.form_with "[name=login]"
    |> require "Can't find the login form !"
    |> Page.Form.set "username" "mynick"
    |> Page.Form.set "password" "@xlz43")
  >>= Agent.submit

let _ =
  M.run (Agent.init ()) action_login

More examples are available in the dedicated folder.

license

GNU LGPL v3

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].