All Projects → staffanm → ferenda

staffanm / ferenda

Licence: BSD-2-Clause license
Transform unstructured document collections to structured Linked Data

Programming Languages

HTML
75241 projects
python
139335 projects - #7 most used programming language
XSLT
1337 projects
CSS
56736 projects
javascript
184084 projects - #8 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to ferenda

SolRDF
An RDF plugin for Solr
Stars: ✭ 115 (+422.73%)
Mutual labels:  rdf, linkeddata
visualisation-lab
An experimental visualisation workbench built using Svelte
Stars: ✭ 17 (-22.73%)
Mutual labels:  rdf, linkeddata
viziquer
Tool for Search in Structured Semantic Data
Stars: ✭ 12 (-45.45%)
Mutual labels:  rdf, linkeddata
pyLDAPI
A very small module to add Linked Data API functionality to a Python Flask installation
Stars: ✭ 28 (+27.27%)
Mutual labels:  rdf
titanium-json-ld
A JSON-LD 1.1 Processor & API
Stars: ✭ 79 (+259.09%)
Mutual labels:  rdf
rdfa-streaming-parser.js
A fast and lightweight streaming RDFa parser for JavaScript
Stars: ✭ 15 (-31.82%)
Mutual labels:  rdf
proxi
Proxy pool. Finds and checks proxies with rest api for querying results. Can find over 25k proxies in under 5 minutes.
Stars: ✭ 32 (+45.45%)
Mutual labels:  scraping
rubium
Rubium is a lightweight alternative to Selenium/Capybara/Watir if you need to perform some operations (like web scraping) using Headless Chromium and Ruby
Stars: ✭ 65 (+195.45%)
Mutual labels:  scraping
stardog-language-servers
Language Servers for Stardog Languages
Stars: ✭ 19 (-13.64%)
Mutual labels:  rdf
gunaydin
Your good mornings ☀️
Stars: ✭ 16 (-27.27%)
Mutual labels:  scraping
document-dl
Command line program to download documents from web portals.
Stars: ✭ 14 (-36.36%)
Mutual labels:  scraping
rdf-parser-csvw
CSV on the Web parser
Stars: ✭ 15 (-31.82%)
Mutual labels:  rdf
SEPA
Get notifications about changes in your SPARQL endpoint.
Stars: ✭ 21 (-4.55%)
Mutual labels:  rdf
go-scrapy
Web crawling and scraping framework for Golang
Stars: ✭ 17 (-22.73%)
Mutual labels:  scraping
internet-affordability
🌍 Dataset that shows the Internet affordability by country (a shocking reality!)
Stars: ✭ 13 (-40.91%)
Mutual labels:  scraping
twinql
A graph query language for the semantic web
Stars: ✭ 17 (-22.73%)
Mutual labels:  rdf
GeoTriples
Publishing Big Geospatial data as Linked Open Geospatial Data
Stars: ✭ 32 (+45.45%)
Mutual labels:  rdf
rdf2x
RDF2X converts big RDF datasets to the relational database model, CSV, JSON and ElasticSearch.
Stars: ✭ 43 (+95.45%)
Mutual labels:  rdf
chesf
CHeSF is the Chrome Headless Scraping Framework, a very very alpha code to scrape javascript intensive web pages
Stars: ✭ 18 (-18.18%)
Mutual labels:  scraping
scrapy-distributed
A series of distributed components for Scrapy. Including RabbitMQ-based components, Kafka-based components, and RedisBloom-based components for Scrapy.
Stars: ✭ 38 (+72.73%)
Mutual labels:  scraping

Ferenda is a python library and framework for transforming unstructured document collections into structured Linked Data. It helps with downloading documents, parsing them to add explicit semantic structure and RDF-based metadata, finding relationships between documents, and publishing the results, including through a REST-based HTTP API.

https://badge.fury.io/py/ferenda.png https://travis-ci.org/staffanm/ferenda.png?branch=master https://ci.appveyor.com/api/projects/status/aqdo3c39cdof8opa/branch/master https://coveralls.io/repos/staffanm/ferenda/badge.png?branch=master Code Health https://pypip.in/d/ferenda/badge.png

Quick start

This example uses ferenda's project framework to download the 50 latest RFCs and W3C standards, parse documents into structured, RDF-enabled XHTML documents, loads all RDF metadata into a triplestore and generates a web site of static HTML5 files that are usable offline:

pip install ferenda
ferenda-setup myproject
cd myproject
./ferenda-build.py ferenda.sources.tech.RFC enable
./ferenda-build.py ferenda.sources.tech.W3Standards enable
./ferenda-build.py all all --downloadmax=50 --staticsite --fulltextindex=False
open data/index.html

The same functionality can also be accessed through a python API, if you want to use ferenda as part of a larger system. It's also possible to just use the parts of ferenda that you need (eg. only the downloading and parsing features).

More information

See http://ferenda.readthedocs.org/ for in-depth documentation.

Copyright and license

Most of the code written by Staffan Malmgren, licensed under the main 2-clause BSD license.

Some bundled code are written by other authors, included in accordance with their respective licenses:

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].