All Projects → critocrito → sugarcube

critocrito / sugarcube

Licence: GPL-3.0 license
Monoidal data processes.

Programming Languages

javascript
184084 projects - #8 most used programming language
PLpgSQL
1095 projects
shell
77523 projects

Projects that are alternatives of or similar to sugarcube

Statistical Learning
Lecture Slides and R Sessions for Trevor Hastie and Rob Tibshinari's "Statistical Learning" Stanford course
Stars: ✭ 223 (+596.88%)
Mutual labels:  data-mining
Suod
(MLSys' 21) An Acceleration System for Large-scare Unsupervised Heterogeneous Outlier Detection (Anomaly Detection)
Stars: ✭ 245 (+665.63%)
Mutual labels:  data-mining
kenchi
A scikit-learn compatible library for anomaly detection
Stars: ✭ 36 (+12.5%)
Mutual labels:  data-mining
Chirp
Interface to manage and centralize Google Alert information
Stars: ✭ 227 (+609.38%)
Mutual labels:  data-mining
Data Mining Conferences
Ranking, acceptance rate, deadline, and publication tips
Stars: ✭ 236 (+637.5%)
Mutual labels:  data-mining
Tweetfeels
Real-time sentiment analysis in Python using twitter's streaming api
Stars: ✭ 249 (+678.13%)
Mutual labels:  data-mining
Prefixspan Py
The shortest yet efficient Python implementation of the sequential pattern mining algorithm PrefixSpan, closed sequential pattern mining algorithm BIDE, and generator sequential pattern mining algorithm FEAT.
Stars: ✭ 214 (+568.75%)
Mutual labels:  data-mining
scikit-hubness
A Python package for hubness analysis and high-dimensional data mining
Stars: ✭ 41 (+28.13%)
Mutual labels:  data-mining
Reaper
Social media scraping / data collection tool for the Facebook, Twitter, Reddit, YouTube, Pinterest, and Tumblr APIs
Stars: ✭ 240 (+650%)
Mutual labels:  data-mining
Awesome Datascience
📝 An awesome Data Science repository to learn and apply for real world problems.
Stars: ✭ 17,520 (+54650%)
Mutual labels:  data-mining
Deepgraph
Analyze Data with Pandas-based Networks. Documentation:
Stars: ✭ 232 (+625%)
Mutual labels:  data-mining
Datascience
Curated list of Python resources for data science.
Stars: ✭ 3,051 (+9434.38%)
Mutual labels:  data-mining
Orange3
🍊 📊 💡 Orange: Interactive data analysis
Stars: ✭ 3,152 (+9750%)
Mutual labels:  data-mining
Automlpipeline.jl
A package that makes it trivial to create and evaluate machine learning pipeline architectures.
Stars: ✭ 223 (+596.88%)
Mutual labels:  data-mining
Rule Extraction from Trees
A toolkit for extracting comprehensible rules from tree-based algorithms
Stars: ✭ 34 (+6.25%)
Mutual labels:  data-mining
Amazing Feature Engineering
Feature engineering is the process of using domain knowledge to extract features from raw data via data mining techniques. These features can be used to improve the performance of machine learning algorithms. Feature engineering can be considered as applied machine learning itself.
Stars: ✭ 218 (+581.25%)
Mutual labels:  data-mining
Python Projects
some python projects
Stars: ✭ 247 (+671.88%)
Mutual labels:  data-mining
Semantic-Bus
object flow treatment, data transformation
Stars: ✭ 49 (+53.13%)
Mutual labels:  data-mining
software-analytics
A repository with my data analysis results of software artifacts
Stars: ✭ 37 (+15.63%)
Mutual labels:  data-mining
Matminer
Data mining for materials science
Stars: ✭ 251 (+684.38%)
Mutual labels:  data-mining

Sugarcube

Sugarcube - Data pipelines for human rights

GitHub GitHub Workflow Status Coverage Status

Synopsis

Sugarcube is a framework to fetch, transform and export data. Data processes are described using plugins, which are chained in sequence to model complex data processes.

It is a tool designed to support journalists, non-profits, academic researchers, human rights organisations and others with investigations using online, publicly-available sources (e.g.tweets, videos, public databases, websites, online databases).

Learn how to use Sugarcube on your own project.

This code is licensed under the GPL 3.

Documentation

All documentation can be found on the website.

Examples

There are more examples and explanations on the website. Here is one to get you started.

sugarcube -p http_import,media_warc,media_screenshot,elastic_export \
          -c config.json \
          -Q http_url:'https://mwatana.org/en/airstrike-on-detention-center/'

This example will fetch and extract the contents and meta data of an online article, archive the website as a Web ARChive, take a screenshot of the website and store the data in an Elasticsearch database.

Data processes, like from the example above, can be codified in order to repeat them. Once a data process has been defined, Sugarcube allows to scale and automate it's operation.

Testimony

  • Syrian Archive uses Sugarcube to archive video evidence of human rights violations in Syria. Further, Sugarcube is used to monitor human rights documentation that is taken down by social media companies. The systems and workflows developed with Syrian Archive are now being expanded to do similar work in Yemen, Sudan and other areas.

  • Built using Sugarcube, the Data Scores investigation tool provided evidence and insights for research into how data analytics and data-driven "scoring" were being used in the public sector of the UK to make decisions. This research was conducted by the Data Justice Lab.

License

Sugarcube is licensed under the GPL 3.0.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].