All Projects → DigitalPebble → Behemoth

DigitalPebble / Behemoth

Licence: other
Behemoth is an open source platform for large scale document analysis based on Apache Hadoop.

Programming Languages

java
68154 projects - #9 most used programming language

Projects that are alternatives of or similar to Behemoth

Repository
个人学习知识库涉及到数据仓库建模、实时计算、大数据、Java、算法等。
Stars: ✭ 92 (-67.83%)
Mutual labels:  hadoop, mapreduce
gomrjob
gomrjob - a Go Framework for Hadoop Map Reduce Jobs
Stars: ✭ 39 (-86.36%)
Mutual labels:  hadoop, mapreduce
Bigdata Notes
大数据入门指南 ⭐
Stars: ✭ 10,991 (+3743.01%)
Mutual labels:  hadoop, mapreduce
Bigdata Interview
🎯 🌟[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结
Stars: ✭ 857 (+199.65%)
Mutual labels:  hadoop, mapreduce
qs-hadoop
大数据生态圈学习
Stars: ✭ 18 (-93.71%)
Mutual labels:  hadoop, mapreduce
Data Algorithms Book
MapReduce, Spark, Java, and Scala for Data Algorithms Book
Stars: ✭ 949 (+231.82%)
Mutual labels:  hadoop, mapreduce
Asakusafw
Asakusa Framework
Stars: ✭ 114 (-60.14%)
Mutual labels:  hadoop, mapreduce
Avro Hadoop Starter
Example MapReduce jobs in Java, Hive, Pig, and Hadoop Streaming that work on Avro data.
Stars: ✭ 110 (-61.54%)
Mutual labels:  hadoop, mapreduce
Data-pipeline-project
Data pipeline project
Stars: ✭ 18 (-93.71%)
Mutual labels:  hadoop, mapreduce
learning-hadoop-and-spark
Companion to Learning Hadoop and Learning Spark courses on Linked In Learning
Stars: ✭ 146 (-48.95%)
Mutual labels:  hadoop, mapreduce
Bigdata
💎🔥大数据学习笔记
Stars: ✭ 488 (+70.63%)
Mutual labels:  hadoop, mapreduce
GooglePlay-Web-Crawler
Mapreduce project by Hadoop, Nutch, AWS EMR, Pig, Tez, Hive
Stars: ✭ 18 (-93.71%)
Mutual labels:  hadoop, mapreduce
Data Science Ipython Notebooks
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
Stars: ✭ 22,048 (+7609.09%)
Mutual labels:  hadoop, mapreduce
Src
A light-weight distributed stream computing framework for Golang
Stars: ✭ 67 (-76.57%)
Mutual labels:  hadoop, mapreduce
Cascading
Cascading is a feature rich API for defining and executing complex and fault tolerant data processing flows locally or on a cluster. See https://github.com/Cascading/cascading for the release repository.
Stars: ✭ 318 (+11.19%)
Mutual labels:  hadoop, mapreduce
bigdata-doc
大数据学习笔记,学习路线,技术案例整理。
Stars: ✭ 37 (-87.06%)
Mutual labels:  hadoop, mapreduce
web-click-flow
网站点击流离线日志分析
Stars: ✭ 14 (-95.1%)
Mutual labels:  hadoop, mapreduce
big data
A collection of tutorials on Hadoop, MapReduce, Spark, Docker
Stars: ✭ 34 (-88.11%)
Mutual labels:  hadoop, mapreduce
dtail
DTail is a distributed DevOps tool for tailing, grepping, catting logs and other text files on many remote machines at once.
Stars: ✭ 112 (-60.84%)
Mutual labels:  mapreduce
hadoop-docker-lite
Docker build project to setup a lightweight hadoop cluster containing hadoop, pig, zookeeper, hbase, phoenix, storm, kafka, kafka manager
Stars: ✭ 24 (-91.61%)
Mutual labels:  hadoop

Build Status

Behemoth is an open source platform for large scale document processing based on Apache Hadoop.

It consists of a simple annotation-based implementation of a document and a number of modules operating on these documents. One of the main aspects of Behemoth is to simplify the deployment of document analysers on a large scale but also to provide reusable modules for :

  • ingesting from common data sources (Warc, Nutch, etc...)
  • text processing (Tika, UIMA, GATE, Language Identification)
  • generating output for external tools (SOLR, Mahout)

Its modular architecture simplifies the development of custom annotators based on MapReduce.

Note that Behemoth does not implement any NLP or Machine Learning components as such but serves as a 'large-scale glueware' for existing resources. Being Hadoop-based, it benefits from all its features, namely scalability, fault-tolerance and most notably the back up of a thriving open source community.

WIKI : https://github.com/DigitalPebble/behemoth/wiki

Mailing list : http://groups.google.com/group/digitalpebble

StackOverflow : http://stackoverflow.com/questions/tagged/behemoth

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].