All Projects → openaire → iis

openaire / iis

Licence: Apache-2.0 license
Information Inference Service of the OpenAIRE system

Programming Languages

java
68154 projects - #9 most used programming language
HTML
75241 projects
PigLatin
29 projects
python
139335 projects - #7 most used programming language
scala
5932 projects
shell
77523 projects
HiveQL
18 projects

Projects that are alternatives of or similar to iis

corpusexplorer2.0
Korpuslinguistik war noch nie so einfach...
Stars: ✭ 16 (+0%)
Mutual labels:  text-mining, data-mining, big-data
Text mining resources
Resources for learning about Text Mining and Natural Language Processing
Stars: ✭ 358 (+2137.5%)
Mutual labels:  text-mining, data-mining
Artificial Adversary
🗣️ Tool to generate adversarial text examples and test machine learning models against them
Stars: ✭ 348 (+2075%)
Mutual labels:  text-mining, data-mining
Tadw
An implementation of "Network Representation Learning with Rich Text Information" (IJCAI '15).
Stars: ✭ 43 (+168.75%)
Mutual labels:  text-mining, data-mining
SparseLSH
A Locality Sensitive Hashing (LSH) library with an emphasis on large, highly-dimensional datasets.
Stars: ✭ 127 (+693.75%)
Mutual labels:  text-mining, data-mining
advanced-text-mining
TEANAPS 라이브러리를 활용한 자연어 처리와 텍스트 분석 방법론에 대해 다룹니다.
Stars: ✭ 15 (-6.25%)
Mutual labels:  text-mining, data-mining
Metasra Pipeline
MetaSRA: normalized sample-specific metadata for the Sequence Read Archive
Stars: ✭ 33 (+106.25%)
Mutual labels:  text-mining, data-mining
Vizuka
Explore high-dimensional datasets and how your algo handles specific regions.
Stars: ✭ 100 (+525%)
Mutual labels:  data-mining, big-data
Xioc
Extract indicators of compromise from text, including "escaped" ones.
Stars: ✭ 148 (+825%)
Mutual labels:  text-mining, data-mining
Pyss3
A Python package implementing a new machine learning model for text classification with visualization tools for Explainable AI
Stars: ✭ 191 (+1093.75%)
Mutual labels:  text-mining, data-mining
Qminer
Analytic platform for real-time large-scale streams containing structured and unstructured data.
Stars: ✭ 206 (+1187.5%)
Mutual labels:  text-mining, data-mining
tf-idf-python
Term frequency–inverse document frequency for Chinese novel/documents implemented in python.
Stars: ✭ 98 (+512.5%)
Mutual labels:  text-mining, data-mining
teanaps
자연어 처리와 텍스트 분석을 위한 오픈소스 파이썬 라이브러리 입니다.
Stars: ✭ 91 (+468.75%)
Mutual labels:  text-mining, data-mining
Textract
extract text from any document. no muss. no fuss.
Stars: ✭ 3,165 (+19681.25%)
Mutual labels:  text-mining, data-mining
Accelerator
The Accelerator is a tool for fast and reproducible processing of large amounts of data.
Stars: ✭ 137 (+756.25%)
Mutual labels:  data-mining, big-data
Rmdl
RMDL: Random Multimodel Deep Learning for Classification
Stars: ✭ 375 (+2243.75%)
Mutual labels:  text-mining, data-mining
Dataflowjavasdk
Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.
Stars: ✭ 854 (+5237.5%)
Mutual labels:  data-mining, big-data
Graph sampling
Graph Sampling is a python package containing various approaches which samples the original graph according to different sample sizes.
Stars: ✭ 99 (+518.75%)
Mutual labels:  data-mining, big-data
Cogcomp Nlpy
CogComp's light-weight Python NLP annotators
Stars: ✭ 115 (+618.75%)
Mutual labels:  text-mining, data-mining
Gwu data mining
Materials for GWU DNSC 6279 and DNSC 6290.
Stars: ✭ 217 (+1256.25%)
Mutual labels:  text-mining, data-mining

About

Information Inference Service (IIS) a flexible data processing system for handling big data based on Apache Hadoop technologies. It is a subsystem of the OpenAIRE system (www.openaire.eu is its public web front-end) - see Fig.1 for a high-level overview.

Fig.1: The center of OpenAIRE system is the Information Space system which stores all information available in the system. IIS ingests data from Information Space, runs processing workflows, and produces inferred data which, in turn, is ingested by Information Space.

The goal of OpenAIRE is to provide an infrastructure for gathering, processing (including de-duplication), and providing unified access to research-related data (papers, datasets, researchers, projects, etc.). The goal of IIS is to provide data/text mining functionality for the OpenAIRE system. In practice, IIS defines data processing workflows that connect various modules, each one with well-defined input and output. A high-level overview of IIS can be found in paper "Information Inference in Scholarly Communication Infrastructures: The OpenAIREplus Project Experience", Procedia Computer Science, vol. 38, 2014, 92-99.

IIS was initially developed during OpenAIREplus project and has been further extended during OpenAIRE2020 project.

The original code was migrated to GitHub from D-NET SVN repository. The public read-only interface of the repository is available at https://svn-public.driver.research-infrastructures.eu/driver/dnet40/modules/ and this is where you can find the history of the code base before the migration (IIS-related Maven projects are the ones matching glob pattern *-iis-*).

Content of the most important subdirectories and files

  • docs - basic documentation
  • iis-core - generic common utilities used by other projects
  • iis-common - OpenAIRE-related common utilities
  • iis-wf - definitions of workflows used in the system
  • CONTRIBUTORS.markdown - list of contributors to the project

License

The code is licensed under Apache License, version 2.0. We also use 3rd party code from other projects compatible with this license. This 3rd party code can be found in directories with names starting with iis-3rdparty-; each directory corresponds to a different source project.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].