All Projects → elki-project → Elki

elki-project / Elki

Licence: agpl-3.0
ELKI Data Mining Toolkit

Programming Languages

java
68154 projects - #9 most used programming language

Projects that are alternatives of or similar to Elki

Matrixprofile
A Python 3 library making time series data mining tasks, utilizing matrix profile algorithms, accessible to everyone.
Stars: ✭ 141 (-77%)
Mutual labels:  data-science, data-mining, time-series, clustering
Tsrepr
TSrepr: R package for time series representations
Stars: ✭ 75 (-87.77%)
Mutual labels:  data-science, data-analysis, data-mining, time-series
Datascience
Curated list of Python resources for data science.
Stars: ✭ 3,051 (+397.72%)
Mutual labels:  data-science, data-analysis, data-mining
Orange3
🍊 📊 💡 Orange: Interactive data analysis
Stars: ✭ 3,152 (+414.19%)
Mutual labels:  data-science, data-mining, clustering
Urs
Universal Reddit Scraper - A comprehensive Reddit scraping command-line tool written in Python.
Stars: ✭ 275 (-55.14%)
Mutual labels:  data-science, data-analysis, data-mining
Collapse
Advanced and Fast Data Transformation in R
Stars: ✭ 184 (-69.98%)
Mutual labels:  data-science, data-analysis, time-series
Amazing Feature Engineering
Feature engineering is the process of using domain knowledge to extract features from raw data via data mining techniques. These features can be used to improve the performance of machine learning algorithms. Feature engineering can be considered as applied machine learning itself.
Stars: ✭ 218 (-64.44%)
Mutual labels:  data-science, data-analysis, data-mining
genieclust
Genie++ Fast and Robust Hierarchical Clustering with Noise Point Detection - for Python and R
Stars: ✭ 34 (-94.45%)
Mutual labels:  data-mining, clustering, data-analysis
Pydataroad
open source for wechat-official-account (ID: PyDataLab)
Stars: ✭ 302 (-50.73%)
Mutual labels:  data-science, data-analysis, data-mining
Ai Learn
人工智能学习路线图,整理近200个实战案例与项目,免费提供配套教材,零基础入门,就业实战!包括:Python,数学,机器学习,数据分析,深度学习,计算机视觉,自然语言处理,PyTorch tensorflow machine-learning,deep-learning data-analysis data-mining mathematics data-science artificial-intelligence python tensorflow tensorflow2 caffe keras pytorch algorithm numpy pandas matplotlib seaborn nlp cv等热门领域
Stars: ✭ 4,387 (+615.66%)
Mutual labels:  data-science, data-analysis, data-mining
Cookbook 2nd Code
Code of the IPython Cookbook, Second Edition, by Cyrille Rossant, Packt Publishing 2018 [read-only repository]
Stars: ✭ 541 (-11.75%)
Mutual labels:  data-science, data-analysis, data-mining
Pyod
A Python Toolbox for Scalable Outlier Detection (Anomaly Detection)
Stars: ✭ 5,083 (+729.2%)
Mutual labels:  data-science, data-analysis, data-mining
Data Science Resources
👨🏽‍🏫You can learn about what data science is and why it's important in today's modern world. Are you interested in data science?🔋
Stars: ✭ 171 (-72.1%)
Mutual labels:  data-science, data-analysis, data-mining
Deepgraph
Analyze Data with Pandas-based Networks. Documentation:
Stars: ✭ 232 (-62.15%)
Mutual labels:  data-science, data-analysis, data-mining
Machine learning for good
Machine learning fundamentals lesson in interactive notebooks
Stars: ✭ 142 (-76.84%)
Mutual labels:  data-science, data-analysis, data-mining
genie
Genie: A Fast and Robust Hierarchical Clustering Algorithm (this R package has now been superseded by genieclust)
Stars: ✭ 21 (-96.57%)
Mutual labels:  data-mining, clustering, data-analysis
Sktime
A unified framework for machine learning with time series
Stars: ✭ 4,741 (+673.41%)
Mutual labels:  data-science, data-mining, time-series
Dat8
General Assembly's 2015 Data Science course in Washington, DC
Stars: ✭ 1,516 (+147.31%)
Mutual labels:  data-science, data-analysis, clustering
Rightmove webscraper.py
Python class to scrape data from rightmove.co.uk and return listings in a pandas DataFrame object
Stars: ✭ 125 (-79.61%)
Mutual labels:  data-science, data-analysis, data-mining
Pycaret
An open-source, low-code machine learning library in Python
Stars: ✭ 4,594 (+649.43%)
Mutual labels:  data-science, time-series, clustering

ELKI

Environment for Developing KDD-Applications Supported by Index-Structures

arXiv:1902-03616 DBLP:journals/corr/abs-1902-03616 License AGPL-3.0 Build Status

Quick Summary

ELKI is an open source (AGPLv3) data mining software written in Java. The focus of ELKI is research in algorithms, with an emphasis on unsupervised methods in cluster analysis and outlier detection. In order to achieve high performance and scalability, ELKI offers many data index structures such as the R*-tree that can provide major performance gains. ELKI is designed to be easy to extend for researchers and students in this domain, and welcomes contributions in particular of new methods. ELKI aims at providing a large collection of highly parameterizable algorithms, in order to allow easy and fair evaluation and benchmarking of algorithms.

Background

Data mining research leads to many algorithms for similar tasks. A fair and useful comparison of these algorithms is difficult due to several reasons:

  • Implementations of comparison partners are not at hand.
  • If implementations of different authors are provided, an evaluation in terms of efficiency is biased to evaluate the efforts of different authors in efficient programming instead of evaluating algorithmic merits.

On the other hand, efficient data management tools like index-structures can show considerable impact on data mining tasks and are therefore useful for a broad variety of algorithms.

In ELKI, data mining algorithms and data management tasks are separated and allow for an independent evaluation. This separation makes ELKI unique among data mining frameworks like Weka or Rapidminer and frameworks for index structures like GiST. At the same time, ELKI is open to arbitrary data types, distance or similarity measures, or file formats. The fundamental approach is the independence of file parsers or database connections, data types, distances, distance functions, and data mining algorithms. Helper classes, e.g. for algebraic or analytic computations are available for all algorithms on equal terms.

With the development and publication of ELKI, we humbly hope to serve the data mining and database research community beneficially. The framework is free for scientific usage ("free" as in "open source", see License for details). In case of application of ELKI in scientific publications, we would appreciate credit in form of a citation of the appropriate publication (see our list of publications), that is, the publication related to the release of ELKI you were using.

The people behind ELKI are documented on the Team page.

The ELKI wiki: Tutorials, HowTos, Documentation

Beginners may want to start at the HowTo documents, Examples and Tutorials to help with difficult configuration scenarios and beginning with ELKI development.

This website serves as community development hub and task tracker for both bug reports, Tutorials, FAQ, general issues and development tasks.

The most important documentation pages are: Tutorial, JavaDoc, FAQ, InputFormat, DataTypes, DistanceFunctions, DataSets, Development, Parameterization, Visualization, Benchmarking, and the list of Algorithms and RelatedPublications.

Getting ELKI: Download and Citation Policy

You can download ELKI including source code on the Releases page.
ELKI uses the AGPLv3 License, a well-known open source license.

There is a list of Publications that accompany the ELKI releases. When using ELKI in your scientific work, you should cite the publication corresponding to the ELKI release you are using, to give credit. This also helps to improve the repeatability of your experiments. We would also appreciate if you contributed your algorithm to ELKI to allow others to reproduce your results and compare with your algorithm (which in turn will likely get you citations). We try to document every publication used for implementing ELKI: the page RelatedPublications is generated from the source code annotations.

Efficiency Benchmarking with ELKI

ELKI is quite fast (see some of our benchmark results) but the focus lies on a broad coverage of algorithms and variations. We discourage cross-platform benchmarking, because it is easy to produce misleading results by comparing apples and oranges. For fair comparability, you should implement all algorithms within ELKI, and use the same APIs. We have also observed Java JDK versions have a large impact on the runtime performance. To make your results reproducible, please cite the version you have been using. See also Benchmarking.

Bug Reports and Contact

You can browse the open bug reports or create a new bug report.

We also appreciate any comments, suggestions and code contributions.
You can contact the core development team by e-mail: elki () dbs ifi lmu de

You can also subscribe to the user mailing list of ELKI, to exchange questions and ideas among other users or to get announcements (e.g., new releases, major changes) by the ELKI team.

Our primary "support" medium is this community mailing list. We appreciate if you share experiences and also success stories there that might help other users. This project makes a lot of progress, and information can get outdated rather quickly. If you prefer a web forum, you can try asking at StackOverflow, but you should understand that this is a general (and third-party operated) programming community.

Design Goals

  • Extensibility - ELKI has a very modular design. We want to allow arbitrary combinations of data types, distance functions, algorithms, input formats, index structures and evaluations methods
  • Contributions - ELKI grows only as fast as people contribute. By having a modular design that allows small contributions such as single distance functions and single algorithms, we can have students and external contributors participate in the progress of ELKI
  • Completeness - for an exhaustive comparison of methods, we aim at covering as much published and credited work as we can
  • Fairness - It is easy to do an unfair comparison by badly implementing a competitor. We try to implement every method as good as we can, and by publishing the source code allow for external improvements. We try to add all proposed improvements, such as index structures for faster range and kNN queries
  • Performance - the modular architecture of ELKI allows optimized versions of algorithms and index structures for acceleration
  • Progress - ELKI is changing with every release. To accomodate new features and enhance performance, API breakages are unavoidable. We hope to get a stable API with the 1.0 release, but we are not there yet.

Building ELKI

ELKI is built using the Gradle wrapper:

./gradlew shadowJar

will produce a single executable jar file named elki-bundle-<VERSION>.jar.

Individual jar files can be built using:

./gradlew jar

A complete build (with tests and JavaDoc, it will take a few minutes) can be triggered as:

./gradlew build

Eclipse can build ELKI, and the easiest way is to use elki-bundle as classpath, which includes everything enabled.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].