All Projects → datumbox → Datumbox Framework

datumbox / Datumbox Framework

Licence: apache-2.0
Datumbox is an open-source Machine Learning framework written in Java which allows the rapid development of Machine Learning and Statistical applications.

Programming Languages

java
68154 projects - #9 most used programming language

Projects that are alternatives of or similar to Datumbox Framework

Tennis Crystal Ball
Ultimate Tennis Statistics and Tennis Crystal Ball - Tennis Big Data Analysis and Prediction
Stars: ✭ 107 (-89.93%)
Mutual labels:  data-science, statistics, big-data
Data Science Live Book
An open source book to learn data science, data analysis and machine learning, suitable for all ages!
Stars: ✭ 193 (-81.84%)
Mutual labels:  data-science, statistics, big-data
Datascience Ai Machinelearning Resources
Alex Castrounis' curated set of resources for artificial intelligence (AI), machine learning, data science, internet of things (IoT), and more.
Stars: ✭ 414 (-61.05%)
Mutual labels:  data-science, statistics, big-data
Sciblog support
Support content for my blog
Stars: ✭ 694 (-34.71%)
Mutual labels:  data-science, big-data
Data Science Career
Career Resources for Data Science, Machine Learning, Big Data and Business Analytics Career Repository
Stars: ✭ 630 (-40.73%)
Mutual labels:  data-science, big-data
Cracking The Data Science Interview
A Collection of Cheatsheets, Books, Questions, and Portfolio For DS/ML Interview Prep
Stars: ✭ 672 (-36.78%)
Mutual labels:  data-science, statistics
Imbalanced Learn
A Python Package to Tackle the Curse of Imbalanced Datasets in Machine Learning
Stars: ✭ 5,617 (+428.41%)
Mutual labels:  data-science, statistics
Looper
A resource list for causality in statistics, data science and physics
Stars: ✭ 23 (-97.84%)
Mutual labels:  data-science, statistics
Statistical Rethinking With Python And Pymc3
Python/PyMC3 port of the examples in " Statistical Rethinking A Bayesian Course with Examples in R and Stan" by Richard McElreath
Stars: ✭ 713 (-32.93%)
Mutual labels:  data-science, statistics
Blogr
Scripts + data to recreate analyses published on http://benjaminlmoore.wordpress.com and http://blm.io
Stars: ✭ 23 (-97.84%)
Mutual labels:  data-science, statistics
Autodl
Automated Deep Learning without ANY human intervention. 1'st Solution for AutoDL [email protected]
Stars: ✭ 854 (-19.66%)
Mutual labels:  data-science, big-data
Boltons
🔩 Like builtins, but boltons. 250+ constructs, recipes, and snippets which extend (and rely on nothing but) the Python standard library. Nothing like Michael Bolton.
Stars: ✭ 5,671 (+433.49%)
Mutual labels:  data-science, statistics
H2o 3
H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
Stars: ✭ 5,656 (+432.08%)
Mutual labels:  data-science, big-data
Learn Julia The Hard Way
Learn Julia the hard way!
Stars: ✭ 679 (-36.12%)
Mutual labels:  data-science, statistics
Smile
Statistical Machine Intelligence & Learning Engine
Stars: ✭ 5,412 (+409.13%)
Mutual labels:  data-science, statistics
Awesome Python Data Science
Probably the best curated list of data science software in Python.
Stars: ✭ 812 (-23.61%)
Mutual labels:  data-science, statistics
Socrat
A Dynamic Web Toolbox for Interactive Data Processing, Analysis, and Visualization
Stars: ✭ 26 (-97.55%)
Mutual labels:  data-science, statistics
Dataflowjavasdk
Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.
Stars: ✭ 854 (-19.66%)
Mutual labels:  data-science, big-data
Mlj.jl
A Julia machine learning framework
Stars: ✭ 982 (-7.62%)
Mutual labels:  data-science, statistics
Nipype
Workflows and interfaces for neuroimaging packages
Stars: ✭ 557 (-47.6%)
Mutual labels:  data-science, big-data

Datumbox Machine Learning Framework

Build Status Windows Build status Maven Central License

Datumbox

The Datumbox Machine Learning Framework is an open-source framework written in Java which allows the rapid development Machine Learning and Statistical applications. The main focus of the framework is to include a large number of machine learning algorithms & statistical methods and to be able to handle large sized datasets.

Copyright & License

Copyright (C) 2013-2020 Vasilis Vryniotis.

The code is licensed under the Apache License, Version 2.0.

Installation & Versioning

Datumbox Framework is available on Maven Central Repository.

The latest stable version of the framework is 0.8.2 (Build 20200805). To use it, add the following snippet in your pom.xml:

    <dependency>
        <groupId>com.datumbox</groupId>
        <artifactId>datumbox-framework-lib</artifactId>
        <version>0.8.2</version>
    </dependency>

The latest snapshot version of the framework is 0.8.3-SNAPSHOT (Build 20201014). To test it, update your pom.xml as follows:

    <repository>
       <id>sonatype-snapshots</id>
       <name>sonatype snapshots repo</name>
       <url>https://oss.sonatype.org/content/repositories/snapshots</url>
    </repository>

    <dependency>
        <groupId>com.datumbox</groupId>
        <artifactId>datumbox-framework-lib</artifactId>
        <version>0.8.3-SNAPSHOT</version>
    </dependency>

The develop branch is the development branch (default github branch), while the master branch contains the latest stable version of the framework. All the stable releases are marked with tags.

The releases of the framework follow the Semantic Versioning approach. For detailed information about the various releases check out the Changelog.

Documentation and Code Examples

All the public methods and classes of the Framework are documented with Javadoc comments. Moreover for every model there is a JUnit Test which clearly shows how to train and use the models. Finally for more examples on how to use the framework checkout the Code Examples or the official Blog.

Pre-trained Models

Datumbox comes with a large number of pre-trained models which allow you to perform Sentiment Analysis (Document & Twitter), Subjectivity Analysis, Topic Classification, Spam Detection, Adult Content Detection, Language Detection, Commercial Detection, Educational Detection and Gender Detection. To get the binary models check out the Datumbox Zoo.

Which methods/algorithms are supported?

The Framework currently supports performing multiple Parametric & non-parametric Statistical tests, calculating descriptive statistics on censored & uncensored data, performing ANOVA, Cluster Analysis, Dimension Reduction, Regression Analysis, Timeseries Analysis, Sampling and calculation of probabilities from the most common discrete and continues Distributions. In addition it provides several implemented algorithms including Max Entropy, Naive Bayes, SVM, Bootstrap Aggregating, Adaboost, Kmeans, Hierarchical Clustering, Dirichlet Process Mixture Models, Softmax Regression, Ordinal Regression, Linear Regression, Stepwise Regression, PCA and several other techniques that can be used for feature selection, ensemble learning, linear programming solving and recommender systems.

Bug Reports

Despite the fact that parts of the Framework have been used in commercial applications, not all classes are equally used/tested. Currently the framework is in Alpha version, so you should expect some changes on the public APIs on future versions. If you spot a bug please submit it as an Issue on the official Github repository.

Contributing

The Framework can be improved in many ways and as a result any contribution is welcome. By far the most important feature missing from the Framework is the ability to use it from command line or from other languages such as Python. Other important enhancements include improving the documentation, the test coverage and the examples, improving the architecture of the framework and supporting more Machine Learning and Statistical Models. If you make any useful changes on the code, please consider contributing them by sending a pull request.

Acknowledgements

Many thanks to Eleftherios Bampaletakis for his invaluable input on improving the architecture of the Framework. Also many thanks to ej-technologies GmbH for providing a license for their Java Profiler and to JetBrains for providing a license for their Java IDE.

Useful Links

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].