All Projects → w4k2 → stream-learn

w4k2 / stream-learn

Licence: GPL-3.0 license
The stream-learn is an open-source Python library for difficult data stream analysis.

Programming Languages

python
139335 projects - #7 most used programming language
TeX
3793 projects

Projects that are alternatives of or similar to stream-learn

Awesome Linux Software
A list of awesome applications, software, tools and other materials for Linux distros.
Stars: ✭ 16,943 (+36732.61%)
Mutual labels:  software
malleable.systems
Website for the malleable systems and software community
Stars: ✭ 49 (+6.52%)
Mutual labels:  software
FRC-Java-Tutorial
A tutorial on how to program a robot for use in the FIRST Robotics Competition
Stars: ✭ 52 (+13.04%)
Mutual labels:  software
Marabu
Music Synthetiser
Stars: ✭ 440 (+856.52%)
Mutual labels:  software
Black-Tool
Install the tools and start Attacking , black-tool v5.0 ! ⬛
Stars: ✭ 239 (+419.57%)
Mutual labels:  software
trener
A simple programming challenge for implementing a train station app
Stars: ✭ 28 (-39.13%)
Mutual labels:  software
Udemy Course Grabber
A script/software for automatically enrolling/joining 100% discounted Udemy courses for free. Get Paid Udemy courses for free with just a few clicks.
Stars: ✭ 230 (+400%)
Mutual labels:  software
RoboVision
Attempting to create a program capable of combining stereo video input , with motors and other sensors on a PC running linux , the target is embedded linux for use in a robot!
Stars: ✭ 21 (-54.35%)
Mutual labels:  software
jet
A Fast C and Python like Programming Language that puts the Developer first. WIP
Stars: ✭ 41 (-10.87%)
Mutual labels:  software
swGL
A multithreaded software implementation of OpenGL 1.3 in C++.
Stars: ✭ 50 (+8.7%)
Mutual labels:  software
Sudoku-Solver
🎯 This Python-based Sudoku Solver utilizes the PyGame Library and Backtracking Algorithm to visualize and solve Sudoku puzzles efficiently. With its intuitive interface, users can input and interact with the Sudoku board, allowing for a seamless solving experience.
Stars: ✭ 51 (+10.87%)
Mutual labels:  software
ideas-for-projects-people-would-use
Every time I have an idea, I write it down. These are a collection of my top software ideas -- problems I think enough people have that don't have solutions. I expect you can reach a decent userbase if marketed correctly, as I am surely not the only one with these problems.
Stars: ✭ 646 (+1304.35%)
Mutual labels:  software
APC
Arduino Pinball Controller
Stars: ✭ 27 (-41.3%)
Mutual labels:  software
rab
Rusty Armor Builds - Monster Hunter Rise Armor Set Creation Tool
Stars: ✭ 29 (-36.96%)
Mutual labels:  software
awesome-macos-commandline
A curated list of awesome command-line software for macOS.
Stars: ✭ 167 (+263.04%)
Mutual labels:  software
Freelearningresourcesforsoftwaretesters
A New Project to create a set of links to free Online Learning Resources for New and Experienced Software Testers.
Stars: ✭ 247 (+436.96%)
Mutual labels:  software
Gisola
Gisola: A High Performance Computing application for real-time Moment Tensor inversion
Stars: ✭ 35 (-23.91%)
Mutual labels:  software
PingoMeter
PingoMeter - is a small portable program that show your ping in Windows system tray
Stars: ✭ 91 (+97.83%)
Mutual labels:  software
SoftUni-Software-Engineering
SoftUni- Software Engineering
Stars: ✭ 47 (+2.17%)
Mutual labels:  software
react-native-text-area
Simple and easy to use TextArea for React Native.
Stars: ✭ 20 (-56.52%)
Mutual labels:  software

stream-learn

CircleCI codecov Documentation Status PyPI version

The stream-learn module is a set of tools necessary for processing data streams using scikit-learn estimators. The batch processing approach is used here, where the dataset is passed to the classifier in smaller, consecutive subsets called chunks. The module consists of five sub-modules:

  • streams - containing a data stream generator that allows obtaining both stationary and dynamic distributions in accordance with various types of concept drift (also in the field of a priori probability, i.e. dynamically unbalanced data) and a parser of the standard ARFF file format.
  • evaluators - containing classes for running experiments on stream data in accordance with the Test-Then-Train and Prequential methodology.
  • classifiers - containing sample stream classifiers,
  • ensembles - containing standard team models of stream data classification,
  • utils - containing typical classification quality metrics in data streams.

You can read more about each module in the documentation page.

Citation policy

If you use stream-learn in a scientific publication, we would appreciate citation to the following paper:

@article{Ksieniewicz2022,
  doi = {10.1016/j.neucom.2021.10.120},
  url = {https://doi.org/10.1016/j.neucom.2021.10.120},
  year = {2022},
  month = jan,
  publisher = {Elsevier {BV}},
  author = {P. Ksieniewicz and P. Zyblewski},
  title = {stream-learn {\textemdash} open-source Python library for difficult data stream batch analysis},
  journal = {Neurocomputing}
}

Quick start guide

Installation

To use the stream-learn package, it will be absolutely useful to install it. Fortunately, it is available in the PyPI repository, so you may install it using pip:

pip3 install -U stream-learn

stream-learn is also avaliable with conda:

conda install stream-learn -c w4k2 -c conda-forge

You can also install the module cloned from Github using the setup.py file if you have a strange, but perhaps legitimate need:

git clone https://github.com/w4k2/stream-learn.git
cd stream-learn
make install

Preparing experiments

1. Classifier

In order to conduct experiments, a declaration of four elements is necessary. The first is the estimator, which must be compatible with the scikit-learn API and, in addition, implement the partial_fit() method, allowing you to re-fit the already built model. For example, we'll use the standard Gaussian Naive Bayes algorithm:

from sklearn.naive_bayes import GaussianNB
clf = GaussianNB()

2. Data Stream

The next element is the data stream that we aim to process. In the example we will use a synthetic stream consisting of shocking number of 100 chunks and containing precisely one concept drift. We will prepare it using the StreamGenerator() class of the stream-learn module:

from strlearn.streams import StreamGenerator
stream = StreamGenerator(n_chunks=100, n_drifts=1)

3. Metrics

The third requirement of the experiment is to specify the metrics used in the evaluation of the methods. In the example, we will use the accuracy metric available in scikit-learn and the precision from the stream-learn module:

from sklearn.metrics import accuracy_score
from strlearn.metrics import precision
metrics = [accuracy_score, precision]

4. Evaluator

The last necessary element of processing is the evaluator, i.e. the method of conducting the experiment. For example, we will choose the Test-Then-Train paradigm, described in more detail in User Guide. It is important to note, that we need to provide the metrics that we will use in processing at the point of initializing the evaluator. In the case of none metrics given, it will use default pair of accuracy and balanced accuracy scores:

from strlearn.evaluators import TestThenTrain
evaluator = TestThenTrain(metrics)

Processing and understanding results

Once all processing requirements have been met, we can proceed with the evaluation. To start processing, call the evaluator's process method, feeding it with the stream and classifier::

evaluator.process(stream, clf)

The results obtained are stored in the scores atribute of evaluator. If we print it on the screen, we may be able to observe that it is a three-dimensional numpy array with dimensions (1, 29, 2).

  • The first dimension is the index of a classifier submitted for processing. In the example above, we used only one model, but it is also possible to pass a tuple or list of classifiers that will be processed in parallel (See User Guide).
  • The second dimension specifies the instance of evaluation, which in the case of Test-Then-Train methodology directly means the index of the processed chunk.
  • The third dimension indicates the metric used in the processing.

Using this knowledge, we may finally try to illustrate the results of our simple experiment in the form of a plot::

import matplotlib.pyplot as plt

plt.figure(figsize=(6,3))

for m, metric in enumerate(metrics):
    plt.plot(evaluator.scores[0, :, m], label=metric.__name__)

plt.title("Basic example of stream processing")
plt.ylim(0, 1)
plt.ylabel('Quality')
plt.xlabel('Chunk')

plt.legend()

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].