All Projects → bhrnjica → daany

bhrnjica / daany

Licence: MIT license
Daany - .NET DAta ANalYtics .NET library with the implementation of DataFrame, Time series decompositions and Linear Algebra routines BLASS and LAPACK.

Programming Languages

C#
18002 projects
powershell
5483 projects

Projects that are alternatives of or similar to daany

dflib
In-memory Java DataFrame library
Stars: ✭ 50 (+2.04%)
Mutual labels:  data-frame, series, dataframe
Datatable
A go in-memory table
Stars: ✭ 215 (+338.78%)
Mutual labels:  series, dataframe
Tablesaw
Java dataframe and visualization library
Stars: ✭ 2,785 (+5583.67%)
Mutual labels:  data-frame, dataframe
pssa
Singular Spectrum Analysis for time series forecasting in Python
Stars: ✭ 119 (+142.86%)
Mutual labels:  ssa, series
Qframe
Immutable data frame for Go
Stars: ✭ 282 (+475.51%)
Mutual labels:  data-frame, dataframe
DataFrame
DataFrame Library for Java
Stars: ✭ 51 (+4.08%)
Mutual labels:  data-frame, dataframe
bow
Go data analysis / manipulation library built on top of Apache Arrow
Stars: ✭ 20 (-59.18%)
Mutual labels:  data-frame, dataframe
Dataframe Js
A javascript library providing a new data structure for datascientists and developpers
Stars: ✭ 376 (+667.35%)
Mutual labels:  data-frame, dataframe
Styleframe
A library that wraps pandas and openpyxl and allows easy styling of dataframes in excel
Stars: ✭ 252 (+414.29%)
Mutual labels:  data-frame, dataframe
tablexplore
Table analysis and plotting application written in PySide2/PyQt5
Stars: ✭ 89 (+81.63%)
Mutual labels:  dataframe
ST-ASS
ASS/SSA subtitles syntax highlight for Sublime Text.
Stars: ✭ 23 (-53.06%)
Mutual labels:  ssa
server-benchmarks
🚀 Cross-platform transparent benchmarks for HTTP/2 Web Servers at 2020-2023
Stars: ✭ 78 (+59.18%)
Mutual labels:  iris
demo-chatroom
go+iris+jwt+mysql+xorm+viper,iris项目实战简易聊天室,登录、注册、私聊、群聊。
Stars: ✭ 47 (-4.08%)
Mutual labels:  iris
isarn-sketches-spark
Routines and data structures for using isarn-sketches idiomatically in Apache Spark
Stars: ✭ 28 (-42.86%)
Mutual labels:  dataframe
SABRE.js
Substation Alpha suBtitles REnderer -- A Gpu Accelerated Javascript Advanced SubStation (ASS) Alpha Subtitles Renderer. Renders .ass and .ssa files.
Stars: ✭ 58 (+18.37%)
Mutual labels:  ssa
php-serializer
Serialize PHP variables, including objects, in any format. Support to unserialize it too.
Stars: ✭ 47 (-4.08%)
Mutual labels:  data-transformation
data-algorithms-with-spark
O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian
Stars: ✭ 34 (-30.61%)
Mutual labels:  data-transformation
clojure-dsl-resources
A curated list of Clojure resources for dealing with domain-specific languages.
Stars: ✭ 99 (+102.04%)
Mutual labels:  data-transformation
bamboolib binder template
bamboolib - template for creating your own binder notebook
Stars: ✭ 19 (-61.22%)
Mutual labels:  data-transformation
scalarr
A command-line client for Sonarr, written in Scala
Stars: ✭ 33 (-32.65%)
Mutual labels:  series

Daany Logo

Daany - .NET DAta ANalYtics .NET library with the implementation of DataFrame, Time series decompositions and Linear Algebra LAPACK and BLASS routines.

License NuGet Status

Daany Developer Guide - complete guide for developers.

Software Requirements

The latest version of the library is built on .NET Standard2.1 and supports .NET 3 and above.

Unfortunately the library is not available for .NET Framework 4.8 and below. In case you want to use it on .NET Framework and Standard 2.0, use older version v0.3.*, or try to build the version from the source code.

The main components with separate NuGet package of Daany library are:

  • Daany.DataFrame - data frame implementation in pure C#.
  • Daany.DataFrame.Ext - data frame extensions for additional implementation about plotting, data scaling and encoding and similar.
  • Daany.Stat - time series decompositions e.g. SSA, STL, ....
  • Daany.LinA - .NET wrapper of the Intel MKL LAPACK and BLASS routines.

Data Frame (Daany.DataFrame) and extensions (Daany.DataFrame.Ext)

Daany.DataFrame implementation follows the .NET coding paradigm rather than Pandas look and feel. The DataFrame implementation tries to fill the gap in ML.NET data preparation phase, and it can be easily passed to ML.NET pipeline. The DataFrame does not require any class type implementation prior to data loading and transformation.

Once the DataFrame completes the data transformation, the extension methods provide the easy way to pass the data into MLContex object.

The following example shows Daany.DataFrame in action:

Data Loading

We are going to use iris data file, which can be found on many places on the internet. The basic structure of the file is that it contains 5 tab separated columns: sepal_length, sepal_width, petal_length, petal_width, and species. The Daany.DataFrame class has predefined static methods to load data from txt or csv file. The following code loads the data and create DataFrame object:

//read the iris data and create DataFrame object. 
var df = DataFrame.FromCsv(orgdataPath,sep:'\t');

Now that we have data frame, we can perform one of many supported data transformations. For this example we are going to create two new calculated columns:

//calculate two new columns into dataset
df.AddCalculatedColumns(new string[] { "SepalArea", "PetalArea" }, 
        (r, i) =>
        {
            var aRow = new object[2];
            aRow[0]=Convert.ToSingle(r["sepal_width"]) * Convert.ToSingle(r["sepal_length"]);
            aRow[1] = Convert.ToSingle(r["petal_width"]) * Convert.ToSingle(r["petal_length"]);
            return aRow;
        });

Now the df object has two new columns:SepalArea and PetalArea.

As the next step we are going to create a new Data Frame containing only three columns: SepalArea, PetalArea and Species:

//create new data-frame by selecting only three columns
var derivedDF = df["SepalArea","PetalArea","species"];

For this purpose, we may use Create method by passing tuples of the old and new column name. In our case, we simply use indexer with column names to get a new Data Frame.

Building a model by using ML.NET

We transformed the data and created final data frame, which will be passed to the ML.NET. Since the data is already in the memory, we should use mlContext.Data.LoadFromEnumerable ML.NET method. Here we need to provide the type for the loaded data.

So let's create the Iris class with only three properties since we want to use only two columns as the features and one as the label.

class Iris
{
    public float PetalArea { get; set; }
    public float SepalArea { get; set; }
    public string Species { get; set; }
}

Once we have the class type implemented we can load the data frame into ML.NET:

//Load Data Frame into Ml.NET data pipeline
IDataView dataView = mlContext.Data.LoadFromEnumerable<Iris>(derivedDF.GetEnumerator<Iris>((oRow) =>
 {
     //convert row object array into Iris row               
     var prRow = new Iris();
     prRow.SepalArea = Convert.ToSingle(oRow["SepalArea"]);
     prRow.PetalArea = Convert.ToSingle(oRow["PetalArea"]);
     prRow.Species = Convert.ToString(oRow["species"]);
     //
     return prRow;
 }));

The whole data has been loaded into the ML.NET pipeline, so we have to split the data into Train and Test set:

//Split dataset in two parts: TrainingDataset (80%) and TestDataset (20%)
var trainTestData = mlContext.Data.TrainTestSplit(dataView, testFraction: 0.1);
var trainData = trainTestData.TrainSet;
var testData = trainTestData.TestSet;

Create the pipeline to prepare the train data for machine learning:

//prepare data for ML
//one encoding output category column by defining KeyValues for each category
var dataPipeline = mlContext.Transforms.Conversion.MapValueToKey(
                            outputColumnName: "Label", 
                            inputColumnName: nameof(Iris.Species))
                
//define features columns
.Append(mlContext.Transforms.Concatenate("Features",nameof(Iris.SepalArea), nameof(Iris.PetalArea)));

Use data pipeline and trainSet to train and build the model.

//train and build the model

//create Trainer
var lightGbm = mlContext.MulticlassClassification.Trainers.LightGbm();

//train the ML model
var model = transformationPipeline.Append(lightGbm).Fit(preparedData);

Model Evaluation

Once we have trained model, we can evaluate how it predicts the Iris flower from the testSet:

//evaluate test set
var testPrediction = model.Transform(testData);
var metricsTest = mlContext.MulticlassClassification.Evaluate(testPrediction);
ConsoleHelper.PrintMultiClassClassificationMetrics("TEST Iris Dataset", metricsTest);
ConsoleHelper.ConsoleWriteHeader("Test Iris DataSet Confusion Matrix ");
ConsoleHelper.ConsolePrintConfusionMatrix(metricsTest.ConfusionMatrix);

Once the program is run the output shows that we have 100% accurate prediction of Iris model using testSet: Iris Model Evaluation

Daany Statistics (Daany.Stat)

Besides the Daany.DataFrame the library contains set of implementation with working on time series data. The following list contains some of them:

  • Conversion time series into Daany.DataFrame and Series
  • Seasonal and Trend decomposition using Loess -STL time series decomposition,
  • Singular Spectrum Analysis SSA time series decomposition,
  • Set of Time Series operations like moving average, etc....

Singular Spectrum Analysis, SSA

With SSA, you can decompose the time series into any number of components (signals). The following code loads the famous AirPassengerstime series data:

var strPath = $"{root}/AirPassengers.csv";
var mlDF = DataFrame.FromCsv(strPath, sep: ',');
var ts = mlDF["#Passengers"].Select(f => Convert.ToDouble(f));//create time series from data frame

Now that we have AirPasanger time series object ts, we can create SSA object by passing thets into it:

//create Singular Spectrum Analysis object
var ssa = new SSA(ts);

//perform analysis
ssa.Fit(36);

So we created the ssa object by passing the number of components that we are going to create. Once the ssa object has been created we can call the Fit method to start with time series SSA analysis.

Once we have analyzed the time series, we can plot its components. The following plot shows the first 4 components:

Iris Model Evaluation

The following plot shows how previous 4 components approximate the actual AirPassengers data:

Iris Model Evaluation

At the end we can plot ssa predicted and actual values of the time series:

Iris Model Evaluation

Daany Linear Algebra (Daany.LinA)

The Daany.LinA provides the ability to use Intel MKL a native and super fast math library to perform linear algebra calculations. With the combination of the previous packages (DataFrame and Daany.Stat) you are able to transform and analyze very complex data, solve system of linear equations, find eigen values and vectors, use least square method etc.

For more information how to use any of the implemented methods please see the Daany Developer Guide, test application implemented in the library or you can use unit test methods which cover almost all implementation in the library.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].