All Projects → nRo → DataFrame

nRo / DataFrame

Licence: MIT license
DataFrame Library for Java

Programming Languages

java
68154 projects - #9 most used programming language

Projects that are alternatives of or similar to DataFrame

dflib
In-memory Java DataFrame library
Stars: ✭ 50 (-1.96%)
Mutual labels:  data-frame, dataframe, dataframe-library
Nghandsontable
Official AngularJS directive for Handsontable
Stars: ✭ 438 (+758.82%)
Mutual labels:  table, data-table
Arquero
Query processing and transformation of array-backed data tables.
Stars: ✭ 384 (+652.94%)
Mutual labels:  table, dataframe
Hot Table
Handsontable - Best Data Grid Web Component with Spreadsheet Look and Feel.
Stars: ✭ 114 (+123.53%)
Mutual labels:  table, data-table
polars
Fast multi-threaded DataFrame library in Rust | Python | Node.js
Stars: ✭ 6,368 (+12386.27%)
Mutual labels:  dataframe, dataframe-library
Datatablesbundle
This Bundle integrates the jQuery DataTables plugin into your Symfony application.
Stars: ✭ 334 (+554.9%)
Mutual labels:  table, data-table
Reactabular
A framework for building the React table you need (MIT)
Stars: ✭ 903 (+1670.59%)
Mutual labels:  table, data-table
tablex
virtualized and extendable tables and datagrids for React
Stars: ✭ 39 (-23.53%)
Mutual labels:  table, data-table
Angular Handsontable
Angular Data Grid with Spreadsheet Look & Feel. Official Angular wrapper for Handsontable.
Stars: ✭ 175 (+243.14%)
Mutual labels:  table, data-table
react-bolivianite-grid
React grid component for virtualized rendering large tabular data.
Stars: ✭ 95 (+86.27%)
Mutual labels:  table, data-table
daany
Daany - .NET DAta ANalYtics .NET library with the implementation of DataFrame, Time series decompositions and Linear Algebra routines BLASS and LAPACK.
Stars: ✭ 49 (-3.92%)
Mutual labels:  data-frame, dataframe
Pandasvault
Advanced Pandas Vault — Utilities, Functions and Snippets (by @firmai).
Stars: ✭ 316 (+519.61%)
Mutual labels:  table, dataframe
heidi
heidi : tidy data in Haskell
Stars: ✭ 24 (-52.94%)
Mutual labels:  dataframe, dataframe-library
Android Extensions
An Android library with modules to quickly bootstrap an Android application.
Stars: ✭ 356 (+598.04%)
Mutual labels:  table, data-table
Reactgrid
Add spreadsheet-like behavior to your React app
Stars: ✭ 289 (+466.67%)
Mutual labels:  table, data-table
React Handsontable
React Data Grid with Spreadsheet Look & Feel. Official React wrapper for Handsontable.
Stars: ✭ 511 (+901.96%)
Mutual labels:  table, data-table
Tablesaw
Java dataframe and visualization library
Stars: ✭ 2,785 (+5360.78%)
Mutual labels:  data-frame, dataframe
Styleframe
A library that wraps pandas and openpyxl and allows easy styling of dataframes in excel
Stars: ✭ 252 (+394.12%)
Mutual labels:  data-frame, dataframe
Window Table
Windowing Table for React based on React Window
Stars: ✭ 160 (+213.73%)
Mutual labels:  table, data-table
bow
Go data analysis / manipulation library built on top of Apache Arrow
Stars: ✭ 20 (-60.78%)
Mutual labels:  data-frame, dataframe

Java DataFrame

An easy-to-use DataFrame Library for Java.

travis codecov Codacy Badge

Documentation

Javadocs

Install

Maven Central

Add this to you pom.xml

<dependencies>
...
    <dependency>
        <groupId>de.unknownreality</groupId>
        <artifactId>dataframe</artifactId>
        <version>0.7.6</version>
    </dependency>
...
</dependencies>

Build

To build the library from sources:

  1. Clone github repository

    $ git clone https://github.com/nRo/DataFrame.git

  2. Change to the created folder and run mvn install

    $ cd DataFrame

    $ mvn install

  3. Include it by adding the following to your project's pom.xml:

<dependencies>
...
    <dependency>
        <groupId>de.unknownreality</groupId>
        <artifactId>dataframe</artifactId>
        <version>0.7.6-SNAPSHOT</version>
    </dependency>
...
</dependencies>

Version 0.7.5

  • direct value access for DataRow object.

    DataRows now directly access the respective values from the columns.
    This improves runtime and memory footprint for most DataFrame operations. DataRow objects are invalidated once the source DataFrame is changed.
    Accessing an invalidated row results in an exception

  • Row collections are now return as DataRows object.
    DataRows can be converted to a new DataFrame

  • improved 'groupBy' method

Version 0.7

  • The read and write functions have been rewritten from scratch for this version. Some existing methods have been removed.

  • Data grouping has been refactored and aggregation functions can now be applied to data groupings. In general, data groupings can now be used like normal DataFrames.

  • Java 8 is now required.

  • Empty DataFrame instances are now created using DataFrame.create()

Examples

Select all users called Meier or Schmitt from Germany, group by age and add column that contains the number of users with the respective age. Then sort by age and print

URL csvUrl = new URL("https://raw.githubusercontent.com/nRo/DataFrame/master/src/test/resources/users.csv");

DataFrame users = DataFrame.load(csvUrl, FileFormat.CSV);

users.select("(name == 'Schmitt' || name == 'Meier') && country == 'Germany'")
        .groupBy("age").agg("count",Aggregate.count())
        .sort("age")
        .print();

/*
    age count
    20   1
    24   2
    30   2
 */

Load a csv file, set a unique column as primary key and add an index for two other columns. Select rows using the previously created index, change the values in their NAME column and join them with the original DataFrame.

URL csvUrl = new URL("https://raw.githubusercontent.com/nRo/DataFrame/master/src/test/resources/data_index.csv");

DataFrame dataFrame = DataFrame.load(csvUrl, FileFormat.CSV);

dataFrame.setPrimaryKey("UID");
dataFrame.addIndex("id_name_idx","ID","NAME");

DataRow row = dataFrame.selectByPrimaryKey(1);
System.out.println(row);
//1;A;1

DataFrame idxExample = dataFrame.selectByIndex("id_name_idx",3,"A");

idxExample.print();
/*
    ID	NAME	UID
    3	A	4
    3	A	8
 */
idxExample.getStringColumn("NAME").map((value -> value + "_idx_example"));
idxExample.print();
/*
    ID	NAME	UID
    3	A_idx_example	4
    3	A_idx_example	8
 */

dataFrame.joinInner(idxExample,"UID").print();
/*
    ID.A    NAME.A	UID	ID.B	NAME.B
    3   A   4	3   A_idx_example
    3   A   8	3   A_idx_example
 */

Usage

Load DataFrame from a CSV file. Column types will be detected automatically. (String, Double, Integer, Boolean)

File file = new File("person.csv");
DataFrame users = DataFrame.fromCSV(file, ';', true);

Load a DataFrame with custom options and predefined column types.

File file = new File("person.csv");

CSVReader csvReader = CSVReaderBuilder.create()
                .containsHeader(true)
                .withHeaderPrefix("#")
                .withSeparator(';')
                .setColumnType("person_id", Integer.class)
                .setColumnType("first_name", String.class)
                .setColumnType("last_name", String.class)
                .setColumnType("age", Integer.class).build();

DataFrame users = DataFrame.load(file,csvReader);
        
System.out.println(users.getHeader());
for(DataRow row : users)
{
    System.out.println(row);
}

DataFrames can be written using default formats (CSV or TSV). Additionally it is possible to set different options when writing DataFrames.

dataFrame.write(file); // TSV per default

dataFrame.write(file, FileFormat.CSV);

dataFrame.writeCSV(file, ';',true); // use ';' as separator and include the header


CSVWriter csvWriter = CSVWriterBuilder.create()
                .withHeader(true)
                .withSeparator('\t')
                .useGzip(true).build();
dataFrame.write(file, csvWriter);

If a meta file is written for a DataFrame, it can simply be loaded by pointing at the DataFrame file. The meta has the same path as the DataFrame file with '.dfm' extension

File file = new File("dataFrame.csv");
dataFrame.write(file);
DataFrame loadedDataFrame = DataFrame.load(file);

Values within a DataFrame are accessed using DataRow objects. If the source DataFrame changes after a DataRow object is created, the DataRow is invalidated and can no longer be accessed.

for(DataRow row : dataFrame){
    ... = row.getInteger("id");
}


DataRows rows = dataFrame.getRows();

//returns the value within the id column in the first row
rows.get(0).getInteger("id");

dataFrame.sort("name");

//The DataFrame was sorted after the DataRows were obtained.
//The first row can now differ. 
//To avoid these effects, a RuntimeException is thrown
//if a row that was created before the DataFrame is altered is accessed

rows.get(0).getInteger("id"); //throws exception

rows = dataFrame.getRows();

//rows is now valid again and rows can be accessed
rows.get(0).getInteger("id");


//DataRows can be converted to a new independent DataFrame.
//changes to the original DataFrame have no effect on the new DataFrame.
DataFrame dataFrame2 = rows.toDataFrame();

dataFrame.sort("id");

dataFrame.getRow(0).getInteger("id"); // no exception

DataRows can be used to change values within a DataFrame

DataRows rows = dataFrame.getRows();

//sets the value in the second row in the name column to 'A'
rows.get(1).set("name","A");

//sets the value in the second row in the first column to 'A'
rows.get(1).set(0,"A");

Use indices for fast row access.

//set the primary key of a data frame
users.setPrimaryKey("person_id");
DataRow firstUser = users.selectByPrimaryKey(1)

//add a multi-column index

users.addIndex("name-address","last_name","address");

//returns rows containing all users with the last name Smith in the Example-Street 15
DataRows user = users.selectRowsByIndex("name-address","Smith","Example-Street 15")

It is possible to define and use other index types. The following example shows interval indices. This index type requires two number columns, start and end. The index can then be used to find rows where start and end value overlap with a region specified by two number values. It is also possible to find rows where the region defined by start and end contains a certain value.

 DataFrame dataFrame = DataFrame.create()
                .addStringColumn("name")
                .addIntegerColumn("start")
                .addIntegerColumn("end");
dataFrame.append("A",1,3);
dataFrame.append("B",2,3);
dataFrame.append("C",4,5);
dataFrame.append("D",6,7);
IntervalIndex index = new IntervalIndex("idx",
    dataFrame.getNumberColumn("start"),
    dataFrame.getNumberColumn("end"));
dataFrame.addIndex(index);

//returns a new dataframe containing all rows where (start,end) overlaps with (1,3)
// -> A, B
DataFrame df = dataFrame.selectByIndex("idx",1,3);

//rows where (start,end) overlaps with (4,5)
// -> C
dataFrame.selectByIndex("idx",4,5);

//rows where (start,end) contains 2.5
// -> A, B
dataFrame.selectByIndex("idx",2.5);

Perform operations on columns.

//max value of column "person_id"
users.getIntegerColumn("age").max();

DoubleColumn dc1 = ...;
DoubleColumn dc2 = ...;
//add one column to another
dc1.add(dc2);

//multiply each value with 2
dc1.multiply(2);

//Use MapFunction to convert all values in a row
dataFrame.getIntegerColumn("age").map(value -> value + 2);

Filter and select rows using predicates.

//keep users with age between 18 and 60
users.filter(FilterPredicate.btwn("age",18,60));

//find all users with age > 18 an first_name == "Max"
DataFrame foundUsers = users.select(
      FilterPredicate.and(
              FilterPredicate.gt("age",18),
              FilterPredicate.eq("first_name","Max")
));

Create and compile predicates from strings
Available value comparison operations:
==, !=, <, <=, >, >=, ~= (regex)
Available predicate operations:
&&, ||, NOR, XOR, !(predicate) (negates the predicate)

//find all users that are younger than 18 or older users with first_name == "Max"
DataFrame foundUsers = users.select("(age > 18 && first_name == 'Max') OR (age < 18)");

//Boolean column filter
//find all users that are older than 18 or the selected column is true
DataFrame foundUsers = users.select("(age > 18) OR selected");

//find all users the selected column is false
DataFrame notSelected = users.select("!selected");

//compare tow columns
//returns all rows where col1 equals col2
FilterPredicate.eqColumn("col1","col2")

//Get all users where first name does not equal the last name
//Column comparisons require '.' as prefix
DataFrame dataframe = users.select(".first_name != .last_name");


// regex filter
//find all users where the street begins with A, B or C followed by lowercase characters
DataFrame foundUsers = users.select("street ~= /[ABC][a-z]+/");

Sort rows by one or more columns.

//sort by column "person_id" (ascending)
users.sort("person_id", SortColumn.Direction.Ascending);

//sort by "last_name" und "first_name"
users.sort(
   new SortColumn("last_name", SortColumn.Direction.Descending),
   new SortColumn("first_name", SortColumn.Direction.Descending)
);

Group dataframes using one or more columns.

//group by "age" and "first_name"
DataGrouping grouping = users.groupBy("age","first_name");


//iterate through all found groups
for(DataRow row : grouping){
    DataGroup group = grouping.getGroup(row.getIndex());
    //print the group description (group values)
    System.out.println(group.getGroupDescription());
   
    //iterate through all rows from the respective groups
    for(DataRow groupRow : group){
            System.out.println(row);
     }
}

Direct access to groups in a grouping.

//group by "age" and "first_name"
DataGrouping grouping = users.groupBy("age","first_name");

//Get all users that are called John and are 18 years old
DataGroup group = grouping.findByGroupValues(18, "John");

It is possible to apply aggregation function to data groups. In this example, a column "max_age" is added to the grouping DataFrame. This column contains the maximum value of the "age" column of the respective rows in the original DataFrame. The resulting grouping DataFrame contains two columns: "first_name" and "max_age"

DataGrouping grouping = users.groupBy("first_name").agg("max_age", Aggregate.max("age"));

//some other aggregate functions
grouping.agg("mean_age", Aggregate.mean("age"));

grouping.agg("median_age", Aggregate.median("age"));

grouping.agg("25quantile_age", Aggregate.quantile("age", 0.25));

grouping.agg("older_30_count", Aggregate.filterCount("age > 30"));

//custom aggregate function
grouping.agg("org_percentage", group -> group.size() / users.size());

Join two dataframes using one or more columns.

//join DataFrames users and visits by columns users.person_id == visits.person_id
DataFrame visitors = users.joinLeft(visits,"person_id");

//join DataFrames users and orders by columns users.person_id == orders.customer_id
DataFrame userOrders = users.joinInner(orders,new JoinColumn("person_id","customer_id"));
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].