All Projects → DataHaskell → dh-core

DataHaskell / dh-core

Licence: other
Functional data science

Programming Languages

haskell
3896 projects

Projects that are alternatives of or similar to dh-core

isarn-sketches-spark
Routines and data structures for using isarn-sketches idiomatically in Apache Spark
Stars: ✭ 28 (-77.24%)
Mutual labels:  datasets, dataframes
heidi
heidi : tidy data in Haskell
Stars: ✭ 24 (-80.49%)
Mutual labels:  data-mining, dataframes
FinEtools.jl
Finite Element tools in Julia
Stars: ✭ 126 (+2.44%)
Mutual labels:  numerical-methods
ElasticBatch
Elasticsearch tool for easily collecting and batch inserting Python data and pandas DataFrames
Stars: ✭ 21 (-82.93%)
Mutual labels:  dataframes
data.world-py
Python package for data.world
Stars: ✭ 98 (-20.33%)
Mutual labels:  datasets
EasyMiner
Easy association rule mining and classification on the web
Stars: ✭ 14 (-88.62%)
Mutual labels:  data-mining
geodaData
Data package for accessing GeoDa datasets using R
Stars: ✭ 15 (-87.8%)
Mutual labels:  datasets
PaperWeeklyAI
📚「@MaiweiAI」Studying papers in the fields of computer vision, NLP, and machine learning algorithms every week.
Stars: ✭ 50 (-59.35%)
Mutual labels:  data-mining
metadat
Meta-analytic datasets for R
Stars: ✭ 21 (-82.93%)
Mutual labels:  datasets
Heart disease prediction
Heart Disease prediction using 5 algorithms
Stars: ✭ 43 (-65.04%)
Mutual labels:  data-mining
poisson-image-blending
🎨 Web-based implementation of the poisson image blending in HTML5 Canvas / JavaScript
Stars: ✭ 22 (-82.11%)
Mutual labels:  numerical-methods
perke
A keyphrase extractor for Persian
Stars: ✭ 60 (-51.22%)
Mutual labels:  data-mining
clothing-detection-ecommerce-dataset
Clothing detection dataset
Stars: ✭ 43 (-65.04%)
Mutual labels:  datasets
CompBioDatasetsForMachineLearning
A Curated List of Computational Biology Datasets Suitable for Machine Learning
Stars: ✭ 90 (-26.83%)
Mutual labels:  datasets
industrial-ml-datasets
A curated list of datasets, publically available for machine learning research in the area of manufacturing
Stars: ✭ 45 (-63.41%)
Mutual labels:  datasets
humanflow2
Official repository of Learning Multi-Human Optical Flow (IJCV 2019)
Stars: ✭ 37 (-69.92%)
Mutual labels:  datasets
BenchmarksPythonJuliaAndCo
Benchmark(s) of numerical programs with Python (and Scipy, Pythran, Numba), Julia and C++.
Stars: ✭ 19 (-84.55%)
Mutual labels:  numerical-methods
delitos-caba
🚓 Crime dataset for the City of Buenos Aires, Argentina
Stars: ✭ 44 (-64.23%)
Mutual labels:  datasets
AsFem
A Simple Finite Element Method program (AsFem)
Stars: ✭ 108 (-12.2%)
Mutual labels:  numerical-methods
MetQy
Repository for R package MetQy (read related publication here: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6247936/)
Stars: ✭ 17 (-86.18%)
Mutual labels:  data-mining

DataHaskell/dh-core

Build Status

DataHaskell core project monorepo

Aims

This project aims to provide a native, end-to-end data science toolkit in Haskell. To achieve this, many types of experience are valuable; engineers, scientists, programmers, visualization experts, data journalists are all welcome to join the discussions and contribute. Not only this should be a working piece of software, but it should be intuitive and pleasant to use. All contributions, big or small, are very welcome and will be acknowledged.

Architecture

One single repository allows us to experiment with interfaces and move code around much more freely than many single-purpose repositories. Also, it makes it more convenient to track and visualize progress.

This is the directory structure of the project; the main project lives in the dh-core subdirectory:

dh-core/
  dh-core/              
  dh-core-accelerate/
  ....

Contributed packages

A number of authors and maintainers agreed to move ownership of their repositories under the dh-core umbrella. In some cases, these packages were already published on Hackage and cannot simply disappear from there, nor can this new line of development break downstream packages.

For this reason, contributed packages will appear as subdirectories to the main dh-core project, and will need to retain their original .cabal file.

The stack tool can take care of multi-package projects; its packages stanza in the stack.yaml file has only its directory as a default, but can contain a list of paths to other Cabal projects; e.g. in our case it could look like:

packages:
- .
- analyze/
- datasets/

Packages that are listed on Hackage already must be added here as distinct sub-directories. Once the migration is complete (PRs merged etc.), add the project to this table :

Package Description Original author(s) First version after merge
analyze Data analysis and manipulation library Eric Conlon 0.2.0
datasets A collection of ready-to-use datasets Tom Nielsen 0.2.6
dense-linear-algebra Fast, native dense linear algebra primitives Brian O'Sullivan, Alexey Khudyakov 0.1.0 (a)

(a) : To be updated

NB: Remember to bump version numbers and change web links accordingly when moving in contributed packages.

Contributing

  1. Open an issue (https://github.com/DataHaskell/dh-core/issues) with a description of what you want to work on (if it's not already open)
  2. Assign or add yourself to the issue contributors
  3. Pull from dh-core:master, start a git branch, add code
  4. Add tests
  5. Update the changelog, describing briefly your changes and their possible effects
  • If you're working on a contributed package (see next section), increase the version number in the Cabal file accordingly

  • If you bumped version numbers, make sure these are updated accordingly in the Travis CI .yaml file

  1. Send a pull request with your branch, referencing the issue
  2. dh-core admins : merge only after another admin has reviewed and approved the PR

GHC and Stackage compatibility

Tested against :

  • Stackage nightly-2019-02-27 (GHC 8.6.3)

Development information and guidelines

Dependencies

We use the stack build tool.

Some systems /might/ need binaries and headers for these additional libraries:

  • zlib
  • curl

(however if you're unsure, first try building with your current configuration).

Nix users should set nix.enable to true in the dh-core/dh-core/stack.yaml file.

Building instructions

In the dh-core/dh-core subdirectory, run

$ stack build

and this will re-build the main project and the contributed packages.

While developing this stack command can come in handy : it will trigger a re-build and run the tests every time a file in the project is modified:

$ stack build --test --ghc-options -Wall --file-watch

Testing

Example :

$ stack test core:doctest core:spec

The <project>:<test_suite> pairs determine which tests will be run.

Continuous Integration (TravisCI)

Travis builds dh-core and its hosted projects every time a commit is pushed to Github. Currently the dh-core/.travis.yml script uses the following command to install the GHC compiler, build the project and subprojects with stack, run the tests and build the Haddock documentation HTMLs:

- stack $ARGS --no-terminal --install-ghc test core:spec core:doctest dense-linear-algebra:spec --haddock

Visualizing the dependency tree of a package

stack can produce a .dot file with the dependency graph of a Haskell project, which can then be rendered by the dot tool (from the graphviz suite). For example, in the following command the output of stack dot will be piped into dot, which will produce a SVG file called deps.svg:

stack dot --external --no-include-base --prune rts,ghc-prim,ghc-boot-th,template-haskell,transformers,containers,deepseq,bytestring,time,primitive,vector,text,hashable | dot -Tsvg > deps.svg

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].