All Projects → mratsim → Arch-Data-Science

mratsim / Arch-Data-Science

Licence: other
Archlinux PKGBUILDs for Data Science, Machine Learning, Deep Learning, NLP and Computer Vision

Programming Languages

shell
77523 projects

Projects that are alternatives of or similar to Arch-Data-Science

Mars
Mars is a tensor-based unified framework for large-scale data computation which scales numpy, pandas, scikit-learn and Python functions.
Stars: ✭ 2,308 (+2408.7%)
Mutual labels:  scikit-learn, pandas, xgboost, lightgbm
ai-deployment
关注AI模型上线、模型部署
Stars: ✭ 149 (+61.96%)
Mutual labels:  mxnet, scikit-learn, xgboost, lightgbm
datascienv
datascienv is package that helps you to setup your environment in single line of code with all dependency and it is also include pyforest that provide single line of import all required ml libraries
Stars: ✭ 53 (-42.39%)
Mutual labels:  scikit-learn, pandas, xgboost, lightgbm
mloperator
Machine Learning Operator & Controller for Kubernetes
Stars: ✭ 85 (-7.61%)
Mutual labels:  mxnet, scikit-learn, xgboost
Openscoring
REST web service for the true real-time scoring (<1 ms) of Scikit-Learn, R and Apache Spark models
Stars: ✭ 536 (+482.61%)
Mutual labels:  scikit-learn, xgboost, lightgbm
Hyperparameter hunter
Easy hyperparameter optimization and automatic result saving across machine learning algorithms and libraries
Stars: ✭ 648 (+604.35%)
Mutual labels:  scikit-learn, xgboost, lightgbm
Auto ml
[UNMAINTAINED] Automated machine learning for analytics & production
Stars: ✭ 1,559 (+1594.57%)
Mutual labels:  scikit-learn, xgboost, lightgbm
Nyoka
Nyoka is a Python library to export ML/DL models into PMML (PMML 4.4.1 Standard).
Stars: ✭ 127 (+38.04%)
Mutual labels:  scikit-learn, xgboost, lightgbm
Mljar Supervised
Automated Machine Learning Pipeline with Feature Engineering and Hyper-Parameters Tuning 🚀
Stars: ✭ 961 (+944.57%)
Mutual labels:  scikit-learn, xgboost, lightgbm
Practical Machine Learning With Python
Master the essential skills needed to recognize and solve complex real-world problems with Machine Learning and Deep Learning by leveraging the highly popular Python Machine Learning Eco-system.
Stars: ✭ 1,868 (+1930.43%)
Mutual labels:  scikit-learn, pandas, spacy
Eli5
A library for debugging/inspecting machine learning classifiers and explaining their predictions
Stars: ✭ 2,477 (+2592.39%)
Mutual labels:  scikit-learn, xgboost, lightgbm
Eland
Python Client and Toolkit for DataFrames, Big Data, Machine Learning and ETL in Elasticsearch
Stars: ✭ 235 (+155.43%)
Mutual labels:  scikit-learn, pandas, lightgbm
Adam qas
ADAM - A Question Answering System. Inspired from IBM Watson
Stars: ✭ 330 (+258.7%)
Mutual labels:  scikit-learn, pandas, spacy
Lambda Packs
Precompiled packages for AWS Lambda
Stars: ✭ 997 (+983.7%)
Mutual labels:  pandas, spacy, lightgbm
Machine Learning Alpine
Alpine Container for Machine Learning
Stars: ✭ 30 (-67.39%)
Mutual labels:  scikit-learn, pandas, xgboost
M2cgen
Transform ML models into a native code (Java, C, Python, Go, JavaScript, Visual Basic, C#, R, PowerShell, PHP, Dart, Haskell, Ruby, F#, Rust) with zero dependencies
Stars: ✭ 1,962 (+2032.61%)
Mutual labels:  scikit-learn, xgboost, lightgbm
AutoTabular
Automatic machine learning for tabular data. ⚡🔥⚡
Stars: ✭ 51 (-44.57%)
Mutual labels:  scikit-learn, xgboost, lightgbm
monolish
monolish: MONOlithic LInear equation Solvers for Highly-parallel architecture
Stars: ✭ 166 (+80.43%)
Mutual labels:  cuda, mkl
converse
Conversational text Analysis using various NLP techniques
Stars: ✭ 147 (+59.78%)
Mutual labels:  scikit-learn, spacy
skippa
SciKIt-learn Pipeline in PAndas
Stars: ✭ 33 (-64.13%)
Mutual labels:  scikit-learn, pandas

Data Science packages for Archlinux

Welcome to my repo to build Data Science, Machine Learning, Computer Vision, Natural language Processing and Deep Learning packages from source.

Performance considerations

My aim is to squeeze the maximum performance for my current configuration (Skylake-X i9-9980XE + 2x RTX 2080Ti) so:

  • All packages are build with -O3 -march=native if the package ignores /etc/makepkg.conf config.
  • I do not use fast-math except if it's the default upstream (example opencv). You might want to enable it for GCC and NVCC (Nvidia compiler)
  • All CUDA packages are build with CUDA 10.1, cuDNN 7.6 and Compute capabilities 7.5 (Turing).
  • Pytorch is build
    • with MAGMA support. Magma is a linear algebra library for heterogeneous computing (CPU + GPU hybridization)
    • with MKLDNN support. MKLDNN is a optimized x86 backend for deep learning.
  • BLAS library is MKL except for Tensorflow (Eigen).
  • Parallel library is Intel OpenMP except for Tensorflow (Eigen), PyTorch (because linking is buggy) and OpenCV (Intel TBB, because linking is buggy as well)
  • OpenCV is further optimized with Intel IPP (Integrated Performance Primitives)
  • Nvidia libraries (CuBLAS, CuFFT, CuSPARSE ...) are used wherever possible

If running in a LXC container, bazel (necessary to build Tensorflow), must be build with its auto-sandboxing disabled.

Caveats

Please note that current mxnet and lightgbm packages are working but must be improved: they put their libraries in /usr/mxnet and /usr/lightgbm Packages included are those not available by default in Archlinux AUR or that needed substantial modifications. So check Archlinux AUR for standard packages like Numpy or Pandas.

Suggestions

Beyond the packages provided here, here are some useful tools:

  • CSV manipulation from command-line
    • xsv - The fastest, multi-processing CSV library. Written in Rust.
  • Geographical data (combined them with a clustering algorithm)
    • Geopy
    • Shapely
  • GPU computation
  • Monitoring
    • htop - Monitor CPU, RAM, load, kill programs
    • nvtop - Monitor Nvidia GPU
    • nvidia-smi - Monitor Nvidia GPU (included with nvidia driver)
      1. nvidia-smi -q -g 0 -d TEMPERATURE,POWER,CLOCK,MEMORY -l #Flags can be UTILIZATION, PERFORMANCE (on Tesla) ...
      2. nvidia-smi dmon
      3. nvidia-smi -l 1
  • Rapid prototyping, Research
    • Jupyter - Code Python, R, Haskell, Julia with direct feedback in your browser
    • jupyter_contrib_nbextensions - Extensions for jupyter (commenting code, ...)
  • Text
    • gensim - word2vec
  • Time data
    • Workalendar - Business calendar for multiple countries
  • Video
    • Vapoursynth - Frameserver for video pre-processing
  • Visualization
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].