All Projects → IntelPython → Sdc

IntelPython / Sdc

Licence: bsd-2-clause
Intel® Scalable Dataframe Compiler for Pandas*

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Sdc

Data Science Ipython Notebooks
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
Stars: ✭ 22,048 (+3439%)
Mutual labels:  pandas, big-data, numpy
Uproot3
ROOT I/O in pure Python and NumPy.
Stars: ✭ 312 (-49.92%)
Mutual labels:  big-data, numpy
Ai Learn
人工智能学习路线图,整理近200个实战案例与项目,免费提供配套教材,零基础入门,就业实战!包括:Python,数学,机器学习,数据分析,深度学习,计算机视觉,自然语言处理,PyTorch tensorflow machine-learning,deep-learning data-analysis data-mining mathematics data-science artificial-intelligence python tensorflow tensorflow2 caffe keras pytorch algorithm numpy pandas matplotlib seaborn nlp cv等热门领域
Stars: ✭ 4,387 (+604.17%)
Mutual labels:  pandas, numpy
Docker Django
A complete docker package for deploying django which is easy to understand and deploy anywhere.
Stars: ✭ 378 (-39.33%)
Mutual labels:  pandas, numpy
AIPortfolio
Use AI to generate a optimized stock portfolio
Stars: ✭ 28 (-95.51%)
Mutual labels:  numpy, pandas
bigstatsr
R package for statistical tools with big matrices stored on disk.
Stars: ✭ 139 (-77.69%)
Mutual labels:  big-data, parallel-computing
Deep Learning Wizard
Open source guides/codes for mastering deep learning to deploying deep learning in production in PyTorch, Python, C++ and more.
Stars: ✭ 343 (-44.94%)
Mutual labels:  pandas, numpy
visions
Type System for Data Analysis in Python
Stars: ✭ 136 (-78.17%)
Mutual labels:  numpy, pandas
Pynamical
Pynamical is a Python package for modeling and visualizing discrete nonlinear dynamical systems, chaos, and fractals.
Stars: ✭ 458 (-26.48%)
Mutual labels:  pandas, numpy
Pytablewriter
pytablewriter is a Python library to write a table in various formats: CSV / Elasticsearch / HTML / JavaScript / JSON / LaTeX / LDJSON / LTSV / Markdown / MediaWiki / NumPy / Excel / Pandas / Python / reStructuredText / SQLite / TOML / TSV.
Stars: ✭ 422 (-32.26%)
Mutual labels:  pandas, numpy
support-tickets-classification
This case study shows how to create a model for text analysis and classification and deploy it as a web service in Azure cloud in order to automatically classify support tickets. This project is a proof of concept made by Microsoft (Commercial Software Engineering team) in collaboration with Endava http://endava.com/en
Stars: ✭ 142 (-77.21%)
Mutual labels:  numpy, pandas
Pandapy
PandaPy has the speed of NumPy and the usability of Pandas 10x to 50x faster (by @firmai)
Stars: ✭ 474 (-23.92%)
Mutual labels:  pandas, numpy
The-Data-Visualization-Workshop
A New, Interactive Approach to Learning Data Visualization
Stars: ✭ 59 (-90.53%)
Mutual labels:  numpy, pandas
Data Science Hacks
Data Science Hacks consists of tips, tricks to help you become a better data scientist. Data science hacks are for all - beginner to advanced. Data science hacks consist of python, jupyter notebook, pandas hacks and so on.
Stars: ✭ 273 (-56.18%)
Mutual labels:  pandas, numpy
Python-Matematica
Explorando aspectos fundamentais da matemática com Python e Jupyter
Stars: ✭ 41 (-93.42%)
Mutual labels:  numpy, pandas
Python for data analysis 2nd chinese version
《利用Python进行数据分析·第2版》
Stars: ✭ 4,049 (+549.92%)
Mutual labels:  pandas, numpy
jun
JUN - python pandas, plotly, seaborn support & dataframes manipulation over erlang
Stars: ✭ 21 (-96.63%)
Mutual labels:  numpy, pandas
data-analysis-using-python
Data Analysis Using Python: A Beginner’s Guide Featuring NYC Open Data
Stars: ✭ 81 (-87%)
Mutual labels:  numpy, pandas
Stats Maths With Python
General statistics, mathematical programming, and numerical/scientific computing scripts and notebooks in Python
Stars: ✭ 381 (-38.84%)
Mutual labels:  pandas, numpy
Mexican Government Report
Text Mining on the 2019 Mexican Government Report, covering from extracting text from a PDF file to plotting the results.
Stars: ✭ 473 (-24.08%)
Mutual labels:  pandas, numpy

Intel® Scalable Dataframe Compiler


.. image:: https://travis-ci.com/IntelPython/sdc.svg?branch=master :target: https://travis-ci.com/IntelPython/sdc :alt: Travis CI

.. image:: https://dev.azure.com/IntelPython/HPAT/_apis/build/status/IntelPython.sdc?branchName=master :target: https://dev.azure.com/IntelPython/HPAT/_build/latest?definitionId=2&branchName=master :alt: Azure Pipelines

.. _Numba*: https://numba.pydata.org/ .. _Pandas*: https://pandas.pydata.org/ .. _Sphinx*: https://www.sphinx-doc.org/

Numba* Extension For Pandas* Operations Compilation ###################################################

Intel® Scalable Dataframe Compiler (Intel® SDC) is an extension of Numba*_ that enables compilation of Pandas*_ operations. It automatically vectorizes and parallelizes the code by leveraging modern hardware instructions and by utilizing all available cores.

Intel® SDC documentation can be found here <https://intelpython.github.io/sdc-doc/>_.

.. note:: For maximum performance and stability, please use numba from intel/label/beta channel.

Installing Binary Packages (conda and wheel)

Intel® SDC is available on the Anaconda Cloud intel/label/beta channel. Distribution includes Intel® SDC for Python 3.6 and Python 3.7 for Windows and Linux platforms.

Intel® SDC conda package can be installed using the steps below::

> conda create -n sdc-env python=<3.7 or 3.6> pyarrow=2.0.0 pandas=1.2.0 -c anaconda -c conda-forge
> conda activate sdc-env
> conda install sdc -c intel/label/beta -c intel -c defaults -c conda-forge --override-channels

Intel® SDC wheel package can be installed using the steps below::

> conda create -n sdc-env python=<3.7 or 3.6> pip pyarrow=2.0.0 pandas=1.2.0 -c anaconda -c conda-forge
> conda activate sdc-env
> pip install --index-url https://pypi.anaconda.org/intel/label/beta/simple --extra-index-url https://pypi.anaconda.org/intel/simple --extra-index-url https://pypi.org/simple sdc

Building Intel® SDC from Source on Linux

We use Anaconda <https://www.anaconda.com/download/>_ distribution of Python for setting up Intel® SDC build environment.

If you do not have conda, we recommend using Miniconda3::

wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh
chmod +x miniconda.sh
./miniconda.sh -b
export PATH=$HOME/miniconda3/bin:$PATH

.. note:: For maximum performance and stability, please use numba from intel/label/beta channel.

It is possible to build Intel® SDC via conda-build or setuptools. Follow one of the cases below to install Intel® SDC and its dependencies on Linux.

Building on Linux with conda-build

::

    PYVER=<3.6 or 3.7>
    NUMPYVER=<1.16 or 1.17>
    conda create -n conda-build-env python=$PYVER conda-build
    source activate conda-build-env
    git clone https://github.com/IntelPython/sdc.git
    cd sdc
    conda build --python $PYVER --numpy $NUMPYVER --output-folder=<output_folder> -c intel/label/beta -c defaults -c intel -c conda-forge --override-channels conda-recipe

Building on Linux with setuptools
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
::

    export PYVER=<3.6 or 3.7>
    export NUMPYVER=<1.16 or 1.17>
    conda create -n sdc-env -q -y -c intel/label/beta -c defaults -c intel -c conda-forge python=$PYVER numpy=$NUMPYVER tbb-devel tbb4py numba=0.52 pandas=1.2.0 pyarrow=2.0.0 gcc_linux-64 gxx_linux-64
    source activate sdc-env
    git clone https://github.com/IntelPython/sdc.git
    cd sdc
    python setup.py install

In case of issues, reinstalling in a new conda environment is recommended.

Building Intel® SDC from Source on Windows
------------------------------------------

Building Intel® SDC on Windows requires Build Tools for Visual Studio 2019 (with component MSVC v140 - VS 2015 C++ build tools (v14.00)):

* Install `Build Tools for Visual Studio 2019 (with component MSVC v140 - VS 2015 C++ build tools (v14.00)) <https://visualstudio.microsoft.com/downloads/#build-tools-for-visual-studio-2019>`_.
* Install `Miniconda for Windows <https://repo.continuum.io/miniconda/Miniconda3-latest-Windows-x86_64.exe>`_.
* Start 'Anaconda prompt'

It is possible to build Intel® SDC via conda-build or setuptools. Follow one of the
cases below to install Intel® SDC and its dependencies on Windows.

Building on Windows with conda-build

::

set PYVER=<3.6 or 3.7>
set NUMPYVER=<1.16 or 1.17>
conda create -n conda-build-env -q -y python=%PYVER% conda-build conda-verify vc vs2015_runtime vs2015_win-64
conda activate conda-build-env
git clone https://github.com/IntelPython/sdc.git
cd sdc
conda build --python %PYVER% --numpy %NUMPYVER% --output-folder=<output_folder> -c intel/label/beta -c defaults -c intel -c conda-forge --override-channels conda-recipe

Building on Windows with setuptools

::

    set PYVER=<3.6 or 3.7>
    set NUMPYVER=<1.16 or 1.17>
    conda create -n sdc-env -c intel/label/beta -c defaults -c intel -c conda-forge python=%PYVER% numpy=%NUMPYVER% tbb-devel tbb4py numba=0.52 pandas=1.2.0 pyarrow=2.0.0
    conda activate sdc-env
    set INCLUDE=%INCLUDE%;%CONDA_PREFIX%\Library\include
    set LIB=%LIB%;%CONDA_PREFIX%\Library\lib
    git clone https://github.com/IntelPython/sdc.git
    cd sdc
    python setup.py install

.. "C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\vcvarsall.bat" amd64

Troubleshooting Windows Build
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

* If the ``cl`` compiler throws the error fatal ``error LNK1158: cannot run 'rc.exe'``,
  add Windows Kits to your PATH (e.g. ``C:\Program Files (x86)\Windows Kits\8.0\bin\x86``).
* Some errors can be mitigated by ``set DISTUTILS_USE_SDK=1``.
* For setting up Visual Studio, one might need go to registry at
  ``HKEY_LOCAL_MACHINE\SOFTWARE\WOW6432Node\Microsoft\VisualStudio\SxS\VS7``,
  and add a string value named ``14.0`` whose data is ``C:\Program Files (x86)\Microsoft Visual Studio 14.0\``.
* Sometimes if the conda version or visual studio version being used are not latest then
  building Intel® SDC can throw some vague error about a keyword used in a file.
  So make sure you are using the latest versions.

Building documentation
----------------------

Building Intel® SDC User's Guide documentation requires pre-installed Intel® SDC package
along with compatible `Pandas*`_ version as well as `Sphinx*`_ 2.2.1 or later.

Intel® SDC documentation includes Intel® SDC examples output which is pasted to functions description in the API Reference.

Use ``pip`` to install `Sphinx*`_ and extensions:
::

    pip install sphinx sphinxcontrib-programoutput

Currently the build precedure is based on ``make`` located at ``./sdc/docs/`` folder.
While it is not generally required we recommended that you clean up the system from previous documentaiton build by running:
::

    make clean

To build HTML documentation you will need to run:
::

    make html

The built documentation will be located in the ``./sdc/docs/build/html`` directory.
To preview the documentation open ``index.html`` file.


More information about building and adding documentation can be found `here <docs/README.rst>`_.


Running unit tests
------------------
::

    python sdc/tests/gen_test_data.py
    python -m unittest

References
##########

Intel® SDC follows ideas and initial code base of High-Performance Analytics Toolkit (HPAT). These academic papers describe ideas and methods behind HPAT:

- `HPAT paper at ICS'17 <http://dl.acm.org/citation.cfm?id=3079099>`_
- `HPAT at HotOS'17 <http://dl.acm.org/citation.cfm?id=3103004>`_
- `HiFrames on arxiv <https://arxiv.org/abs/1704.02341>`_
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].