All Projects → zero323 → Pyspark Stubs

zero323 / Pyspark Stubs

Licence: apache-2.0
Apache (Py)Spark type annotations (stub files).

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Pyspark Stubs

Sparkora
Powerful rapid automatic EDA and feature engineering library with a very easy to use API 🌟
Stars: ✭ 51 (-47.96%)
Mutual labels:  apache-spark, pyspark
pyspark-cheatsheet
PySpark Cheat Sheet - example code to help you learn PySpark and develop apps faster
Stars: ✭ 115 (+17.35%)
Mutual labels:  apache-spark, pyspark
jupyterlab-sparkmonitor
JupyterLab extension that enables monitoring launched Apache Spark jobs from within a notebook
Stars: ✭ 78 (-20.41%)
Mutual labels:  apache-spark, pyspark
learn-by-examples
Real-world Spark pipelines examples
Stars: ✭ 84 (-14.29%)
Mutual labels:  apache-spark, pyspark
Spark Gotchas
Spark Gotchas. A subjective compilation of the Apache Spark tips and tricks
Stars: ✭ 308 (+214.29%)
Mutual labels:  apache-spark, pyspark
spark-twitter-sentiment-analysis
Sentiment Analysis of a Twitter Topic with Spark Structured Streaming
Stars: ✭ 55 (-43.88%)
Mutual labels:  apache-spark, pyspark
Spark-for-data-engineers
Apache Spark for data engineers
Stars: ✭ 22 (-77.55%)
Mutual labels:  apache-spark, pyspark
Azure Cosmosdb Spark
Apache Spark Connector for Azure Cosmos DB
Stars: ✭ 165 (+68.37%)
Mutual labels:  apache-spark, pyspark
mmtf-workshop-2018
Structural Bioinformatics Training Workshop & Hackathon 2018
Stars: ✭ 50 (-48.98%)
Mutual labels:  apache-spark, pyspark
pyspark-asyncactions
Asynchronous actions for PySpark
Stars: ✭ 30 (-69.39%)
Mutual labels:  apache-spark, pyspark
isarn-sketches-spark
Routines and data structures for using isarn-sketches idiomatically in Apache Spark
Stars: ✭ 28 (-71.43%)
Mutual labels:  apache-spark, pyspark
Live log analyzer spark
Spark Application for analysis of Apache Access logs and detect anamolies! Along with Medium Article.
Stars: ✭ 14 (-85.71%)
Mutual labels:  apache-spark, pyspark
spark3D
Spark extension for processing large-scale 3D data sets: Astrophysics, High Energy Physics, Meteorology, …
Stars: ✭ 23 (-76.53%)
Mutual labels:  apache-spark, pyspark
datalake-etl-pipeline
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Stars: ✭ 39 (-60.2%)
Mutual labels:  apache-spark, pyspark
Quinn
pyspark methods to enhance developer productivity 📣 👯 🎉
Stars: ✭ 217 (+121.43%)
Mutual labels:  apache-spark, pyspark
SynapseML
Simple and Distributed Machine Learning
Stars: ✭ 3,355 (+3323.47%)
Mutual labels:  apache-spark, pyspark
Mmlspark
Simple and Distributed Machine Learning
Stars: ✭ 2,899 (+2858.16%)
Mutual labels:  pyspark, apache-spark
Spark With Python
Fundamentals of Spark with Python (using PySpark), code examples
Stars: ✭ 150 (+53.06%)
Mutual labels:  apache-spark, pyspark
aut
The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Stars: ✭ 111 (+13.27%)
Mutual labels:  apache-spark, pyspark
Pyspark Boilerplate
A boilerplate for writing PySpark Jobs
Stars: ✭ 318 (+224.49%)
Mutual labels:  apache-spark, pyspark

PySpark Stubs

|Build Status| |PyPI version| |Conda Forge version|

A collection of the Apache Spark stub files <https://www.python.org/dev/peps/pep-0484/#stub-files>. These files were generated by stubgen <https://github.com/python/mypy/blob/master/mypy/stubgen.py> and manually edited to include accurate type hints.

Tests and configuration files have been originally contributed to the Typeshed project <https://github.com/python/typeshed/>. Please refer to its contributors list <https://github.com/python/typeshed/graphs/contributors> and license <https://github.com/python/typeshed/blob/master/LICENSE>__ for details.

Important

This project has been merged <https://github.com/apache/spark/commit/31a16fbb405a19dc3eb732347e0e1f873b16971d#diff-23eeeb4347bdd26bfc6b7ee9a3b755dd>_ with the main Apache Spark repository (SPARK-32714 <https://issues.apache.org/jira/browse/SPARK-32714>_). All further development for Spark 3.1 and onwards will be continued there.

For Spark 2.4 and 3.0, development of this package will be continued, until their official deprecation.

  • If your problem is specific to Spark 2.3 and 3.0 feel free to create an issue or open pull requests here.
  • Otherwise, please check the official Spark JIRA <https://issues.apache.org/jira/projects/SPARK/issues/>_ and contributing guidelines <https://spark.apache.org/contributing.html>. If you create a JIRA ticket or Spark PR related to type hints, please ping me with [~zero323] <https://issues.apache.org/jira/secure/ViewProfile.jspa?name=zero323> or @zero323 <https://github.com/zero323>_ respectively. Thanks in advance.

Motivation

  • Static error detection (see SPARK-20631 <https://issues.apache.org/jira/browse/SPARK-20631>__)

    |SPARK-20631|

  • Improved autocompletion.

    |Syntax completion|

Installation and usage

Please note that the guidelines for distribution of type information is still work in progress (PEP 561 - Distributing and Packaging Type Information <https://www.python.org/dev/peps/pep-0561/>__). Currently installation script overlays existing Spark installations (pyi stub files are copied next to their py counterparts in the PySpark installation directory). If this approach is not acceptable you can add stub files to the search path manually.

According to PEP 484 <https://www.python.org/dev/peps/pep-0484/#storing-and-distributing-stub-files>__:

Third-party stub packages can use any location for stub storage.
Type checkers should search for them using PYTHONPATH.

Moreover:

Third-party stub packages can use any location for stub storage.
Type checkers should search for them using PYTHONPATH. A default
fallback directory that is always checked is
shared/typehints/python3.5/ (or 3.6, etc.)

Please check usage before proceeding.

The package is available on PYPI <https://pypi.org/project/pyspark-stubs/>__:

.. code:: bash

pip install pyspark-stubs

and conda-forge <https://anaconda.org/conda-forge/pyspark-stubs>__:

.. code:: bash

conda install -c conda-forge pyspark-stubs

Depending on your environment you might also need a type checker, like Mypy_ or Pytype_ [#f1], and autocompletion tool, like Jedi.

+--------------------------------------------------+---------------------+--------------------+-------------------------------------+ | Editor | Type checking | Autocompletion | Notes | +==================================================+=====================+====================+=====================================+ | Atom_ | ✔ [#f2]_ | ✔ [#f3]_ | Through plugins. | +--------------------------------------------------+---------------------+--------------------+-------------------------------------+ | IPython_ / Jupyter Notebook_ | ✘ [#f4]_ | ✔ | | +--------------------------------------------------+---------------------+--------------------+-------------------------------------+ | PyCharm_ | ✔ | ✔ | | +--------------------------------------------------+---------------------+--------------------+-------------------------------------+ | PyDev_ | ✔ [#f5]_ | ? | | +--------------------------------------------------+---------------------+--------------------+-------------------------------------+ | VIM_ / Neovim_ | ✔ [#f6]_ | ✔ [#f7]_ | Through plugins. | +--------------------------------------------------+---------------------+--------------------+-------------------------------------+ | Visual Studio Code_ | ✔ [#f8]_ | ✔ [#f9]_ | Completion with plugin | +--------------------------------------------------+---------------------+--------------------+-------------------------------------+ | Environment independent / other editors | ✔ [#f10]_ | ✔ [#f11]_ | Through Mypy_ and Jedi_. | +--------------------------------------------------+---------------------+--------------------+-------------------------------------+

This package is tested against MyPy development branch and in rare cases (primarily important upstrean bugfixes), is not compatible with the preceding MyPy release.

PySpark Version Compatibility

Package versions follow PySpark versions with exception to maintenance releases - i.e. pyspark-stubs==2.3.0 should be compatible with pyspark>=2.3.0,<2.4.0. Maintenance releases (post1, post2, ..., postN) are reserved for internal annotations updates.

API Coverage:

As of release 2.4.0 most of the public API is covered. For details please check API coverage document <https://github.com/zero323/pyspark-stubs/blob/master/doc/api-coverage.rst>__.

See also

  • SPARK-17333 <https://issues.apache.org/jira/browse/SPARK-17333>__ - Make pyspark interface friendly with static analysis.
  • PySpark typing hints <http://apache-spark-developers-list.1001551.n3.nabble.com/PYTHON-PySpark-typing-hints-td21560.html>__ and Revisiting PySpark type annotations <http://apache-spark-developers-list.1001551.n3.nabble.com/Re-PySpark-Revisiting-PySpark-type-annotations-td26232.html>__ on Apache Spark Developers List <http://apache-spark-developers-list.1001551.n3.nabble.com/>__.

Disclaimer

Apache Spark, Spark, PySpark, Apache, and the Spark logo are trademarks <https://www.apache.org/foundation/marks/>__ of The Apache Software Foundation <http://www.apache.org/>__. This project is not owned, endorsed, or sponsored by The Apache Software Foundation.

Footnotes

.. [#f1] Not supported or tested. .. [#f2] Requires atom-mypy <https://atom.io/packages/atom-mypy>__ or equivalent. .. [#f3] Requires autocomplete-python-jedi <https://atom.io/packages/autocomplete-python-jedi>__ or equivalent. .. [#f4] It is possible <https://web.archive.org/web/20190126155957/http://journalpanic.com/post/spice-up-thy-jupyter-notebooks-with-mypy/>__ to use magics to type check directly in the notebook. In general though, you'll have to export whole notebook to .py file and run type checker on the result. .. [#f5] Requires PyDev 7.0.3 or later. .. [#f6] TODO Using vim-mypy <https://github.com/Integralist/vim-mypy>, syntastic <https://github.com/vim-syntastic/syntastic> or Neomake <https://github.com/neomake/neomake>. .. [#f7] With jedi-vim <https://github.com/davidhalter/jedi-vim>. .. [#f8] With Mypy linter <https://code.visualstudio.com/docs/python/linting#_specific-linters>. .. [#f9] With Python extension for Visual Studio Code <https://marketplace.visualstudio.com/items?itemName=ms-python.python>. .. [#f10] Just use your favorite checker directly, optionally combined with tool like entr <http://eradman.com/entrproject/>. .. [#f11] See Jedi editor plugins list <https://jedi.readthedocs.io/en/latest/docs/usage.html#editor-plugins>.

.. |Build Status| image:: https://travis-ci.org/zero323/pyspark-stubs.svg?branch=master :target: https://travis-ci.org/zero323/pyspark-stubs .. |PyPI version| image:: https://img.shields.io/pypi/v/pyspark-stubs.svg :target: https://pypi.org/project/pyspark-stubs/ .. |Conda Forge version| image:: https://img.shields.io/conda/vn/conda-forge/pyspark-stubs.svg :target: https://anaconda.org/conda-forge/pyspark-stubs .. |SPARK-20631| image:: https://i.imgur.com/GfDCGjv.gif :alt: SPARK-20631 .. |Syntax completion| image:: https://i.imgur.com/qvkLTAp.gif :alt: Syntax completion

.. _Atom: https://atom.io/ .. _IPython: https://ipython.org/ .. _Jedi: https://github.com/davidhalter/jedi .. _Jupyter Notebook: https://jupyter.org/ .. _Mypy: http://mypy-lang.org/ .. _Neovim : https://neovim.io/ .. _PyCharm: https://www.jetbrains.com/pycharm/ .. _PyDev: https://www.pydev.org/ .. _Pytype: https://github.com/google/pytype .. _VIM: https://www.vim.org/ .. _Visual Studio Code: https://code.visualstudio.com/

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].