All Projects → Sqooba → snorkel

Sqooba / snorkel

Licence: Apache-2.0 license
Snorkel - Bootstrap your Data Science

Programming Languages

shell
77523 projects
Batchfile
5799 projects

Projects that are alternatives of or similar to snorkel

intellij-zeppelin
Edit code in IntelliJ, eval/run in Zeppelin notebook
Stars: ✭ 19 (-20.83%)
Mutual labels:  zeppelin, zeppelin-notebook
nsmc-zeppelin-notebook
Movie review dataset Word2Vec & sentiment classification Zeppelin notebook
Stars: ✭ 26 (+8.33%)
Mutual labels:  zeppelin, zeppelin-notebook
ScalaTIKZ
ScalaTIKZ is an open-source library for PGF/TIKZ vector graphics.
Stars: ✭ 18 (-25%)
Mutual labels:  datascience
angular-cli-skeleton
angular-cli skeleton to quickly start a new project with advanced features and best practices. All features are described in README.md.
Stars: ✭ 32 (+33.33%)
Mutual labels:  starter
starter-reactnative-nestjs-mysql
Starter mobile ReactNative NestJS MySQL with continuous integration and AWS deployment
Stars: ✭ 16 (-33.33%)
Mutual labels:  starter
mySequelWeb
MySequel Web is an open source web based GUI tool to access your MySql database. It is similar to PHP My Admin of WAMP. Here you can access any MySQL database with proper connection strings. We do not save or store any of your connection strings or data. Every thing related to your connection strings are volatile. You can host this as a simple n…
Stars: ✭ 26 (+8.33%)
Mutual labels:  workbench
RcppDynProg
Dynamic Programming implemented in Rcpp. Includes example partition and out of sample fitting applications.
Stars: ✭ 13 (-45.83%)
Mutual labels:  datascience
Machine-learning
This repository will contain all the stuffs required for beginners in ML and DL do follow and star this repo for regular updates
Stars: ✭ 27 (+12.5%)
Mutual labels:  datascience
emr-bootstrap-spark
AWS bootstrap scripts for Mozilla's flavoured Spark setup.
Stars: ✭ 49 (+104.17%)
Mutual labels:  zeppelin
AgePredictor
Age classification from text using PAN16, blogs, Fisher Callhome, and Cancer Forum
Stars: ✭ 13 (-45.83%)
Mutual labels:  datascience
angular-open-source-starter
This is a starter project for creating open-source libraries for Angular. It is a full fledged Angular workspace with demo application and easy library addition. It is designed to be used for open-sourcing libraries on Github and has everything you'd need ready for CI, code coverage, SSR testing, StackBlitz demo deployment and more.
Stars: ✭ 212 (+783.33%)
Mutual labels:  starter
node-typescript-starter
REST API using Node with typescript, KOA framework. TypeORM for SQL. Middlewares JWT (auth), CORS, Winston Logger, Error, Response
Stars: ✭ 19 (-20.83%)
Mutual labels:  starter
PracticalMachineLearning
A collection of ML related stuff including notebooks, codes and a curated list of various useful resources such as books and softwares. Almost everything mentioned here is free (as speech not free food) or open-source.
Stars: ✭ 60 (+150%)
Mutual labels:  zeppelin-notebook
WP-Gulp-Starter
A starter kit for developing WordPress themes and plugins with Gulp workflow.
Stars: ✭ 26 (+8.33%)
Mutual labels:  starter
graphql-compose-elasticsearch
Graphql App using Node with typescript, KOA framework and Elasticsearch
Stars: ✭ 40 (+66.67%)
Mutual labels:  starter
nodejs-starter-template
You can use this template when you're starting a new project by using Node.js, Express, and Mongoose. It contains general concepts, you can customize it according to your needs.
Stars: ✭ 54 (+125%)
Mutual labels:  starter
botfuel-sample-starter
Starter bot using Botfuel Dialog
Stars: ✭ 24 (+0%)
Mutual labels:  starter
modern-webpack-starter
🏰 A modern JavaScript starter using Webpack 4. Made in a simple way - good for learning or starting a new project without having to rollout cli-auto-builders.
Stars: ✭ 42 (+75%)
Mutual labels:  starter
mercury
Mercury - data visualize and discovery with Javascript, such as apache zeppelin and jupyter
Stars: ✭ 29 (+20.83%)
Mutual labels:  zeppelin
angular-app
Angular 14 ,Bootstrap 5, Node.js, Express.js, ESLint, CRUD, PWA, SSR, SEO, Universal, Lazy Loading
Stars: ✭ 389 (+1520.83%)
Mutual labels:  starter

Snorkel - Bootstrap your DataScience

Snorkel is a local ready-in-30-seconds DataScience workbench for small to medium sized data problems.

It is based on Apache Zeppelin, is easy to start and stop, allows to persist your workspace locally and update your python or javascript dependencies without interrupting your work. It is best suited for early stage data exploration and prototyping, fully loaded with common python and javascript data science libraries.

How to launch it

On Linux and macOS

  1. ./build-images.sh

    Run once to build the docker image and install the python and javascript dependencies.

  2. ./zeppelin.sh --start

    Starts the Zeppelin container.

    Default port for Zeppelin is 8080, i.e. http://localhost:8080. Default port for Spark UI is 4040, i.e. http://localhost:4040, once the first Spark job has been started.

  3. ./zeppelin.sh --stop

    Stops Zeppelin container

On Windows

Windows scripts are available (.cmd extension). You can execute them from the command prompt or the powershell, or simply double-click on them from the explorer (or right-click > run).

  1. build-images.cmd

    Run once to build the docker image and install the python and javascript dependencies.

  2. start-zeppelin.cmd

    Starts the Zeppelin container. A command prompt window will appear, press any key to close it.

    Default port for Zeppelin is 8080, i.e. http://localhost:8080. Default port for Spark UI is 4040, i.e. http://localhost:4040, once the first Spark job has been started.

  3. stop-zeppelin.cmd

    Stops Zeppelin container. Once again, press any key to close the window.

Custom configuration

Workspace persistence

On first start, the following volumes will be created on the host at the specified default locations and shared with the container:

Host Container Description
snorkel/zeppelin/data /zeppelin/data Your data stored here are available in Zeppelin
snorkel/zeppelin/logs /zeppelin/logs Logs
snorkel/zeppelin/notebooks /zeppelin/notebooks Notebooks git repo, i.e. your work
snorkel/zeppelin/spark-warehouse /zeppelin/spark-warehouse Storage for temporary Spark tables

It is possible to override the location of these volumes by setting the environment variable ZEPPELIN_ROOT_DIR to your preferred location before running the zeppelin.sh --start script

Zeppelin interpreter memory

By default half of the total available memory will be allocated to the Zeppelin interpreters on start. You can override this value by setting the environment variable ZEPPELIN_MEMORY (the value should be the size in GB, eg: export ZEPPELIN_MEMORY=8 for 8 Gb of memory).

UI ports

By default the Zeppelin UI will run on port 8080 and the Spark UI on port 4040. You can override these values by setting the environment variables, respectively ZEPPELIN_PORT and SPARK_UI_PORT

Add Python and JS dependencies on-the-fly

snorkel/zeppelin/bootstrap/python/requirements.txt lets you define Python pip dependencies.

zeppelin/bootstrap/js and zeppelin/bootstrap/css lets you deploy JS and CSS libraries inside Zeppelin.

On Linux and macOS, call ./zeppelin.sh --refresh to refresh your container without restarting it!

Examples

Python dependency

Say you're missing the python web micro-framework Flask. Just add the following line to snorkel/zeppelin/bootstrap/python/requirements.txt:

Flask==0.12.2

And execute ./zeppelin.sh --refresh. Voilà! Flask is available in your Zeppelin notebook, no restart needed.

JS libraries

Let's imagine you want to add the mobx library to your dependencies.

There are two ways to add javascript dependencies to your Zeppelin notebook:

  1. By using unpkg, a fast, global content delivery network for everything on npm:

    Add the following script tag to your code in the notebook's snippet: <script src="https://unpkg.com/mobx"></script> This will inject the static (non-minified) source code of the library in your browser.

  2. By using the zeppelin.sh script:

    • Download the source code of the library from any CDN
    • Add the js file to the bootstrap/js folder
    • Execute ./zeppelin.sh --refresh. This will copy the library in the container at a location where Zeppelin can serve it to your browser.

Scala/Java dependency

You can use Zeppelin's built-in dependency interpreter to pull dependencies without leaving your notebook

For example, if you need the Scala plotting library Vegas, just add the following line in a snippet at the very beginning of your notebook:

%spark.dep
z.load("org.vegas-viz:vegas_2.11:0.3.11")

Do not forget to specify the spark.dep interpreter!

Execute the snippet before running any code (or restart your interpreter and execute the snippet). You can now use the library normally:

import vegas._
...

Dependencies table

The below table list all the dependencies included inside the container.

Library Version Licence
matplotlib 2.0.2 PSF
NumPy 1.13.1 BSD
pandas 0.20.3 BSD
python-igraph 0.7.1.post6 GPL 2
cairocffi 0.8.0 BSD-3-Clause
scikit-learn 0.19.0 BSD-3-Clause
SciPy 0.19.1 BSD
Seaborn 0.8.1 BSD-3-Clause
sklearn 0.0 BSD
d3js 4.10.2 BSD-3-Clause
leaflet 1.2.0 BSD 2-clause
Leaflet.markercluster 1.1.0 MIT
Zeppelin 0.7.3 Apache-2.0
Docker Compose 3.3 Apache-2.0
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].