All Projects → DataThirstLtd → databricksConnectDocker

DataThirstLtd / databricksConnectDocker

Licence: other
Docker Images with Databricks Connect Ready to go

Programming Languages

Dockerfile
14818 projects

Projects that are alternatives of or similar to databricksConnectDocker

terraform-provider-databricks
Terraform Databricks provider
Stars: ✭ 16 (-15.79%)
Mutual labels:  databricks
stowage
Bloat-free, no BS cloud storage SDK.
Stars: ✭ 85 (+347.37%)
Mutual labels:  databricks
SynapseML
Simple and Distributed Machine Learning
Stars: ✭ 3,355 (+17557.89%)
Mutual labels:  databricks
azure.databricks.cicd.tools
Tools for Deploying Databricks Solutions in Azure
Stars: ✭ 87 (+357.89%)
Mutual labels:  databricks
architect big data solutions with spark
code, labs and lectures for the course
Stars: ✭ 40 (+110.53%)
Mutual labels:  databricks
mlops-platforms
Compare MLOps Platforms. Breakdowns of SageMaker, VertexAI, AzureML, Dataiku, Databricks, h2o, kubeflow, mlflow...
Stars: ✭ 293 (+1442.11%)
Mutual labels:  databricks
databricks-dbapi
DBAPI and SQLAlchemy dialect for Databricks Workspace and SQL Analytics clusters
Stars: ✭ 21 (+10.53%)
Mutual labels:  databricks
nutter
Testing framework for Databricks notebooks
Stars: ✭ 152 (+700%)
Mutual labels:  databricks
mlflow-tracking-server
MLFLow Tracking Server based on Docker and AWS S3
Stars: ✭ 59 (+210.53%)
Mutual labels:  databricks
dbt-databricks
A dbt adapter for Databricks.
Stars: ✭ 115 (+505.26%)
Mutual labels:  databricks
StoreItemDemand
(117th place - Top 26%) Deep learning using Keras and Spark for the "Store Item Demand Forecasting" Kaggle competition.
Stars: ✭ 24 (+26.32%)
Mutual labels:  databricks
blackbricks
Black for Databricks notebooks
Stars: ✭ 40 (+110.53%)
Mutual labels:  databricks
databricks-notebooks
Collection of Sample Databricks Spark Notebooks ( mostly for Azure Databricks )
Stars: ✭ 57 (+200%)
Mutual labels:  databricks
Mmlspark
Simple and Distributed Machine Learning
Stars: ✭ 2,899 (+15157.89%)
Mutual labels:  databricks
Spark
.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.
Stars: ✭ 1,721 (+8957.89%)
Mutual labels:  databricks
Redash
Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.
Stars: ✭ 20,147 (+105936.84%)
Mutual labels:  databricks

Databricks-Connect Container

This container is designed for developing PySpark application in VS Code using Databricks-Connect. Technically it should work for Scala, Java & R - though I haven't tried.

It can be used as a Docker Container or as a cloud hosted container using CodeSpaces.

Requirements

Mac OS

Windows 10

Alternative

For a more advanced option you can use any OS and a Remote Docker Host.

This maybe useful if you cannot run local containers and do not want to use a cloud container.

Getting Started

  • Open an empty folder in VS Code
  • Create a directory called .devcontainer
  • Create an empty file in the directory called devcontainer.json
  • Paste this code into the file:
{
	"context": "..",
	"image": "datathirstltd/dbconnect:7.1.0",

	"settings": {
		"python.pythonPath": "/opt/conda/envs/dbconnect/bin/python",
		"python.venvPath": "/opt/conda/envs/dbconnect/lib/python3.7/site-packages/pyspark/jars"
	},

	//  Optional command - could add your own environment.yml file here (you must keep --name the same)
	// "postCreateCommand": "conda env update --file environment.yml --name dbconnect",
	
	// Rather than storing/committing your bearer token here we recommend using a local variable and passing thru "DATABRICKS_API_TOKEN": "${localEnv:DatabricksToken}",
	// You can manually set these as environment variables if you prefer
	"containerEnv": {
		"DATABRICKS_ADDRESS": "https://westeurope.azuredatabricks.net/",
		"DATABRICKS_API_TOKEN": "dapia12345678901234567890",
		"DATABRICKS_CLUSTER_ID": "0000-11111-hello123",
		"DATABRICKS_ORG_ID": "1234567890",
		"DATABRICKS_PORT": "8787"
	},
	"extensions": [
		"ms-python.python"
	]
}  

IMPORTANT: Correct the image tag to the version of Databricks Runtime your cluster is running. Currently we only support Databricks 6+

  • Update the Databricks Variables for your environment
  • Optionally add any additional extensions you want to the extensions block.

IMPORTANT: Changing any setting in the devcontain.json after the container has been build requires you to rebuild the container for it take effect

To open using Docker locally:

  • Click on the Green icon in the bottom left of VSCode and select "Reopen in Container"

To open in a CodeSpace:

  • Commit your folder to a repo first
  • Open the Remote Explorer (left hand toolbar)
  • Ensure CodeSpaces is selected in the top drop down
  • Click + (Create new CodeSpace)
  • Follow the prompts

The first pull can be a little slow as the image is quite big. But once it is cached rebuilding the container should take just a few seconds.

Test it out

First from command prompt check that you databricks connect install can connect:

databricks-connect test

If create a test.py file and paste this code:

from pyspark.sql import SparkSession
spark = SparkSession\
.builder\
.getOrCreate()

print("Testing simple count")

# The Spark code will execute on the Azure Databricks cluster.
print(spark.range(100).count())

Press F5 and select the Python debugger.

Why?

Because setting up Databricks-Connect (particulary on Windows is a PIA). This allow:

  • A common setup between team members
  • Multiple side by side versions
  • Ability to reset your environment
  • Even run the whole thing from a browser!

GitHub Repository

Issues & Contributions

https://github.com/DataThirstLtd/databricksConnectDocker

Docker Hub

https://hub.docker.com/r/datathirstltd/dbconnect

More Information

About Data Thirst - https://datathirst.net About VSCode and Containers: https://code.visualstudio.com/docs/remote/containers

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].