ngrunwald / Datasplash
Licence: epl-1.0
Clojure API for a more dynamic Google Dataflow
Stars: ✭ 111
Programming Languages
clojure
4091 projects
Projects that are alternatives of or similar to Datasplash
Google Actions Starter
A Node.js server for Google Assistant (and Google Home).
Stars: ✭ 85 (-23.42%)
Mutual labels: google-cloud
Boinc
Open-source software for volunteer computing and grid computing.
Stars: ✭ 1,320 (+1089.19%)
Mutual labels: distributed-computing
Parapet
A purely functional library to build distributed and event-driven systems
Stars: ✭ 106 (-4.5%)
Mutual labels: distributed-computing
Hands Detection
Hands video tracker using the Tensorflow Object Detection API and Faster RCNN model. The data used is the Hand Dataset from University of Oxford.
Stars: ✭ 87 (-21.62%)
Mutual labels: google-cloud
Floops.jl
Fast sequential, threaded, and distributed for-loops for Julia—fold for humans™
Stars: ✭ 96 (-13.51%)
Mutual labels: distributed-computing
Google Cloud Visualstudio
Google Cloud Tools for Visual Studio
Stars: ✭ 80 (-27.93%)
Mutual labels: google-cloud
Microservices Demo
Sample cloud-native application with 10 microservices showcasing Kubernetes, Istio, gRPC and OpenCensus.
Stars: ✭ 11,369 (+10142.34%)
Mutual labels: google-cloud
Terraform Provider Google
Terraform Google Cloud Platform provider
Stars: ✭ 1,318 (+1087.39%)
Mutual labels: google-cloud
Typhoon
Minimal and free Kubernetes distribution with Terraform
Stars: ✭ 1,397 (+1158.56%)
Mutual labels: google-cloud
Funnel
Funnel is a toolkit for distributed task execution via a simple, standard API.
Stars: ✭ 87 (-21.62%)
Mutual labels: google-cloud
Bank Vaults
A Vault swiss-army knife: a K8s operator, Go client with automatic token renewal, automatic configuration, multiple unseal options and more. A CLI tool to init, unseal and configure Vault (auth methods, secret engines). Direct secret injection into Pods.
Stars: ✭ 1,316 (+1085.59%)
Mutual labels: google-cloud
Iap Desktop
IAP Desktop is a Windows application that provides zero-trust Remote Desktop and SSH access to Linux and Windows VMs on Google Cloud.
Stars: ✭ 96 (-13.51%)
Mutual labels: google-cloud
Kubeformation
Create declarative cluster specifications for your managed Kubernetes vendor (GKE, AKS)
Stars: ✭ 86 (-22.52%)
Mutual labels: google-cloud
Continuous Deployment On Kubernetes
Get up and running with Jenkins on Google Kubernetes Engine
Stars: ✭ 1,494 (+1245.95%)
Mutual labels: google-cloud
Unity Solutions
Use Firebase tools to incorporate common features into your games!
Stars: ✭ 95 (-14.41%)
Mutual labels: google-cloud
Elephas
Distributed Deep learning with Keras & Spark
Stars: ✭ 1,521 (+1270.27%)
Mutual labels: distributed-computing
Keras Cloud Ml Engine
Adventures using keras on Google's Cloud ML Engine
Stars: ✭ 106 (-4.5%)
Mutual labels: google-cloud
Jhipster Microservices Example
JHipster Microservices Example using Spring Cloud, Spring Boot, Angular, Docker, and Kubernetes
Stars: ✭ 100 (-9.91%)
Mutual labels: google-cloud
Datasplash
Clojure API for a more dynamic Google Cloud Dataflow and (not really battle tested) any other Apache Beam backend.
Usage
You can also see ports of the official Dataflow examples in the
datasplash.examples
namespace.
Here is the classic word count:
(ns datasplash.examples
(:require [clojure.string :as str]
[clojure.tools.logging :as log]
[datasplash
[api :as ds]
[bq :as bq]
[datastore :as dts]
[pubsub :as ps]
[options :as options :refer [defoptions]]]
[clojure.edn :as edn])
(:import [java.util UUID]
[com.google.datastore.v1 Query PropertyFilter$Operator]
[com.google.datastore.v1.client DatastoreHelper]
[org.apache.beam.sdk.options PipelineOptionsFactory])
(:gen-class))
(defn tokenize
[l]
(remove empty? (.split (str/trim l) "[^a-zA-Z']+")))
(defn count-words
[p]
(ds/->> :count-words p
(ds/mapcat tokenize {:name :tokenize})
(ds/frequencies)))
(defn format-count
[[k v]]
(format "%s: %d" k v))
(defoptions WordCountOptions
{:input {:default "gs://dataflow-samples/shakespeare/kinglear.txt"
:type String}
:output {:default "kinglear-freqs.txt"
:type String}
:numShards {:default 0
:type Long}})
(defn -main
[& str-args]
(let [p (ds/make-pipeline WordCountOptions str-args)
{:keys [input output numShards]} (ds/get-pipeline-options p)]
(->> p
(ds/read-text-file input {:name "King-Lear"})
(count-words)
(ds/map format-count {:name :format-count})
(ds/write-text-file output {:num-shards numShards})
(ds/run-pipeline))))
Run it from the repl with:
(in-ns 'datasplash.examples)
(compile 'datasplash.examples)
(-main "--input=in.txt" "--output=out.txt")
Note that you will need to run (compile 'datasplash.examples)
every time you
make a change.
Run an example from the examples namespace locally with:
lein run example-name --input=in.txt --output=out.txt
Run in on Google Cloud (if you have done a gcloud init
on this machine):
lein run example-name --input=gs://dataflow-samples/shakespeare/kinglear.txt --output=gs://my-project-tmp/results.txt --runner=DataflowRunner --project=my-project --stagingLocation=gs://my-project-staging
Caveats
- Due to the way the code is loaded when running in distributed mode, you may
get some exceptions about unbound vars, especially when using instances with
a high number of cpu. They will not however cause the job to fail and are of
no consequences. They are caused by the need to prep the Clojure runtime when
loading the class files in remote instances and some tricky business with
locks and
require
. - If you have to write your own low-level
ParDo
objects (you shouldn't), wrap all your code in thesafe-exec
macro to avoid issues with unbound vars. Any good idea about finding a better way to do this would be greatly appreciated! - If some of the
UserCodeException
as seen in the cloud UI are mangled and missing the relevant part of the Clojure source code, this is due to a bug with the way the sdk mangles stacktraces in Clojure. In this case look for ClojureRuntimeException in the logs to find the original unaltered stacktrace. - Beware of using Clojure 1.9:
proxy
results are notSerializable
anymore, so you cannot use anywhere in your pipeline Clojure code that uses proxy. Use Java shim for these objects instead. - If you see something like
java.lang.ClassNotFoundException: Options
you probably forgot to compile your namespace. - Whenever you need to check some spec in user code, you will have to first require
those specs because they may not be loaded in your Clojure runtime. But don't
use
(require)
because it's not thread safe. See [this issue] for a workaround. - If you see a
java.io.IOException: No such file or directory
when invokingcompile
, make sure there is a directory in your project root that matches the value of*compile-path*
(defaultclasses
).
License
Copyright © 2015-2019 Oscaro.com
Distributed under the Eclipse Public License either version 1.0 or (at your option) any later version.
Note that the project description data, including the texts, logos, images, and/or trademarks,
for each open source project belongs to its rightful owner.
If you wish to add or remove any projects, please contact us at [email protected].