miraisolutions / sparkbq

Licence: GPL-3.0 license

Sparklyr extension package to connect to Google BigQuery

Programming Languages

7636 projects

Projects that are alternatives of or similar to sparkbq

mleap

R Interface to MLeap

Stars: ✭ 24 (+50%)

Mutual labels: sparklyr

sparklygraphs

Old repo for R interface for GraphFrames

Stars: ✭ 13 (-18.75%)

Mutual labels: sparklyr

logica

Logica is a logic programming language that compiles to StandardSQL and runs on Google BigQuery.

Stars: ✭ 1,469 (+9081.25%)

Mutual labels: bigquery

tag-manager

Website analytics, JavaScript error tracking + analytics, tag manager, data ingest endpoint creation (tracking pixels). GDPR + CCPA compliant.

Stars: ✭ 279 (+1643.75%)

Mutual labels: bigquery

dekart

GIS Visualisation for Amazon Athena and BigQuery

Stars: ✭ 131 (+718.75%)

Mutual labels: bigquery

hive-bigquery-storage-handler

Hive Storage Handler for interoperability between BigQuery and Apache Hive

Stars: ✭ 16 (+0%)

Mutual labels: bigquery

hive compared bq

hive_compared_bq compares/validates 2 (SQL like) tables, and graphically shows the rows/columns that are different.

Stars: ✭ 27 (+68.75%)

Mutual labels: bigquery

DataflowTemplates

Convenient Dataflow pipelines for transforming data between cloud data sources

Stars: ✭ 22 (+37.5%)

Mutual labels: bigquery

spark.sas7bdat

Read in SAS data in parallel into Apache Spark

Stars: ✭ 25 (+56.25%)

Mutual labels: sparklyr

objectiv-analytics

Powerful product analytics for data teams, with full control over data & models.

Stars: ✭ 399 (+2393.75%)

Mutual labels: bigquery

bigquery-data-lineage

Reference implementation for real-time Data Lineage tracking for BigQuery using Audit Logs, ZetaSQL and Dataflow.

Stars: ✭ 112 (+600%)

Mutual labels: bigquery

scalikejdbc-bigquery

ScalikeJDBC extension for Google BigQuery

Stars: ✭ 18 (+12.5%)

Mutual labels: bigquery

bigflow

A Python framework for data processing on GCP.

Stars: ✭ 96 (+500%)

Mutual labels: bigquery

firestore-to-bigquery-export

NPM package for copying and converting Cloud Firestore data to BigQuery.

Stars: ✭ 26 (+62.5%)

Mutual labels: bigquery

amplitude-bigquery

Export your events from Amplitude to Google BigQuery/Google Cloud Storage

Stars: ✭ 28 (+75%)

Mutual labels: bigquery

managed ml systems and iot

Managed Machine Learning Systems and Internet of Things Live Lesson

Stars: ✭ 35 (+118.75%)

Mutual labels: bigquery

spark-on-k8s-gcp-examples

Example Spark applications that run on Kubernetes and access GCP products, e.g., GCS, BigQuery, and Cloud PubSub

Stars: ✭ 36 (+125%)

Mutual labels: bigquery

graphframes

R Interface for GraphFrames

Stars: ✭ 36 (+125%)

Mutual labels: sparklyr

starlake

Starlake is a Spark Based On Premise and Cloud ELT/ETL Framework for Batch & Stream Processing

Stars: ✭ 16 (+0%)

Mutual labels: bigquery

polygon-etl

ETL (extract, transform and load) tools for ingesting Polygon blockchain data to Google BigQuery and Pub/Sub

Stars: ✭ 53 (+231.25%)

Mutual labels: bigquery

View All Similar Projects ➔

sparkbq: Google BigQuery Support for sparklyr

sparkbq is a sparklyr extension package providing an integration with Google BigQuery. It builds on top of spark-bigquery, which provides a Google BigQuery data source to Apache Spark.

Version Information

You can install the released version of sparkbq from CRAN via

install.packages("sparkbq")

or the latest development version through

devtools::install_github("miraisolutions/sparkbq", ref = "develop")

The following table provides an overview over supported versions of Apache Spark, Scala, and Google Dataproc:

sparkbq	spark-bigquery	Apache Spark	Scala	Google Dataproc
0.1.x	0.1.0	2.2.x and 2.3.x	2.11	1.2.x and 1.3.x

sparkbq is based on the Spark package spark-bigquery which is available in a separate GitHub repository.

Example Usage

library(sparklyr)
library(sparkbq)
library(dplyr)

config <- spark_config()

sc <- spark_connect(master = "local[*]", config = config)

# Set Google BigQuery default settings
bigquery_defaults(
  billingProjectId = "<your_billing_project_id>",
  gcsBucket = "<your_gcs_bucket>",
  datasetLocation = "US",
  serviceAccountKeyFile = "<your_service_account_key_file>",
  type = "direct"
)

# Reading the public shakespeare data table
# https://cloud.google.com/bigquery/public-data/
# https://cloud.google.com/bigquery/sample-tables
hamlet <- 
  spark_read_bigquery(
    sc,
    name = "hamlet",
    projectId = "bigquery-public-data",
    datasetId = "samples",
    tableId = "shakespeare") %>%
  filter(corpus == "hamlet") # NOTE: predicate pushdown to BigQuery!
  
# Retrieve results into a local tibble
hamlet %>% collect()

# Write result into "mysamples" dataset in our BigQuery (billing) project
spark_write_bigquery(
  hamlet,
  datasetId = "mysamples",
  tableId = "hamlet",
  mode = "overwrite")

Authentication

When running outside of Google Cloud it is necessary to specify a service account JSON key file. The service account key file can be passed as parameter serviceAccountKeyFile to bigquery_defaults or directly to spark_read_bigquery and spark_write_bigquery.

Alternatively, an environment variable export GOOGLE_APPLICATION_CREDENTIALS=/path/to/your/service_account_keyfile.json can be set (see https://cloud.google.com/docs/authentication/getting-started for more information). Make sure the variable is set before starting the R session.

When running on Google Cloud, e.g. Google Cloud Dataproc, application default credentials (ADC) may be used in which case it is not necessary to specify a service account key file.

Further Information

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

miraisolutions / sparkbq

Programming Languages

Labels

Projects that are alternatives of or similar to sparkbq

sparkbq: Google BigQuery Support for sparklyr

Version Information

Example Usage

Authentication

Further Information