Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations

Stars: ✭ 39 (+95%)

Mutual labels: big-data, apache-spark

Scala Spark Tutorial

Project for James' Apache Spark with Scala course

Stars: ✭ 121 (+505%)

Mutual labels: big-data, apache-spark

contact map

Contact map analysis for biomolecules; based on MDTraj

Stars: ✭ 27 (+35%)

Mutual labels: protein-protein-interaction, protein-ligand-interactions

Mmlspark

Simple and Distributed Machine Learning

Stars: ✭ 2,899 (+14395%)

Mutual labels: big-data, apache-spark

plmc

Inference of couplings in proteins and RNAs from sequence variation

Stars: ✭ 85 (+325%)

Mutual labels: protein-structure, protein-sequences

Morpheus

Morpheus brings the leading graph query language, Cypher, onto the leading distributed processing platform, Spark.

Stars: ✭ 303 (+1415%)

Mutual labels: big-data, apache-spark

gan deeplearning4j

Automatic feature engineering using Generative Adversarial Networks using Deeplearning4j and Apache Spark.

Stars: ✭ 19 (-5%)

Mutual labels: big-data, apache-spark

deepblast

Neural Networks for Protein Sequence Alignment

Stars: ✭ 29 (+45%)

Mutual labels: protein-structure, protein-sequences

pyspark-cheatsheet

PySpark Cheat Sheet - example code to help you learn PySpark and develop apps faster

Stars: ✭ 115 (+475%)

Mutual labels: big-data, apache-spark

Parquet Dotnet

🏐 Apache Parquet for modern .NET

Stars: ✭ 276 (+1280%)

Mutual labels: big-data, apache-spark

spark3D

Spark extension for processing large-scale 3D data sets: Astrophysics, High Energy Physics, Meteorology, …

Stars: ✭ 23 (+15%)

Mutual labels: apache-spark, scientific-computing

Griffon Vm

Griffon Data Science Virtual Machine

Stars: ✭ 128 (+540%)

Mutual labels: big-data, apache-spark

lightdock

Protein-protein, protein-peptide and protein-DNA docking framework based on the GSO algorithm

Stars: ✭ 110 (+450%)

Mutual labels: protein-structure, protein-protein-interaction

Detecting-Malicious-URL-Machine-Learning

No description or website provided.

Stars: ✭ 47 (+135%)

Mutual labels: big-data, apache-spark

SynapseML

Simple and Distributed Machine Learning

Stars: ✭ 3,355 (+16675%)

Mutual labels: big-data, apache-spark

SparkProgrammingInScala

Apache Spark Course Material

Stars: ✭ 57 (+185%)

Mutual labels: big-data, apache-spark

Sparkrdma

RDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark

Stars: ✭ 215 (+975%)

Mutual labels: big-data, apache-spark

leaflet heatmap

简单的可视化湖州通话数据假设数据量很大，没法用浏览器直接绘制热力图，把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后，再使用Apache Spark绘制热力图，然后用leafletjs加载OpenStreetMap图层和热力图图层，以达到良好的交互效果。现在使用Apache Spark实现绘制，可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法，并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .

Stars: ✭ 13 (-35%)

Mutual labels: big-data, apache-spark

Hydrograph

A visual ETL development and debugging tool for big data

Stars: ✭ 144 (+620%)

Mutual labels: big-data, apache-spark

Parquetviewer

Simple windows desktop application for viewing & querying Apache Parquet files

Stars: ✭ 145 (+625%)

Mutual labels: big-data, apache-spark

parapred

Paratope Prediction using Deep Learning

Stars: ✭ 49 (+145%)

Mutual labels: protein-structure, protein-sequences

Bigdata Playground

A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL

Stars: ✭ 177 (+785%)

Mutual labels: big-data, apache-spark

Spark On Lambda

Apache Spark on AWS Lambda

Stars: ✭ 137 (+585%)

Mutual labels: big-data, apache-spark

tape-neurips2019

Tasks Assessing Protein Embeddings (TAPE), a set of five biologically relevant semi-supervised learning tasks spread across different domains of protein biology. (DEPRECATED)

Stars: ✭ 117 (+485%)

Mutual labels: protein-structure, protein-sequences

Data Accelerator

Data Accelerator for Apache Spark simplifies onboarding to Streaming of Big Data. It offers a rich, easy to use experience to help with creation, editing and management of Spark jobs on Azure HDInsights or Databricks while enabling the full power of the Spark engine.

Stars: ✭ 247 (+1135%)

Mutual labels: big-data, apache-spark

spark-records

Bulletproof Apache Spark jobs with fast root cause analysis of failures.

Stars: ✭ 67 (+235%)

Mutual labels: big-data, apache-spark

sparkucx

A high-performance, scalable and efficient ShuffleManager plugin for Apache Spark, utilizing UCX communication layer

Stars: ✭ 32 (+60%)

Mutual labels: big-data, apache-spark

awesome-tools

curated list of awesome tools and libraries for specific domains

Stars: ✭ 31 (+55%)

Mutual labels: big-data, apache-spark

Mist

Serverless proxy for Spark cluster

Stars: ✭ 309 (+1445%)

Mutual labels: big-data, apache-spark

Spark With Python

Fundamentals of Spark with Python (using PySpark), code examples

Stars: ✭ 150 (+650%)

Mutual labels: big-data, apache-spark

pytorch-rgn

Recurrent Geometric Network in Pytorch

Stars: ✭ 28 (+40%)

Mutual labels: protein-structure, protein-sequences

gis4wrf

QGIS toolkit 🧰 for pre- and post-processing 🔨, visualizing 🔍, and running simulations 💻 in the Weather Research and Forecasting (WRF) model 🌀

Stars: ✭ 137 (+585%)

Mutual labels: scientific-computing

accumulo-testing

Apache Accumulo Testing

Stars: ✭ 14 (-30%)

Mutual labels: big-data

Uni-Fold

An open-source platform for developing protein models beyond AlphaFold.

Stars: ✭ 227 (+1035%)

Mutual labels: protein-structure

seamless

Seamless is a framework to set up reproducible computations (and visualizations) that respond to changes in cells. Cells contain the input data as well as the source code of the computations, and all cells can be edited interactively.

Stars: ✭ 19 (-5%)

Mutual labels: scientific-computing

fink-broker

Astronomy Broker based on Apache Spark

Stars: ✭ 18 (-10%)

Mutual labels: apache-spark

incubator-tez

Mirror of Apache Tez (Incubating)

Stars: ✭ 60 (+200%)

Mutual labels: big-data

spinmob

Rapid and flexible acquisition, analysis, fitting, and plotting in Python. Designed for scientific laboratories.

Stars: ✭ 34 (+70%)

Mutual labels: scientific-computing

Clickhouse

ClickHouse® is a free analytics DBMS for big data

Stars: ✭ 21,089 (+105345%)

Mutual labels: big-data

PyCannyEdge

Educational Python implementation of the Canny Edge Detector

Stars: ✭ 31 (+55%)

Mutual labels: scientific-computing

Koalas

Koalas: pandas API on Apache Spark

Stars: ✭ 3,044 (+15120%)

Mutual labels: big-data

Vue Virtual Scroll List

⚡️A vue component support big amount data list with high render performance and efficient.

Stars: ✭ 3,201 (+15905%)

Mutual labels: big-data

paccmann kinase binding residues

Comparison of active site and full kinase sequences for drug-target affinity prediction and molecular generation. Full paper: https://pubs.acs.org/doi/10.1021/acs.jcim.1c00889

Stars: ✭ 29 (+45%)

Mutual labels: protein-ligand-interactions

sequencework

programs and scripts, mainly python, for analyses related to nucleic or protein sequences

Stars: ✭ 22 (+10%)

Mutual labels: protein-sequences

isarn-sketches-spark

Routines and data structures for using isarn-sketches idiomatically in Apache Spark

Stars: ✭ 28 (+40%)

Mutual labels: apache-spark

Cboard

An easy to use, self-service open BI reporting and BI dashboard platform.