LopqTraining of Locally Optimized Product Quantization (LOPQ) models for approximate nearest neighbor search of high dimensional data in Python and Spark.
Stars: ✭ 530 (+178.95%)
LinkisLinkis helps easily connect to various back-end computation/storage engines(Spark, Python, TiDB...), exposes various interfaces(REST, JDBC, Java ...), with multi-tenancy, high performance, and resource control.
Stars: ✭ 2,323 (+1122.63%)
AztkAZTK powered by Azure Batch: On-demand, Dockerized, Spark Jobs on Azure
Stars: ✭ 152 (-20%)
Nd4jFast, Scientific and Numerical Computing for the JVM (NDArrays)
Stars: ✭ 1,742 (+816.84%)
PowderkegLive-coding the cluster!
Stars: ✭ 152 (-20%)
DatacompyPandas and Spark DataFrame comparison for humans
Stars: ✭ 147 (-22.63%)
SparkFirely's open source FHIR server
Stars: ✭ 174 (-8.42%)
Azure Event Hubs SparkEnabling Continuous Data Processing with Apache Spark and Azure Event Hubs
Stars: ✭ 140 (-26.32%)
ElastiknnElasticsearch plugin for nearest neighbor search. Store vectors and run similarity search using exact and approximate algorithms.
Stars: ✭ 139 (-26.84%)
QuillCompile-time Language Integrated Queries for Scala
Stars: ✭ 1,998 (+951.58%)
Cc PysparkProcess Common Crawl data with Python and Spark
Stars: ✭ 147 (-22.63%)
Whylogs JavaProfile and monitor your ML data pipeline end-to-end
Stars: ✭ 164 (-13.68%)
Technology Talk汇总java生态圈常用技术框架、开源中间件,系统架构、数据库、大公司架构案例、常用三方类库、项目管理、线上问题排查、个人成长、思考等知识
Stars: ✭ 12,136 (+6287.37%)
RasterframesGeospatial Raster support for Spark DataFrames
Stars: ✭ 142 (-25.26%)
ValdVald. A Highly Scalable Distributed Vector Search Engine
Stars: ✭ 158 (-16.84%)
Sparkling GraphSparklingGraph provides easy to use set of features that will give you ability to proces large scala graphs using Spark and GraphX.
Stars: ✭ 139 (-26.84%)
Faiss tipsSome useful tips for faiss
Stars: ✭ 170 (-10.53%)
PgannFast Approximate Nearest Neighbor (ANN) searches with a PostgreSQL database.
Stars: ✭ 156 (-17.89%)
HorovodDistributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.
Stars: ✭ 11,943 (+6185.79%)
SparkmonitorMonitor Apache Spark from Jupyter Notebook
Stars: ✭ 154 (-18.95%)
Spark.jlJulia binding for Apache Spark
Stars: ✭ 153 (-19.47%)
XsqlUnified SQL Analytics Engine Based on SparkSQL
Stars: ✭ 176 (-7.37%)
Spark TsneDistributed t-SNE via Apache Spark
Stars: ✭ 151 (-20.53%)
GeopysparkGeoTrellis for PySpark
Stars: ✭ 167 (-12.11%)
Benchm MlA minimal benchmark for scalability, speed and accuracy of commonly used open source implementations (R packages, Python scikit-learn, H2O, xgboost, Spark MLlib etc.) of the top machine learning algorithms for binary classification (random forests, gradient boosted trees, deep neural networks etc.).
Stars: ✭ 1,835 (+865.79%)
RoaringbitmapA better compressed bitset in Java
Stars: ✭ 2,460 (+1194.74%)
Spark With PythonFundamentals of Spark with Python (using PySpark), code examples
Stars: ✭ 150 (-21.05%)
Big WhaleSpark、Flink等离线任务的调度以及实时任务的监控
Stars: ✭ 163 (-14.21%)
Kraps RpcA RPC framework leveraging Spark RPC module
Stars: ✭ 175 (-7.89%)
NanopqPure python implementation of product quantization for nearest neighbor search
Stars: ✭ 145 (-23.68%)
AzuredatabricksbestpracticesVersion 1 of Technical Best Practices of Azure Databricks based on real world Customer and Technical SME inputs
Stars: ✭ 186 (-2.11%)
Spark AuthorizerA Spark SQL extension which provides SQL Standard Authorization for Apache Spark
Stars: ✭ 141 (-25.79%)
Vue Info CardSimple and beautiful card component with an elegant spark line, for VueJS.
Stars: ✭ 159 (-16.32%)
Data science blogsA repository to keep track of all the code that I end up writing for my blog posts.
Stars: ✭ 139 (-26.84%)
Spark NlpState of the Art Natural Language Processing
Stars: ✭ 2,518 (+1225.26%)
GlowAn open-source toolkit for large-scale genomic analysis
Stars: ✭ 159 (-16.32%)
Isolation ForestA Spark/Scala implementation of the isolation forest unsupervised outlier detection algorithm.
Stars: ✭ 139 (-26.84%)
TarsoslshA Java library implementing practical nearest neighbour search algorithm for multidimensional vectors that operates in sublinear time. It implements Locality-sensitive Hashing (LSH) and multi index hashing for hamming space.
Stars: ✭ 179 (-5.79%)
QuicksqlA Flexible, Fast, Federated(3F) SQL Analysis Middleware for Multiple Data Sources
Stars: ✭ 1,821 (+858.42%)
HandysparkHandySpark - bringing pandas-like capabilities to Spark dataframes
Stars: ✭ 158 (-16.84%)
Apache Spark NodeNode.js bindings for Apache Spark DataFrame APIs
Stars: ✭ 136 (-28.42%)
Deeplearning4jSuite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular and tiny c++ library for running math code and a java based math library on top of the core c++ library. Also includes samediff: a pytorch/tensorflow like library for running deep learni…
Stars: ✭ 12,277 (+6361.58%)
GeniA Clojure dataframe library that runs on Spark
Stars: ✭ 152 (-20%)
Js SparkRealtime calculation distributed system. AKA distributed lodash
Stars: ✭ 187 (-1.58%)
Kotlin Spark ApiThis projects gives Kotlin bindings and several extensions for Apache Spark. We are looking to have this as a part of Apache Spark 3.x
Stars: ✭ 183 (-3.68%)
Sparkstreaming💥 🚀 封装sparkstreaming动态调节batch time(有数据就执行计算);🚀 支持运行过程中增删topic;🚀 封装sparkstreaming 1.6 - kafka 010 用以支持 SSL。
Stars: ✭ 179 (-5.79%)
TransmogrifaiTransmogrifAI (pronounced trăns-mŏgˈrə-fī) is an AutoML library for building modular, reusable, strongly typed machine learning workflows on Apache Spark with minimal hand-tuning
Stars: ✭ 2,084 (+996.84%)