phrase-at-scaleDetect common phrases in large amounts of text using a data-driven approach. Size of discovered phrases can be arbitrary. Can be used in languages other than English
Stars: ✭ 115 (+0%)
OSCIOpen Source Contributor Index
Stars: ✭ 107 (-6.96%)
dislibThe Distributed Computing library for python implemented using PyCOMPSs programming model for HPC.
Stars: ✭ 39 (-66.09%)
siembolAn open-source, real-time Security Information & Event Management tool based on big data technologies, providing a scalable, advanced security analytics framework.
Stars: ✭ 153 (+33.04%)
classifai🔥 One of the most comprehensive open-source data annotation platform.
Stars: ✭ 99 (-13.91%)
streamsx.kafkaRepository for integration with Apache Kafka
Stars: ✭ 13 (-88.7%)
soda-sparkSoda Spark is a PySpark library that helps you with testing your data in Spark Dataframes
Stars: ✭ 58 (-49.57%)
spark-utilsBasic framework utilities to quickly start writing production ready Apache Spark applications
Stars: ✭ 25 (-78.26%)
beam-siteApache Beam Site
Stars: ✭ 28 (-75.65%)
DataEngineeringThis repo contains commands that data engineers use in day to day work.
Stars: ✭ 47 (-59.13%)
arrow-datafusionApache Arrow DataFusion SQL Query Engine
Stars: ✭ 2,360 (+1952.17%)
FIW KRTFamilies In the WIld: A Kinship Recogntion Toolbox.
Stars: ✭ 18 (-84.35%)
cejaPySpark phonetic and string matching algorithms
Stars: ✭ 24 (-79.13%)
fink-brokerAstronomy Broker based on Apache Spark
Stars: ✭ 18 (-84.35%)
bullet-coreBullet is a streaming query engine that can be plugged into any singular data stream using a Stream Processing framework like Apache Storm, Spark or Flink.
Stars: ✭ 36 (-68.7%)
scarfToolkit for highly memory efficient analysis of single-cell RNA-Seq, scATAC-Seq and CITE-Seq data. Analyze atlas scale datasets with millions of cells on laptop.
Stars: ✭ 54 (-53.04%)
predictionioPredictionIO, a machine learning server for developers and ML engineers.
Stars: ✭ 12,510 (+10778.26%)
LoL-Match-PredictionWin probability predictions for League of Legends matches using neural networks
Stars: ✭ 34 (-70.43%)
shiftingA privacy-focused list of alternatives to mainstream services to help the competition.
Stars: ✭ 31 (-73.04%)
IoT-system-PLC-data-to-InfluxDBThis project aim is to provide free software to fetch data from plcs (Siemens S7-300/400/1200/1500) and store it. Used stack is completly opensource. I used InfluDB as data storage, so application principle is following Big Data paradigm.
Stars: ✭ 26 (-77.39%)
bftkvA distributed key-value storage that's tolerant to Byzantine fault.
Stars: ✭ 27 (-76.52%)
spark-connectorA connector for Apache Spark to access Exasol
Stars: ✭ 13 (-88.7%)
DaFlowApache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.
Stars: ✭ 24 (-79.13%)
yildiz🦄🌟 Graph Database layer on top of Google Bigtable
Stars: ✭ 24 (-79.13%)
spark-rootApache Spark Data Source for ROOT File Format
Stars: ✭ 28 (-75.65%)
KoalasKoalas: pandas API on Apache Spark
Stars: ✭ 3,044 (+2546.96%)
nebulaA distributed, fast open-source graph database featuring horizontal scalability and high availability
Stars: ✭ 8,196 (+7026.96%)
ByteSlice"Byteslice: Pushing the envelop of main memory data processing with a new storage layout" (SIGMOD'15)
Stars: ✭ 24 (-79.13%)
HadoopDedup🍉基于Hadoop和HBase的大规模海量数据去重
Stars: ✭ 27 (-76.52%)
HyperspaceAn open source indexing subsystem that brings index-based query acceleration to Apache Spark™ and big data workloads.
Stars: ✭ 246 (+113.91%)
TrafodionApache Trafodion
Stars: ✭ 242 (+110.43%)
falconMirror of Apache Falcon
Stars: ✭ 95 (-17.39%)
Selinon An advanced distributed task flow management on top of Celery
Stars: ✭ 237 (+106.09%)
Books整理一些书籍 ,包含 C&C++ 、git 、Java、Keras 、Linux 、NLP 、Python 、Scala 、TensorFlow 、大数据 、推荐系统、数据库、数据挖掘 、机器学习 、深度学习 、算法等。
Stars: ✭ 222 (+93.04%)
big-data-liteSamples to the Oracle Big Data Lite VM
Stars: ✭ 41 (-64.35%)
spark-operatorOperator for managing the Spark clusters on Kubernetes and OpenShift.
Stars: ✭ 129 (+12.17%)
data-viz-utilsFunctions for easily making publication-quality figures with matplotlib.
Stars: ✭ 16 (-86.09%)
SGDLibraryMATLAB/Octave library for stochastic optimization algorithms: Version 1.0.20
Stars: ✭ 165 (+43.48%)
geosparkbring sf to spark in production
Stars: ✭ 53 (-53.91%)