HadoopDedup🍉基于Hadoop和HBase的大规模海量数据去重
Stars: ✭ 27 (-82.35%)
merkle-dbHigh-scalability analytics database built on immutable merkle-trees
Stars: ✭ 44 (-71.24%)
datalake-etl-pipelineSimplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Stars: ✭ 39 (-74.51%)
FIW KRTFamilies In the WIld: A Kinship Recogntion Toolbox.
Stars: ✭ 18 (-88.24%)
cdp-servicecdp数据平台,帮助企业充分了解客户,实现千人千面的精准营销。
Stars: ✭ 30 (-80.39%)
RemoteShuffleServiceCeleborn provides an elastic and high-performance service for shuffle and spilled data.
Stars: ✭ 262 (+71.24%)
dislibThe Distributed Computing library for python implemented using PyCOMPSs programming model for HPC.
Stars: ✭ 39 (-74.51%)
lubeckHigh level linear algebra library for Dlang
Stars: ✭ 57 (-62.75%)
iisInformation Inference Service of the OpenAIRE system
Stars: ✭ 16 (-89.54%)
mmtf-sparkMethods for the parallel and distributed analysis and mining of the Protein Data Bank using MMTF and Apache Spark.
Stars: ✭ 20 (-86.93%)
nebulaA distributed, fast open-source graph database featuring horizontal scalability and high availability
Stars: ✭ 8,196 (+5256.86%)
shiftingA privacy-focused list of alternatives to mainstream services to help the competition.
Stars: ✭ 31 (-79.74%)
scarfToolkit for highly memory efficient analysis of single-cell RNA-Seq, scATAC-Seq and CITE-Seq data. Analyze atlas scale datasets with millions of cells on laptop.
Stars: ✭ 54 (-64.71%)
CS Book🔥 Latest computer science e-books。提供最新技术类电子书下载, “我无非就是想卷死各位,或者被各位卷死!”
Stars: ✭ 40 (-73.86%)
metriqlThe metrics layer for your data. Join us at https://metriql.com/slack
Stars: ✭ 227 (+48.37%)
automile-phpAutomile offers a simple, smart, cutting-edge telematics solution for businesses to track and manage their business vehicles.
Stars: ✭ 28 (-81.7%)
Quantitative-Big-Imaging-2018(Latest semester at https://github.com/kmader/Quantitative-Big-Imaging-2019) The material for the Quantitative Big Imaging course at ETHZ for the Spring Semester 2018
Stars: ✭ 50 (-67.32%)
ngmswissgeol.ch gives you insight in geoscientific data - above and below the surface.
Stars: ✭ 23 (-84.97%)
big-data-upfRECSM-UPF Summer School: Social Media and Big Data Research
Stars: ✭ 21 (-86.27%)
ytprivYT metadata exporter
Stars: ✭ 28 (-81.7%)
dxramA distributed in-memory key-value storage for billions of small objects.
Stars: ✭ 25 (-83.66%)
spark-recordsBulletproof Apache Spark jobs with fast root cause analysis of failures.
Stars: ✭ 67 (-56.21%)
img2datasetEasily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
Stars: ✭ 1,173 (+666.67%)
yildiz🦄🌟 Graph Database layer on top of Google Bigtable
Stars: ✭ 24 (-84.31%)
data-viz-utilsFunctions for easily making publication-quality figures with matplotlib.
Stars: ✭ 16 (-89.54%)
GDLibraryMatlab library for gradient descent algorithms: Version 1.0.1
Stars: ✭ 50 (-67.32%)
pyspark-algorithmsPySpark Algorithms Book: https://www.amazon.com/dp/B07X4B2218/ref=sr_1_2
Stars: ✭ 72 (-52.94%)
lidboxEnd-to-end spoken language identification out of the box.
Stars: ✭ 39 (-74.51%)
lcbo-apiA crawler and API server for Liquor Control Board of Ontario retail data
Stars: ✭ 152 (-0.65%)
awesome-toolscurated list of awesome tools and libraries for specific domains
Stars: ✭ 31 (-79.74%)
gan deeplearning4jAutomatic feature engineering using Generative Adversarial Networks using Deeplearning4j and Apache Spark.
Stars: ✭ 19 (-87.58%)
leetspeekOpen and collaborative content from leet hackers!
Stars: ✭ 11 (-92.81%)
IoT-system-PLC-data-to-InfluxDBThis project aim is to provide free software to fetch data from plcs (Siemens S7-300/400/1200/1500) and store it. Used stack is completly opensource. I used InfluDB as data storage, so application principle is following Big Data paradigm.
Stars: ✭ 26 (-83.01%)
awesome-coder-resources编程路上加油站!------【持续更新中...欢迎star,欢迎常回来看看......】【内容:编程/学习/阅读资源,开源项目,面试题,网站,书,博客,教程等等】
Stars: ✭ 54 (-64.71%)
FlameStreamDistributed stream processing model and its implementation
Stars: ✭ 14 (-90.85%)
couchdb-pkgApache CouchDB Packaging support files
Stars: ✭ 24 (-84.31%)
SWELFSimple Windows Event Log Forwarder (SWELF). Its easy to use/simply works Log Forwarder and EVTX Parser. Almost in full release here at https://github.com/ceramicskate0/SWELF/releases/latest.
Stars: ✭ 23 (-84.97%)
sgdAn R package for large scale estimation with stochastic gradient descent
Stars: ✭ 55 (-64.05%)
nifiDeploy a secured, clustered, auto-scaling NiFi service in AWS.
Stars: ✭ 37 (-75.82%)
rastercuberastercube is a python library for big data analysis of georeferenced time series data (e.g. MODIS NDVI)
Stars: ✭ 15 (-90.2%)
beam-siteApache Beam Site
Stars: ✭ 28 (-81.7%)
spark-rootApache Spark Data Source for ROOT File Format
Stars: ✭ 28 (-81.7%)
automile-netAutomile offers a simple, smart, cutting-edge telematics solution for businesses to track and manage their business vehicles.
Stars: ✭ 24 (-84.31%)