cdp-servicecdp数据平台,帮助企业充分了解客户,实现千人千面的精准营销。
Stars: ✭ 30 (-99.76%)
Grouparoo🦘 The Grouparoo Monorepo - open source customer data sync framework
Stars: ✭ 334 (-97.33%)
GafferA large-scale entity and relation database supporting aggregation of properties
Stars: ✭ 1,642 (-86.87%)
TezApache Tez
Stars: ✭ 313 (-97.5%)
dxramA distributed in-memory key-value storage for billions of small objects.
Stars: ✭ 25 (-99.8%)
DeltaAn open-source storage layer that brings scalable, ACID transactions to Apache Spark™ and big data workloads.
Stars: ✭ 3,903 (-68.8%)
TajoMirror of Apache Tajo
Stars: ✭ 128 (-98.98%)
FluidFluid, elastic data abstraction and acceleration for BigData/AI applications in cloud
Stars: ✭ 265 (-97.88%)
sgdAn R package for large scale estimation with stochastic gradient descent
Stars: ✭ 55 (-99.56%)
MorpheusMorpheus brings the leading graph query language, Cypher, onto the leading distributed processing platform, Spark.
Stars: ✭ 303 (-97.58%)
FeastFeature Store for Machine Learning
Stars: ✭ 2,576 (-79.41%)
classifai🔥 One of the most comprehensive open-source data annotation platform.
Stars: ✭ 99 (-99.21%)
SmooksAn extensible Java framework for building XML and non-XML streaming applications
Stars: ✭ 293 (-97.66%)
RichdemHigh-performance Terrain and Hydrology Analysis
Stars: ✭ 127 (-98.98%)
FlinkApache Flink is an open source project of The Apache Software Foundation (ASF).
The Apache Flink project originated from the Stratosphere research project.
Stars: ✭ 17,781 (+42.13%)
ytprivYT metadata exporter
Stars: ✭ 28 (-99.78%)
TrinoOfficial repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
Stars: ✭ 4,581 (-63.38%)
img2datasetEasily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
Stars: ✭ 1,173 (-90.62%)
DatahubThe Metadata Platform for the Modern Data Stack
Stars: ✭ 4,232 (-66.17%)
MmlsparkSimple and Distributed Machine Learning
Stars: ✭ 2,899 (-76.83%)
scikit-learn-intelexIntel(R) Extension for Scikit-learn is a seamless way to speed up your Scikit-learn application
Stars: ✭ 887 (-92.91%)
bigstatsrR package for statistical tools with big matrices stored on disk.
Stars: ✭ 139 (-98.89%)
Hdfs ShellHDFS Shell is a HDFS manipulation tool to work with functions integrated in Hadoop DFS
Stars: ✭ 117 (-99.06%)
bigdata-funA complete (distributed) BigData stack, running in containers
Stars: ✭ 14 (-99.89%)
CmakCMAK is a tool for managing Apache Kafka clusters
Stars: ✭ 10,544 (-15.72%)
autThe Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Stars: ✭ 111 (-99.11%)
AsakusafwAsakusa Framework
Stars: ✭ 114 (-99.09%)
ibmpairsopen source tools for interaction with IBM PAIRS:
Stars: ✭ 23 (-99.82%)
GDLibraryMatlab library for gradient descent algorithms: Version 1.0.1
Stars: ✭ 50 (-99.6%)
spark-acidACID Data Source for Apache Spark based on Hive ACID
Stars: ✭ 91 (-99.27%)
Pythondatarepo for code published on pythondata.com
Stars: ✭ 113 (-99.1%)
SqoopMirror of Apache Sqoop
Stars: ✭ 817 (-93.47%)
bullet-coreBullet is a streaming query engine that can be plugged into any singular data stream using a Stream Processing framework like Apache Storm, Spark or Flink.
Stars: ✭ 36 (-99.71%)
vxqueryMirror of Apache VXQuery
Stars: ✭ 19 (-99.85%)
GenieDistributed Big Data Orchestration Service
Stars: ✭ 1,544 (-87.66%)
ByteSlice"Byteslice: Pushing the envelop of main memory data processing with a new storage layout" (SIGMOD'15)
Stars: ✭ 24 (-99.81%)
KeyviKeyvi - a key value index that powers Cliqz search engine. It is an in-memory FST-based data structure highly optimized for size and lookup performance.
Stars: ✭ 171 (-98.63%)
hotmapWebGL Heatmap Viewer for Big Data and Bioinformatics
Stars: ✭ 13 (-99.9%)
Spark R Notebooks R on Apache Spark (SparkR) tutorials for Big Data analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 109 (-99.13%)
egisEgis - a handy Ruby interface for AWS Athena
Stars: ✭ 38 (-99.7%)
incubator-tezMirror of Apache Tez (Incubating)
Stars: ✭ 60 (-99.52%)
big-sorterJava library that sorts very large files of records by splitting into smaller sorted files and merging
Stars: ✭ 49 (-99.61%)
Tennis Crystal BallUltimate Tennis Statistics and Tennis Crystal Ball - Tennis Big Data Analysis and Prediction
Stars: ✭ 107 (-99.14%)
big dataA collection of tutorials on Hadoop, MapReduce, Spark, Docker
Stars: ✭ 34 (-99.73%)
lcbo-apiA crawler and API server for Liquor Control Board of Ontario retail data
Stars: ✭ 152 (-98.78%)
MahaA framework for rapid reporting API development; with out of the box support for high cardinality dimension lookups with druid.
Stars: ✭ 101 (-99.19%)
clusterdockclusterdock is a framework for creating Docker-based container clusters
Stars: ✭ 26 (-99.79%)
opendcCollaborative Datacenter Simulation and Exploration for Everybody
Stars: ✭ 40 (-99.68%)
SynapseMLSimple and Distributed Machine Learning
Stars: ✭ 3,355 (-73.18%)
xcastA High-Performance Data Science Toolkit for the Earth Sciences
Stars: ✭ 28 (-99.78%)
beam-siteApache Beam Site
Stars: ✭ 28 (-99.78%)
pyspark-algorithmsPySpark Algorithms Book: https://www.amazon.com/dp/B07X4B2218/ref=sr_1_2
Stars: ✭ 72 (-99.42%)
TitanoboaTitanoboa makes complex workflows easy. It is a low-code workflow orchestration platform for JVM - distributed, highly scalable and fault tolerant.
Stars: ✭ 787 (-93.71%)