Winutilswinutils.exe hadoop.dll and hdfs.dll binaries for hadoop windows
Stars: ✭ 657 (-59.42%)
Spark Py NotebooksApache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 1,338 (-17.36%)
Flink ShadedApache Flink shaded artifacts repository
Stars: ✭ 67 (-95.86%)
Data Science CareerCareer Resources for Data Science, Machine Learning, Big Data and Business Analytics Career Repository
Stars: ✭ 630 (-61.09%)
TonyTonY is a framework to natively run deep learning frameworks on Apache Hadoop.
Stars: ✭ 626 (-61.33%)
MarmarayGeneric Data Ingestion & Dispersal Library for Hadoop
Stars: ✭ 414 (-74.43%)
Scala Db CodegenScala code/boilerplate generator from a db schema
Stars: ✭ 49 (-96.97%)
SrcA light-weight distributed stream computing framework for Golang
Stars: ✭ 67 (-95.86%)
SdcIntel® Scalable Dataframe Compiler for Pandas*
Stars: ✭ 623 (-61.52%)
MockneatMockNeat is a Java 8+ library that facilitates the generation of arbitrary data for your applications.
Stars: ✭ 410 (-74.68%)
TrckQuery engine for TrailDB
Stars: ✭ 48 (-97.04%)
Javapdf🍣100本 Java电子书 技术书籍PDF(以下载阅读为荣,以点赞收藏为耻)
Stars: ✭ 609 (-62.38%)
Php Thrift SqlA PHP library for connecting to Hive or Impala over Thrift
Stars: ✭ 107 (-93.39%)
TreevizTree diagrams with JavaScript 🌲 📈
Stars: ✭ 95 (-94.13%)
ScalikejdbcA tidy SQL-based DB access library for Scala developers. This library naturally wraps JDBC APIs and provides you easy-to-use APIs.
Stars: ✭ 1,139 (-29.65%)
Kafka Streamsequivalent to kafka-streams 🐙 for nodejs ✨🐢🚀✨
Stars: ✭ 613 (-62.14%)
LycheeThe most complete and powerful data-binding library and persistence infra for Kotlin 1.3, Android & Splitties Views DSL, JavaFX & TornadoFX, JSON, JDBC & SQLite, SharedPreferences.
Stars: ✭ 102 (-93.7%)
Node ParquetNodeJS module to access apache parquet format files
Stars: ✭ 46 (-97.16%)
Bigdata File ViewerA cross-platform (Windows, MAC, Linux) desktop application to view common bigdata binary format like Parquet, ORC, AVRO, etc. Support local file system, HDFS, AWS S3, Azure Blob Storage ,etc.
Stars: ✭ 86 (-94.69%)
RsparklingRSparkling: Use H2O Sparkling Water from R (Spark + R + Machine Learning)
Stars: ✭ 65 (-95.99%)
Dist KerasDistributed Deep Learning, with a focus on distributed training, using Keras and Apache Spark.
Stars: ✭ 613 (-62.14%)
SchemacrawlerFree database schema discovery and comprehension tool
Stars: ✭ 1,021 (-36.94%)
BitalarmAn app to keep track of different cryptocurrencies, written in dart + flutter
Stars: ✭ 94 (-94.19%)
JumbuneJumbune, an open source BigData APM & Data Quality Management Platform for Data Clouds. Enterprise feature offering is available at http://jumbune.com. More details of open source offering are at,
Stars: ✭ 64 (-96.05%)
OozieMirror of Apache Oozie
Stars: ✭ 602 (-62.82%)
MetorikkuA simplified, lightweight ETL Framework based on Apache Spark
Stars: ✭ 361 (-77.7%)
QuiltQuilt is a self-organizing data hub for S3
Stars: ✭ 1,007 (-37.8%)
BigtopMirror of Apache Bigtop
Stars: ✭ 356 (-78.01%)
Elasticsearch JdbcA elasticsearch specified SQL interface on Java, no need to tweak your es instance.
Stars: ✭ 41 (-97.47%)
Devops RoadmapDevOps methodology & roadmap for a devops developer in 2019. Interesting books to learn new technologies.
Stars: ✭ 349 (-78.44%)
EgadsA Java package to automatically detect anomalies in large scale time-series data
Stars: ✭ 997 (-38.42%)
ZeppelinWeb-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.
Stars: ✭ 5,513 (+240.52%)
Cloud VolumeRead and write Neuroglancer datasets programmatically.
Stars: ✭ 63 (-96.11%)
Hibernate SpringbootCollection of best practices for Java persistence performance in Spring Boot applications
Stars: ✭ 589 (-63.62%)
Esper TvEsper instance for TV news analysis
Stars: ✭ 37 (-97.71%)
Tennis Crystal BallUltimate Tennis Statistics and Tennis Crystal Ball - Tennis Big Data Analysis and Prediction
Stars: ✭ 107 (-93.39%)
JailerDatabase Subsetting and Relational Data Browsing Tool.
Stars: ✭ 576 (-64.42%)
AlluxioAlluxio, data orchestration for analytics and machine learning in the cloud
Stars: ✭ 5,379 (+232.24%)
CamusMirror of Linkedin's Camus
Stars: ✭ 81 (-95%)
DatafakerDatafaker is a large-scale test data and flow test data generation tool. Datafaker fakes data and inserts to varied data sources. 测试数据生成工具
Stars: ✭ 327 (-79.8%)
FastsqlDatabase rapid development framework for Java(数据库快速开发框架).
Stars: ✭ 100 (-93.82%)
Java Spring CloudDistributed tracing for Spring Boot, Cloud and other Spring projects
Stars: ✭ 326 (-79.86%)
MetricsMeasure behavior of Java applications
Stars: ✭ 35 (-97.84%)
PanoptesA Global Scale Network Telemetry Ecosystem
Stars: ✭ 80 (-95.06%)
GiraphMirror of Apache Giraph
Stars: ✭ 569 (-64.85%)
WarpConvert and analyze large data sets at light speed, on Mac and iOS.
Stars: ✭ 62 (-96.17%)
ScannerEfficient video analysis at scale
Stars: ✭ 569 (-64.85%)
Hadoop study定期更新Hadoop生态圈中常用大数据组件文档 重心依次为: Flink Solr Sparksql ES Scala Kafka Hbase/phoenix Redis Kerberos (项目包含hadoop思维导图 印象笔记 Scala版本简单demo 常用工具类 去敏后的train code 持续更新!!!)
Stars: ✭ 567 (-64.98%)
NabhashAn extremely fast Non-crypto-safe AES Based Hash algorithm for Big Data
Stars: ✭ 62 (-96.17%)
PachydermReproducible Data Science at Scale!
Stars: ✭ 5,305 (+227.67%)
NipypeWorkflows and interfaces for neuroimaging packages
Stars: ✭ 557 (-65.6%)