Flink Boot懒松鼠Flink-Boot 脚手架让Flink全面拥抱Spring生态体系,使得开发者可以以Java WEB开发模式开发出分布式运行的流处理程序,懒松鼠让跨界变得更加简单。懒松鼠旨在让开发者以更底上手成本(不需要理解分布式计算的理论知识和Flink框架的细节)便可以快速编写业务代码实现。为了进一步提升开发者使用懒松鼠脚手架开发大型项目的敏捷的度,该脚手架默认集成Spring框架进行Bean管理,同时将微服务以及WEB开发领域中经常用到的框架集成进来,进一步提升开发速度。比如集成Mybatis ORM框架,Hibernate Validator校验框架,Spring Retry重试框架等,具体见下面的脚手架特性。
Stars: ✭ 209 (+464.86%)
FlinkxBased on Apache Flink. support data synchronization/integration and streaming SQL computation.
Stars: ✭ 2,651 (+7064.86%)
learning-sparkTidy up Spark and Hadoop tutorials.
Stars: ✭ 28 (-24.32%)
ros hadoopHadoop splittable InputFormat for ROS. Process rosbag with Hadoop Spark and other HDFS compatible systems.
Stars: ✭ 92 (+148.65%)
flokkrDocumentation placeholder and utilities for all the other containers.
Stars: ✭ 30 (-18.92%)
Hdfs ShellHDFS Shell is a HDFS manipulation tool to work with functions integrated in Hadoop DFS
Stars: ✭ 117 (+216.22%)
DynamometerA tool for scale and performance testing of HDFS with a specific focus on the NameNode.
Stars: ✭ 122 (+229.73%)
Big WhaleSpark、Flink等离线任务的调度以及实时任务的监控
Stars: ✭ 163 (+340.54%)
HiveApache Hive
Stars: ✭ 4,031 (+10794.59%)
Awesome Learning实践源码库:https://github.com/jast90/bigdata 。 微信搜索Jast关注公众号,获取最新技术分享😯。
Stars: ✭ 197 (+432.43%)
ShifuAn end-to-end machine learning and data mining framework on Hadoop
Stars: ✭ 207 (+459.46%)
Hive Jdbc Uber JarHive JDBC "uber" or "standalone" jar based on the latest Apache Hive version
Stars: ✭ 188 (+408.11%)
Hops ExamplesExamples for Deep Learning/Feature Store/Spark/Flink/Hive/Kafka jobs and Jupyter notebooks on Hops
Stars: ✭ 84 (+127.03%)
TrinoOfficial repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
Stars: ✭ 4,581 (+12281.08%)
Data Science Ipython NotebooksData science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
Stars: ✭ 22,048 (+59489.19%)
BehemothBehemoth is an open source platform for large scale document analysis based on Apache Hadoop.
Stars: ✭ 286 (+672.97%)
Jsr203 HadoopA Java NIO file system provider for HDFS
Stars: ✭ 35 (-5.41%)
Data Algorithms Book MapReduce, Spark, Java, and Scala for Data Algorithms Book
Stars: ✭ 949 (+2464.86%)
logparserEasy parsing of Apache HTTPD and NGINX access logs with Java, Hadoop, Hive, Pig, Flink, Beam, Storm, Drill, ...
Stars: ✭ 139 (+275.68%)
DrillApache Drill is a distributed MPP query layer for self describing data
Stars: ✭ 1,619 (+4275.68%)
WaterdropProduction Ready Data Integration Product, documentation:
Stars: ✭ 1,856 (+4916.22%)
SparkrdmaRDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark
Stars: ✭ 215 (+481.08%)
AddaxAddax is an open source universal ETL tool that supports most of those RDBMS and NoSQLs on the planet, helping you transfer data from any one place to another.
Stars: ✭ 615 (+1562.16%)
Spark With PythonFundamentals of Spark with Python (using PySpark), code examples
Stars: ✭ 150 (+305.41%)
IbisA pandas-like deferred expression system, with first-class SQL support
Stars: ✭ 1,630 (+4305.41%)
WedatasphereWeDataSphere is a financial level one-stop open-source suitcase for big data platforms. Currently the source code of Scriptis and Linkis has already been released to the open-source community. WeDataSphere, Big Data Made Easy!
Stars: ✭ 372 (+905.41%)
AsakusafwAsakusa Framework
Stars: ✭ 114 (+208.11%)
DatafakerDatafaker is a large-scale test data and flow test data generation tool. Datafaker fakes data and inserts to varied data sources. 测试数据生成工具
Stars: ✭ 327 (+783.78%)
smart-data-lakeSmart Automation Tool for building modern Data Lakes and Data Pipelines
Stars: ✭ 79 (+113.51%)
Hadoop Attack LibraryA collection of pentest tools and resources targeting Hadoop environments
Stars: ✭ 228 (+516.22%)
hive-jdbc-driverAn alternative to the "hive standalone" jar for connecting Java applications to Apache Hive via JDBC
Stars: ✭ 31 (-16.22%)
PrestoThe official home of the Presto distributed SQL query engine for big data
Stars: ✭ 12,957 (+34918.92%)
swordfishOpen-source distribute workflow schedule tools, also support streaming task.
Stars: ✭ 35 (-5.41%)
Movie recommend基于Spark的电影推荐系统,包含爬虫项目、web网站、后台管理系统以及spark推荐系统
Stars: ✭ 2,092 (+5554.05%)
liquibase-impalaLiquibase extension to add Impala Database support
Stars: ✭ 23 (-37.84%)
hadoop-etl-udfsThe Hadoop ETL UDFs are the main way to load data from Hadoop into EXASOL
Stars: ✭ 17 (-54.05%)
DaFlowApache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.
Stars: ✭ 24 (-35.14%)
TILToday I Learned
Stars: ✭ 43 (+16.22%)
cobra-policytoolManage Apache Atlas and Ranger configuration for your Hadoop environment.
Stars: ✭ 16 (-56.76%)
litemall-dw基于开源Litemall电商项目的大数据项目,包含前端埋点(openresty+lua)、后端埋点;数据仓库(五层)、实时计算和用户画像。大数据平台采用CDH6.3.2(已使用vagrant+ansible脚本化),同时也包含了Azkaban的workflow。
Stars: ✭ 36 (-2.7%)
xxhadoopData Analysis Using Hadoop/Spark/Storm/ElasticSearch/MachineLearning etc. This is My Daily Notes/Code/Demo. Don't fork, Just star !
Stars: ✭ 37 (+0%)
cloud云计算之hadoop、hive、hue、oozie、sqoop、hbase、zookeeper环境搭建及配置文件
Stars: ✭ 48 (+29.73%)
TitanDataOperationSystem最好的大数据项目。《Titan数据运营系统》,本项目是一个全栈闭环系统,我们有用作数据可视化的web系统,然后用flume-kafaka-flume进行日志的读取,在hive设计数仓,编写spark代码进行数仓表之间的转化以及ads层表到mysql的迁移,使用azkaban进行定时任务的调度,使用技术:Java/Scala语言,Hadoop、Spark、Hive、Kafka、Flume、Azkaban、SpringBoot,Bootstrap, Echart等;
Stars: ✭ 62 (+67.57%)
Bigdata File ViewerA cross-platform (Windows, MAC, Linux) desktop application to view common bigdata binary format like Parquet, ORC, AVRO, etc. Support local file system, HDFS, AWS S3, Azure Blob Storage ,etc.
Stars: ✭ 86 (+132.43%)
ETL-Starter-Kit📁 Extract, Transform, Load (ETL) 👷 refers to a process in database usage and especially in data warehousing. This repository contains a starter kit featuring ETL related work.
Stars: ✭ 21 (-43.24%)
Eel SdkBig Data Toolkit for the JVM
Stars: ✭ 140 (+278.38%)
QuicksqlA Flexible, Fast, Federated(3F) SQL Analysis Middleware for Multiple Data Sources
Stars: ✭ 1,821 (+4821.62%)