CamusMirror of Linkedin's Camus
Stars: ✭ 81 (-79.75%)
Repository个人学习知识库涉及到数据仓库建模、实时计算、大数据、Java、算法等。
Stars: ✭ 92 (-77%)
leaflet heatmap简单的可视化湖州通话数据 假设数据量很大,没法用浏览器直接绘制热力图,把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后,再使用Apache Spark绘制热力图,然后用leafletjs加载OpenStreetMap图层和热力图图层,以达到良好的交互效果。现在使用Apache Spark实现绘制,可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法,并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .
Stars: ✭ 13 (-96.75%)
Hdfs ShellHDFS Shell is a HDFS manipulation tool to work with functions integrated in Hadoop DFS
Stars: ✭ 117 (-70.75%)
Bigdata PlaygroundA complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL
Stars: ✭ 177 (-55.75%)
Kafka UiOpen-Source Web GUI for Apache Kafka Management
Stars: ✭ 230 (-42.5%)
Hadoop For GeoeventArcGIS GeoEvent Server sample Hadoop connector for storing GeoEvents in HDFS.
Stars: ✭ 5 (-98.75%)
God Of Bigdata专注大数据学习面试,大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive...
Stars: ✭ 6,008 (+1402%)
Bigdata Interview🎯 🌟[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结
Stars: ✭ 857 (+114.25%)
Kafka Connect JdbcKafka Connect connector for JDBC-compatible databases
Stars: ✭ 698 (+74.5%)
Spark With PythonFundamentals of Spark with Python (using PySpark), code examples
Stars: ✭ 150 (-62.5%)
bigdata-funA complete (distributed) BigData stack, running in containers
Stars: ✭ 14 (-96.5%)
Streamxkafka-connect-s3 : Ingest data from Kafka to Object Stores(s3)
Stars: ✭ 96 (-76%)
SpartaReal Time Analytics and Data Pipelines based on Spark Streaming
Stars: ✭ 513 (+28.25%)
Eel SdkBig Data Toolkit for the JVM
Stars: ✭ 140 (-65%)
Data AcceleratorData Accelerator for Apache Spark simplifies onboarding to Streaming of Big Data. It offers a rich, easy to use experience to help with creation, editing and management of Spark jobs on Azure HDInsights or Databricks while enabling the full power of the Spark engine.
Stars: ✭ 247 (-38.25%)
KafkactlCommand Line Tool for managing Apache Kafka
Stars: ✭ 177 (-55.75%)
Cp All In Onedocker-compose.yml files for cp-all-in-one , cp-all-in-one-community, cp-all-in-one-cloud
Stars: ✭ 239 (-40.25%)
Kafka Sprout🚀 Web GUI for Kafka Cluster Management
Stars: ✭ 388 (-3%)
kafkaESKAn event-driven monitoring tool that can consume messages from Apache Kafka clusters and display the aggregated data on a dashboard for analysis and maintenance.
Stars: ✭ 79 (-80.25%)
docker-hadoopDocker image for main Apache Hadoop components (Yarn/Hdfs)
Stars: ✭ 59 (-85.25%)
IgniteApache Ignite
Stars: ✭ 4,027 (+906.75%)
KopKafka-on-Pulsar - A protocol handler that brings native Kafka protocol to Apache Pulsar
Stars: ✭ 159 (-60.25%)
Azkarra Streams🚀 Azkarra is a lightweight java framework to make it easy to develop, deploy and manage cloud-native streaming microservices based on Apache Kafka Streams.
Stars: ✭ 146 (-63.5%)
terasliceScalable data processing pipelines in JavaScript
Stars: ✭ 48 (-88%)
skeinA tool and library for easily deploying applications on Apache YARN
Stars: ✭ 128 (-68%)
BigdlBuilding Large-Scale AI Applications for Distributed Big Data
Stars: ✭ 3,813 (+853.25%)
OryxOryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning
Stars: ✭ 1,785 (+346.25%)
datalake-etl-pipelineSimplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Stars: ✭ 39 (-90.25%)
iisInformation Inference Service of the OpenAIRE system
Stars: ✭ 16 (-96%)
rastercuberastercube is a python library for big data analysis of georeferenced time series data (e.g. MODIS NDVI)
Stars: ✭ 15 (-96.25%)
hive to es同步Hive数据仓库数据到Elasticsearch的小工具
Stars: ✭ 21 (-94.75%)
datasqueezeHadoop utility to compact small files
Stars: ✭ 18 (-95.5%)
waspWASP is a framework to build complex real time big data applications. It relies on a kind of Kappa/Lambda architecture mainly leveraging Kafka and Spark. If you need to ingest huge amount of heterogeneous data and analyze them through complex pipelines, this is the framework for you.
Stars: ✭ 19 (-95.25%)
aaocp一个对用户行为日志进行分析的大数据项目
Stars: ✭ 53 (-86.75%)
fsbrowserFast desktop client for Hadoop Distributed File System
Stars: ✭ 27 (-93.25%)
sparkucxA high-performance, scalable and efficient ShuffleManager plugin for Apache Spark, utilizing UCX communication layer
Stars: ✭ 32 (-92%)
clusterdockclusterdock is a framework for creating Docker-based container clusters
Stars: ✭ 26 (-93.5%)
big-data-liteSamples to the Oracle Big Data Lite VM
Stars: ✭ 41 (-89.75%)
HiveApache Hive
Stars: ✭ 4,031 (+907.75%)
fastdata-clusterFast Data Cluster (Apache Cassandra, Kafka, Spark, Flink, YARN and HDFS with Vagrant and VirtualBox)
Stars: ✭ 20 (-95%)
py-hdfs-mountMount HDFS with fuse, works with kerberos!
Stars: ✭ 13 (-96.75%)
big dataA collection of tutorials on Hadoop, MapReduce, Spark, Docker
Stars: ✭ 34 (-91.5%)
WedatasphereWeDataSphere is a financial level one-stop open-source suitcase for big data platforms. Currently the source code of Scriptis and Linkis has already been released to the open-source community. WeDataSphere, Big Data Made Easy!
Stars: ✭ 372 (-7%)
autThe Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Stars: ✭ 111 (-72.25%)
firehoseFirehose is an extensible, no-code, and cloud-native service to load real-time streaming data from Kafka to data stores, data lakes, and analytical storage systems.
Stars: ✭ 213 (-46.75%)