Isolation ForestA Spark/Scala implementation of the isolation forest unsupervised outlier detection algorithm.
Stars: ✭ 139 (+9.45%)
ArchivesparkAn Apache Spark framework for easy data processing, extraction as well as derivation for web archives and archival collections, developed at Internet Archive.
Stars: ✭ 111 (-12.6%)
LinkedrwA simple CLI to create your resume and personal website based on your LinkedIn profile or a JSON file
Stars: ✭ 104 (-18.11%)
SchemerSchema registry for CSV, TSV, JSON, AVRO and Parquet schema. Supports schema inference and GraphQL API.
Stars: ✭ 97 (-23.62%)
LogigskA Linux based software package to control led's on Logitech G910, G810, G610 and G410.
Stars: ✭ 107 (-15.75%)
Spring Shiro SparkSpring-Shiro-Spark是Spring-Boot Hibernate Spark Spark-SQL Shiro iView VueJs... ...的集成尝试
Stars: ✭ 114 (-10.24%)
Flink Learningflink learning blog. http://www.54tianzhisheng.cn/ 含 Flink 入门、概念、原理、实战、性能调优、源码解析等内容。涉及 Flink Connector、Metrics、Library、DataStream API、Table API & SQL 等内容的学习案例,还有 Flink 落地应用的大型项目案例(PVUV、日志存储、百亿数据实时去重、监控告警)分享。欢迎大家支持我的专栏《大数据实时计算引擎 Flink 实战与性能优化》
Stars: ✭ 11,378 (+8859.06%)
TruvisoryThis project is meant to provide resources to users who want to access good LinkedIn posts which contains resources to learn any Technology, Design, Self-Branding, Motivation etc. You can visit project by:
Stars: ✭ 116 (-8.66%)
SparktutorialSource code for James Lee's Aparch Spark with Java course
Stars: ✭ 105 (-17.32%)
ZparkioBoiler plate framework to use Spark and ZIO together.
Stars: ✭ 121 (-4.72%)
Spark FfmFFM (Field-Awared Factorization Machine) on Spark
Stars: ✭ 101 (-20.47%)
Xlearning Xdmlextremely distributed machine learning
Stars: ✭ 113 (-11.02%)
Avro2tfAvro2TF is designed to fill the gap of making users' training data ready to be consumed by deep learning training frameworks.
Stars: ✭ 125 (-1.57%)
Spark Py NotebooksApache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 1,338 (+953.54%)
Lambda ArchApplying Lambda Architecture with Spark, Kafka, and Cassandra.
Stars: ✭ 111 (-12.6%)
Kinesis SqlKinesis Connector for Structured Streaming
Stars: ✭ 120 (-5.51%)
Parquet IndexSpark SQL index for Parquet tables
Stars: ✭ 109 (-14.17%)
Ammonite SparkRun spark calculations from Ammonite
Stars: ✭ 88 (-30.71%)
HnswlibJava library for approximate nearest neighbors search using Hierarchical Navigable Small World graphs
Stars: ✭ 108 (-14.96%)
Cube.js📊 Cube — Open-Source Analytics API for Building Data Apps
Stars: ✭ 11,983 (+9335.43%)
Seldon ServerMachine Learning Platform and Recommendation Engine built on Kubernetes
Stars: ✭ 1,435 (+1029.92%)
DeequDeequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
Stars: ✭ 2,020 (+1490.55%)
Spark On K8s OperatorKubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.
Stars: ✭ 1,780 (+1301.57%)
Spark LucenerddSpark RDD with Lucene's query and entity linkage capabilities
Stars: ✭ 114 (-10.24%)
SplashSplash, a flexible Spark shuffle manager that supports user-defined storage backends for shuffle data storage and exchange
Stars: ✭ 105 (-17.32%)
Scala SamplesThere are pieces of scala code that explain Scala syntax and related things - like what you can do with all this
Stars: ✭ 125 (-1.57%)
AlmondA Scala kernel for Jupyter
Stars: ✭ 1,354 (+966.14%)
Python BigdataData science and Big Data with Python
Stars: ✭ 112 (-11.81%)
LogislandScalable stream processing platform for advanced realtime analytics on top of Kafka and Spark. LogIsland also supports MQTT and Kafka Streams (Flink being in the roadmap). The platform does complex event processing and is suitable for time series analysis. A large set of valuable ready to use processors, data sources and sinks are available.
Stars: ✭ 97 (-23.62%)
HadoopcryptoledgerHadoop Crypto Ledger - Analyzing CryptoLedgers, such as Bitcoin Blockchain, on Big Data platforms, such as Hadoop/Spark/Flink/Hive
Stars: ✭ 126 (-0.79%)
ElephasDistributed Deep learning with Keras & Spark
Stars: ✭ 1,521 (+1097.64%)
Repository个人学习知识库涉及到数据仓库建模、实时计算、大数据、Java、算法等。
Stars: ✭ 92 (-27.56%)
TeddySpark Streaming监控平台,支持任务部署与告警、自启动
Stars: ✭ 120 (-5.51%)
Big Data🔧 Use dplyr to analyze Big Data 🐘
Stars: ✭ 93 (-26.77%)
WaterdropProduction Ready Data Integration Product, documentation:
Stars: ✭ 1,856 (+1361.42%)
Daily Coding ProblemSeries of the problem 💯 and solution ✅ asked by Daily Coding problem👨🎓 website.
Stars: ✭ 90 (-29.13%)
Spark Infotheoretic Feature SelectionThis package contains a generic implementation of greedy Information Theoretic Feature Selection (FS) methods. The implementation is based on the common theoretic framework presented by Gavin Brown. Implementations of mRMR, InfoGain, JMI and other commonly used FS filters are provided.
Stars: ✭ 123 (-3.15%)
Linkedin Api Php ClientLinkedIn API PHP SDK with OAuth 2 support. Can be used for social sign in or sharing on LinkedIn. Has a good usage examples
Stars: ✭ 88 (-30.71%)
BigdataclassTwo-day workshop that covers how to use R to interact databases and Spark
Stars: ✭ 110 (-13.39%)
Spark Nlp ModelsModels and Pipelines for the Spark NLP library
Stars: ✭ 88 (-30.71%)
ElassandraElassandra = Elasticsearch + Apache Cassandra
Stars: ✭ 1,610 (+1167.72%)
Dex Test ParserFind all test methods in an Android instrumentation APK
Stars: ✭ 87 (-31.5%)
Cape PythonCollaborate on privacy-preserving policy for data science projects in Pandas and Apache Spark
Stars: ✭ 125 (-1.57%)
Spark Bigquery ConnectorBigQuery data source for Apache Spark: Read data from BigQuery into DataFrames, write DataFrames into BigQuery tables.
Stars: ✭ 126 (-0.79%)
Spark AlchemyCollection of open-source Spark tools & frameworks that have made the data engineering and data science teams at Swoop highly productive
Stars: ✭ 122 (-3.94%)
IbisA pandas-like deferred expression system, with first-class SQL support
Stars: ✭ 1,630 (+1183.46%)