AzuredatalakeSamples and Docs for Azure Data Lake Store and Analytics
Stars: β 128 (-92.86%)
LogislandScalable stream processing platform for advanced realtime analytics on top of Kafka and Spark. LogIsland also supports MQTT and Kafka Streams (Flink being in the roadmap). The platform does complex event processing and is suitable for time series analysis. A large set of valuable ready to use processors, data sources and sinks are available.
Stars: β 97 (-94.59%)
Haystackπ Haystack is an open source NLP framework that leverages Transformer models. It enables developers to implement production-ready neural search, question answering, semantic document search and summarization for a wide range of applications.
Stars: β 3,409 (+90.13%)
Spark Py NotebooksApache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Stars: β 1,338 (-25.38%)
Ndxπ Full text indexing and searching library
Stars: β 94 (-94.76%)
WikimanWikiman is an offline search engine for manual pages, Arch Wiki, Gentoo Wiki and other documentation.
Stars: β 117 (-93.47%)
FeastFeature Store for Machine Learning
Stars: β 2,576 (+43.67%)
Xinahn ClientδΈδΈͺεΌζΊοΌι«ιη§οΌθͺζΆθͺη¨ηθεζη΄’εΌζγhttps://xinahn.com
Stars: β 116 (-93.53%)
AcceleratorThe Accelerator is a tool for fast and reproducible processing of large amounts of data.
Stars: β 137 (-92.36%)
Parquet MrApache Parquet
Stars: β 1,278 (-28.72%)
Tinysearchπ Tiny, full-text search engine for static websites built with Rust and Wasm
Stars: β 1,705 (-4.91%)
ImageclassificationDeep Learning: Image classification, feature visualization and transfer learning with Keras
Stars: β 83 (-95.37%)
RichdemHigh-performance Terrain and Hydrology Analysis
Stars: β 127 (-92.92%)
Dig Etl EngineDownload DIG to run on your laptop or server.
Stars: β 81 (-95.48%)
Amazon S3 Find And ForgetAmazon S3 Find and Forget is a solution to handle data erasure requests from data lakes stored on Amazon S3, for example, pursuant to the European General Data Protection Regulation (GDPR)
Stars: β 115 (-93.59%)
PanoptesA Global Scale Network Telemetry Ecosystem
Stars: β 80 (-95.54%)
IotdbApache IoTDB
Stars: β 1,221 (-31.9%)
Just Dashboardπ π Dashboards using YAML or JSON files
Stars: β 1,511 (-15.73%)
SetlA simple Spark-powered ETL framework that just works πΊ
Stars: β 79 (-95.59%)
Mobydqπ³ Tool to automate data quality checks on data pipelines
Stars: β 123 (-93.14%)
AmbariMirror of Apache Ambari
Stars: β 1,576 (-12.1%)
Cosmos Searchπ± The next generation unbiased real-time privacy and user focused code search engine for everyone; Join us at https://discourse.opengenus.org/
Stars: β 137 (-92.36%)
GenieDistributed Big Data Orchestration Service
Stars: β 1,544 (-13.89%)
SearxPrivacy-respecting metasearch engine
Stars: β 10,074 (+461.85%)
100projectsofcodeA list of practical knowledge-building projects.
Stars: β 1,183 (-34.02%)
Search PluginsSearch plugins for the search feature
Stars: β 1,860 (+3.74%)
LabsResearch on distributed system
Stars: β 73 (-95.93%)
Instantsearch AndroidA library of widgets and helpers to build instant-search applications on Android.
Stars: β 129 (-92.81%)
BookkeeperApache Bookkeeper
Stars: β 1,178 (-34.3%)
Spark R Notebooks R on Apache Spark (SparkR) tutorials for Big Data analysis and Machine Learning as IPython / Jupyter notebooks
Stars: β 109 (-93.92%)
QuerqyQuery preprocessor for Java-based search engines (Querqy Core and Solr implementation)
Stars: β 122 (-93.2%)
AppdocsApplication Performance Optimization Summary
Stars: β 1,169 (-34.8%)
Countly Sdk CordovaCountly Product Analytics SDK for Cordova, Icenium and Phonegap
Stars: β 69 (-96.15%)
Rated Ranking EvaluatorSearch Quality Evaluation Tool for Apache Solr & Elasticsearch search-based infrastructures
Stars: β 134 (-92.53%)
CarbondataMirror of Apache CarbonData
Stars: β 1,158 (-35.42%)
Flink ShadedApache Flink shaded artifacts repository
Stars: β 67 (-96.26%)
Dato.rssThe best RSS Search experience you can find
Stars: β 122 (-93.2%)
MahaA framework for rapid reporting API development; with out of the box support for high cardinality dimension lookups with druid.
Stars: β 101 (-94.37%)
Cloud VolumeRead and write Neuroglancer datasets programmatically.
Stars: β 63 (-96.49%)
WarpConvert and analyze large data sets at light speed, on Mac and iOS.
Stars: β 62 (-96.54%)
VizukaExplore high-dimensional datasets and how your algo handles specific regions.
Stars: β 100 (-94.42%)
Graph samplingGraph Sampling is a python package containing various approaches which samples the original graph according to different sample sizes.
Stars: β 99 (-94.48%)
SimpleaudioindexerSearching for the occurrence seconds of words/phrases or arbitrary regex patterns within audio files
Stars: β 100 (-94.42%)
Datasetsπ 3,000,000+ Unsplash images made available for research and machine learning
Stars: β 1,805 (+0.67%)