Repo 2019BERT, AWS RDS, AWS Forecast, EMR Spark Cluster, Hive, Serverless, Google Assistant + Raspberry Pi, Infrared, Google Cloud Platform Natural Language, Anomaly detection, Tensorflow, Mathematics
Stars: ✭ 133 (-88%)
ButterfreeA tool for building feature stores.
Stars: ✭ 126 (-88.63%)
Pyspark Cheatsheet🐍 Quick reference guide to common patterns & functions in PySpark.
Stars: ✭ 108 (-90.25%)
HnswlibJava library for approximate nearest neighbors search using Hierarchical Navigable Small World graphs
Stars: ✭ 108 (-90.25%)
Pyspark StubsApache (Py)Spark type annotations (stub files).
Stars: ✭ 98 (-91.16%)
Spark Py NotebooksApache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 1,338 (+20.76%)
Bitcoin Value Predictor[NOT MAINTAINED] Predicting Bit coin price using Time series analysis and sentiment analysis of tweets on bitcoin
Stars: ✭ 91 (-91.79%)
W2vWord2Vec models with Twitter data using Spark. Blog:
Stars: ✭ 64 (-94.22%)
Pysparkgeoanalysis🌐 Interactive Workshop on GeoAnalysis using PySpark
Stars: ✭ 63 (-94.31%)
Vscode Data PreviewData Preview 🈸 extension for importing 📤 viewing 🔎 slicing 🔪 dicing 🎲 charting 📊 & exporting 📥 large JSON array/config, YAML, Apache Arrow, Avro, Parquet & Excel data files
Stars: ✭ 245 (-77.89%)
Awkward 0.xManipulate arrays of complex data structures as easily as Numpy.
Stars: ✭ 216 (-80.51%)
Parquetjsfully asynchronous, pure JavaScript implementation of the Parquet file format
Stars: ✭ 200 (-81.95%)
Bigdata PlaygroundA complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL
Stars: ✭ 177 (-84.03%)
ParquetviewerSimple windows desktop application for viewing & querying Apache Parquet files
Stars: ✭ 145 (-86.91%)
Parquet RsApache Parquet implementation in Rust
Stars: ✭ 144 (-87%)
KartothekA consistent table management library in python
Stars: ✭ 144 (-87%)
Eel SdkBig Data Toolkit for the JVM
Stars: ✭ 140 (-87.36%)
GafferA large-scale entity and relation database supporting aggregation of properties
Stars: ✭ 1,642 (+48.19%)
Parquet4sRead and write Parquet in Scala. Use Scala classes as schema. No need to start a cluster.
Stars: ✭ 125 (-88.72%)
Amazon S3 Find And ForgetAmazon S3 Find and Forget is a solution to handle data erasure requests from data lakes stored on Amazon S3, for example, pursuant to the European General Data Protection Regulation (GDPR)
Stars: ✭ 115 (-89.62%)
Parquet GoGo package to read and write parquet files. parquet is a file format to store nested data structures in a flat columnar data format. It can be used in the Hadoop ecosystem and with tools such as Presto and AWS Athena.
Stars: ✭ 114 (-89.71%)
Parquet IndexSpark SQL index for Parquet tables
Stars: ✭ 109 (-90.16%)
KglabGraph-Based Data Science: an abstraction layer in Python for building knowledge graphs, integrated with popular graph libraries – atop Pandas, RDFlib, pySHACL, RAPIDS, NetworkX, iGraph, PyVis, pslpython, pyarrow, etc.
Stars: ✭ 98 (-91.16%)
SchemerSchema registry for CSV, TSV, JSON, AVRO and Parquet schema. Supports schema inference and GraphQL API.
Stars: ✭ 97 (-91.25%)
Bigdata File ViewerA cross-platform (Windows, MAC, Linux) desktop application to view common bigdata binary format like Parquet, ORC, AVRO, etc. Support local file system, HDFS, AWS S3, Azure Blob Storage ,etc.
Stars: ✭ 86 (-92.24%)
DrillApache Drill is a distributed MPP query layer for self describing data
Stars: ✭ 1,619 (+46.12%)