A cross-platform (Windows, MAC, Linux) desktop application to view common bigdata binary format like Parquet, ORC, AVRO, etc. Support local file system, HDFS, AWS S3, Azure Blob Storage ,etc.

Stars: ✭ 86 (-40.69%)

Mutual labels: parquet

Bandar Log

Monitoring tool to measure flow throughput of data sources and processing components that are part of Data Ingestion and ETL pipelines.

Stars: ✭ 19 (-86.9%)

Mutual labels: big-data

Nsoup

NSoup is a .NET port of the jsoup (http://jsoup.org) HTML parser and sanitizer originally written in Java

Stars: ✭ 145 (+0%)

Mutual labels: dot-net

Datascience Ai Machinelearning Resources

Alex Castrounis' curated set of resources for artificial intelligence (AI), machine learning, data science, internet of things (IoT), and more.

Stars: ✭ 414 (+185.52%)

Mutual labels: big-data

OpenHSP

Hot Soup Processor (HSP3)

Stars: ✭ 120 (-17.24%)

Mutual labels: windows-desktop

Hadoop For Geoevent

ArcGIS GeoEvent Server sample Hadoop connector for storing GeoEvents in HDFS.

Stars: ✭ 5 (-96.55%)

Mutual labels: big-data

PysparkCheatsheet

PySpark Cheatsheet

Stars: ✭ 25 (-82.76%)

Mutual labels: apache-spark

Spark States

Custom state store providers for Apache Spark

Stars: ✭ 83 (-42.76%)

Mutual labels: apache-spark

couchdb-mango

Mirror of Apache CouchDB Mango

Stars: ✭ 34 (-76.55%)

Mutual labels: big-data

Sqoop

Mirror of Apache Sqoop

Stars: ✭ 817 (+463.45%)

Mutual labels: big-data

Cmak

CMAK is a tool for managing Apache Kafka clusters

Stars: ✭ 10,544 (+7171.72%)

Mutual labels: big-data

Spark Doc Zh

Apache Spark 官方文档中文版

Stars: ✭ 1,126 (+676.55%)

Mutual labels: big-data

Agile data code 2

Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition

Stars: ✭ 413 (+184.83%)

Mutual labels: apache-spark

Spark-for-data-engineers

Apache Spark for data engineers

Stars: ✭ 22 (-84.83%)

Mutual labels: apache-spark

Goodreads etl pipeline

An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.

Stars: ✭ 793 (+446.9%)

Mutual labels: apache-spark

parquet-usql

A custom extractor designed to read parquet for Azure Data Lake Analytics

Stars: ✭ 13 (-91.03%)

Mutual labels: parquet

Panoptes

A Global Scale Network Telemetry Ecosystem

Stars: ✭ 80 (-44.83%)

Mutual labels: big-data

spark

Apache Spark enhanced with native Kubernetes scheduler back-end: NOTE this repository is being ARCHIVED as all new development for the kubernetes scheduler back-end is now on https://github.com/apache/spark/

Stars: ✭ 609 (+320%)

Mutual labels: apache-spark

Aspia

Remote desktop and file transfer tool.

Stars: ✭ 784 (+440.69%)

Mutual labels: windows-desktop

Open Source Handbook

⭐️ Open source projects for all skill levels

Stars: ✭ 131 (-9.66%)

Mutual labels: big-data

classifai

🔥 One of the most comprehensive open-source data annotation platform.

Stars: ✭ 99 (-31.72%)

Mutual labels: big-data

Rakam Api

📈 Collect customer event data from your apps. (Note that this project only includes the API collector, not the visualization platform)

Stars: ✭ 772 (+432.41%)

Mutual labels: big-data

spark-utils

Basic framework utilities to quickly start writing production ready Apache Spark applications

Stars: ✭ 25 (-82.76%)

Mutual labels: apache-spark

Iotdb

Apache IoTDB

Stars: ✭ 1,221 (+742.07%)

Mutual labels: big-data

Opendata.cern.ch

Source code for the CERN Open Data portal

Stars: ✭ 411 (+183.45%)

Mutual labels: big-data

Spark Movie Lens

An on-line movie recommender using Spark, Python Flask, and the MovieLens dataset

Stars: ✭ 745 (+413.79%)

Mutual labels: big-data

OnlineStatsBase.jl

Base types for OnlineStats.

Stars: ✭ 26 (-82.07%)

Mutual labels: big-data

Asakusafw

Asakusa Framework

Stars: ✭ 114 (-21.38%)

Mutual labels: big-data

parquet-dotnet

🐬 Apache Parquet for modern .Net

Stars: ✭ 199 (+37.24%)

Mutual labels: apache-spark

Cython

The most widely used Python to C compiler

Stars: ✭ 6,588 (+4443.45%)

Mutual labels: big-data

Attic Predictionio Template Recommender

PredictionIO Recommendation Engine Template (Scala-based parallelized engine)

Stars: ✭ 78 (-46.21%)

Mutual labels: big-data

Couchdb Documentation

Apache CouchDB Documentation

Stars: ✭ 128 (-11.72%)

Mutual labels: big-data

Mysql perf analyzer

MySQL performance monitoring and analysis.

Stars: ✭ 1,423 (+881.38%)

Mutual labels: big-data

Warp

Convert and analyze large data sets at light speed, on Mac and iOS.

Stars: ✭ 62 (-57.24%)

Mutual labels: big-data

Devops Python Tools

80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Function, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.

Stars: ✭ 406 (+180%)

Mutual labels: parquet

MLBD

Materials for "Machine Learning on Big Data" course

Stars: ✭ 20 (-86.21%)

Mutual labels: big-data

Sakura

SAKURA Editor (Japanese text editor for MS Windows)

Stars: ✭ 689 (+375.17%)

Mutual labels: windows-desktop

Big-Data-Demo

基于Vue、three.js、echarts，数据可视化展示项目，包含三维模型导入交互、三维模型标注等功能

Stars: ✭ 146 (+0.69%)

Mutual labels: big-data

Belajarpython.com

Open Source Indonesian Python Programming Tutorial Site

Stars: ✭ 141 (-2.76%)

Mutual labels: big-data

Parquet.jl

Julia implementation of Parquet columnar file format reader

Stars: ✭ 93 (-35.86%)

Mutual labels: parquet

Fo Dicom

Fellow Oak DICOM for .NET, .NET Core, Universal Windows, Android, iOS, Mono and Unity

Stars: ✭ 674 (+364.83%)

Mutual labels: dot-net

meetups-archivos

Ppts, códigos y videos de las meetups, data science days, videollamadas y workshops. Data Science Research es una organización sin fines de lucro que busca difundir, descentralizar y difundir los conocimientos en Ciencia de Datos e Inteligencia Artificial en el Perú, dando oportunidades a nuevos talentos mediante MeetUps, Workshops y Semilleros …

Stars: ✭ 60 (-58.62%)

Mutual labels: big-data

Spark Website

Apache Spark Website

Stars: ✭ 75 (-48.28%)

Mutual labels: big-data

Cogcomp Nlp

CogComp's Natural Language Processing libraries and Demos:

Stars: ✭ 410 (+182.76%)

Mutual labels: big-data

Nabhash

An extremely fast Non-crypto-safe AES Based Hash algorithm for Big Data

Stars: ✭ 62 (-57.24%)

Mutual labels: big-data

Mockneat

MockNeat is a Java 8+ library that facilitates the generation of arbitrary data for your applications.

Stars: ✭ 410 (+182.76%)

Mutual labels: big-data

Decentralized Internet

A SDK/library for decentralized web and distributing computing projects

Stars: ✭ 406 (+180%)

Mutual labels: big-data

Prig

Prig is a lightweight framework for test indirections in .NET Framework.

Stars: ✭ 106 (-26.9%)

Mutual labels: dot-net

Petastorm

Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.

Stars: ✭ 1,108 (+664.14%)

Mutual labels: parquet

361-420 of 640 similar projects

first

‹

›