All Projects → mongodb-labs → big-data-exploration

mongodb-labs / big-data-exploration

Licence: other
[Archive] Intern project - Big Data Exploration using MongoDB - This Repository is NOT a supported MongoDB product

Programming Languages

javascript
184084 projects - #8 most used programming language
CSS
56736 projects
coffeescript
4710 projects
python
139335 projects - #7 most used programming language
java
68154 projects - #9 most used programming language
HTML
75241 projects

Projects that are alternatives of or similar to big-data-exploration

HDFS-Netdisc
基于Hadoop的分布式云存储系统 🌴
Stars: ✭ 56 (+30.23%)
Mutual labels:  hadoop
Data-pipeline-project
Data pipeline project
Stars: ✭ 18 (-58.14%)
Mutual labels:  hadoop
torchgeo
TorchGeo: datasets, samplers, transforms, and pre-trained models for geospatial data
Stars: ✭ 1,125 (+2516.28%)
Mutual labels:  datasets
biomechanics dataset
Information of public available data sets for biomechanics.
Stars: ✭ 31 (-27.91%)
Mutual labels:  datasets
the-apache-ignite-book
All code samples, scripts and more in-depth examples for The Apache Ignite Book. Include Apache Ignite 2.6 or above
Stars: ✭ 65 (+51.16%)
Mutual labels:  hadoop
Thirukkural-Tamil-Dataset
திருக்குறள் by திருவள்ளுவர்.
Stars: ✭ 44 (+2.33%)
Mutual labels:  datasets
CHR
SIXray : A Large-scale Security Inspection X-ray Benchmark in CVPR 2019
Stars: ✭ 78 (+81.4%)
Mutual labels:  datasets
rs datasets
Tool for autodownloading recommendation systems datasets
Stars: ✭ 22 (-48.84%)
Mutual labels:  datasets
Google-Playstore-Dataset
Google PlayStore App dataset. (2.3 million App Data) and 24 attributes
Stars: ✭ 27 (-37.21%)
Mutual labels:  datasets
datalake-etl-pipeline
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Stars: ✭ 39 (-9.3%)
Mutual labels:  hadoop
iis
Information Inference Service of the OpenAIRE system
Stars: ✭ 16 (-62.79%)
Mutual labels:  hadoop
hive-bigquery-storage-handler
Hive Storage Handler for interoperability between BigQuery and Apache Hive
Stars: ✭ 16 (-62.79%)
Mutual labels:  hadoop
qs-hadoop
大数据生态圈学习
Stars: ✭ 18 (-58.14%)
Mutual labels:  hadoop
hive to es
同步Hive数据仓库数据到Elasticsearch的小工具
Stars: ✭ 21 (-51.16%)
Mutual labels:  hadoop
LogAnalyzeHelper
论坛日志分析系统清洗程序(包含IP规则库,UDF开发,MapReduce程序,日志数据)
Stars: ✭ 33 (-23.26%)
Mutual labels:  hadoop
mlx
Machine Learning eXchange (MLX). Data and AI Assets Catalog and Execution Engine
Stars: ✭ 132 (+206.98%)
Mutual labels:  datasets
dockerfiles
Multi docker container images for main Big Data Tools. (Hadoop, Spark, Kafka, HBase, Cassandra, Zookeeper, Zeppelin, Drill, Flink, Hive, Hue, Mesos, ... )
Stars: ✭ 29 (-32.56%)
Mutual labels:  hadoop
open2ch-dialogue-corpus
おーぷん2ちゃんねるをクロールして作成した対話コーパス
Stars: ✭ 65 (+51.16%)
Mutual labels:  datasets
bugrepo
A collection of publicly available bug reports
Stars: ✭ 93 (+116.28%)
Mutual labels:  datasets
DiscEval
Discourse Based Evaluation of Language Understanding
Stars: ✭ 18 (-58.14%)
Mutual labels:  datasets

This Repository is NOT a supported MongoDB product

MongoDB Big-Data-Exploration Project

This project seeks to discover, investigate, and solve big data-set questions while utilizing MongoDB for storage and computations. This summer internship project also shows how to answer questions concerning big datasets stored in MongoDB using MongoDB's frameworks and connector. Both the MongoDB native aggregation framework and hadoop were utilized to explore the data.

The data for this project comes from two major sources:

Roadmap

This project can be divided into three sections, each with in-depth wiki pages describing our steps and observation:

  • Basic-Flights - Basic analysis on the Flights dataset using MongoDB Aggregation Framework
  • PageRank-Flights - Computing PageRank over the Flights dataset using the MongoDB MapReduce Framework
  • Twitter-Memes - Computing PageRank over the Twitter-Memes dataset using Hadoop and associated frameworks/languages (like Apache Pig, Amazon EMR)

Contributors

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].