All Projects → microsoft → Mobius

microsoft / Mobius

Licence: mit
C# and F# language binding and extensions to Apache Spark

Programming Languages

csharp
926 projects
fsharp
127 projects

Projects that are alternatives of or similar to Mobius

Azure Event Hubs Spark
Enabling Continuous Data Processing with Apache Spark and Azure Event Hubs
Stars: ✭ 140 (-84.93%)
Mutual labels:  spark, bigdata, apache-spark, spark-streaming, streaming
Spark
.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.
Stars: ✭ 1,721 (+85.25%)
Mutual labels:  spark, bigdata, apache-spark, spark-streaming, streaming
Cdap
An open source framework for building data analytic applications.
Stars: ✭ 509 (-45.21%)
Mutual labels:  dataset, spark, spark-streaming, mapreduce
Data Accelerator
Data Accelerator for Apache Spark simplifies onboarding to Streaming of Big Data. It offers a rich, easy to use experience to help with creation, editing and management of Spark jobs on Azure HDInsights or Databricks while enabling the full power of the Spark engine.
Stars: ✭ 247 (-73.41%)
Mutual labels:  spark, apache-spark, spark-streaming, streaming
Spark States
Custom state store providers for Apache Spark
Stars: ✭ 83 (-91.07%)
Mutual labels:  spark, apache-spark, spark-streaming
Splash
Splash, a flexible Spark shuffle manager that supports user-defined storage backends for shuffle data storage and exchange
Stars: ✭ 105 (-88.7%)
Mutual labels:  spark, bigdata, apache-spark
Sparta
Real Time Analytics and Data Pipelines based on Spark Streaming
Stars: ✭ 513 (-44.78%)
Mutual labels:  spark, spark-streaming, streaming
Sparkrdma
RDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark
Stars: ✭ 215 (-76.86%)
Mutual labels:  spark, bigdata, apache-spark
Dpark
Python clone of Spark, a MapReduce alike framework in Python
Stars: ✭ 2,668 (+187.19%)
Mutual labels:  spark, bigdata, mapreduce
Streaming Readings
Streaming System 相关的论文读物
Stars: ✭ 554 (-40.37%)
Mutual labels:  apache-spark, spark-streaming, streaming
Spark With Python
Fundamentals of Spark with Python (using PySpark), code examples
Stars: ✭ 150 (-83.85%)
Mutual labels:  dataframe, spark, apache-spark
Bigdata Notebook
Stars: ✭ 100 (-89.24%)
Mutual labels:  spark, bigdata, streaming
Bigdata Notes
大数据入门指南 ⭐
Stars: ✭ 10,991 (+1083.1%)
Mutual labels:  spark, bigdata, mapreduce
Whylogs Java
Profile and monitor your ML data pipeline end-to-end
Stars: ✭ 164 (-82.35%)
Mutual labels:  dataset, spark, apache-spark
qs-hadoop
大数据生态圈学习
Stars: ✭ 18 (-98.06%)
Mutual labels:  bigdata, spark-streaming, mapreduce
Big Data Engineering Coursera Yandex
Big Data for Data Engineers Coursera Specialization from Yandex
Stars: ✭ 71 (-92.36%)
Mutual labels:  spark, bigdata, mapreduce
Bigdata Interview
🎯 🌟[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结
Stars: ✭ 857 (-7.75%)
Mutual labels:  spark, bigdata, mapreduce
Real Time Stream Processing Engine
This is an example of real time stream processing using Spark Streaming, Kafka & Elasticsearch.
Stars: ✭ 37 (-96.02%)
Mutual labels:  spark, apache-spark, spark-streaming
aut
The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Stars: ✭ 111 (-88.05%)
Mutual labels:  spark, apache-spark, dataframe
leaflet heatmap
简单的可视化湖州通话数据 假设数据量很大,没法用浏览器直接绘制热力图,把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后,再使用Apache Spark绘制热力图,然后用leafletjs加载OpenStreetMap图层和热力图图层,以达到良好的交互效果。现在使用Apache Spark实现绘制,可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法,并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .
Stars: ✭ 13 (-98.6%)
Mutual labels:  spark, apache-spark, bigdata

Mobius development is deprecated and has been superseded by a more recent version '.NET for Apache Spark' from Microsoft (Website | GitHub) that runs on Azure HDInsight Spark, Amazon EMR Spark, Azure & AWS Databricks.

Mobius logo

Mobius: C# API for Spark

Mobius provides C# language binding to Apache Spark enabling the implementation of Spark driver program and data processing operations in the languages supported in the .NET framework like C# or F#.

For example, the word count sample in Apache Spark can be implemented in C# as follows :

var lines = sparkContext.TextFile(@"hdfs://path/to/input.txt");  
var words = lines.FlatMap(s => s.Split(' '));
var wordCounts = words.Map(w => new Tuple<string, int>(w.Trim(), 1))  
                      .ReduceByKey((x, y) => x + y);  
var wordCountCollection = wordCounts.Collect();  
wordCounts.SaveAsTextFile(@"hdfs://path/to/wordcount.txt");  

A simple DataFrame application using TempTable may look like the following:

var reqDataFrame = sqlContext.TextFile(@"hdfs://path/to/requests.csv");
var metricDataFrame = sqlContext.TextFile(@"hdfs://path/to/metrics.csv");
reqDataFrame.RegisterTempTable("requests");
metricDataFrame.RegisterTempTable("metrics");
// C0 - guid in requests DataFrame, C3 - guid in metrics DataFrame  
var joinDataFrame = GetSqlContext().Sql(  
    "SELECT joinedtable.datacenter" +
         ", MAX(joinedtable.latency) maxlatency" +
         ", AVG(joinedtable.latency) avglatency " +
    "FROM (" +
       "SELECT a.C1 as datacenter, b.C6 as latency " +  
       "FROM requests a JOIN metrics b ON a.C0  = b.C3) joinedtable " +   
    "GROUP BY datacenter");
joinDataFrame.ShowSchema();
joinDataFrame.Show();

A simple DataFrame application using DataFrame DSL may look like the following:

// C0 - guid, C1 - datacenter
var reqDataFrame = sqlContext.TextFile(@"hdfs://path/to/requests.csv")  
                             .Select("C0", "C1");    
// C3 - guid, C6 - latency   
var metricDataFrame = sqlContext.TextFile(@"hdfs://path/to/metrics.csv", ",", false, true)
                                .Select("C3", "C6"); //override delimiter, hasHeader & inferSchema
var joinDataFrame = reqDataFrame.Join(metricDataFrame, reqDataFrame["C0"] == metricDataFrame["C3"])
                                .GroupBy("C1");
var maxLatencyByDcDataFrame = joinDataFrame.Agg(new Dictionary<string, string> { { "C6", "max" } });
maxLatencyByDcDataFrame.ShowSchema();
maxLatencyByDcDataFrame.Show();

A simple Spark Streaming application that processes messages from Kafka using C# may be implemented using the following code:

StreamingContext sparkStreamingContext = StreamingContext.GetOrCreate(checkpointPath, () =>
    {
      var ssc = new StreamingContext(sparkContext, slideDurationInMillis);
      ssc.Checkpoint(checkpointPath);
      var stream = KafkaUtils.CreateDirectStream(ssc, topicList, kafkaParams, perTopicPartitionKafkaOffsets);
      //message format: [timestamp],[loglevel],[logmessage]
      var countByLogLevelAndTime = stream
                                    .Map(kvp => Encoding.UTF8.GetString(kvp.Value))
                                    .Filter(line => line.Contains(","))
                                    .Map(line => line.Split(','))
                                    .Map(columns => new Tuple<string, int>(
                                                          string.Format("{0},{1}", columns[0], columns[1]), 1))
                                    .ReduceByKeyAndWindow((x, y) => x + y, (x, y) => x - y,
                                                          windowDurationInSecs, slideDurationInSecs, 3)
                                    .Map(logLevelCountPair => string.Format("{0},{1}",
                                                          logLevelCountPair.Key, logLevelCountPair.Value));
      countByLogLevelAndTime.ForeachRDD(countByLogLevel =>
      {
          foreach (var logCount in countByLogLevel.Collect())
              Console.WriteLine(logCount);
      });
      return ssc;
    });
sparkStreamingContext.Start();
sparkStreamingContext.AwaitTermination();

For more code samples, refer to Mobius\examples directory or Mobius\csharp\Samples directory.

API Documentation

Refer to Mobius C# API documentation for the list of Spark's data processing operations supported in Mobius.

API Usage

Mobius API usage samples are available at:

  • Examples folder which contains standalone C# and F# projects that can be used as templates to start developing Mobius applications

  • Samples project which uses a comprehensive set of Mobius APIs to implement samples that are also used for functional validation of APIs

  • Mobius performance test scenarios implemented in C# and Scala for side by side comparison of Spark driver code

Documents

Refer to the docs folder for design overview and other info on Mobius

Build Status

Ubuntu 14.04.3 LTS Windows Unit test coverage
Build status Build status codecov.io

Getting Started

Windows Linux
Build & run unit tests Build in Windows Build in Linux
Run samples (functional tests) in local mode Samples in Windows Samples in Linux
Run examples in local mode Examples in Windows Examples in Linux
Run Mobius app
Run Mobius Shell Not supported yet

Useful Links

Supported Spark Versions

Mobius is built and tested with Apache Spark 1.4.1, 1.5.2, 1.6.* and 2.0.

Releases

Mobius releases are available at https://github.com/Microsoft/Mobius/releases. References needed to build C# Spark driver applicaiton using Mobius are also available in NuGet

NuGet Badge

Refer to mobius-release-info.md for the details on versioning policy and the contents of the release.

License

License

Mobius is licensed under the MIT license. See LICENSE file for full license information.

Community

Issue Stats Issue Stats Join the chat at https://gitter.im/Microsoft/Mobius Twitter

Code of Conduct

This project has adopted the [email protected] with any additional questions or comments.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].