Bigdata Interview🎯 🌟[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结
Stars: ✭ 857 (+82.73%)
Aws Auto Terminate Idle EmrAWS Auto Terminate Idle AWS EMR Clusters Framework is an AWS based solution using AWS CloudWatch and AWS Lambda using a Python script that is using Boto3 to terminate AWS EMR clusters that have been idle for a specified period of time.
Stars: ✭ 21 (-95.52%)
Aws Etl OrchestratorA serverless architecture for orchestrating ETL jobs in arbitrarily-complex workflows using AWS Step Functions and AWS Lambda.
Stars: ✭ 245 (-47.76%)
MobiusC# and F# language binding and extensions to Apache Spark
Stars: ✭ 929 (+98.08%)
DparkPython clone of Spark, a MapReduce alike framework in Python
Stars: ✭ 2,668 (+468.87%)
big dataA collection of tutorials on Hadoop, MapReduce, Spark, Docker
Stars: ✭ 34 (-92.75%)
zdh web大数据采集,抽取平台
Stars: ✭ 292 (-37.74%)
PantherDetect threats with log data and improve cloud security posture
Stars: ✭ 885 (+88.7%)
VaexOut-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualize and explore big tabular data at a billion rows per second 🚀
Stars: ✭ 6,793 (+1348.4%)
gan deeplearning4jAutomatic feature engineering using Generative Adversarial Networks using Deeplearning4j and Apache Spark.
Stars: ✭ 19 (-95.95%)
lectures-hse-sparkМасштабируемое машинное обучение и анализ больших данных с Apache Spark
Stars: ✭ 20 (-95.74%)
ArvadosAn open source platform for managing and analyzing biomedical big data
Stars: ✭ 274 (-41.58%)
DotnextNext generation API for .NET
Stars: ✭ 379 (-19.19%)
OpenpbsAn HPC workload manager and job scheduler for desktops, clusters, and clouds.
Stars: ✭ 427 (-8.96%)
AbcPower of appbase.io via CLI, with nifty imports from your favorite data sources
Stars: ✭ 375 (-20.04%)
TensorflowonsparkTensorFlowOnSpark brings TensorFlow programs to Apache Spark clusters.
Stars: ✭ 3,748 (+699.15%)
OnepanelThe open and extensible integrated development environment (IDE) for computer vision with built-in modules for model building, automated labeling, data processing, model training, hyperparameter tuning and workflow orchestration.
Stars: ✭ 428 (-8.74%)
WedatasphereWeDataSphere is a financial level one-stop open-source suitcase for big data platforms. Currently the source code of Scriptis and Linkis has already been released to the open-source community. WeDataSphere, Big Data Made Easy!
Stars: ✭ 372 (-20.68%)
ArticlesA repository for the source code, notebooks, data, files, and other assets used in the data science and machine learning articles on LearnDataSci
Stars: ✭ 350 (-25.37%)
SynA global Process Registry and Process Group manager for Erlang and Elixir.
Stars: ✭ 412 (-12.15%)
NfxC# Server UNISTACK framework [MOVED]
Stars: ✭ 379 (-19.19%)
Circosjsd3 library to build circular graphs
Stars: ✭ 436 (-7.04%)
SwarmletA self-hosted, open-source Platform as a Service that enables easy swarm deployments, load balancing, automatic SSL, metrics, analytics and more.
Stars: ✭ 373 (-20.47%)
PglogicalLogical Replication extension for PostgreSQL 13, 12, 11, 10, 9.6, 9.5, 9.4 (Postgres), providing much faster replication than Slony, Bucardo or Londiste, as well as cross-version upgrades.
Stars: ✭ 455 (-2.99%)
ChoetlETL Framework for .NET / c# (Parser / Writer for CSV, Flat, Xml, JSON, Key-Value, Parquet, Yaml, Avro formatted files)
Stars: ✭ 372 (-20.68%)
MinimesosThe experimentation and testing tool for Apache Mesos - NO LONGER MAINTANED!
Stars: ✭ 429 (-8.53%)
AistoreAIStore: scalable storage for AI applications
Stars: ✭ 367 (-21.75%)
EtlalchemyExtract, Transform, Load: Any SQL Database in 4 lines of Code.
Stars: ✭ 460 (-1.92%)
SidekickHigh Performance HTTP Sidecar Load Balancer
Stars: ✭ 366 (-21.96%)
CortxCORTX Community Object Storage is 100% open source object storage uniquely optimized for mass capacity storage devices.
Stars: ✭ 426 (-9.17%)
MetorikkuA simplified, lightweight ETL Framework based on Apache Spark
Stars: ✭ 361 (-23.03%)
DiplomatA HTTP Ruby API for Consul
Stars: ✭ 358 (-23.67%)
VictoriametricsVictoriaMetrics: fast, cost-effective monitoring solution and time series database
Stars: ✭ 5,558 (+1085.07%)
JigsawJigsaw七巧板 provides a set of web components based on Angular5/8/9+. The main purpose of Jigsaw is to help the application developers to construct complex & intensive interacting & user friendly web pages. Jigsaw is supporting the development of all applications of Big Data Product of ZTE.
Stars: ✭ 354 (-24.52%)
NodejsstarterkitStarter Kit for Node.js v14.x, minimum dependencies 🚀
Stars: ✭ 348 (-25.8%)
SmartcodeSmartCode = IDataSource -> IBuildTask -> IOutput => Build Everything!!!
Stars: ✭ 464 (-1.07%)
SmudgeA lightweight library that provides group member discovery, status dissemination, and failure detection using the SWIM epidemic protocol.
Stars: ✭ 458 (-2.35%)
MinikubeRun Kubernetes locally
Stars: ✭ 22,673 (+4734.33%)
Actionaicustom human activity recognition modules by pose estimation and cascaded inference using sklearn API
Stars: ✭ 404 (-13.86%)
DatawaveDataWave is an ingest/query framework that leverages Apache Accumulo to provide fast, secure data access.
Stars: ✭ 347 (-26.01%)
DataformDataform is a framework for managing SQL based data operations in BigQuery, Snowflake, and Redshift
Stars: ✭ 342 (-27.08%)
K8s Multicluster Ingresskubemci: Command line tool to configure L7 load balancers using multiple kubernetes clusters
Stars: ✭ 345 (-26.44%)
SparklensQubole Sparklens tool for performance tuning Apache Spark
Stars: ✭ 345 (-26.44%)
Data Science Ipython NotebooksData science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
Stars: ✭ 22,048 (+4601.07%)
Akka.netPort of Akka actors for .NET
Stars: ✭ 4,024 (+758%)
Api.rssRSS as RESTful. This service allows you to transform RSS feed into an awesome API.
Stars: ✭ 340 (-27.51%)
Webkettle基于web版kettle开发的一套分布式综合调度,管理,ETL开发的用户专业版B/S架构工具
Stars: ✭ 334 (-28.78%)
Kube SpawnA tool for creating multi-node Kubernetes clusters on a Linux machine using kubeadm & systemd-nspawn. Brought to you by the Kinvolk team.
Stars: ✭ 392 (-16.42%)
Kontraktordistributed Actors for Java 8 / JavaScript
Stars: ✭ 333 (-29%)
Udacity Data Engineering ProjectsFew projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.
Stars: ✭ 458 (-2.35%)