PydisA simple longslit spectroscopy pipeline in Python
Stars: ✭ 37 (-82.13%)
Winutilswinutils.exe hadoop.dll and hdfs.dll binaries for hadoop windows
Stars: ✭ 657 (+217.39%)
CoreThe safe post-production pipeline - https://getavalon.github.io/2.0
Stars: ✭ 162 (-21.74%)
Argo CdDeclarative continuous deployment for Kubernetes.
Stars: ✭ 7,887 (+3710.14%)
SupraSUPRA: Software Defined Ultrasound Processing for Real-Time Applications - An Open Source 2D and 3D Pipeline from Beamforming to B-Mode
Stars: ✭ 96 (-53.62%)
Go StreamsA lightweight stream processing library for Go
Stars: ✭ 615 (+197.1%)
Eel SdkBig Data Toolkit for the JVM
Stars: ✭ 140 (-32.37%)
BlurrData transformations for the ML era
Stars: ✭ 96 (-53.62%)
PdpipeEasy pipelines for pandas DataFrames.
Stars: ✭ 590 (+185.02%)
Jenkinsdocs Jenkins实践文档 最新站点地址: http://www.idevops.site
Stars: ✭ 200 (-3.38%)
NextflowA DSL for data-driven computational pipelines
Stars: ✭ 1,337 (+545.89%)
Hadoop study定期更新Hadoop生态圈中常用大数据组件文档 重心依次为: Flink Solr Sparksql ES Scala Kafka Hbase/phoenix Redis Kerberos (项目包含hadoop思维导图 印象笔记 Scala版本简单demo 常用工具类 去敏后的train code 持续更新!!!)
Stars: ✭ 567 (+173.91%)
Go spider[爬虫框架 (golang)] An awesome Go concurrent Crawler(spider) framework. The crawler is flexible and modular. It can be expanded to an Individualized crawler easily or you can use the default crawl components only.
Stars: ✭ 1,745 (+743%)
Deep ForestAn Efficient, Scalable and Optimized Python Framework for Deep Forest (2021.2.1)
Stars: ✭ 547 (+164.25%)
VistrailsVisTrails is an open-source data analysis and visualization tool. It provides a comprehensive provenance infrastructure that maintains detailed history information about the steps followed and data derived in the course of an exploratory task: VisTrails maintains provenance of data products, of the computational processes that derive these products and their executions.
Stars: ✭ 94 (-54.59%)
Ttyplota realtime plotting utility for terminal/console with data input from stdin
Stars: ✭ 532 (+157%)
Machine Learning ModelsDecision Trees, Random Forest, Dynamic Time Warping, Naive Bayes, KNN, Linear Regression, Logistic Regression, Mixture Of Gaussian, Neural Network, PCA, SVD, Gaussian Naive Bayes, Fitting Data to Gaussian, K-Means
Stars: ✭ 160 (-22.71%)
Wifi基于wifi抓取信息的大数据查询分析系统
Stars: ✭ 93 (-55.07%)
MachinelearnjsMachine Learning library for the web and Node.
Stars: ✭ 498 (+140.58%)
XlearningAI on Hadoop
Stars: ✭ 1,709 (+725.6%)
Gis Tools For HadoopThe GIS Tools for Hadoop are a collection of GIS tools for spatial analysis of big data.
Stars: ✭ 485 (+134.3%)
MnemonicApache Mnemonic - A non-volatile hybrid memory storage oriented library
Stars: ✭ 91 (-56.04%)
Pdf编程电子书,电子书,编程书籍,包括C,C#,Docker,Elasticsearch,Git,Hadoop,HeadFirst,Java,Javascript,jvm,Kafka,Linux,Maven,MongoDB,MyBatis,MySQL,Netty,Nginx,Python,RabbitMQ,Redis,Scala,Solr,Spark,Spring,SpringBoot,SpringCloud,TCPIP,Tomcat,Zookeeper,人工智能,大数据类,并发编程,数据库类,数据挖掘,新面试题,架构设计,算法系列,计算机类,设计模式,软件测试,重构优化,等更多分类
Stars: ✭ 12,009 (+5701.45%)
ChefboostA Lightweight Decision Tree Framework supporting regular algorithms: ID3, C4,5, CART, CHAID and Regression Trees; some advanced techniques: Gradient Boosting (GBDT, GBRT, GBM), Random Forest and Adaboost w/categorical features support for Python
Stars: ✭ 176 (-14.98%)
GaiaBuild powerful pipelines in any programming language.
Stars: ✭ 4,534 (+2090.34%)
DrakeAn R-focused pipeline toolkit for reproducibility and high-performance computing
Stars: ✭ 1,301 (+528.5%)
Data Science Ipython NotebooksData science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
Stars: ✭ 22,048 (+10551.21%)
The AppSample application and CD Pipeline for DevOps Dojo
Stars: ✭ 88 (-57.49%)
Optimus🚚 Agile Data Preparation Workflows made easy with dask, cudf, dask_cudf and pyspark
Stars: ✭ 986 (+376.33%)
Circosjsd3 library to build circular graphs
Stars: ✭ 436 (+110.63%)
CortxCORTX Community Object Storage is 100% open source object storage uniquely optimized for mass capacity storage devices.
Stars: ✭ 426 (+105.8%)
Biglassobiglasso: Extending Lasso Model Fitting to Big Data in R
Stars: ✭ 87 (-57.97%)
RushA cross-platform command-line tool for executing jobs in parallel
Stars: ✭ 421 (+103.38%)
KartonDistributed malware processing framework based on Python, Redis and MinIO.
Stars: ✭ 134 (-35.27%)
MarmarayGeneric Data Ingestion & Dispersal Library for Hadoop
Stars: ✭ 414 (+100%)
ServingA flexible, high-performance carrier for machine learning models(『飞桨』服务化部署框架)
Stars: ✭ 403 (+94.69%)
Pex ContextModern WebGL state wrapper for PEX: allocate GPU resources (textures, buffers), setup state pipelines and passes, and combine them into commands.
Stars: ✭ 117 (-43.48%)
Bio embeddingsGet protein embeddings from protein sequences
Stars: ✭ 86 (-58.45%)
Pytorch classification利用pytorch实现图像分类的一个完整的代码,训练,预测,TTA,模型融合,模型部署,cnn提取特征,svm或者随机森林等进行分类,模型蒸馏,一个完整的代码
Stars: ✭ 395 (+90.82%)
Mara PipelinesA lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflow
Stars: ✭ 1,841 (+789.37%)
IcebergIceberg is a table format for large, slow-moving tabular data
Stars: ✭ 393 (+89.86%)
ClusterflowA pipelining tool to automate and standardise bioinformatics analyses on cluster environments.
Stars: ✭ 85 (-58.94%)
OrcApache ORC - the smallest, fastest columnar storage for Hadoop workloads
Stars: ✭ 389 (+87.92%)
Spacy Wordnetspacy-wordnet creates annotations that easily allow the use of wordnet and wordnet domains by using the nltk wordnet interface
Stars: ✭ 156 (-24.64%)
AirbyteAirbyte is an open-source EL(T) platform that helps you replicate your data in your warehouses, lakes and databases.
Stars: ✭ 4,919 (+2276.33%)
Spark With PythonFundamentals of Spark with Python (using PySpark), code examples
Stars: ✭ 150 (-27.54%)
LastbackendSystem for containerized apps management. From build to scaling.
Stars: ✭ 1,536 (+642.03%)
Mlj.jlA Julia machine learning framework
Stars: ✭ 982 (+374.4%)
Jsr203 HadoopA Java NIO file system provider for HDFS
Stars: ✭ 35 (-83.09%)
DataxDataX is an open source universal ETL tool that support Cassandra, ClickHouse, DBF, Hive, InfluxDB, Kudu, MySQL, Oracle, Presto(Trino), PostgreSQL, SQL Server
Stars: ✭ 116 (-43.96%)
CimonitorDisplays CI statuses on a dashboard and triggers fun modules representing the status!
Stars: ✭ 34 (-83.57%)