AlphapyAutomated Machine Learning [AutoML] with Python, scikit-learn, Keras, XGBoost, LightGBM, and CatBoost
Stars: ✭ 564 (-65.4%)
Data Science PortfolioPortfolio of data science projects completed by me for academic, self learning, and hobby purposes.
Stars: ✭ 559 (-65.71%)
LogigskA Linux based software package to control led's on Logitech G910, G810, G610 and G410.
Stars: ✭ 107 (-93.44%)
Baby Names AnalysisData ETL & Analysis on the dataset 'Baby Names from Social Security Card Applications - National Data'.
Stars: ✭ 557 (-65.83%)
JustenoughscalaforsparkA tutorial on the most important features and idioms of Scala that you need to use Spark's Scala APIs.
Stars: ✭ 538 (-66.99%)
Bigdata File ViewerA cross-platform (Windows, MAC, Linux) desktop application to view common bigdata binary format like Parquet, ORC, AVRO, etc. Support local file system, HDFS, AWS S3, Azure Blob Storage ,etc.
Stars: ✭ 86 (-94.72%)
Utils4sscala、spark使用过程中,各种测试用例以及相关资料整理
Stars: ✭ 1,070 (-34.36%)
MagellanGeo Spatial Data Analytics on Spark
Stars: ✭ 507 (-68.9%)
Dat8General Assembly's 2015 Data Science course in Washington, DC
Stars: ✭ 1,516 (-6.99%)
YfinanceDownload market data from Yahoo! Finance's API
Stars: ✭ 6,148 (+277.18%)
Hops ExamplesExamples for Deep Learning/Feature Store/Spark/Flink/Hive/Kafka jobs and Jupyter notebooks on Hops
Stars: ✭ 84 (-94.85%)
Mgflappy Bird飞翔的小鸟:是一个飞翔的小鸟通过障碍物得分的小游戏和熊猫(Panda):是一款以熊猫为主题的游戏,你将会化身行动敏捷神速的熊猫
Stars: ✭ 20 (-98.77%)
Gis Tools For HadoopThe GIS Tools for Hadoop are a collection of GIS tools for spatial analysis of big data.
Stars: ✭ 485 (-70.25%)
Gait RecognitionDistance Recognition of a Human Being with Deep CNN's
Stars: ✭ 51 (-96.87%)
SparktutorialSource code for James Lee's Aparch Spark with Java course
Stars: ✭ 105 (-93.56%)
Lpa DetectorOptimize and improve the Label propagation algorithm
Stars: ✭ 75 (-95.4%)
Crime AnalysisAssociation Rule Mining from Spatial Data for Crime Analysis
Stars: ✭ 20 (-98.77%)
PandapyPandaPy has the speed of NumPy and the usability of Pandas 10x to 50x faster (by @firmai)
Stars: ✭ 474 (-70.92%)
SkootA package for data science practitioners. This library implements a number of helpful, common data transformations with a scikit-learn friendly interface in an effort to expedite the modeling process.
Stars: ✭ 50 (-96.93%)
SparkCross-platform real-time collaboration client optimized for business and organizations.
Stars: ✭ 471 (-71.1%)
WeightedcalcsPandas-based utility to calculate weighted means, medians, distributions, standard deviations, and more.
Stars: ✭ 83 (-94.91%)
Jqdatasdk简单易用的量化金融数据包(easy utility for getting financial market data of China)
Stars: ✭ 457 (-71.96%)
Basehttps://www.researchgate.net/profile/Rajah_Iyer
Stars: ✭ 48 (-97.06%)
Stock Market Analysis And PredictionStock Market Analysis and Prediction is the project on technical analysis, visualization and prediction using data provided by Google Finance.
Stars: ✭ 112 (-93.13%)
Dupandas📊 python package for performing deduplication using flexible text matching and cleaning in pandas dataframe
Stars: ✭ 20 (-98.77%)
Spark StatesCustom state store providers for Apache Spark
Stars: ✭ 83 (-94.91%)
YanagishimaWeb UI for Trino, Presto, Hive, Elasticsearch, SparkSQL
Stars: ✭ 424 (-73.99%)
XyzpyEfficiently generate and analyse high dimensional data.
Stars: ✭ 45 (-97.24%)
100 Pandas Puzzles100 data puzzles for pandas, ranging from short and simple to super tricky (60% complete)
Stars: ✭ 1,382 (-15.21%)
KglabGraph-Based Data Science: an abstraction layer in Python for building knowledge graphs, integrated with popular graph libraries – atop Pandas, RDFlib, pySHACL, RAPIDS, NetworkX, iGraph, PyVis, pslpython, pyarrow, etc.
Stars: ✭ 98 (-93.99%)
KodiakEnhance your feature engineering workflow with Kodiak
Stars: ✭ 20 (-98.77%)
FeatranA Scala feature transformation library for data science and machine learning
Stars: ✭ 420 (-74.23%)
SparkleHaskell on Apache Spark.
Stars: ✭ 419 (-74.29%)
Seaborn TutorialThis repository is my attempt to help Data Science aspirants gain necessary Data Visualization skills required to progress in their career. It includes all the types of plot offered by Seaborn, applied on random datasets.
Stars: ✭ 114 (-93.01%)
Delta ArchitectureStreaming data changes to a Data Lake with Debezium and Delta Lake pipeline
Stars: ✭ 43 (-97.36%)
TutorialJava全栈知识架构体系总结
Stars: ✭ 407 (-75.03%)
Yelp dataset challengePlay around with Yelp dataset in Python (in progress and very messy repo)
Stars: ✭ 15 (-99.08%)
LabsResearch on distributed system
Stars: ✭ 73 (-95.52%)
NumsharpHigh Performance Computation for N-D Tensors in .NET, similar API to NumPy.
Stars: ✭ 882 (-45.89%)
Df2gspreadManage Google Spreadsheets in Pandas DataFrame with Python
Stars: ✭ 114 (-93.01%)
Baidu poi search一个基于pyqt5的百度地图兴趣点GUI采集工具,可根据关键词搜索指定区域的兴趣点,并导出为excel文件
Stars: ✭ 113 (-93.07%)
FlintA Time Series Library for Apache Spark
Stars: ✭ 878 (-46.13%)
LogislandScalable stream processing platform for advanced realtime analytics on top of Kafka and Spark. LogIsland also supports MQTT and Kafka Streams (Flink being in the roadmap). The platform does complex event processing and is suitable for time series analysis. A large set of valuable ready to use processors, data sources and sinks are available.
Stars: ✭ 97 (-94.05%)
Locopylocopy: Loading/Unloading to Redshift and Snowflake using Python.
Stars: ✭ 73 (-95.52%)
TedsdsApache Spark - Turbofan Engine Degradation Simulation Data Set example in Apache Spark
Stars: ✭ 14 (-99.14%)
Live log analyzer sparkSpark Application for analysis of Apache Access logs and detect anamolies! Along with Medium Article.
Stars: ✭ 14 (-99.14%)
Kamu CliNext generation tool for decentralized exchange and transformation of semi-structured data
Stars: ✭ 69 (-95.77%)
Pandas ProfilingCreate HTML profiling reports from pandas DataFrame objects
Stars: ✭ 8,329 (+410.98%)