All Projects → flink-extended → clink

flink-extended / clink

Licence: other
Clink is a library that provides APIs and infrastructure to facilitate the development of parallelizable feature engineering operators that can be used in both C++ and Java runtime.

Programming Languages

C++
36643 projects - #6 most used programming language
java
68154 projects - #9 most used programming language
Starlark
911 projects
shell
77523 projects
MLIR
15 projects

Projects that are alternatives of or similar to clink

Albedo
A recommender system for discovering GitHub repos, built with Apache Spark
Stars: ✭ 149 (+520.83%)
Mutual labels:  feature-engineering
Fe4ml Zh
📖 [译] 面向机器学习的特征工程
Stars: ✭ 2,323 (+9579.17%)
Mutual labels:  feature-engineering
Deep Learning Machine Learning Stock
Stock for Deep Learning and Machine Learning
Stars: ✭ 240 (+900%)
Mutual labels:  feature-engineering
Remixautoml
R package for automation of machine learning, forecasting, feature engineering, model evaluation, model interpretation, data generation, and recommenders.
Stars: ✭ 159 (+562.5%)
Mutual labels:  feature-engineering
Hyperactive
A hyperparameter optimization and data collection toolbox for convenient and fast prototyping of machine-learning models.
Stars: ✭ 182 (+658.33%)
Mutual labels:  feature-engineering
Lightautoml
LAMA - automatic model creation framework
Stars: ✭ 196 (+716.67%)
Mutual labels:  feature-engineering
Ppdai risk evaluation
“魔镜杯”风控算法大赛 拍拍贷风控模型,接近冠军分数
Stars: ✭ 144 (+500%)
Mutual labels:  feature-engineering
FIFA-2019-Analysis
This is a project based on the FIFA World Cup 2019 and Analyzes the Performance and Efficiency of Teams, Players, Countries and other related things using Data Analysis and Data Visualizations
Stars: ✭ 28 (+16.67%)
Mutual labels:  feature-engineering
Hanzi char featurizer
汉字字符特征提取器 (featurizer),提取汉字的特征(发音特征、字形特征)用做深度学习的特征 | A Chinese character feature extractor, which extracts the features of Chinese characters (pronunciation features, glyph features) as features for deep learning
Stars: ✭ 187 (+679.17%)
Mutual labels:  feature-engineering
Amazing Feature Engineering
Feature engineering is the process of using domain knowledge to extract features from raw data via data mining techniques. These features can be used to improve the performance of machine learning algorithms. Feature engineering can be considered as applied machine learning itself.
Stars: ✭ 218 (+808.33%)
Mutual labels:  feature-engineering
Transmogrifai
TransmogrifAI (pronounced trăns-mŏgˈrə-fī) is an AutoML library for building modular, reusable, strongly typed machine learning workflows on Apache Spark with minimal hand-tuning
Stars: ✭ 2,084 (+8583.33%)
Mutual labels:  feature-engineering
Feature Engineering Handbook
A practical feature engineering handbook
Stars: ✭ 181 (+654.17%)
Mutual labels:  feature-engineering
Tsfel
An intuitive library to extract features from time series
Stars: ✭ 202 (+741.67%)
Mutual labels:  feature-engineering
Machine Learning Workflow With Python
This is a comprehensive ML techniques with python: Define the Problem- Specify Inputs & Outputs- Data Collection- Exploratory data analysis -Data Preprocessing- Model Design- Training- Evaluation
Stars: ✭ 157 (+554.17%)
Mutual labels:  feature-engineering
tsflex
Flexible time series feature extraction & processing
Stars: ✭ 252 (+950%)
Mutual labels:  feature-engineering
Evalml
EvalML is an AutoML library written in python.
Stars: ✭ 145 (+504.17%)
Mutual labels:  feature-engineering
Geomancer
Automated feature engineering for geospatial data
Stars: ✭ 194 (+708.33%)
Mutual labels:  feature-engineering
feng
feng - feature engineering for machine-learning champions
Stars: ✭ 27 (+12.5%)
Mutual labels:  feature-engineering
Data-Science
Using Kaggle Data and Real World Data for Data Science and prediction in Python, R, Excel, Power BI, and Tableau.
Stars: ✭ 15 (-37.5%)
Mutual labels:  feature-engineering
Nyaggle
Code for Kaggle and Offline Competitions
Stars: ✭ 209 (+770.83%)
Mutual labels:  feature-engineering

Clink

Clink is a library that provides infrastructure to do the following:

  • Defines C++ functions that can be parallelized by TFRT thread pool.
  • Executes a graph (in the MLIR format) of these C++ functions in parallel.
  • Makes C++ functions executable as Java functions using JNA.

Furthermore, Clink provides an off-the-shelf library of reusable Feature Processing functions that can be executed as Java and C++ functions.

Clink is useful in the scenario where users want to do online feature processing with low latency (in sub-millisecond) in C++, apply the same logic to do offline feature processing in Java, and implement this logic only once (in C++).

Getting Started

Prerequisites

Clink uses TFRT as the underlying execution engine and therefore follows TFRT's Operation System and installation requirements.

Currently supported operating systems are as follows:

  • Ubuntu 16.04
  • CentOS 7.7.1908

Here are the prerequisites to build and install Clink:

  • Bazel 4.0.0
  • Clang 11.1.0
  • libstdc++8 or greater
  • openjdk-8

Clink provides dockerfiles and pre-built docker images that satisfy the installation requirements listed above. You can use one of the following commands to build the docker image, according to the operating system you expect to use.

$ docker build -t ubuntu:16.04_clink -f docker/Dockerfile_ubuntu_1604 .
$ docker build -t centos:centos7.7.1908_clink -f docker/Dockerfile_centos_77 .

Or you can use one of the following commands to pull the pre-built Docker image from Docker Hub.

$ docker pull docker.io/flinkextended/clink:ubuntu16.04
$ docker pull docker.io/flinkextended/clink:centos7.7.1908

If you plan to set up the Clink environment without the docker images provided above, please check the TFRT README for more detailed instructions to install, configure and verify Bazel, Clang, and libstdc++8.

Initializing Submodules before building Clink from Source

After setting up the environment according to the instructions above and pulling Clink repository, please use the following command to initialize submodules like TFRT before building any Clink target from source.

$ git submodule update --init --recursive

Executing Examples

Users can execute Clink C++ function example in parallel in C++ using one of the following commands.

$ bazel run //:executor -- `pwd`/mlir_test/executor/basic.mlir --work_queue_type=mstd --host_allocator_type=malloc

Developer Guidelines

Running All Tests

Developers can run the following command to build all targets and to run all tests.

$ bazel test $(bazel query //...) -c dbg

Code Formatting

Changes to Clink C++ code should conform to Google C++ Style Guide.

Clink uses ClangFormat to check C++ code, diffplug/spotless to check java code, and Buildifier to check bazel code.

Please run the following command to format codes before uploading PRs for review.

$ ./tools/format-code.sh

View & Edit Java Code with IDE

Clink provides maven configuration that allows users to view or edit java code with IDEs like IntelliJ IDEA. Before IDEs can correctly compile java project, users need to run the following commands after setting up Clink repo and build Clink.

$ bazel build //:clink_java_proto
$ cp bazel-bin/libclink_proto-speed.jar java-lib/lib/

Then users can open java-lib directory with their IDEs.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].