Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → howl-anderson → Hanzi_char_featurizer

howl-anderson / Hanzi_char_featurizer

Licence: apache-2.0

汉字字符特征提取器 (featurizer)，提取汉字的特征（发音特征、字形特征）用做深度学习的特征｜ A Chinese character feature extractor, which extracts the features of Chinese characters (pronunciation features, glyph features) as features for deep learning

Programming Languages

139335 projects - #7 most used programming language

Labels

feature-engineering

Projects that are alternatives of or similar to Hanzi char featurizer

Home Credit Default Risk

Default risk prediction for Home Credit competition - Fast, scalable and maintainable SQL-based feature engineering pipeline

Stars: ✭ 68 (-63.64%)

Mutual labels: feature-engineering

Feature Store for Machine Learning

Stars: ✭ 2,576 (+1277.54%)

Mutual labels: feature-engineering

R package for automation of machine learning, forecasting, feature engineering, model evaluation, model interpretation, data generation, and recommenders.

Stars: ✭ 159 (-14.97%)

Mutual labels: feature-engineering

Data transformations for the ML era

Stars: ✭ 96 (-48.66%)

Mutual labels: feature-engineering

A Python library for easy data analysis, visualization, exploration and modeling

Stars: ✭ 123 (-34.22%)

Mutual labels: feature-engineering

Ppdai risk evaluation

“魔镜杯”风控算法大赛拍拍贷风控模型，接近冠军分数

Stars: ✭ 144 (-22.99%)

Mutual labels: feature-engineering

Awesome Feature Engineering

A curated list of feature engineering techniques for image and text machine learning

Stars: ✭ 45 (-75.94%)

Mutual labels: feature-engineering

Feature Engineering Handbook

A practical feature engineering handbook

Stars: ✭ 181 (-3.21%)

Mutual labels: feature-engineering

The Data Science Workshop

A New, Interactive Approach to Learning Data Science

Stars: ✭ 126 (-32.62%)

Mutual labels: feature-engineering

Machine Learning Workflow With Python

This is a comprehensive ML techniques with python: Define the Problem- Specify Inputs & Outputs- Data Collection- Exploratory data analysis -Data Preprocessing- Model Design- Training- Evaluation

Stars: ✭ 157 (-16.04%)

Mutual labels: feature-engineering

An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.

Stars: ✭ 10,698 (+5620.86%)

Mutual labels: feature-engineering

[UNMAINTAINED] Automated machine learning for analytics & production

Stars: ✭ 1,559 (+733.69%)

Mutual labels: feature-engineering

EvalML is an AutoML library written in python.

Stars: ✭ 145 (-22.46%)

Mutual labels: feature-engineering

Kaggle Competitions

There are plenty of courses and tutorials that can help you learn machine learning from scratch but here in GitHub, I want to solve some Kaggle competitions as a comprehensive workflow with python packages. After reading, you can use this workflow to solve other real problems and use it as a template.

Stars: ✭ 86 (-54.01%)

Mutual labels: feature-engineering

TransmogrifAI (pronounced trăns-mŏgˈrə-fī) is an AutoML library for building modular, reusable, strongly typed machine learning workflows on Apache Spark with minimal hand-tuning

Stars: ✭ 2,084 (+1014.44%)

Mutual labels: feature-engineering

A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.

Stars: ✭ 8,378 (+4380.21%)

Mutual labels: feature-engineering

Complete Life Cycle Of A Data Science Project

Complete-Life-Cycle-of-a-Data-Science-Project

Stars: ✭ 140 (-25.13%)

Mutual labels: feature-engineering

A hyperparameter optimization and data collection toolbox for convenient and fast prototyping of machine-learning models.

Stars: ✭ 182 (-2.67%)

Mutual labels: feature-engineering

Linear Prediction Model with Automated Feature Engineering and Selection Capabilities

Stars: ✭ 178 (-4.81%)

Mutual labels: feature-engineering

A recommender system for discovering GitHub repos, built with Apache Spark

Stars: ✭ 149 (-20.32%)

Mutual labels: feature-engineering

View All Similar Projects ➔

汉字字符特征提取器（featurizer）

在深度学习中，很多场合需要提取汉字的特征（发音特征、字形特征）。本项目提供了一个通用的字符特征提取框架，并内建了 拼音、字形（四角编码）和 部首拆解 的特征。

特征提取器

拼音特征提取器：提取汉字的拼音作为特征，发音相似的字在编码上应该相似。示例： 胡 -> hú，福 -> fú
字形（四角编码）提取器：提取中文的外形作为特征，相似的汉字在编码上应该相近。示例：门 -> 37001，闩 -> 37101
部首拆解提取器：提取汉字的偏旁部首拆解作为特征，相似的汉字在编码上应该相近。示例：闩 -> ['门', '一']，闫 -> ['门', '三']

使用

from hanzi_char_featurizer import Featurizor

featurizor = Featurizor()
result = featurizor.featurize('明天')
print(result)

输出

([['m'], ['t']], [['ing'], ['ian']], [['2'], ['1']], ('6', '1'), ('7', '0'), ('0', '8'), ('2', '0'), ('0', '4'))

结构解析

输出到 TensorFlow 作为 Tensor

import tensorflow as tf

import hanzi_char_featurizer

feature = hanzi_char_featurizer.featurize_as_tensor('./usage/data.txt')

with tf.Session() as sess:
    sess.run(tf.initializers.tables_initializer())
    for _ in range(1):
        print('+' * 20)
        data = sess.run(feature)
        print(data)

输出

++++++++++++++++++++
[[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0.
  1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0.
  0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]]

在使用 hanzi_char_featurizer 的公司列表

TODO

增加 Unicode 的 IDS 表征，来自爱奇艺 FASPell 模型

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 187

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (1) 🔗