All Projects → Z-yq → Tensorflowasr

Z-yq / Tensorflowasr

Licence: apache-2.0
集成了Tensorflow 2版本的端到端语音识别模型,并且RTF(实时率)在0.1左右/Mandarin State-of-the-art Automatic Speech Recognition in Tensorflow 2

Programming Languages

python
139335 projects - #7 most used programming language
cpp
1120 projects

Projects that are alternatives of or similar to Tensorflowasr

Sightseq
Computer vision tools for fairseq, containing PyTorch implementation of text recognition and object detection
Stars: ✭ 116 (-20%)
Mutual labels:  ctc, transformer
Neural sp
End-to-end ASR/LM implementation with PyTorch
Stars: ✭ 408 (+181.38%)
Mutual labels:  ctc, transformer
Athena
an open-source implementation of sequence-to-sequence based speech processing engine
Stars: ✭ 542 (+273.79%)
Mutual labels:  ctc, transformer
Multiturndialogzoo
Multi-turn dialogue baselines written in PyTorch
Stars: ✭ 106 (-26.9%)
Mutual labels:  transformer
Getting Started With Google Bert
Build and train state-of-the-art natural language processing models using BERT
Stars: ✭ 107 (-26.21%)
Mutual labels:  transformer
Transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Stars: ✭ 55,742 (+38342.76%)
Mutual labels:  transformer
Tupe
Transformer with Untied Positional Encoding (TUPE). Code of paper "Rethinking Positional Encoding in Language Pre-training". Improve existing models like BERT.
Stars: ✭ 143 (-1.38%)
Mutual labels:  transformer
Protoc Gen Struct Transformer
Transformation functions generator for Protocol Buffers.
Stars: ✭ 105 (-27.59%)
Mutual labels:  transformer
Tensorflow Ctc Speech Recognition
Application of Connectionist Temporal Classification (CTC) for Speech Recognition (Tensorflow 1.0 but compatible with 2.0).
Stars: ✭ 127 (-12.41%)
Mutual labels:  ctc
Mmsegmentation
OpenMMLab Semantic Segmentation Toolbox and Benchmark.
Stars: ✭ 2,875 (+1882.76%)
Mutual labels:  transformer
Text recognition toolbox
text_recognition_toolbox: The reimplementation of a series of classical scene text recognition papers with Pytorch in a uniform way.
Stars: ✭ 114 (-21.38%)
Mutual labels:  ctc
Kiss
Code for the paper "KISS: Keeping it Simple for Scene Text Recognition"
Stars: ✭ 108 (-25.52%)
Mutual labels:  transformer
Transformer In Generating Dialogue
An Implementation of 'Attention is all you need' with Chinese Corpus
Stars: ✭ 121 (-16.55%)
Mutual labels:  transformer
Nlp research
NLP research:基于tensorflow的nlp深度学习项目,支持文本分类/句子匹配/序列标注/文本生成 四大任务
Stars: ✭ 141 (-2.76%)
Mutual labels:  transformer
Asr syllable
基于卷积神经网络的语音识别声学模型的研究
Stars: ✭ 127 (-12.41%)
Mutual labels:  ctc
Bertqa Attention On Steroids
BertQA - Attention on Steroids
Stars: ✭ 112 (-22.76%)
Mutual labels:  transformer
Ghostnet
CV backbones including GhostNet, TinyNet and TNT, developed by Huawei Noah's Ark Lab.
Stars: ✭ 1,744 (+1102.76%)
Mutual labels:  transformer
Overlappredator
[CVPR 2021, Oral] PREDATOR: Registration of 3D Point Clouds with Low Overlap.
Stars: ✭ 106 (-26.9%)
Mutual labels:  transformer
Symfony Jsonapi
JSON API Transformer Bundle for Symfony 2 and Symfony 3
Stars: ✭ 114 (-21.38%)
Mutual labels:  transformer
The Story Of Heads
This is a repository with the code for the ACL 2019 paper "Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned" and the paper "Analyzing Source and Target Contributions to NMT Predictions".
Stars: ✭ 146 (+0.69%)
Mutual labels:  transformer

TensorflowASR

python tensorflow

集成了Tensorflow 2版本的端到端语音识别模型,并且RTF(实时率)在0.1左右

目前集成了中文的CTC\Transducer\LAS 三种结构

当前还在开发阶段

欢迎使用并反馈bug

English|中文版

Mel Layer

参照librosa库,用TF2实现了语音频谱特征提取的层,这样在跨平台部署时会更加容易。

使用:

  • am_data.yml
    use_mel_layer: True
    mel_layer_type: Melspectrogram #Spectrogram
    trainable_kernel: True #support train model,not recommend
    

Cpp Inference

C++的demo已经提供。

测试于TensorflowC 2.3.0版本

详细见目录 cppinference

Pretrained Model

所有结果测试于 AISHELL TEST 数据集.

RTF(实时率) 测试于CPU单核解码任务。

AM:

Model Name Mel layer(USE/TRAIN) link code train data phoneme CER(%) Params Size RTF
ConformerCTC(M) True/False pan.baidu.com/s/1NPk17DUr0-lBgwCkC5dFuQ 7qmd aishell-1(20 epochs) 6.2/5.1 32M 0.114
ConformerCTS(S) True/False pan.baidu.com/s/1mHR2RryT7Rw0D4I9caY0QQ 7g3n aishell-1(20 epochs) 9.1/8.7 10M 0.056

LM:

Model Name O2O(Decoder) link code train data txt cer model size params size RTF
TransformerO2OE True(False) pan.baidu.com/s/1X11OE_sk7yNTjtDpU7sfvA sxrw aishell-1 text(30 epochs) 4.4 43M 10M 0.06
TransformerO2OED True(True) pan.baidu.com/s/1acvCRpS2j16dxLoCyToB6A jrfi aishell2 text(10k steps) 6.2 217M 61M 0.13
Transformer True(True) - - aishell2 text(10k steps) 8.6 233M 61M 0.31

快速使用:

下载预训练模型,修改 am_data.yml/lm_data.yml 里的目录参数(running_config下的outdir参数),并在修改后的目录中添加 checkpoints 目录,

将model_xx.h5(xx为数字)文件放入对应的checkpoints目录中,

修改run-test.py中的读取的config文件(am_data.yml,model.yml)路径,运行run-test.py即可。

Community

欢迎加入,讨论和分享问题。

What's New?

最新更新

  • 优化了一些逻辑
  • Change RNNT predict to support C++
  • Add C++ Inference Demo,detail in cppinference

Supported Structure

  • CTC
  • Transducer
  • LAS
  • MultiTaskCTC

Supported Models

  • Conformer
  • ESPNet:Efficient Spatial Pyramid of Dilated Convolutions
  • DeepSpeech2
  • Transformer拼音->汉字
    • O2O-Encoder-Decoder 完整的transformer结构,拼音与汉字一一对应的形式 ,e.g.: pin4 yin4-> 拼音
    • O2O-Encoder 不含decoder部分的结构
    • Encoder-Decoder 经典的transformer结构

Requirements

  • Python 3.6+
  • Tensorflow 2.2+: pip install tensorflow
  • librosa
  • pypinyin if you need use the default phoneme
  • keras-bert
  • addons For LAS structure,pip install tensorflow-addons
  • tqdm
  • jieba
  • wrap_rnnt_loss not essential,provide in ./externals
  • wrap_ctc_decoders not essential,provide in ./externals

Usage

  1. 准备train_list.

    am_train_list 格式,其中'\t'为tap:

    file_path1 \t text1
    file_path2 \t text2
    ……
    

    lm_train_list 格式:

    text1
    text2
    ……
    
  2. 下载bert的预训练模型,用于LM的辅助训练,如果你不需要LM可以跳过:

     https://pan.baidu.com/s/1_HDAhfGZfNhXS-cYoLQucA extraction code: 4hsa
    
  3. 修改配置文件 am_data.yml (in ./configs)来设置一些训练的选项,以及修改model yaml(如:./configs/conformer.yml) 里的name参数来选择模型结构。

  4. 然后执行命令:

    python train_am.py --data_config ./configs/am_data.yml --model_config ./configs/conformer.yml
    
  5. 想要测试时,可以参考 run-test.py 里写的demo,当然你可以修改 predict 方法来适应你的需求:

    from utils.user_config import UserConfig
    from AMmodel.model import AM
    from LMmodel.trm_lm import LM
    
    am_config=UserConfig(r'./configs/am_data.yml',r'./configs/conformer.yml')
    lm_config = UserConfig(r'./configs/lm_data.yml', r'./configs/transformer.yml')
    
    am=AM(am_config)
    am.load_model(training=False)
    
    lm=LM(lm_config)
    lm.load_model(training=False)
    
    am_result=am.predict(wav_path)
    if am.model_type=='Transducer':
       am_result =am.decode(am_result[1:-1])
       lm_result = lm.predict(am_result)
       lm_result = lm.decode(lm_result[0].numpy(), self.lm.word_featurizer)
    else:
       am_result=am.decode(am_result[0])
       lm_result=lm.predict(am_result)
       lm_result = lm.decode(lm_result[0].numpy(), self.lm.word_featurizer)
    
    

也可以使用Tester 来大批量测试数据验证你的模型性能:

第一步需要修改 am_data.yml/lm_data.yml 里的 eval_list ,格式与 train_list 相同

然后执行:

python eval_am.py --data_config ./configs/am_data.yml --model_config ./configs/conformer.yml

该脚本将展示 SER/CER/DEL/INS/SUB 几项指标

Your Model

如果你想加入你自己的模型,你可以将模型加入 ./AMmodel 目录里 ,声学、语言模型操作都一样,语言模型就放在 ./LMmodel

from AMmodel.transducer_wrap import Transducer
from AMmodel.ctc_wrap import CtcModel
from AMmodel.las_wrap import LAS,LASConfig
class YourModel(tf.keras.Model):
    def __init__(self,……):
        super(YourModel, self).__init__(……)
        ……
    
    def call(self, inputs, training=False, **kwargs):
       
        ……
        return decoded_feature
        
#To CTC
class YourModelCTC(CtcModel):
    def __init__(self,
                ……
                 **kwargs):
        super(YourModelCTC, self).__init__(
        encoder=YourModel(……),num_classes=vocabulary_size,name=name,
        )
        self.time_reduction_factor = reduction_factor #if you never use the downsample layer,set 1

#To Transducer
class YourModelTransducer(Transducer):
    def __init__(self,
                ……
                 **kwargs):
        super(YourModelTransducer, self).__init__(
            encoder=YourModel(……),
            vocabulary_size=vocabulary_size,
            embed_dim=embed_dim,
            embed_dropout=embed_dropout,
            num_lstms=num_lstms,
            lstm_units=lstm_units,
            joint_dim=joint_dim,
            name=name, **kwargs
        )
        self.time_reduction_factor = reduction_factor #if you never use the downsample layer,set 1

#To LAS
class YourModelLAS(LAS):
    def __init__(self,
                ……,
                config,# the config dict in model yml
                training,
                 **kwargs):
        config['LAS_decoder'].update({'encoder_dim':encoder_dim})# encoder_dim is your encoder's last dimension
        decoder_config=LASConfig(**config['LAS_decoder'])

        super(YourModelLAS, self).__init__(
        encoder=YourModel(……),
        config=decoder_config,
        training=training,
        )
        self.time_reduction_factor = reduction_factor #if you never use the downsample layer,set 1

然后,将你的模型添加到./AMmodel/model.py ,修改方法 load_model 来导入你的模型。

Convert to pb

AM/LM 的操作都相同:

from AMmodel.model import AM
am_config = UserConfig('...','...')
am=AM(am_config)
am.load_model(False)
am.convert_to_pb(export_path)

Tips

如果你想用你自己的音素,需要对应 am_dataloader.py/lm_dataloader.py 里的转换方法。

def init_text_to_vocab(self):#keep the name
    
    def text_to_vocab_func(txt):
        return your_convert_function

    self.text_to_vocab = text_to_vocab_func #here self.text_to_vocab is a function,not a call

不要忘记你的音素列表用 S/S 打头,e.g:

    S
    /S
    de
    shì
    ……

References

感谢关注:

https://github.com/usimarit/TiramisuASR modify from it

https://github.com/noahchalifour/warp-transducer

https://github.com/PaddlePaddle/DeepSpeech

https://github.com/baidu-research/warp-ctc

Licence

允许并感谢您使用本项目进行学术研究、商业产品生产等,但禁止将本项目作为商品进行交易。

Overall, Almost models here are licensed under the Apache 2.0 for all countries in the world.

Allow and thank you for using this project for academic research, commercial product production, allowing unrestricted commercial and non-commercial use alike.

However, it is prohibited to trade this project as a commodity.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].