Skip to content

Tongjilibo/bert4torch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

bert4torch

licence GitHub release PyPI PyPI - Downloads GitHub stars GitHub Issues contributions welcome Generic badge

Documentation | Torch4keras | Examples | build_MiniLLM_from_scratch | bert4vector

目录

1. 下载安装

安装稳定版

pip install bert4torch

安装最新版

pip install git+https://github.com/Tongjilibo/bert4torch
  • 注意事项:pip包的发布慢于git上的开发版本,git clone注意引用路径,注意权重是否需要转换
  • 测试用例git clone https://github.com/Tongjilibo/bert4torch,修改example中的预训练模型文件路径和数据路径即可启动脚本
  • 自行训练:针对自己的数据,修改相应的数据处理代码块
  • 开发环境:原使用torch==1.10版本进行开发,现已切换到torch2.0开发,如其他版本遇到不适配,欢迎反馈

2. 功能

  • LLM模型: 加载chatglm、llama、 baichuan、ziya、bloom等开源大模型权重进行推理和微调,命令行一行部署大模型

  • 核心功能:加载bert、roberta、albert、xlnet、nezha、bart、RoFormer、RoFormer_V2、ELECTRA、GPT、GPT2、T5、GAU-alpha、ERNIE等预训练权重继续进行finetune、并支持在bert基础上灵活定义自己模型

  • 丰富示例:包含llmpretrainsentence_classficationsentence_embeddingsequence_labelingrelation_extractionseq2seqserving等多种解决方案

  • 实验验证:已在公开数据集实验验证,使用如下examples数据集实验指标

  • 易用trick:集成了常见的trick,即插即用

  • 其他特性加载transformers库模型一起使用;调用方式简洁高效;有训练进度条动态展示;配合torchinfo打印参数量;默认Logger和Tensorboard简便记录训练过程;自定义fit过程,满足高阶需求

  • 训练过程

    训练过程

功能 bert4torch transformers 备注
训练进度条 进度条打印loss和定义的metrics
分布式训练dp/ddp torch自带dp/ddp
各类callbacks 日志/tensorboard/earlystop/wandb等
大模型推理,stream/batch输出 各个模型是通用的,无需单独维护脚本
大模型微调 lora依赖peft库,pv2自带
丰富tricks 对抗训练等tricks即插即用
代码简洁易懂,自定义空间大 代码复用度高, keras代码训练风格
仓库的维护能力/影响力/使用量/兼容性 目前仓库个人维护
一键部署大模型

3. 快速上手

3.1 上手教程

3.2 命令行快速部署大模型服务

  • 本地 / 联网加载
    # 联网下载全部文件
    bert4torch-llm-server --checkpoint_path Qwen2-0.5B-Instruct
    
    # 加载本地大模型,联网下载bert4torch_config.json
    bert4torch-llm-server --checkpoint_path /data/pretrain_ckpt/Qwen/Qwen2-0.5B-Instruct --config_path Qwen/Qwen2-0.5B-Instruct
    
    # 加载本地大模型,且bert4torch_config.json已经下载并放于同名目录下
    bert4torch-llm-server --checkpoint_path /data/pretrain_ckpt/Qwen/Qwen2-0.5B-Instruct
  • 命令行 / gradio网页 / openai_api
    # 命令行
    bert4torch-llm-server --checkpoint_path /data/pretrain_ckpt/Qwen/Qwen2-0.5B-Instruct --mode cli
    
    # gradio网页
    bert4torch-llm-server --checkpoint_path /data/pretrain_ckpt/Qwen/Qwen2-0.5B-Instruct --mode gradio
    
    # openai_api
    bert4torch-llm-server --checkpoint_path /data/pretrain_ckpt/Qwen/Qwen2-0.5B-Instruct --mode openai
  • 命令行聊天示例 命令行聊天

4. 版本和更新历史

4.1 版本历史

更新日期 bert4torch torch4keras 版本说明
20240814 0.5.3 0.2.6 【新功能】增加llama3.1/Yi1.5;自动选择从hfmirror下载;支持命令行参数bert4torch-llm-server
20240801 0.5.2 0.2.5 【新功能】chatglm/qwen系列支持function call调用, 增加internlm2系列;【小优化】简化pipeline中chat demo的调用,generate的终止token元素允许为列表, 统一rope_scaling参数名,增加rope衍生类;【bug】修复flash_attn2的推理bug, 修复bart的tie_word_embedding的bug
20240619 0.5.1 0.2.4 增加Qwen1.5, Qwen2, glm4; 增加SWA/convert_lm_logits_dtype;调整各个trainer(重点DPOTrainer), generation中segment_ids, repetition_penalty需带query, RMSNorm中转类型bug

更多版本

4.2 更新历史

更多历史

5. 预训练权重

  • 预训练模型支持多种代码加载方式

    from bert4torch.models import build_transformer_model
    
    # 1. 仅指定config_path: 从头初始化模型结构, 不加载预训练模型
    model = build_transformer_model('./model/bert4torch_config.json')
    
    # 2. 仅指定checkpoint_path: 
    ## 2.1 文件夹路径: 自动寻找路径下的*.bin/*.safetensors权重文件 + 需把bert4torch_config.json下载并放于该目录下
    model = build_transformer_model(checkpoint_path='./model')
    
    ## 2.2 文件路径/列表: 文件路径即权重路径/列表, bert4torch_config.json会从同级目录下寻找
    model = build_transformer_model(checkpoint_path='./pytorch_model.bin')
    
    ## 2.3 model_name: hf上预训练权重名称, 会自动下载hf权重以及bert4torch_config.json文件
    model = build_transformer_model(checkpoint_path='bert-base-chinese')
    
    # 3. 同时指定config_path和checkpoint_path(本地路径名或model_name排列组合): 
    #    本地路径从本地加载,pretrained_model_name会联网下载
    config_path = './model/bert4torch_config.json'  # 或'bert-base-chinese'
    checkpoint_path = './model/pytorch_model.bin'  # 或'bert-base-chinese'
    model = build_transformer_model(config_path, checkpoint_path)
  • 预训练权重链接和bert4torch_config.json

模型分类 模型名称 权重来源 权重链接/checkpoint_path config_path
bert bert-base-chinese google-bert bert-base-chinese bert-base-chinese
chinese_L-12_H-768_A-12 谷歌 tf权重
Tongjilibo/bert-chinese_L-12_H-768_A-12
chinese-bert-wwm-ext HFL hfl/chinese-bert-wwm-ext chinese-bert-wwm-ext
bert-base-multilingual-cased google-bert bert-base-multilingual-cased bert-base-multilingual-cased
MacBERT HFL hfl/chinese-macbert-base
hfl/chinese-macbert-large
chinese-macbert-base
chinese-macbert-large
WoBERT 追一科技 junnyu/wobert_chinese_basejunnyu/wobert_chinese_plus_base wobert_chinese_base
wobert_chinese_plus_base
roberta chinese-roberta-wwm-ext HFL hfl/chinese-roberta-wwm-ext
hfl/chinese-roberta-wwm-ext-large
(large的mlm权重是随机初始化)
chinese-roberta-wwm-ext
chinese-roberta-wwm-ext-large
roberta-small/tiny 追一科技 Tongjilibo/chinese_roberta_L-4_H-312_A-12
Tongjilibo/chinese_roberta_L-6_H-384_A-12
roberta-base FacebookAI roberta-base roberta-base
guwenbert ethanyt ethanyt/guwenbert-base guwenbert-base
albert albert_zh
albert_pytorch
brightmart voidful/albert_chinese_tiny
voidful/albert_chinese_small
voidful/albert_chinese_base
voidful/albert_chinese_large
voidful/albert_chinese_xlarge
voidful/albert_chinese_xxlarge
albert_chinese_tinyalbert_chinese_small
albert_chinese_base
albert_chinese_large
albert_chinese_xlarge
albert_chinese_xxlarge
nezha NEZHA
NeZha_Chinese_PyTorch
huawei_noah sijunhe/nezha-cn-base
sijunhe/nezha-cn-large
sijunhe/nezha-base-wwm
sijunhe/nezha-large-wwm
nezha-cn-base
nezha-cn-large
nezha-base-wwm
nezha-large-wwm
nezha_gpt_dialog bojone Tongjilibo/nezha_gpt_dialog
xlnet Chinese-XLNet HFL hfl/chinese-xlnet-base chinese-xlnet-base
tranformer_xl huggingface transfo-xl/transfo-xl-wt103 transfo-xl-wt103
deberta Erlangshen-DeBERTa-v2 IDEA IDEA-CCNL/Erlangshen-DeBERTa-v2-97M-Chinese
IDEA-CCNL/Erlangshen-DeBERTa-v2-320M-Chinese
IDEA-CCNL/Erlangshen-DeBERTa-v2-710M-Chinese
Erlangshen-DeBERTa-v2-97M-Chinese
Erlangshen-DeBERTa-v2-320M-Chinese
Erlangshen-DeBERTa-v2-710M-Chinese
electra Chinese-ELECTRA HFL hfl/chinese-electra-base-discriminator chinese-electra-base-discriminator
ernie ernie 百度文心 nghuyong/ernie-1.0-base-zh
nghuyong/ernie-3.0-base-zh
ernie-1.0-base-zh
ernie-3.0-base-zh
roformer roformer 追一科技 junnyu/roformer_chinese_base roformer_chinese_base
roformer_v2 追一科技 junnyu/roformer_v2_chinese_char_base roformer_v2_chinese_char_base
simbert simbert 追一科技 Tongjilibo/simbert-chinese-base
Tongjilibo/simbert-chinese-small
Tongjilibo/simbert-chinese-tiny
simbert_v2/roformer-sim 追一科技 junnyu/roformer_chinese_sim_char_basejunnyu/roformer_chinese_sim_char_ft_basejunnyu/roformer_chinese_sim_char_smalljunnyu/roformer_chinese_sim_char_ft_small roformer_chinese_sim_char_base
roformer_chinese_sim_char_ft_base
roformer_chinese_sim_char_small
roformer_chinese_sim_char_ft_small
gau GAU-alpha 追一科技 Tongjilibo/chinese_GAU-alpha-char_L-24_H-768
uie uie
uie_pytorch
百度 Tongjilibo/uie-base
gpt CDial-GPT thu-coai thu-coai/CDial-GPT_LCCC-base
thu-coai/CDial-GPT_LCCC-large
CDial-GPT_LCCC-base
CDial-GPT_LCCC-large
cmp_lm(26亿) 清华 TsinghuaAI/CPM-Generate CPM-Generate
nezha_gen huawei_noah Tongjilibo/chinese_nezha_gpt_L-12_H-768_A-12
gpt2-chinese-cluecorpussmall UER uer/gpt2-chinese-cluecorpussmall gpt2-chinese-cluecorpussmall
gpt2-ml imcaspar torch
BaiduYun(84dh)
gpt2-ml_15g_corpus
gpt2-ml_30g_corpus
bart bart_base_chinese 复旦fnlp fnlp/bart-base-chinese
v1.0
bart-base-chinese
bart-base-chinese-v1.0
t5 t5 UER uer/t5-small-chinese-cluecorpussmall
uer/t5-base-chinese-cluecorpussmall
t5-base-chinese-cluecorpussmall
t5-small-chinese-cluecorpussmall
mt5 谷歌 google/mt5-base mt5-base
t5_pegasus 追一科技 Tongjilibo/chinese_t5_pegasus_small
Tongjilibo/chinese_t5_pegasus_base
chatyuan clue-ai ClueAI/ChatYuan-large-v1
ClueAI/ChatYuan-large-v2
ChatYuan-large-v1
ChatYuan-large-v2
PromptCLUE clue-ai ClueAI/PromptCLUE-base PromptCLUE-base
chatglm chatglm-6b THUDM THUDM/chatglm-6b
THUDM/chatglm-6b-int8
THUDM/chatglm-6b-int4
v0.1.0
chatglm-6b
chatglm-6b-int8
chatglm-6b-int4
chatglm-6b-v0.1.0
chatglm2-6b THUDM THUDM/chatglm2-6b
THUDM/chatglm2-6b-int4
THUDM/chatglm2-6b-32k
chatglm2-6b
chatglm2-6b-int4
chatglm2-6b-32k
chatglm3-6b THUDM THUDM/chatglm3-6b
THUDM/chatglm3-6b-32k
chatglm3-6b
chatglm3-6b-32k
glm4-9b THUDM THUDM/glm-4-9b
THUDM/glm-4-9b-chat
THUDM/glm-4-9b-chat-1m
glm-4-9b
glm-4-9b-chat
glm-4-9b-chat-1m
llama llama meta llama-7b
llama-13b
llama-2 meta meta-llama/Llama-2-7b-hf
meta-llama/Llama-2-7b-chat-hf
meta-llama/Llama-2-13b-hf
meta-llama/Llama-2-13b-chat-hf
Llama-2-7b-hf
Llama-2-7b-chat-hf
Llama-2-13b-hf
Llama-2-13b-chat-hf
llama-3 meta meta-llama/Meta-Llama-3-8B
meta-llama/Meta-Llama-3-8B-Instruct
Meta-Llama-3-8B
Meta-Llama-3-8B-Instruct
llama-3.1 meta meta-llama/Meta-Llama-3.1-8B
meta-llama/Meta-Llama-3.1-8B-Instruct
Meta-Llama-3.1-8B
Meta-Llama-3.1-8B-Instruct
Chinese-LLaMA-Alpaca HFL chinese_alpaca_plus_7b
chinese_llama_plus_7b
Chinese-LLaMA-Alpaca-2 HFL 待添加
Chinese-LLaMA-Alpaca-3 HFL 待添加
Belle_llama LianjiaTech BelleGroup/BELLE-LLaMA-7B-2M-enc 合成说明BELLE-LLaMA-7B-2M-enc
Ziya IDEA-CCNL IDEA-CCNL/Ziya-LLaMA-13B-v1
IDEA-CCNL/Ziya-LLaMA-13B-v1.1
IDEA-CCNL/Ziya-LLaMA-13B-Pretrain-v1
Ziya-LLaMA-13B-v1
Ziya-LLaMA-13B-v1.1
Baichuan baichuan-inc baichuan-inc/Baichuan-7B
baichuan-inc/Baichuan-13B-Base
baichuan-inc/Baichuan-13B-Chat
Baichuan-7B
Baichuan-13B-Base
Baichuan-13B-Chat
Baichuan2 baichuan-inc baichuan-inc/Baichuan2-7B-Base
baichuan-inc/Baichuan2-7B-Chat
baichuan-inc/Baichuan2-13B-Base
baichuan-inc/Baichuan2-13B-Chat
Baichuan2-7B-Base
Baichuan2-7B-Chat
Baichuan2-13B-Base
Baichuan2-13B-Chat
vicuna lmsys lmsys/vicuna-7b-v1.5 vicuna-7b-v1.5
Yi 01-ai 01-ai/Yi-6B
01-ai/Yi-6B-200K
01-ai/Yi-9B
01-ai/Yi-9B-200K
Yi-6B
Yi-6B-200K
Yi-9B
Yi-9B-200K
Yi-1.5 01-ai 01-ai/Yi-1.5-6B
01-ai/Yi-1.5-6B-Chat
01-ai/Yi-1.5-9B
01-ai/Yi-1.5-9B-32K
01-ai/Yi-1.5-9B-Chat
01-ai/Yi-1.5-9B-Chat-16K
Yi-1.5-6B
Yi-1.5-6B-Chat
Yi-1.5-9B
Yi-1.5-9B-32K
Yi-1.5-9B-Chat
Yi-1.5-9B-Chat-16K
bloom bloom bigscience bigscience/bloom-560m
bigscience/bloomz-560m
bloom-560m
bloomz-560m
Qwen Qwen 阿里云 Qwen/Qwen-1_8B
Qwen/Qwen-1_8B-Chat
Qwen/Qwen-7B
Qwen/Qwen-7B-Chat
Qwen/Qwen-14B
Qwen/Qwen-14B-Chat
Qwen-1_8B
Qwen-1_8B-Chat
Qwen-7B
Qwen-7B-Chat
Qwen-14B
Qwen-14B-Chat
Qwen1.5 阿里云 Qwen/Qwen1.5-0.5B
Qwen/Qwen1.5-0.5B-Chat
Qwen/Qwen1.5-1.8B
Qwen/Qwen1.5-1.8B-Chat
Qwen/Qwen1.5-7B
Qwen/Qwen1.5-7B-Chat
Qwen/Qwen1.5-14B
Qwen/Qwen1.5-14B-Chat
Qwen1.5-0.5B
Qwen1.5-0.5B-Chat
Qwen1.5-1.8B
Qwen1.5-1.8B-Chat
Qwen1.5-7B
Qwen1.5-7B-Chat
Qwen1.5-14B
Qwen1.5-14B-Chat
Qwen2 阿里云 Qwen/Qwen2-0.5B
Qwen/Qwen2-0.5B-Instruct
Qwen/Qwen2-1.5B
Qwen/Qwen2-1.5B-Instruct
Qwen/Qwen2-7B
Qwen/Qwen2-7B-Instruct
Qwen2-0.5B
Qwen2-0.5B-Instruct
Qwen2-1.5B
Qwen2-1.5B-Instruct
Qwen2-7B
Qwen2-7B-Instruct
InternLM InternLM 上海人工智能实验室 internlm/internlm-chat-7b
internlm/internlm-7b
internlm-7b
internlm-chat-7b
InternLM2 上海人工智能实验室 internlm/internlm2-1_8b
internlm/internlm2-chat-1_8b
internlm/internlm2-7b
internlm/internlm2-chat-7b
internlm/internlm2-20b
internlm/internlm2-chat-20b
internlm2-1_8b
internlm2-chat-1_8b
internlm2-7b
internlm2-chat-7b
InternLM2.5 上海人工智能实验室 internlm/internlm2_5-7b
internlm/internlm2_5-7b-chat
internlm/internlm2_5-7b-chat-1m
internlm2_5-7b
internlm2_5-7b-chat
internlm2_5-7b-chat-1m
Falcon Falcon tiiuae tiiuae/falcon-rw-1b
tiiuae/falcon-7b
tiiuae/falcon-7b-instruct
falcon-rw-1b
falcon-7b
falcon-7b-instruct
DeepSeek DeepSeek-MoE 幻方量化 deepseek-ai/deepseek-moe-16b-base
deepseek-ai/deepseek-moe-16b-chat
deepseek-moe-16b-base
deepseek-moe-16b-chat
DeepSeek-LLM 幻方量化 deepseek-ai/deepseek-llm-7b-base
deepseek-ai/deepseek-llm-7b-chat
deepseek-llm-7b-base
deepseek-llm-7b-chat
DeepSeek-V2 幻方量化 deepseek-ai/DeepSeek-V2-Lite
deepseek-ai/DeepSeek-V2-Lite-Chat
DeepSeek-Coder 幻方量化 待添加
DeepSeek-Coder-V2 幻方量化 待添加
MiniCPM MiniCPM OpenBMB openbmb/MiniCPM-2B-sft-bf16
openbmb/MiniCPM-2B-dpo-bf16
openbmb/MiniCPM-2B-128k
openbmb/MiniCPM-1B-sft-bf16
MiniCPM-2B-sft-bf16
MiniCPM-2B-dpo-bf16
MiniCPM-2B-128k
MiniCPM-1B-sft-bf16
MiniCPM-V OpenBMB 待添加
embedding text2vec-base-chinese shibing624 shibing624/text2vec-base-chinese text2vec-base-chinese
m3e moka-ai moka-ai/m3e-base m3e-base
bge BAAI BAAI/bge-large-en-v1.5
BAAI/bge-large-zh-v1.5
BAAI/bge-base-en-v1.5
BAAI/bge-base-zh-v1.5
BAAI/bge-small-en-v1.5
BAAI/bge-small-zh-v1.5
bge-large-en-v1.5
bge-large-zh-v1.5
bge-base-en-v1.5
bge-base-zh-v1.5
bge-small-en-v1.5
bge-small-zh-v1.5
gte thenlper thenlper/gte-large-zh
thenlper/gte-base-zh
gte-base-zh
gte-large-zh

*注:

  1. 高亮格式(如bert-base-chinese)的表示可直接build_transformer_model()联网下载
  2. 国内镜像网站加速下载
    • HF_ENDPOINT=https://hf-mirror.com python your_script.py
    • export HF_ENDPOINT=https://hf-mirror.com后再执行python代码
    • 在python代码开头如下设置
    import os
    os.environ['HF_ENDPOINT'] = "https://hf-mirror.com"

6. 鸣谢

  • 感谢苏神实现的bert4keras,本实现有不少地方参考了bert4keras的源码,在此衷心感谢大佬的无私奉献;
  • 其次感谢项目bert4pytorch,也是在该项目的指引下给了我用pytorch来复现bert4keras的想法和思路。

7. 引用

@misc{bert4torch,
  title={bert4torch},
  author={Bo Li},
  year={2022},
  howpublished={\url{https://github.com/Tongjilibo/bert4torch}},
}

8. 其他

  • Wechat & Star History Chart
  • 微信群人数超过200个(有邀请限制),可添加个人微信拉群
pic
微信号
pic
微信群
pic
Star History Chart