https://github.com/FlagAlpha/Llama2-Chinese
https://huggingface.co/FlagAlpha
https://hub.docker.com/r/longerhuya/llama2-chinese-7b
https://hub.docker.com/r/ninthkat/jupyterlab-pytorch-cuda
# Installing the NVIDIA Container Toolkit
https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html
curl -s -L https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo | \
sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo
yum install -y nvidia-container-toolkit
-Configuring Docker
nvidia-ctk runtime configure --runtime=docker
systemctl restart docker
docker run -d --restart=always --name llama2 -p 9999:8888 -p 15550:15550 -p 15551:15551 -v /data/site/docker/data/llama2:/home/jovyan -e TZ='Asia/Shanghai' -v /etc/localtime:/etc/localtime:ro --shm-size 12G ninthkat/jupyterlab-pytorch-cuda:latest
http://g.htmltoo.com:9999
-p 15550:15550 -p 15551:15551
bitnami/pytorch
--shm-size 16G
nvcr.io/nvidia/pytorch:21.08-py3
docker exec -it llama2 /bin/bash
-安装conda
cd /notebooks
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh
export PATH="/root/miniconda3/bin:$PATH"
-创建 conda 环境:
python --version
conda --version
conda create -n llama2 python=3.10.11
-安装依赖库
activate llama2
conda init
docker exec -it llama2 env \LANG=C.UTF-8 /bin/bash
cd /home/jovyan
activate llama2
git clone https://github.com/facebookresearch/llama.git
cd llama
-安装依赖库:
pip install -e .
pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple
-代码及模型权重拉取
git clone https://github.com/FlagAlpha/Llama2-Chinese.git
-拉取 Llama2-Chinese-13b-Chat 模型权重及代码
cd Llama2-Chinese
git clone git clone https://huggingface.co/FlagAlpha/Llama2-Chinese-13b-Chat
-文件大小查看:
du -sh Llama2-Chinese-13b-Chat
-输出:
25G Llama2-Chinese-13b-Chat
---终端测试
-进入python环境:
python3
-输入代码
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained('Llama2-Chinese-13b-Chat',device_map='auto',torch_dtype=torch.float16,load_in_8bit=True)
model =model.eval()
tokenizer = AutoTokenizer.from_pretrained('Llama2-Chinese-13b-Chat',use_fast=False)
tokenizer.pad_token = tokenizer.eos_token
input_ids = tokenizer(['<s>Human: 介绍一下深圳\n</s><s>Assistant: '], return_tensors="pt",add_special_tokens=False).input_ids.to('cuda')
generate_input = {
"input_ids":input_ids,
"max_new_tokens":512,
"do_sample":True,
"top_k":50,
"top_p":0.95,
"temperature":0.3,
"repetition_penalty":1.3,
"eos_token_id":tokenizer.eos_token_id,
"bos_token_id":tokenizer.bos_token_id,
"pad_token_id":tokenizer.pad_token_id
}
generate_ids = model.generate(**generate_input)
text = tokenizer.decode(generate_ids[0])
print(text)
---页面测试
-使用 gradio 搭建页面
pip3 install gradio -i https://pypi.tuna.tsinghua.edu.cn/simple
---加载模型并启动服务
vi /notebooks/Llama2-Chinese/examples/chat_gradio.py
到94行:
demo.queue().launch(share=False, debug=True, server_name="0.0.0.0")
1
修改为:
demo.queue().launch(share=False, debug=True, server_name="0.0.0.0", server_port=15550)
启动脚本:
python3 examples/chat_gradio.py --model_name_or_path Llama2-Chinese-13b-Chat
-如果出现下面的错误:
File "/notebooks/Llama2-Chinese/examples/chat_gradio.py", line 94
demo.queue().launch(share=False, debug=True, server_name="0.0.0.0")
^
SyntaxError: invalid character ',' (U+FF0C)
-则按照下面的步骤修改代码:
vi /notebooks/Llama2-Chinese/examples/chat_gradio.py
:94
修改中文逗号,为英文逗号,
94 demo.queue().launch(share=False, debug=True, server_name="0.0.0.0")
=>
94 demo.queue().launch(share=False, debug=True, server_name="0.0.0.0")
---测试
http://g.htmltoo.com:15550
llama2的模型下载需要去官网申请,申请可能需要科学上网,下载不需要,
申请地址:
https://ai.meta.com/resources/models-and-libraries/llama-downloads/
cd /opt
git clone https://github.com/facebookresearch/llama.git
cd llama
pip3 install -e .
pip3 install --upgrade torch torchvision -i https://pypi.tuna.tsinghua.edu.cn/simple fastNLP
llama2有7B、13B和70B三个版本,分别为70 亿、130 亿和 700 亿三种参数变体,参数越多对配置要求越高。
每个版本有可调参的版本和chat版本,我这里选择第一个7B的版本。
Llama-2-7b
Llama-2-7b-chat
Llama-2-13b
Llama-2-13b-chat
Llama-2-70b
Llama-2-70b-chat
bash download.sh
按照提示,贴上邮件内提供的下载的URL,选择需要下载的版本,然后等待下载完成,7B的文件13G比较大得等半天,其他的版本更大。
https://blog.csdn.net/cecere/article/details/132120423
https://blog.csdn.net/zengNLP/article/details/131965453
https://zhuanlan.zhihu.com/p/647067870