0


部署AI语音助手,实现本地Siri

在这里插入图片描述

This is a project to deploy a local AI assistant. This combines FunAsr,Ollama and CosyVoice.

在这里插入图片描述

Preparation

FunAsr

You need to guarantee the WSL and Windows host can share the docker, this can be opened WSL in the docker desktop.

在这里插入图片描述

Follow this tutorial to deploy FunAsr: FunASR Realtime Transcribe Service.

Download workspace and run the local Asr server:

  1. $ curl-O https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/shell/funasr-runtime-deploy-online-cpu-zh.sh
  2. $ sudobash funasr-runtime-deploy-online-cpu-zh.sh install--workspace ./funasr-runtime-resources
  3. # Restart the container
  4. $ sudobash funasr-runtime-deploy-online-cpu-zh.sh restart

In the container, it will download models from the modelscope. After that you should see:

  1. $ dockerps-a
  2. $ dockerexec-it<contianer ID>bash# In container:
  3. $ watch-n0.1"cat FunASR/runtime/log.txt | tail -n 10"
  4. I20240915 21:57:20.544512 56 ct-transformer-online.cpp:21] Successfully load model from /workspace/models/damo/punc_ct-transformer_zh-cn-common-vad_realtime-vocab272727-onnx/model_quant.onnx
  5. I20240915 21:57:20.647238 56 itn-processor.cpp:33] Successfully load model from /workspace/models/thuduj12/fst_itn_zh/zh_itn_tagger.fst
  6. I20240915 21:57:20.648470 56 itn-processor.cpp:35] Successfully load model from /workspace/models/thuduj12/fst_itn_zh/zh_itn_verbalizer.fst
  7. I20240915 21:57:20.648491 56 websocket-server-2pass.cpp:580] initAsr run check_and_clean_connection
  8. I20240915 21:57:20.648558 56 websocket-server-2pass.cpp:583] initAsr run check_and_clean_connection finished
  9. I20240915 21:57:20.648564 56 funasr-wss-server-2pass.cpp:565] decoder-thread-num: 16
  10. I20240915 21:57:20.648567 56 funasr-wss-server-2pass.cpp:566] io-thread-num: 4
  11. I20240915 21:57:20.648571 56 funasr-wss-server-2pass.cpp:567] model-thread-num: 2
  12. I20240915 21:57:20.648572 56 funasr-wss-server-2pass.cpp:568] asr model init finished. listen on port:10095

VAD

We need to stop recording when the mic is not active. Here the technology is called VAD.

We also need to judge when the users input stop, not by the mic, but the results count from FunAsr server. The count of sentences(messages) sent from the ASR server increases when user is saying. When user stopped, the count will stop increasing. I set the latency to 2s, when there is no more sentences coming from the server, the thread of

  1. wait_end_and_send_to_ollama

will block and wait the results from the Ollama server.

FIX: Note that when assistant is saying, the users input at the same time will be set as next input. An important feature is that user can interrupt the assistant.

Ollama

Download and install ollama in windows. After that, run

  1. ollama run llama3.1

or

  1. ollama run qwen:7b

in the Powershell to download the model. Then start the ollama server:

  1. $ ollama serve
  2. ...
  3. time=2024-09-15T18:25:35.068+08:00 level=INFO source=images.go:753 msg="total blobs: 5"time=2024-09-15T18:25:35.069+08:00 level=INFO source=images.go:760 msg="total unused blobs removed: 0"time=2024-09-15T18:25:35.070+08:00 level=INFO source=routes.go:1172 msg="Listening on 127.0.0.1:11434 (version 0.3.10)"time=2024-09-15T18:25:35.070+08:00 level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu cpu_avx cpu_avx2 cuda_v11 cuda_v12 rocm_v6.1]"time=2024-09-15T18:25:35.070+08:00 level=INFO source=gpu.go:200 msg="looking for compatible GPUs"time=2024-09-15T18:25:35.251+08:00 level=INFO source=gpu.go:292 msg="detected OS VRAM overhead"id=GPU-e2aae3a3-4bf5-4f72-0920-b864cb97001c library=cuda compute=8.9driver=12.6name="NVIDIA GeForce RTX 4060"overhead="124.9 MiB"time=2024-09-15T18:25:35.259+08:00 level=INFO source=types.go:107 msg="inference compute"id=GPU-e2aae3a3-4bf5-4f72-0920-b864cb97001c library=cuda variant=v12 compute=8.9driver=12.6name="NVIDIA GeForce RTX 4060"total="8.0 GiB"available="6.9 GiB"

You can easily follow the Ollama PyPi tutorial to use ollama APIs.

  1. import ollama
  2. # Response streaming can be enabled by setting stream=True, modifying function# calls to return a Python generator where each part is an object in the stream.
  3. stream = ollama.chat(
  4. model='llama3.1',
  5. messages=[{'role':'user','content':'Why is the sky blue?'}],
  6. stream=True,)for chunk in stream:print(chunk['message']['content'], end='', flush=True)

CosyVoice

Follow the tutorial: CosyVoice

  • Cuda 11.8 torch and torchaudio:$ pip installtorch==2.0.1 --index-url https://download.pytorch.org/whl/cu118
  • If you want to clone audio, transform audio file recorded from windows recorder:> This is a very funny feature provided by CosyVoice, for example, you can clone Trump’s voice. This is already realized several years ago, but here it supports Chinese and uses LLM. Note that you should follow the law and privacy policy, it is very significant.$ ffmpeg -i input.m4a output.wav
  • ONNX Runtime Issue: onnxruntime::Provider& onnxruntime::ProviderLibrary::Get() [ONNXRuntimeError] : 1 : FAIL : Failed to load library libonnxruntime_providers_cuda.so with error: libcufft.so.10: cannot open shared object file: No such file or directory``````$ pip3 install onnxruntime-gpu==1.18.1 -i https://mirrors.aliyun.com/pypi/simple/$ pip3 installonnxruntime==1.18.1 -i https://mirrors.aliyun.com/pypi/simple/
  • Glibc issue: version GLIBCXX_3.4.29 not found``````$ find ~ -name"libstdc++.so.6*"$ strings .conda/envs/cosyvoice/lib/libstdc++.so.6 |grep-i"glibcxx"$ sudocp .conda/envs/cosyvoice/lib/libstdc++.so.6.0.33 /lib/x86_64-linux-gnu$ sudorm /usr/lib/x86_64-linux-gnu/libstdc++.so.6$ sudoln-s /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.33 /usr/lib/x86_64-linux-gnu/libstdc++.so.6

Server and client

  1. # 安装依赖
  2. $ cd runtime/python/grpc && python3 -m grpc_tools.protoc -I. --python_out=. --grpc_python_out=. cosyvoice.proto
  3. # 将文字请求发送至server,并返回语音文件 demo.wav
  4. $ python3 runtime/python/grpc/server.py --port50000--max_conc4--model_dir pretrained_models/CosyVoice-300M &&sleep infinit
  5. $ python3 runtime/python/grpc/client.py --port50000--mode sft
  6. # Fast API
  7. $ python3 runtime/python/fastapi/server.py --port50000--model_dir pretrained_models/CosyVoice-300M &&sleep infinity
  8. $ python3 runtime/python/fastapi/client.py --port50000--mode sft

Play the audio in python

After get the

  1. .wav

files from the server, you can use

  1. simpleaudio

python library to play the audio.

  1. import simpleaudio as sa
  2. # Load audio file
  3. filename ='demo.wav'
  4. wave_obj = sa.WaveObject.from_wave_file(filename)# Play the audio
  5. play_obj = wave_obj.play()
  6. play_obj.wait_done()

TODO

Streaming…

Deploy

Script:

  1. # -*- encoding: utf-8 -*-import os
  2. import websockets, ssl
  3. import asyncio
  4. import argparse
  5. import json
  6. from ollama import Client
  7. import logging
  8. from multiprocessing import Process
  9. logging.basicConfig(level=logging.ERROR)
  10. parser = argparse.ArgumentParser()
  11. parser.add_argument("--host",type=str, default="localhost", required=False,help="host ip, localhost, 0.0.0.0")
  12. parser.add_argument("--port",type=int, default=10095, required=False,help="grpc server port")
  13. parser.add_argument("--chunk_size",type=str, default="5, 10, 5",help="chunk")
  14. parser.add_argument("--chunk_interval",type=int, default=10,help="chunk")
  15. parser.add_argument("--hotword",type=str, default="",help="hotword file path, one hotword perline (e.g.:阿里巴巴 20)")
  16. parser.add_argument("--words_max_print",type=int, default=10000,help="chunk")
  17. parser.add_argument("--use_itn",type=int, default=1,help="1 for using itn, 0 for not itn")
  18. parser.add_argument("--powershell",type=int, default=0,help="work under powershell")
  19. parser.add_argument("--llamahost",type=str, default="0.0.0.0:11434",help="Ollama server")
  20. parser.add_argument("--llm_model",type=str, default="llama3.1",help="Ollama model")
  21. args = parser.parse_args()
  22. args.chunk_size =[int(x)for x in args.chunk_size.split(",")]
  23. msg_cnt =0
  24. msg_end =False
  25. text_print =""
  26. text_print_2pass_online =""
  27. text_print_2pass_offline =""
  28. messages =[]asyncdefrecord_microphone():
  29. is_finished =Falseimport pyaudio
  30. FORMAT = pyaudio.paInt16
  31. CHANNELS =1
  32. RATE =16000
  33. chunk_size =60* args.chunk_size[1]/ args.chunk_interval
  34. CHUNK =int(RATE /1000* chunk_size)
  35. p = pyaudio.PyAudio()
  36. stream = p.open(format=FORMAT, channels=CHANNELS, rate=RATE,input=True, frames_per_buffer=CHUNK)# hotwords
  37. fst_dict ={}
  38. hotword_msg =""if args.hotword.strip()!="":
  39. f_scp =open(args.hotword, encoding='utf-8')
  40. hot_lines = f_scp.readlines()for line in hot_lines:
  41. words = line.strip().split(" ")iflen(words)<2:print("Please checkout format of hotwords")continuetry:
  42. fst_dict[" ".join(words[:-1])]=int(words[-1])except ValueError:print("Please checkout format of hotwords")
  43. hotword_msg=json.dumps(fst_dict)
  44. use_itn=Trueif args.use_itn ==0:
  45. use_itn=False
  46. message = json.dumps({"mode":"2pass","chunk_size": args.chunk_size,"chunk_interval": args.chunk_interval,"wav_name":"microphone","is_speaking":True,"hotwords":hotword_msg,"itn": use_itn})await websocket.send(message)if args.powershell:
  47. os.system('powershell -command "Clear-Host"')else:
  48. os.system('clear')print('---------------------------------------------------------------')print('Welcome to use local AI assistant. You can say: Tell me a joke!')print('---------------------------------------------------------------')print("User: ")whileTrue:
  49. data = stream.read(CHUNK)
  50. message = data
  51. await websocket.send(message)await asyncio.sleep(0.005)asyncdefwait_end_and_send_to_ollama():whileTrue:global msg_cnt, msg_end, text_print, text_print_2pass_online, text_print_2pass_offline
  52. cur_cnt = msg_cnt
  53. await asyncio.sleep(2)if(msg_cnt == cur_cnt and text_print):
  54. prompt = text_print
  55. assistant_log =""# Clear ASR texts
  56. text_print_2pass_online =""
  57. text_print_2pass_offline =""
  58. text_print =""print("\n\nAssistant: ")
  59. messages.append({'role':'user','content': prompt})
  60. client = Client(host='http://'+ args.llamahost)
  61. stream = client.chat(model=args.llm_model, messages=messages, stream=True)for chunk in stream:
  62. assistant_log += chunk['message']['content']print(chunk['message']['content'], end='', flush=True)
  63. messages.append({'role':'assistant','content': assistant_log})
  64. assistant_log =""print("\n-------------------------------------------------------------")print("User: ")asyncdefmessage(id):global websocket
  65. try:whileTrue:global msg_cnt, msg_end, text_print, text_print_2pass_online, text_print_2pass_offline
  66. meg =await websocket.recv()
  67. msg_cnt +=1
  68. meg = json.loads(meg)
  69. text = meg["text"]if'mode'notin meg:continueelse:if meg["mode"]=="2pass-online":
  70. text_print_2pass_online +="{}".format(text)
  71. text_print = text_print_2pass_offline + text_print_2pass_online
  72. else:
  73. text_print_2pass_online =""
  74. text_print = text_print_2pass_offline +"{}".format(text)
  75. text_print_2pass_offline +="{}".format(text)
  76. text_print = text_print[-args.words_max_print:]# Fix: Delete the laster punctuation mark in the laster sentenceif(text_print[0]in["。",",","?","!"]):
  77. text_print = text_print[1:]print("\r"+ text_print, end='')except Exception as e:print("Exception:", e)asyncdefws_client(id, chunk_begin, chunk_size):
  78. chunk_begin=0
  79. chunk_size=1global websocket
  80. for i inrange(chunk_begin,chunk_begin+chunk_size):
  81. ssl_context = ssl.SSLContext()
  82. ssl_context.check_hostname =False
  83. ssl_context.verify_mode = ssl.CERT_NONE
  84. uri ="wss://{}:{}".format(args.host, args.port)print("connect to", uri)asyncwith websockets.connect(uri, subprotocols=["binary"], ping_interval=None, ssl=ssl_context)as websocket:
  85. task1 = asyncio.create_task(record_microphone())
  86. task2 = asyncio.create_task(message(str(id)+"_"+str(i)))#processid+fileid
  87. task3 = asyncio.create_task(wait_end_and_send_to_ollama())await asyncio.gather(task1, task2, task3)
  88. exit(0)defone_thread(id, chunk_begin, chunk_size):
  89. asyncio.get_event_loop().run_until_complete(ws_client(id, chunk_begin, chunk_size))
  90. asyncio.get_event_loop().run_forever()if __name__ =='__main__':
  91. p = Process(target=one_thread, args=(0,0,0))
  92. p.start()
  93. p.join()print('end')

Windows

Run the script:

  1. pip3 install websockets pyaudio ollama
  2. python3 funasr_client.py --host"127.0.0.1"--port10095--hotword hotword.txt --powershell1--llm_mode llama3.1 --llamahost"localhost:11434"

在这里插入图片描述

WSL

  • If ollama runs in the Windows host, you should enable wsl to access it in LAN (For other devices, this should also be enabled). In Powershell:$ [Environment]::SetEnvironmentVariable('OLLAMA_HOST', '0.0.0.0:11434', 'Process')$ [Environment]::SetEnvironmentVariable('OLLAMA_ORIGINS', '*', 'Process')$ ollama serve
  • Run ipconfig in Poweshell to get the IPv4 of Host, for example: 172.20.10.2.
  • The audio may not work due to the audio card. A way to solve the problem:$ sudoapt-getinstall python3-pyaudio pulseaudio portaudio19-dev
  • Run the scripts:$ pip3 install websockets pyaudio ollama$ python3 funasr_client.py --host"127.0.0.1"--port10095--hotword hotword.txt --llamahost"172.20.10.2:11434"--llm_model"qwen:7b"

References

  • Ollama PyPI
  • Ollama API
  • version GLIBCXX_3.4.29 not found

本文转载自: https://blog.csdn.net/JackSparrow_sjl/article/details/142327865
版权归原作者 Yanjing-233 所有, 如有侵权,请联系我们删除。

“部署AI语音助手,实现本地Siri”的评论:

还没有评论