0


46集 ESP32的AI大模型对话工程软硬件开源了

三哥AI大模型对话工程开源

基本例程采用esp-adf里面的pipeline_baidu_speech_mp3例程,在这个例程基础上修改加入百度语音转文字,minimax大模型对话,百度文字转语音程序。

程序参考了力创实战派esp32c3的程序。采用建民大佬的硬件板卡,兼容乐鑫官网esp32_s3_korvo2_v3板卡,略微有改动。在此一并感谢大力支持!
显示部分在esp-adf\examples\display\music_player例程基础上修改加入的ai-toys工程里面,加入调通LVGL屏幕显示,调通了GT911触摸程序。
目前仍旧有bug,还有一些功能,比如语音唤醒屏幕显示等,三哥会持续修改更新。
有啥技术问题可以联系三哥微信:robot3g,三哥拉你进开发者联盟vx,或Q群:174742054(开发者联盟),进群讨论。

简介

使用 ADF 进行VTT(voice to text),之后text发到minimax进行ai大模型交互,返回的txt由百度在线语音合成 (text-to-speech, TTS) 服务生成的音频通过i2s送到音频codec进行播放。本示例默认是中文文本,但也支持其他一些语言,更多的技术细节可以参考 百度语音合成文档 页面。

获取百度在线语音转文字和语音合成 MP3 音频管道如下:

[codec_chip]---> [i2s_reader] ---> [http_stream_writer] ---> [baidu_vtt_server] 

[baidu_tts_server] ---> http_stream ---> mp3_decoder ---> i2s_stream ---> [codec_chip]

环境搭建

环境搭建请参考三哥的CSDN笔记:36集【新手必看】vscode搭建ESP32开发环境终极篇
https://blog.csdn.net/phlr5/article/details/141598455
B站配套视频:
https://www.bilibili.com/video/BV1NksAeuEAb
配套视频也可以在抖音、B站、小红书、微信短视频、快手,搜索柔贝特三哥,观看同名短视频就好。
【软硬件代码下载】
https://gitee.com/robot3g/ai_toys_publish
硬件主板:ai_toys_publish\hardware\01-RTC_LCD_V3.0.eprj
硬件屏幕板:ai_toys_publish\hardware\01-RTC_LEB18P_V2.0.eprj
有需要硬件板卡的,三哥后面整理好后会在淘宝、B站工坊和抖店上架。
【t宝链接】
https://item.taobao.com/item.htm?id=843727341146&skuId=5788969659288

硬件要求

编译和下载

IDF 默认分支IDF release/v5.2.2

配置

本工程默认配置大家可以看代码里面的sdkconfig.defaults和sdkconfig.defaults.esp32s3文件。

本例程默认选择的开发板是

CONFIG_AUDIO_BOARD_CUSTOM

,如果需要在其他的开发板上运行此例程,则需要在 menuconfig 中选择开发板的配置,例如选择

ESP32-Lyrat-Mini V1.1

menuconfig > Audio HAL > ESP32-Lyrat-Mini V1.1

本例需要连接 Wi-Fi 网络,通过运行

menuconfig

来配置 Wi-Fi 信息。

 menuconfig > Example Configuration > `WiFi SSID` and `WiFi Password` and 'Baidu speech access key ID' and "Baidu speech access secret"
 

要在百度在线语音合成页面申请语音合成应用,并把申请到的

API Key

Secret Key

分别填入

menuconfig

的配置中,用来和百度 TTS 服务器鉴权。
在minimax里面申请id和key。
具体可以参考三哥的CSDN笔记《21-22集 ESP32-IDF开发教程编译运行机器人对话工程-《MCU嵌入式AI开发笔记》》
https://editor.csdn.net/md/?articleId=140510897
和22集 如何minimax密钥和groupid-《MCU嵌入式AI开发笔记》
https://editor.csdn.net/md/?articleId=140598712
和相同集数的视频介绍。

在minimax_chat.c中修改groupid
62行修改:.url = “https://api.minimax.chat/v1/text/chatcompletion_pro?GroupId=xxx”, // 这里xxx替换成自己的GroupId
在ai_toys.c中第85行修改:
// 把下面的双引号里面的替换成自己的token_key
const char * minimax_key = “Bearer eyJhbGciOiJSxxx”;

编译和下载

如何使用例程

功能和用法

  • 例程开始运行后,按照配置首先尝试连接 Wi-Fi 网络,之后就可以按键聊天了。
PS D:\workspace\esp-idf\ai_toys>& set IDF_PATH='D:\Espressif\v5.2\esp-idf'
PS D:\workspace\esp-idf\ai_toys>&'D:\Espressif\tools\v5.2\python_env\idf5.2_py3.11_env\Scripts\python.exe''D:\Espressif\v5.2\esp-idf\tools\idf_monitor.py'-p COM4 -b 115200--toolchain-prefix xtensa-esp32s3-elf---target esp32s3 'd:\workspace\esp-idf\ai_toys\build\ai_toys.elf'--- WARNING: GDB cannot open serial ports accessed as COMx
--- Using \\.\COM4 instead...--- esp-idf-monitor 1.4.0 on \\.\COM4 115200------ Quit: Ctrl+]| Menu: Ctrl+T | Help: Ctrl+T followed by Ctrl+H ---
ESP-ROM:esp32s3-20210327
Build:Mar 272021
rst:0x1(POWERON),boot:0x8(SPI_FAST_FLASH_BOOT)
SPIWP:0xee
mode:DIO, clock div:1
load:0x3fce3810,len:0x178c
load:0x403c9700,len:0x4
load:0x403c9704,len:0xcbc
load:0x403cc700,len:0x2d9c
entry 0x403c9914I(27) boot: ESP-IDF v5.2.2-639-g43098fc4de-dirty 2nd stage bootloader
I(27) boot: compile time Oct  6202417:31:43I(28) boot: Multicore bootloader
I(32) boot: chip revision: v0.2I(36) boot.esp32s3: Boot SPI Speed :80MHz
I(40) boot.esp32s3: SPI Mode       : DIO
I(45) boot.esp32s3: SPI Flash Size :16MB
I(50) boot: Enabling RNG early entropy source...I(55) boot: Partition Table:I(59) boot: ## Label            Usage          Type ST Offset   Length
I(66) boot:0 nvs              WiFi data        01020000900000004000I(74) boot:1 phy_init         RF data          01010000d000 00001000I(81) boot:2 factory          factory app      00000001000000600000I(89) boot: End of partition table
I(93) esp_image: segment 0: paddr=00010020 vaddr=3c100020 size=326424h(3302436) map
I(694) esp_image: segment 1: paddr=0033644c vaddr=3fc9d100 size=04d50h(19792) load
I(698) esp_image: segment 2: paddr=0033b1a4 vaddr=40374000 size=04e74h(20084) load   
I(704) esp_image: segment 3: paddr=00340020 vaddr=42000020 size=f3400h(996352) map
I(886) esp_image: segment 4: paddr=00433428 vaddr=40378e74 size=141f8h(82424) load
I(915) boot: Loaded app from partition at offset 0x10000I(915) boot: Disabling RNG early entropy source...I(927) cpu_start: Multicore app
I(936) cpu_start: Pro cpu start user code
I(937) cpu_start: cpu freq:160000000 Hz
I(937) cpu_start: Application information:I(940) cpu_start: Project name:     ai_toys
I(944) cpu_start: App version:20241005-lcdtpgt911andadcbutton
I(952) cpu_start: Compile time:     Oct 11202406:47:46I(958) cpu_start: ELF file SHA256:728e4ed9d...I(963) cpu_start: ESP-IDF:          v5.2.2-639-g43098fc4de-dirty
I(970) cpu_start: Min chip rev:     v0.0I(974) cpu_start: Max chip rev:     v0.99I(979) cpu_start: Chip rev:         v0.2I(984) heap_init: Initializing. RAM available for dynamic allocation:I(991) heap_init: At 3FCB9D28 len 0002F9E8(190 KiB): RAM
I(997) heap_init: At 3FCE9710 len 00005724(21 KiB): RAM
I(1003) heap_init: At 3FCF0000 len 00008000(32 KiB): DRAM
I(1010) heap_init: At 600FE010 len 00001FD8(7 KiB): RTCRAM
I(1017) spi_flash: detected chip: generic
I(1021) spi_flash: flash io: dio
W(1025) i2c: This driver is an old driver, please migrate your application code to adapt `driver/i2c_master.h`
W(1036) ADC: legacy driver is deprecated, please migrate to `esp_adc/adc_oneshot.h`
I(1044) sleep: Configure to isolate all GPIO pins in sleep state
I(1051) sleep: Enable automatic switching of GPIO sleep configuration
I(1059) main_task: Started on CPU0
I(1069) main_task: Calling app_main()I(1069) BAIDU_SPEECH_EXAMPLE: nvs_flash_init start
I(1079) BAIDU_SPEECH_EXAMPLE:[0] Start and wait for Wi-Fi network
I(1079) TCA9554: Detected IO expander device at 0x70, name is: TCA9554A
I(1089) AUDIO_BOARD: tca9554_init done
E(1099) gpio: GPIO_PIN mask error 
I(1119) gpio: GPIO[2]| InputEn:0| OutputEn:1| OpenDrain:0| Pullup:0| Pulldown:0| Intr:0I(1519) AUDIO_BOARD: lcd init done
I(1519) AUDIO_THREAD: The esp_periph task allocate stack on internal memory
W(1519) I2C_BUS: I2C bus has been already created,[port:0]I(1519) GT911: TouchPad_ID:0x39,0x31,0x31I(1529) GT911: TouchPad_Config_Version:65I(1539) BAIDU_SPEECH_EXAMPLE: lv_port_init done
I(1559) BAIDU_SPEECH_EXAMPLE: lv_demo_music done
I(1559) BAIDU_SPEECH_EXAMPLE: ai_chat_task
I(1559) BAIDU_SPEECH_EXAMPLE:[0] Start and wait for Wi-Fi network
I(1569) BAIDU_SPEECH_EXAMPLE:[ key_start ] Initialize Button peripheral with board init
I(1579) BAIDU_SPEECH_EXAMPLE:[ key_start ] Create and start input key service
I(1579) AUDIO_THREAD: The input_key_service task allocate stack on internal memory
I(1589) AUDIO_THREAD: The button_task task allocate stack on internal memory
W(1599) BAIDU_SPEECH_EXAMPLE:[4] Waiting for a button to be pressed ...I(1609) BAIDU_SPEECH_EXAMPLE: audio_key_start done
I(1619) pp: pp rom version: e7ae62f
I(1619) net80211: net80211 rom version: e7ae62f
I(1629) wifi:wifi driver task:3fcd82f8, prio:23, stack:6656, core=0I(1629) wifi:wifi firmware version: c2ae8d1
I(1629) wifi:wifi certification version: v7.0I(1639) wifi:config NVS flash: enabled
I(1639) wifi:config nano formating: disabled
I(1649) wifi:Init data frame dynamic rx buffer num:32I(1649) wifi:Init static rx mgmt buffer num:5I(1649) wifi:Init management short buffer num:32I(1659) wifi:Init dynamic tx buffer num:32I(1659) wifi:Init static tx FG buffer num:2I(1669) wifi:Init static rx buffer size:1600I(1669) wifi:Init static rx buffer num:10I(1679) wifi:Init dynamic rx buffer num:32I(1679) wifi_init: rx ba win:6I(1679) wifi_init: tcpip mbox:32I(1689) wifi_init: udp mbox:6I(1689) wifi_init: tcp mbox:6I(1689) wifi_init: tcp tx win:5760I(1699) wifi_init: tcp rx win:5760I(1699) wifi_init: tcp mss:1440I(1709) wifi_init: WiFi IRAM OP enabled
I(1709) wifi_init: WiFi RX IRAM OP enabled
W(1719) wifi:Password length matches WPA2 standards, authmode threshold changes from OPEN to WPA2
I(1729) wifi:Set ps type:1, coexist:0I(1729) phy_init: phy_version 680,a6008b2,Jun  42024,16:41:10I(1779) wifi:mode :sta(cc:8d:a2:ee:c5:94)I(1779) wifi:enable tsf
W(1779) PERIPH_WIFI: WiFi Event cb, Unhandle event_base:WIFI_EVENT, event_id:43I(2899) wifi:new:<6,1>, old:<1,0>, ap:<255,255>, sta:<6,1>, prof:1I(2899) wifi:state: init ->auth(b0)W(2899) PERIPH_WIFI: WiFi Event cb, Unhandle event_base:WIFI_EVENT, event_id:43I(2899) wifi:state: auth ->assoc(0)I(2909) wifi:state: assoc ->run(10)I(2929) wifi:connected with xxx, aid =6, channel 6,40U, bssid = xxxx
I(2929) wifi:security: WPA2-PSK, phy: bgn, rssi:-37I(2929) wifi:pm start, type:1I(2929) wifi:dp:1, bi:102400, li:3, scale listen interval from 307200 us to 307200 us
I(2939) wifi:set rx beacon pti, rx_bcn_pti:0, bcn_timeout:25000, mt_pti:0, mt_time:10000W(2949) PERIPH_WIFI: WiFi Event cb, Unhandle event_base:WIFI_EVENT, event_id:4I(2959) wifi:<ba-add>idx:0(ifx:0, b0:95xxx), tid:0, ssn:0, winSize:64I(2989) wifi:AP's beacon interval =102400 us, DTIM period =1I(3949) esp_netif_handlers: sta ip:192.168.0.105, mask:255.255.255.0, gw:192.168.0.1I(3949) PERIPH_WIFI: Got ip:192.168.0.105I(4359) BAIDU_AUTH: Access token=24.xxxx023
I(4359) AUDIO_BOARD: audio_board_init and codec adc
W(4359) I2C_BUS: I2C bus has been already created,[port:0]I(4369) DRV8311: ES8311 in Slave mode
I(4389) gpio: GPIO[48]| InputEn:0| OutputEn:1| OpenDrain:0| Pullup:0| Pulldown:0| 
Intr:0W(4389) I2C_BUS: I2C bus has been already created,[port:0]I(4399) ES7210: ES7210 in Slave mode
I(4409) ES7210: Enable ES7210_INPUT_MIC1
I(4409) ES7210: Enable ES7210_INPUT_MIC2
I(4419) ES7210: Enable ES7210_INPUT_MIC3
W(4419) ES7210: Enable TDM mode. ES7210_SDP_INTERFACE2_REG12:2I(4429) ES7210: config fmt 60I(4429) AUDIO_HAL: Codec mode is 3, Ctrl:1I(4439) AUDIO_PIPELINE: link el->rb, el:0x3fce91e0, tag:vtt_i2s, rb:xx
I(4439) MP3_DECODER: MP3 init
I(4439) AUDIO_PIPELINE: link el->rb, el:0x3fce2ba0, tag:tts_http, rb:xx
I(4449) AUDIO_PIPELINE: link el->rb, el:0x3fce2edc, tag:tts_mp3, rb:xx
I(4459) BAIDU_SPEECH_EXAMPLE:[4] Set up  event listener
I(4459) BAIDU_SPEECH_EXAMPLE:[4.1] Listening event from the pipeline
I(4469) BAIDU_SPEECH_EXAMPLE:[4.2] Listening event from peripherals
I(4479) BAIDU_SPEECH_EXAMPLE:[5] Listen for all pipeline events
I(4489) BAIDU_SPEECH_EXAMPLE: main_page_task
I(4489) BAIDU_SPEECH_EXAMPLE: lv_main_page
I(4499) BAIDU_SPEECH_EXAMPLE: lv_main_page done
I(14609) BAIDU_SPEECH_EXAMPLE:[*] Event received: src_type:1048588, source:0x3fcd27e0 cmd:1, data:0x5, data_len:4I(14609) BAIDU_SPEECH_EXAMPLE: msg.cmd=1W(14609) AUDIO_PIPELINE: Without stop, st:1W(14619) AUDIO_PIPELINE: Without wait stop, st:1I(14619) BAIDU_TTS: TTS all el stopped
I(14629) BAIDU_SPEECH_EXAMPLE:[*] Resuming pipeline
I(14629) AUDIO_THREAD: The vtt_http task allocate stack on internal memory
I(14639) AUDIO_ELEMENT:[vtt_http-0x3fce01b8] Element task created
I(14649) AUDIO_THREAD: The vtt_i2s task allocate stack on internal memory
I(14649) AUDIO_ELEMENT:[vtt_i2s-0x3fce91e0] Element task created
I(14659) AUDIO_PIPELINE: Func:audio_pipeline_run, Line:359, MEM Total:91340 Bytes      

I(14669) AUDIO_ELEMENT:[vtt_http] AEL_MSG_CMD_RESUME,state:1I(14679) BAIDU_VTT:[+] HTTP client HTTP_STREAM_PRE_REQUEST, lenght=0I(14679) AUDIO_ELEMENT:[vtt_i2s] AEL_MSG_CMD_RESUME,state:1I(14689) AUDIO_PIPELINE: Pipeline started
I(14749) BAIDU_SPEECH_EXAMPLE:[*] Event received: src_type:131072, source:0x3fce91e0 cmd:8, data:0xc, data_len:4I(14749) BAIDU_SPEECH_EXAMPLE:[*] Event received: src_type:131072, source:0x3fce01b8 cmd:10, data:0x0, data_len:0I(14769) BAIDU_SPEECH_EXAMPLE:[*] Event received: src_type:131072, source:0x3fce01b8Total bytes written:69632I(16929) BAIDU_SPEECH_EXAMPLE:[*] Event received: src_type:1048588, source:0x3fcd27e0 cmd:2, data:0x5, data_len:4I(16929) BAIDU_SPEECH_EXAMPLE: msg.cmd=2I(16929) BAIDU_SPEECH_EXAMPLE:[*] Stop pipeline
W(16939) AUDIO_ELEMENT: IN-[vtt_http] AEL_IO_ABORT
I(16939) BAIDU_VTT:[+] HTTP client HTTP_STREAM_POST_REQUEST, write end chunked marker
I(17529) BAIDU_VTT:[+] HTTP client HTTP_STREAM_FINISH_REQUEST, read_len=132I(17529) BAIDU_VTT: Got HTTP Response ={"corpus_no":"xxx","err_msg":"success.","err_no":0,"result":["你叫什么名字?"],"sn":"xxxx"}I(17539) BAIDU_VTT: response_text:你叫什么名字?
I(17549) BAIDU_SPEECH_EXAMPLE: Original text = 你叫什么名字?
I(18019) MINIMAX_CHAT: Need to write 502, written 502I(18449) MINIMAX_CHAT: read ={"created":xxx,"model":"abab5.5s-chat","reply":"我
叫三哥,你呢?你叫什么名字?","choices":[{"finish_reason":"stop","messages":[{"sender_type":"BOT","sender_name":"三哥","text":"我叫三哥,你呢?你叫什么名字?"}]}],"usage":{"total_tokens":84},"input_sensitive":false,"output_sensitive":false,"id":"xxx","base_resp":{"status_code":0,"status_msg":""}}I(18479) MINIMAX_CHAT: response_text:我叫三哥,你呢?你叫什么名字?
I(18489) BAIDU_SPEECH_EXAMPLE: minimax answer = 我叫三哥,你呢?你叫什么名字?
I(18499) AUDIO_THREAD: The tts_http task allocate stack on internal memory
I(18509) AUDIO_ELEMENT:[tts_http-0x3fce2ba0] Element task created
W(18509) AUDIO_THREAD: Make sure selected the `CONFIG_SPIRAM_BOOT_INIT` and `CONFIG_SPIRAM_ALLOW_STACK_EXTERNAL_MEMORY` by `make menuconfig`
I(18529) AUDIO_THREAD: The tts_mp3 task allocate stack on internal memory
I(18529) AUDIO_ELEMENT:[tts_mp3-0x3fce2edc] Element task created
I(18539) AUDIO_THREAD: The tts_i2s task allocate stack on internal memory
I(18549) AUDIO_ELEMENT:[tts_i2s-0x3fcdfa3c] Element task created
I(18549) AUDIO_PIPELINE: Func:audio_pipeline_run, Line:359, MEM Total:64932 Bytes      

I(18559) AUDIO_ELEMENT:[tts_http] AEL_MSG_CMD_RESUME,state:1I(18569) BAIDU_TTS:[+] HTTP client HTTP_STREAM_PRE_REQUEST, lenght=0I(18579) AUDIO_ELEMENT:[tts_mp3] AEL_MSG_CMD_RESUME,state:1I(18579) MP3_DECODER: MP3 opened
I(18589) AUDIO_ELEMENT:[tts_i2s] AEL_MSG_CMD_RESUME,state:1I(18589) AUDIO_PIPELINE: Pipeline started
I(18649) BAIDU_SPEECH_EXAMPLE:[*] Event received: src_type:131072, source:0x3fce91e0 cmd:11, data:0x3fcca038, data_len:64I(18649) BAIDU_SPEECH_EXAMPLE:[*] Event received: src_type:131072, source:0x3fce91e0 cmd:8, data:0xe, data_len:4I(18659) BAIDU_SPEECH_EXAMPLE:[*] Event received: src_type:131072, source:0x3fce01b8 cmd:11, data:0x3fcca07c, data_len:64I(18669) BAIDU_SPEECH_EXAMPLE:[*] Event received: src_type:131072, source:0x3fce01b8 cmd:8, data:0xe, data_len:4I(18679) BAIDU_SPEECH_EXAMPLE:[*] Event received: src_type:131072, source:0x3fcdfa3c cmd:8, data:0xc, data_len:4I(18859) HTTP_STREAM: total_bytes=30240I(18869) CODEC_ELEMENT_HELPER: The element is 0x3fce2edc. The reserve data 2 is 0x0.I(18889) BAIDU_SPEECH_EXAMPLE:[*] Event received: src_type:131072, source:0x3fce2ba0 cmd:10, data:0x0, data_len:0I(18899) BAIDU_SPEECH_EXAMPLE:[*] Event received: src_type:131072, source:0x3fce2ba0 cmd:8, data:0xc, data_len:4I(18909) BAIDU_SPEECH_EXAMPLE:[*] Event received: src_type:131072, source:0x3fce2edc cmd:8, data:0xc, data_len:4I(18919) BAIDU_SPEECH_EXAMPLE:[*] Event received: src_type:131072, source:0x3fce2edc cmd:9, data:0x0, data_len:0W(21229) HTTP_STREAM: No more data,errno:0, total_bytes:30240, rlen =0I(21229) AUDIO_ELEMENT: IN-[tts_http] AEL_IO_DONE,0I(21239) BAIDU_SPEECH_EXAMPLE:[*] Event received: src_type:131072, source:0x3fce2ba0 cmd:11, data:0x3fcd2500, data_len:64I(21249) BAIDU_SPEECH_EXAMPLE:[*] Event received: src_type:131072, source:0x3fce2ba0 cmd:8, data:0xf, data_len:4I(22249) AUDIO_ELEMENT: IN-[tts_mp3] AEL_IO_DONE,-2I(22589) MP3_DECODER: Closed
I(22589) BAIDU_SPEECH_EXAMPLE:[*] Event received: src_type:131072, source:0x3fce2edc cmd:11, data:0x3fcc3e7c, data_len:64I(22599) BAIDU_SPEECH_EXAMPLE:[*] Event received: src_type:131072, source:0x3fce2edc cmd:8, data:0xf, data_len:4I(22649) AUDIO_ELEMENT: IN-[tts_i2s] AEL_IO_DONE,-2I(22649) BAIDU_SPEECH_EXAMPLE:[*] Event received: src_type:131072, source:0x3fcdfa3c cmd:11, data:0x3fce9680, data_len:64I(22659) BAIDU_SPEECH_EXAMPLE:[*] Event received: src_type:131072, source:0x3fcdfa3c cmd:8, data:0xf, data_len:4I(22669) BAIDU_SPEECH_EXAMPLE:[*] TTS Finish

技术支持

请按照下面的链接获取技术支持:
微信:robot3g
CSDN 笔记:https://blog.csdn.net/phlr5/category_12704106.html
B站视频讲解:
https://space.bilibili.com/635929440/channel/seriesdetail?sid=4184155
其他短视频搜索“柔贝特三哥”,抖音、快手、小红书、微信视频号都可以找到三哥,私信三哥就好。
我们会尽快回复。


本文转载自: https://blog.csdn.net/phlr5/article/details/143035195
版权归原作者 柔贝特三哥 所有, 如有侵权,请联系我们删除。

“46集 ESP32的AI大模型对话工程软硬件开源了”的评论:

还没有评论