0


.Net 使用OpenAI开源语音识别模型Whisper

.Net 使用OpenAI开源语音识别模型 Whisper

前言

Open AI在2022年9月21日开源了号称其英文语音辨识能力已达到人类水准的 Whisper 神经网络,且它亦支持其它98种语言的自动语音辨识。 Whisper系统所提供的自动语音辨识(Automatic Speech Recognition,ASR)模型是被训练来运行语音辨识与翻译任务的,它们能将各种语言的语音变成文本,也能将这些文本翻译成英文。

whisper的核心功能语音识别,对于大部分人来说,可以帮助我们更快捷的将会议、讲座、课堂录音整理成文字稿;对于影视爱好者,可以将无字幕的资源自动生成字幕,不用再苦苦等待各大字幕组的字幕资源;对于外语口语学习者,使用whisper翻译你的发音练习录音,可以很好的检验你的口语发音水平。 当然,各大云平台都提供语音识别服务,但是基本都是联网运行,个人隐私安全总是有隐患,而whisper完全不同,whisper完全在本地运行,无需联网,充分保障了个人隐私,且whisper识别准确率相当高。

Whisper是C++写的,sandrohanea 对其进行了.Net封装。

本文旨在梳理我在.net web 项目中使用开源语音识别模型Whisper的过程,方便下次翻阅,如对您有所帮助不胜荣幸~

.Net Web 项目版本为:.Net 6.0

文章目录

安装Whisper.net包

首先我们在Core项目中安装Whisper.net包。在NuGet包管理器中搜索并安装【Whisper.net】和【Whisper.net.Runtime】包,如下图所示:

注意,我们要找的是【Whisper.net】和【Whisper.net.Runtime】,不是、【WhisperNet】、【Whisper.Runtime】。

image-20230530162444326

下载模型文件

前往Hugging Face下载Whisper的模型文件,一共有 ggml-tiny.bin、ggml-base.bin、ggml-small.bin、ggml-medium.bin、ggml-large.bin 5个模型,文件大小依次变大,识别率也依次变大。此外,【xxx.en.bin】是英文模型,【xxx.bin】支持各国语言。

我们将模型文件放到项目中即可,我这里是放到Web项目的wwwroot下:

image-20230530165740596

新建Whisper帮助类

WhisperHelper.cs

image-20230530170227200

  1. using Whisper.net;
  2. using System.IO;
  3. using System.Collections.Generic;
  4. using Market.Core.Enum;
  5. namespace Market.Core.Util
  6. {
  7. public class WhisperHelper
  8. {
  9. public static List<SegmentData> Segments { get; set; }
  10. public static WhisperProcessor Processor { get; set; }
  11. public WhisperHelper(ASRModelType modelType)
  12. {
  13. if(Segments == null || Processor == null)
  14. {
  15. Segments = new List<SegmentData>();
  16. var binName = "ggml-large.bin";
  17. switch (modelType)
  18. {
  19. case ASRModelType.WhisperTiny:
  20. binName = "ggml-tiny.bin";
  21. break;
  22. case ASRModelType.WhisperBase:
  23. binName = "ggml-base.bin";
  24. break;
  25. case ASRModelType.WhisperSmall:
  26. binName = "ggml-small.bin";
  27. break;
  28. case ASRModelType.WhisperMedium:
  29. binName = "ggml-medium.bin";
  30. break;
  31. case ASRModelType.WhisperLarge:
  32. binName = "ggml-large.bin";
  33. break;
  34. default:
  35. break;
  36. }
  37. var modelFilePath = $"wwwroot/WhisperModel/{binName}";
  38. var factory = WhisperFactory.FromPath(modelFilePath);
  39. var builder = factory.CreateBuilder()
  40. .WithLanguage("zh") //中文
  41. .WithSegmentEventHandler(Segments.Add);
  42. var processor = builder.Build();
  43. Processor = processor;
  44. }
  45. }
  46. /// <summary>
  47. /// 完整的语音识别 单例实现
  48. /// </summary>
  49. /// <returns></returns>
  50. public string FullDetection(Stream speechStream)
  51. {
  52. Segments.Clear();
  53. var txtResult = string.Empty;
  54. //开始识别
  55. Processor.Process(speechStream);
  56. //识别结果处理
  57. foreach (var segment in Segments)
  58. {
  59. txtResult += segment.Text + "\n";
  60. }
  61. Segments.Clear();
  62. return txtResult;
  63. }
  64. }
  65. }

ModelType.cs

不同的模型名字不一样,需要用一个枚举类作区分:

image-20230530170534542

  1. using System.ComponentModel;
  2. namespace Market.Core.Enum
  3. {
  4. /// <summary>
  5. /// ASR模型类型
  6. /// </summary>
  7. [Description("ASR模型类型")]
  8. public enum ASRModelType
  9. {
  10. /// <summary>
  11. /// ASRT
  12. /// </summary>
  13. [Description("ASRT")]
  14. ASRT = 0,
  15. /// <summary>
  16. /// WhisperTiny
  17. /// </summary>
  18. [Description("WhisperTiny")]
  19. WhisperTiny = 100,
  20. /// <summary>
  21. /// WhisperBase
  22. /// </summary>
  23. [Description("WhisperBase")]
  24. WhisperBase = 110,
  25. /// <summary>
  26. /// WhisperSmall
  27. /// </summary>
  28. [Description("WhisperSmall")]
  29. WhisperSmall = 120,
  30. /// <summary>
  31. /// WhisperMedium
  32. /// </summary>
  33. [Description("WhisperMedium")]
  34. WhisperMedium = 130,
  35. /// <summary>
  36. /// WhisperLarge
  37. /// </summary>
  38. [Description("WhisperLarge")]
  39. WhisperLarge = 140,
  40. /// <summary>
  41. /// PaddleSpeech
  42. /// </summary>
  43. [Description("PaddleSpeech")]
  44. PaddleSpeech = 200,
  45. }
  46. }

后端接受音频并识别

后端接口接受音频二进制字节码,并使用Whisper帮助类进行语音识别。

image-20230530171221152

关键代码如下:

  1. public class ASRModel
  2. {
  3. public string samples { get; set; }
  4. }
  5. /// <summary>
  6. /// 语音识别
  7. /// </summary>
  8. [HttpPost]
  9. [Route("/auth/speechRecogize")]
  10. public async Task<IActionResult> SpeechRecogizeAsync([FromBody] ASRModel model)
  11. {
  12. ResultDto result = new ResultDto();
  13. byte[] wavData = Convert.FromBase64String(model.samples);
  14. model.samples = null; //内存回收
  15. // 使用Whisper模型进行语音识别
  16. var speechStream = new MemoryStream(wavData);
  17. var whisperManager = new WhisperHelper(model.ModelType);
  18. var textResult = whisperManager.FullDetection(speechStream);
  19. speechStream.Dispose();//内存回收
  20. speechStream = null;
  21. wavData = null; //内存回收
  22. result.Data = textResult;
  23. return Json(result.OK());
  24. }

前端页面上传音频

前端主要做一个音频采集的工作,然后将音频文件转化成二进制编码传输到后端Api接口中

前端页面如下:

image-20230530134802045

页面代码如下:

  1. @{
  2. Layout = null;
  3. }
  4. @using Karambolo.AspNetCore.Bundling.ViewHelpers
  5. @addTagHelper *, Karambolo.AspNetCore.Bundling
  6. @addTagHelper *, Microsoft.AspNetCore.Mvc.TagHelpers
  7. <!DOCTYPEhtml><html><head><metacharset="utf-8"/><title>语音录制</title><metaname="viewport"content="width=device-width, user-scalable=no, initial-scale=1.0, maximum-scale=1.0, minimum-scale=1.0"><environmentnames="Development"><linkhref="~/content/plugins/element-ui/index.css"rel="stylesheet"/><scriptsrc="~/content/plugins/jquery/jquery-3.4.1.min.js"></script><scriptsrc="~/content/js/matomo.js"></script><scriptsrc="~/content/js/slick.min.js"></script><scriptsrc="~/content/js/masonry.js"></script><scriptsrc="~/content/js/instafeed.min.js"></script><scriptsrc="~/content/js/headroom.js"></script><scriptsrc="~/content/js/readingTime.min.js"></script><scriptsrc="~/content/js/script.js"></script><scriptsrc="~/content/js/prism.js"></script><scriptsrc="~/content/js/recorder-core.js"></script><scriptsrc="~/content/js/wav.js"></script><scriptsrc="~/content/js/waveview.js"></script><scriptsrc="~/content/js/vue.js"></script><scriptsrc="~/content/plugins/element-ui/index.js"></script><scriptsrc="~/content/js/request.js"></script></environment><environmentnames="Stage,Production">
  8. @await Styles.RenderAsync("~/bundles/login.css")
  9. @await Scripts.RenderAsync("~/bundles/login.js")
  10. </environment><style>html,
  11. body{margin: 0;height: 100%;}body{padding: 20px;box-sizing: border-box;}audio{display:block;}audio + audio{margin-top: 20px;}.el-textarea .el-textarea__inner{color: #000 !important;font-size: 18px;font-weight: 600;}#app{height: 100%;}.content{height:calc(100% - 130px);overflow: auto;}.content > div{margin: 10px 0 20px;}.press{height: 40px;line-height: 40px;border-radius: 5px;border: 1px solid #dcdfe6;cursor: pointer;width: 100%;text-align: center;background: #fff;}</style></head><body><divid="app"><divstyle="display: flex;justify-content: space-between;align-items: center;"><center>{{isPC? '我是电脑版' : '我是手机版'}}</center><centerstyle="margin: 10px 0"><el-radio-groupv-model="modelType"><el-radio:label="0">ASRT</el-radio><el-radio:label="100">WhisperTiny</el-radio><el-radio:label="110">WhisperBase</el-radio><el-radio:label="120">WhisperSmall</el-radio><el-radio:label="130">WhisperMedium</el-radio><el-radio:label="140">WhisperLarge</el-radio><el-radio:label="200">PaddleSpeech</el-radio></el-radio-group></center><el-buttontype="primary"size="small"onclick="window.location.href ='/'">返回</el-button></div><divclass="content"id="wav_pannel">
  12. @*{{textarea}}*@
  13. </div><divstyle="margin-top: 20px"></div><centerstyle="height: 40px;"><h4id="msgbox"v-if="messageSatuts">{{message}}</h4></center><buttonclass="press"v-on:touchstart="start"v-on:touchend="end"v-if="!isPC">
  14. 按住 说话
  15. </button><buttonclass="press"v-on:mousedown="start"v-on:mouseup="end"v-else>
  16. 按住 说话
  17. </button></div></body></html><script>var blob_wav_current;var rec;varrecOpen=function(success){
  18. rec =Recorder({type:"wav",sampleRate:16000,bitRate:16,onProcess:(buffers, powerLevel, bufferDuration, bufferSampleRate, newBufferIdx, asyncEnd)=>{}});
  19. rec.open(()=>{
  20. success &&success();},(msg, isUserNotAllow)=>{
  21. app.textarea =(isUserNotAllow ?"UserNotAllow":"")+"无法录音:"+ msg;});};var app =newVue({el:'#app',data:{textarea:'',message:'',messageSatuts:false,modelType:0,},computed:{isPC(){var userAgentInfo = navigator.userAgent;var Agents =["Android","iPhone","SymbianOS","Windows Phone","iPod","iPad"];var flag =true;for(var i =0; i < Agents.length; i++){if(userAgentInfo.indexOf(Agents[i])>0){
  22. flag =false;break;}}return flag;}},methods:{start(){
  23. app.message ="正在录音...";
  24. app.messageSatuts =true;recOpen(function(){
  25. app.recStart();});},end(){if(rec){
  26. rec.stop(function(blob, duration){
  27. app.messageSatuts =false;
  28. rec.close();
  29. rec =null;
  30. blob_wav_current = blob;var audio = document.createElement("audio");
  31. audio.controls =true;var dom = document.getElementById("wav_pannel");
  32. dom.appendChild(audio);
  33. audio.src =(window.URL|| webkitURL).createObjectURL(blob);//audio.play();
  34. app.messageSatuts =false;
  35. app.upload();},function(msg){
  36. console.log("录音失败:"+ msg);
  37. rec.close();
  38. rec =null;});
  39. app.message ="录音停止";}},upload(){
  40. app.message ="正在上传识别...";
  41. app.messageSatuts =true;var blob = blob_wav_current;var reader =newFileReader();
  42. reader.onloadend=function(){var data ={samples:(/.+;\s*base64\s*,\s*(.+)$/i.exec(reader.result)||[])[1],sample_rate:16000,channels:1,byte_width:2,modelType: app.modelType
  43. }
  44. $.post('/auth/speechRecogize', data,function(res){if(res.data && res.data.statusCode ==200000){
  45. app.messageSatuts =false;
  46. app.textarea = res.data.text ==''?'暂未识别出来,请重新试试': res.data.text;}else{
  47. app.textarea ="识别失败";}var dom = document.getElementById("wav_pannel");var div = document.createElement("div");
  48. div.innerHTML = app.textarea;
  49. dom.appendChild(div);$('#wav_pannel').animate({scrollTop:$('#wav_pannel')[0].scrollHeight -$('#wav_pannel')[0].offsetHeight });})};
  50. reader.readAsDataURL(blob);},recStart(){
  51. rec.start();},}})</script>

引用

whisper官网

测试离线音频转文本模型Whisper.net的基本用法

whisper.cpp的github

whisper.net的github

whisper模型下载


本文转载自: https://blog.csdn.net/guigenyi/article/details/130955947
版权归原作者 切糕师学AI 所有, 如有侵权,请联系我们删除。

“.Net 使用OpenAI开源语音识别模型Whisper”的评论:

还没有评论