0


【模型部署】Jetson Xavier NX(eMMC)部署YOLOv5-5.0

文章目录

前言

本文对在Jetson Xavier NX上部署YOLOv5-5.0的过程进行全面梳理和总结。

NX介绍

在此之前,首先要了解一下Jetson Xavier NX(以下简称NX)。NX是NVIDIA2020年发布的一款应用于边缘AI的超级计算机,NANO 般的大小,却可以在 10 W功率下可提供 14 TOPS,而在 15 W功率下可提供 21 TOPS,非常适合在大小和功率方面受限的系统。凭借 384 个 CUDA 核心、48 个 Tensor Core 和 2 个 NVDLA 引擎,它可以并行运行多个现代神经网络,并同时处理来自多个传感器的高分辨率数据。借助 NX,我们可以使用完整的 NVIDIA 软件堆栈,通过加速库来运行现代 AI 网络和框架,从而实现深度学习以及计算机视觉、计算机图形、多媒体等。

那么NX长什么样呢?没错,就长下面这模样:
图1 Jetson Xavier NX
图2 Jetson Xavier NX开发者套件

其中图1是官方原装NX,一块只有sodimm接口的板子,没有风扇,USB、网口等接口,图2为开发者套件,提供各种外设接口。NX具体的硬件模块如下所示:
图3 NX正面模块解析
图4 NX背面模块解析

版本区别(SD | eMMC)

目前NX有两个版本:SD卡槽的版本,和带eMMC存储芯片的版本:

  • 带SD卡槽的版本可以使用microSD卡烧录系统后直接插入使用,也支持通过虚拟机SDKManager软件刷入系统使用
  • 带eMMC存储芯片的版本,容量为16G,不支持microSD卡烧录系统方式,支持虚拟机SDKManager软件刷入系统使用

两个版本除了储蓄方式不同,其他性能相同,烧录好系统后使用差异不大

规格参数

NX的规格参数如下所示:

关键参数:

  • NX是arm64架构的,和x86有根本性不同,导致很多东西不能适配,所以在部署的时候必须根据实际情况来
  • 4个USB接口(真香~)
  • 摄像头支持CSI-2接口,但也可以用USB摄像头(后续会有实践)

Jetpack4.6.1环境搭建

现在开搞!!准备以下装备:

  • 带USB接口的键盘
  • 带USB接口的鼠标
  • 1条母对母杜邦线 - 用于连接NX上的FC REC和GND引脚,使NX进入恢复模式
  • USB-a对USB-b数据线 - 用于连接NX和自己的电脑
  • 带HDMI接口的显示器
  • 电源线

烧录系统(OS)

以下操作均在自己的电脑上,但必须要有Linux系统,可以是双系统,也可以是虚拟机,以下是本文环境:

  • Jetson Xavier NX(developer kit version)
  • VMware 16
  • Ubuntu18.04
  • Jetpack4.6.1

1、SDKmanager安装
直接浏览器搜索:SDKmanager,进入官网,首先进行注册(邮箱+密码即可),之后下载

.deb

格式文件,会自动下载到

Downloads

文件夹下:
在这里插入图片描述
接着打开Terminal,cd到

Downloads

,ls查看是否已下载,最后

sudo apt install ./sdkmanager_1.8.3-10409_amd64.deb

进行安装:

xl@ubuntu:~$ cd Downloads/
xl@ubuntu:~/Downloads$ ls
nvidia  sdkmanager_1.8.3-10409_amd64.deb
xl@ubuntu:~/Downloads$ sudoaptinstall ./sdkmanager_1.8.3-10409_amd64.deb

之后在Terminal中输入

sdkmanager

命令,打开应用窗口(第一次打开会进行login in步骤,就是登录之前注册NVIDIA账号和密码):
在这里插入图片描述

2、连接NX和自己电脑
现在开始连接我们的装备,按照下图所示进行连接:
在这里插入图片描述
连接顺序没有强制要求,建议最后连接USB-b(即插入电脑),连接好后电脑会自动检查显示以下信息,我们选择连接到虚拟机中的Ubuntu18系统,确定即可
在这里插入图片描述
之后虚拟机就会自动显示NX版本信息,我们选择第二个开发套件版本,OK即可
在这里插入图片描述

3、STEP01
之后就正式开始使用SDKManager烧录系统啦~~!!选择如下:

  • 取消选择Host Machine
  • 选择Jetpack 4.6.1
  • 取消选择DeepStream

点击

CONTINUE

在这里插入图片描述

4、STEP02
这里只选择烧录OS系统,取消选择烧录SDK组件(容量只有可怜的16G,等后面装上SSD固态硬盘再装也不迟!!)

左下角是选择下载空间,SDKManager会将相关文件下载到虚拟机中之后,再转移到NX上去。这里就是存下载文件的地方,选择完下载路径之后,点击左下角的我接受,点击

CONTINUE

在这里插入图片描述

5、STEP03
之后就开始下载和安装啦(下面两个进度条,第一个是下载相关文件到虚拟机的进度,第二个是安装相关文件到NX上的进度),要注意,当

Installing

进行到50%的时候,会弹窗让我们进行一些设置:

  • 如果是第一次烧录,就是用自动模式(Automatic),它会创建一个暂时的局域网连接,地址为192.168.55.1,然后输入新的用户名和密码
  • 如果不是第一次,就选择手动模式(Manual),需要自己先去盒子上查询当前的IP地址

在这里插入图片描述
最后0.5%会异常缓慢,成功之后,点击

FINISH

退出即可

6、STEP04
恭喜你!!到这一步就烧录成功啦!!!

SSD启动

烧录完成之后,拔掉杜邦线,USB-ab线,给板子接上鼠标,键盘,显示器,然后开搞!!

NX使用SSD的读取速度是SD卡的7倍,因此从SSD启动不仅会对NX进行扩容,还会大幅度提高NX的性能,何乐而不为呢~~

SSD分区

首先,我们要有一个SSD,然后把它插到NX上:

然后,将NX接通电源,登录账号,搜索

disk

,打开

Disks

在这里插入图片描述
可以看到NX已经显示SSD信息了,点击右上角选择

Format Disk

,进行格式化
在这里插入图片描述
点击

Format

在这里插入图片描述
完事之后可以看到SSD全部变成了

Free Space


接着点击加号+,进行空间分配,可以给

Free Space

16G,剩下全给我们的NX,点击

Next

给新的卷命名,然后

Create

创建成功!!!

设置为启动项

接下来要进行安装,直接运行NX开源的脚本即可:

  • 复制rootOnNVMe项目
  • 将根源文件复制到自己的SSD
  • 启用从 SSD 启动运行
  • 重启以使服务生效
git clone https://github.com/jetsonhacks/rootOnNVMe.git
cd rootOnNVMe
./copy-rootfs-ssd.sh 
./setup-service.sh
sudoreboot

深度学习环境搭建

至此,我们的NX基本环境搭建和SSD启动已经完成。下面,进行深度学习相关环境安装,以便于我们快乐的使用NX~~hhhh

设置语言/地区等信息

如果直接从SDKManager中烧录cuda,cudnn等组件,会出现以下报错:

Cannot contact to the device via SSH, validate that SSH service is running on the device

解决方式:在NX上设置完地区(上海)、语言、键盘等信息后,重启解决

烧录SDK组件

与烧录OS系统类似,不同的是,这里不用插杜邦线哦!!Jetpack4.6.1配套的组件版本信息如下:

  • CUDA:10.2.300
  • cuDNN:8.2.1.32
  • TensorRT:8.2.1.8
  • OpenCV:4.11

连接好接线后,打开SDKManager,在STEP02中,只选择第二项

Jetpack SDK Components

,点击

CONTINUE

在这里插入图片描述
之后会让我们输入虚拟机的密码,然后检查安装环境是否正确,之后就正式开始下载和安装了,在这个过程中,会让我们输入NX的账号密码,最后我们点击

Install

即可
在这里插入图片描述

换清华源(可选)

  • 重新编辑source.list文件
sudovim /etc/apt/sources.list
  • 更换清华源
deb http://mirrors.tuna.tsinghua.edu.cn/ubuntu-ports/ bionic-updates main restricted universe multiverse
deb-src http://mirrors.tuna.tsinghua.edu.cn/ubuntu-ports/ bionic-updates main restricted universe multiverse
deb http://mirrors.tuna.tsinghua.edu.cn/ubuntu-ports/ bionic-security main restricted universe multiverse
deb-src http://mirrors.tuna.tsinghua.edu.cn/ubuntu-ports/ bionic-security main restricted universe multiverse
deb http://mirrors.tuna.tsinghua.edu.cn/ubuntu-ports/ bionic-backports main restricted universe multiverse
deb-src http://mirrors.tuna.tsinghua.edu.cn/ubuntu-ports/ bionic-backports main restricted universe multiverse
deb http://mirrors.tuna.tsinghua.edu.cn/ubuntu-ports/ bionic main universe restricted
deb-src http://mirrors.tuna.tsinghua.edu.cn/ubuntu-ports/ bionic main universe restricted
  • 更新源
sudoapt-get update

YOLOv5-5.0

现在,我们要在NX上安装YOLOv5-5.0算法库,首先安装Anaconda,创建虚拟环境,之后克隆YOLOv5-5.0仓库到NX,安装所需要的包,最后下载yolov5.pt进行推理测试

虚拟环境

在NX的

Chromium

浏览器中,直接搜索Anaconda,到官网进行下载,注意要下载ARM64版本的,默认下载到

Downloads

文件夹下
在这里插入图片描述
也可以直接使用

wget

命令,在Terminal中进行下载,执行

bash

命令进行安装:

wget https://repo.anaconda.com/archive/Anaconda3-2022.05-Linux-aarch64.sh
bash Anaconda3-2022.05-Linux-aarch64.sh

之后一路

Enter

+yes,直到安装完成

重新打开Terminal,发现命令行最前面有了

(base)

,下面创建YOLOv5-5.0的虚拟环境:

(base) nx@ubuntu:~$ conda create -n yolo python=3.6 -y
(base) nx@ubuntu:~$ conda activate yolo

下载v5.0仓库

在YOLOv5官网找到v5.0版本的仓库,直接下载到NX,然后cd到yolov5-5.0仓库
在这里插入图片描述
在yolo虚拟环境中,安装对应库:

(base) nx@ubuntu:~$ conda activate yolo
(yolo) nx@ubuntu:~$ cd yolov5-5.0/
(yolo) nx@ubuntu:~/yolov5-5.0$ pip install -r requirements.txt

推理Demo

下载yolov5.pt到yolov5-5.0文件夹中,执行detect.py进行测试:(注意,如果直接执行

python detect.py

,会自动下载最新版本的yolov5.pt,而不是5.0版本的,因为5.0版本和最新版本网络结构不同,因此会报错)

(yolo) nx@ubuntu:~/yolov5-5.0$ wget https://github.com/ultralytics/yolov5/releases/download/v5.0/yolov5s.pt
(yolo) nx@ubuntu:~/yolov5-5.0$ python detect.py

结果如下:
bus
zidane

VScode连接NX

参考:【开发工具】VScode连接远程服务器+设置免密登录

USB摄像头实时检测

首先,准备一个摄像头,可以是USB摄像头,或者官方配置的CSI-2接口摄像头,插到NX上

其次,修改

datasets.py

中的280行代码为:

if 'youtube.com/' in str(url) or 'youtu.be/' in str(url):

在这里插入图片描述
然后,为了显示实时FPS,需要修改以下两个文件:

datasets.py

detect.py

1、

datasets.py

:在

utils/datasets.py

文件的

LoadStreams

类中的

__next__

函数中,返回

self.fps

在这里插入图片描述

2、

detect.py

:使用

cv2.putText

函数,在当前frame上显示文本,并在

vid_cap

前加个not,防止报错(原因:我们返回的只是fps值,而不是cap对象),如下图所示:

在这里插入图片描述
代码如下:

# Stream resultsif view_img:# 实时显示当前FPS 1000 / t2-t1 * 1000
                cv2.putText(im0,"YOLOv5 FPS: {0}".format(float('%.3f'%(1/(t2 - t1)))),(100,50),
                        cv2.FONT_HERSHEY_SIMPLEX,1.5,(30,144,255),3)
                cv2.imshow(str(p), im0)
                cv2.waitKey(1)# 1 millisecond# Save results (image with detections)if save_img:if dataset.mode =='image':
                    cv2.imwrite(save_path, im0)else:# 'video' or 'stream'if vid_path != save_path:# new video
                        vid_path = save_path
                        ifisinstance(vid_writer, cv2.VideoWriter):
                            vid_writer.release()# release previous video writerifnot vid_cap:# video
                            fps = vid_cap.get(cv2.CAP_PROP_FPS)
                            w =int(vid_cap.get(cv2.CAP_PROP_FRAME_WIDTH))
                            h =int(vid_cap.get(cv2.CAP_PROP_FRAME_HEIGHT))else:# stream
                            fps, w, h =30, im0.shape[1], im0.shape[0]
                            save_path +='.mp4'
                        vid_writer = cv2.VideoWriter(save_path, cv2.VideoWriter_fourcc(*'mp4v'), fps,(w, h))
                    vid_writer.write(im0)

最后,在Terminal中执行命令:

python detect.py --source 0

,结果如下:

在这里插入图片描述
(PS:这FPS实在是太感人了/(ㄒoㄒ)/~~,不过检测精度倒还可以(●ˇ∀ˇ●))

模型转换

接下来,是本文的重头戏:使用TensorRT加速部署YOLOv5!!!!基本流程如下:

  • 使用tensorrtx/yolov5中的gen_wts.py文件,在yolov5-5.0中将yolov5.pt转换为yolov5.wts文件
  • 在tensorrtx/yolov5中进行编译,生成可执行文件yolov5
  • 使用yolov5可执行文件来生成yolov5.engine文件,即TensorRT模型

pt–>wts(在yolov5-5.0)

首先,下载tensorrtx-yolov5-v5.0,注意要选择yolov5-5.0版本,然后将

tensorrtx-yolov5-v5.0/yolov5/gen_wts.py

复制到

yolov5-5.0/

中,然后执行命令就可以在当前目录下,生成

yolov5.wts

文件

(yolo) nx@ubuntu:~/yolov5-5.0$ python gen_wts.py -w yolov5s.pt

wts–>engine(在tensorrtx-yolov5-v5.0)

之后切换到

tensorrtx-yolov5-v5.0/yolov5/

目录,新建

build

文件夹,然后cd到

build

文件夹中,进行编译:

(yolo) nx@ubuntu:~$ cd tensorrtx-yolov5-v5.0/yolov5/
(yolo) nx@ubuntu:~/tensorrtx-yolov5-v5.0/yolov5$ mkdir build
(yolo) nx@ubuntu:~/tensorrtx-yolov5-v5.0/yolov5$ cd build
(yolo) nx@ubuntu:~/tensorrtx-yolov5-v5.0/yolov5/build$ cmake ..(yolo) nx@ubuntu:~/tensorrtx-yolov5-v5.0/yolov5/build$ make

!!此时要注意!! 如果我们要转换自己训练的模型,需要在编译前修改

yololayer.h

中的参数:

static constexpr int CLASS_NUM =80; // 数据集的类别数
static constexpr int INPUT_H =608;
static constexpr int INPUT_W =608;

将其中的

CLASS_NUM

修改为自己的类别数量,然后重新执行上述编译流程

此时编译完成,生成了可执行文件

yolov5

,我们可以用这个可执行文件来生成.engine文件,首先把上一步得到的

yolov5s.wts

文件复制到

build

目录下,然后执行如下命令生成

yolov5s.engine

(yolo) nx@ubuntu:~/tensorrtx-yolov5-v5.0/yolov5/build$ sudo ./yolov5 -s yolov5s.wts yolov5s.engine s

注意,这条指令中最后一个参数s表示模型的规模为s,如果我们使用的模型规模为n,l或x,需要把s改成对应的n,l或x

成功生成

yolov5s.engine

后就可以执行下述代码来进行一个小测试:

(yolo) nx@ubuntu:~/tensorrtx-yolov5-v5.0/yolov5/build$ ./yolov5 -d yolov5s.engine ../samples

此时在build目录下会得到检测的结果图,可以查看检测的效果:
bus
zidane

USB摄像头实时检测

在生成

yolov5s.engine

之后,修改

yolov5.cpp

代码,调用USB摄像头实现实时检测,代码参考自:Jetson nano + yolov5 + TensorRT加速+调用usb摄像头,将以下代码直接替代

yolov5.cpp

原来的代码:

#include<iostream>#include<chrono>#include"cuda_utils.h"#include"logging.h"#include"common.hpp"#include"utils.h"#include"calibrator.h"#defineUSE_FP16// set USE_INT8 or USE_FP16 or USE_FP32#defineDEVICE0// GPU id#defineNMS_THRESH0.4#defineCONF_THRESH0.5#defineBATCH_SIZE1// stuff we know about the network and the input/output blobsstaticconstint INPUT_H = Yolo::INPUT_H;staticconstint INPUT_W = Yolo::INPUT_W;staticconstint CLASS_NUM = Yolo::CLASS_NUM;staticconstint OUTPUT_SIZE = Yolo::MAX_OUTPUT_BBOX_COUNT *sizeof(Yolo::Detection)/sizeof(float)+1;// we assume the yololayer outputs no more than MAX_OUTPUT_BBOX_COUNT boxes that conf >= 0.1constchar* INPUT_BLOB_NAME ="data";constchar* OUTPUT_BLOB_NAME ="prob";static Logger gLogger;// 数据集所有类别名称char*my_classes[]={"person","bicycle","car","motorcycle","airplane","bus","train","truck","boat","traffic light","fire hydrant","stop sign","parking meter","bench","bird","cat","dog","horse","sheep","cow","elephant","bear","zebra","giraffe","backpack","umbrella","handbag","tie","suitcase","frisbee","skis","snowboard","sports ball","kite","baseball bat","baseball glove","skateboard","surfboard","tennis racket","bottle","wine glass","cup","fork","knife","spoon","bowl","banana","apple","sandwich","orange","broccoli","carrot","hot dog","pizza","donut","cake","chair","couch","potted plant","bed","dining table","toilet","tv","laptop","mouse","remote","keyboard","cell phone","microwave","oven","toaster","sink","refrigerator","book","clock","vase","scissors","teddy bear","hair drier","toothbrush"};staticintget_width(int x,float gw,int divisor =8){//return math.ceil(x / divisor) * divisorif(int(x * gw)% divisor ==0){returnint(x * gw);}return(int(x * gw / divisor)+1)* divisor;}staticintget_depth(int x,float gd){if(x ==1){return1;}else{returnround(x * gd)>1?round(x * gd):1;}}
 
ICudaEngine*build_engine(unsignedint maxBatchSize, IBuilder* builder, IBuilderConfig* config, DataType dt,float& gd,float& gw, std::string& wts_name){
    INetworkDefinition* network = builder->createNetworkV2(0U);// Create input tensor of shape {3, INPUT_H, INPUT_W} with name INPUT_BLOB_NAME
    ITensor* data = network->addInput(INPUT_BLOB_NAME, dt, Dims3{3, INPUT_H, INPUT_W });assert(data);
 
    std::map<std::string, Weights> weightMap =loadWeights(wts_name);/* ------ yolov5 backbone------ */auto focus0 =focus(network, weightMap,*data,3,get_width(64, gw),3,"model.0");auto conv1 =convBlock(network, weightMap,*focus0->getOutput(0),get_width(128, gw),3,2,1,"model.1");auto bottleneck_CSP2 =C3(network, weightMap,*conv1->getOutput(0),get_width(128, gw),get_width(128, gw),get_depth(3, gd),true,1,0.5,"model.2");auto conv3 =convBlock(network, weightMap,*bottleneck_CSP2->getOutput(0),get_width(256, gw),3,2,1,"model.3");auto bottleneck_csp4 =C3(network, weightMap,*conv3->getOutput(0),get_width(256, gw),get_width(256, gw),get_depth(9, gd),true,1,0.5,"model.4");auto conv5 =convBlock(network, weightMap,*bottleneck_csp4->getOutput(0),get_width(512, gw),3,2,1,"model.5");auto bottleneck_csp6 =C3(network, weightMap,*conv5->getOutput(0),get_width(512, gw),get_width(512, gw),get_depth(9, gd),true,1,0.5,"model.6");auto conv7 =convBlock(network, weightMap,*bottleneck_csp6->getOutput(0),get_width(1024, gw),3,2,1,"model.7");auto spp8 =SPP(network, weightMap,*conv7->getOutput(0),get_width(1024, gw),get_width(1024, gw),5,9,13,"model.8");/* ------ yolov5 head ------ */auto bottleneck_csp9 =C3(network, weightMap,*spp8->getOutput(0),get_width(1024, gw),get_width(1024, gw),get_depth(3, gd),false,1,0.5,"model.9");auto conv10 =convBlock(network, weightMap,*bottleneck_csp9->getOutput(0),get_width(512, gw),1,1,1,"model.10");auto upsample11 = network->addResize(*conv10->getOutput(0));assert(upsample11);
    upsample11->setResizeMode(ResizeMode::kNEAREST);
    upsample11->setOutputDimensions(bottleneck_csp6->getOutput(0)->getDimensions());
 
    ITensor* inputTensors12[]={ upsample11->getOutput(0), bottleneck_csp6->getOutput(0)};auto cat12 = network->addConcatenation(inputTensors12,2);auto bottleneck_csp13 =C3(network, weightMap,*cat12->getOutput(0),get_width(1024, gw),get_width(512, gw),get_depth(3, gd),false,1,0.5,"model.13");auto conv14 =convBlock(network, weightMap,*bottleneck_csp13->getOutput(0),get_width(256, gw),1,1,1,"model.14");auto upsample15 = network->addResize(*conv14->getOutput(0));assert(upsample15);
    upsample15->setResizeMode(ResizeMode::kNEAREST);
    upsample15->setOutputDimensions(bottleneck_csp4->getOutput(0)->getDimensions());
 
    ITensor* inputTensors16[]={ upsample15->getOutput(0), bottleneck_csp4->getOutput(0)};auto cat16 = network->addConcatenation(inputTensors16,2);auto bottleneck_csp17 =C3(network, weightMap,*cat16->getOutput(0),get_width(512, gw),get_width(256, gw),get_depth(3, gd),false,1,0.5,"model.17");// yolo layer 0
    IConvolutionLayer* det0 = network->addConvolutionNd(*bottleneck_csp17->getOutput(0),3*(Yolo::CLASS_NUM +5), DimsHW{1,1}, weightMap["model.24.m.0.weight"], weightMap["model.24.m.0.bias"]);auto conv18 =convBlock(network, weightMap,*bottleneck_csp17->getOutput(0),get_width(256, gw),3,2,1,"model.18");
    ITensor* inputTensors19[]={ conv18->getOutput(0), conv14->getOutput(0)};auto cat19 = network->addConcatenation(inputTensors19,2);auto bottleneck_csp20 =C3(network, weightMap,*cat19->getOutput(0),get_width(512, gw),get_width(512, gw),get_depth(3, gd),false,1,0.5,"model.20");//yolo layer 1
    IConvolutionLayer* det1 = network->addConvolutionNd(*bottleneck_csp20->getOutput(0),3*(Yolo::CLASS_NUM +5), DimsHW{1,1}, weightMap["model.24.m.1.weight"], weightMap["model.24.m.1.bias"]);auto conv21 =convBlock(network, weightMap,*bottleneck_csp20->getOutput(0),get_width(512, gw),3,2,1,"model.21");
    ITensor* inputTensors22[]={ conv21->getOutput(0), conv10->getOutput(0)};auto cat22 = network->addConcatenation(inputTensors22,2);auto bottleneck_csp23 =C3(network, weightMap,*cat22->getOutput(0),get_width(1024, gw),get_width(1024, gw),get_depth(3, gd),false,1,0.5,"model.23");
    IConvolutionLayer* det2 = network->addConvolutionNd(*bottleneck_csp23->getOutput(0),3*(Yolo::CLASS_NUM +5), DimsHW{1,1}, weightMap["model.24.m.2.weight"], weightMap["model.24.m.2.bias"]);auto yolo =addYoLoLayer(network, weightMap,"model.24", std::vector<IConvolutionLayer*>{det0, det1, det2});
    yolo->getOutput(0)->setName(OUTPUT_BLOB_NAME);
    network->markOutput(*yolo->getOutput(0));// Build engine
    builder->setMaxBatchSize(maxBatchSize);
    config->setMaxWorkspaceSize(16*(1<<20));// 16MB#ifdefined(USE_FP16)
    config->setFlag(BuilderFlag::kFP16);#elifdefined(USE_INT8)
    std::cout <<"Your platform support int8: "<<(builder->platformHasFastInt8()?"true":"false")<< std::endl;assert(builder->platformHasFastInt8());
    config->setFlag(BuilderFlag::kINT8);
    Int8EntropyCalibrator2* calibrator =newInt8EntropyCalibrator2(1, INPUT_W, INPUT_H,"./coco_calib/","int8calib.table", INPUT_BLOB_NAME);
    config->setInt8Calibrator(calibrator);#endif
 
    std::cout <<"Building engine, please wait for a while..."<< std::endl;
    ICudaEngine* engine = builder->buildEngineWithConfig(*network,*config);
    std::cout <<"Build engine successfully!"<< std::endl;// Don't need the network any more
    network->destroy();// Release host memoryfor(auto& mem : weightMap){free((void*)(mem.second.values));}return engine;}
 
ICudaEngine*build_engine_p6(unsignedint maxBatchSize, IBuilder* builder, IBuilderConfig* config, DataType dt,float& gd,float& gw, std::string& wts_name){
    INetworkDefinition* network = builder->createNetworkV2(0U);// Create input tensor of shape {3, INPUT_H, INPUT_W} with name INPUT_BLOB_NAME
    ITensor* data = network->addInput(INPUT_BLOB_NAME, dt, Dims3{3, INPUT_H, INPUT_W });assert(data);
 
    std::map<std::string, Weights> weightMap =loadWeights(wts_name);/* ------ yolov5 backbone------ */auto focus0 =focus(network, weightMap,*data,3,get_width(64, gw),3,"model.0");auto conv1 =convBlock(network, weightMap,*focus0->getOutput(0),get_width(128, gw),3,2,1,"model.1");auto c3_2 =C3(network, weightMap,*conv1->getOutput(0),get_width(128, gw),get_width(128, gw),get_depth(3, gd),true,1,0.5,"model.2");auto conv3 =convBlock(network, weightMap,*c3_2->getOutput(0),get_width(256, gw),3,2,1,"model.3");auto c3_4 =C3(network, weightMap,*conv3->getOutput(0),get_width(256, gw),get_width(256, gw),get_depth(9, gd),true,1,0.5,"model.4");auto conv5 =convBlock(network, weightMap,*c3_4->getOutput(0),get_width(512, gw),3,2,1,"model.5");auto c3_6 =C3(network, weightMap,*conv5->getOutput(0),get_width(512, gw),get_width(512, gw),get_depth(9, gd),true,1,0.5,"model.6");auto conv7 =convBlock(network, weightMap,*c3_6->getOutput(0),get_width(768, gw),3,2,1,"model.7");auto c3_8 =C3(network, weightMap,*conv7->getOutput(0),get_width(768, gw),get_width(768, gw),get_depth(3, gd),true,1,0.5,"model.8");auto conv9 =convBlock(network, weightMap,*c3_8->getOutput(0),get_width(1024, gw),3,2,1,"model.9");auto spp10 =SPP(network, weightMap,*conv9->getOutput(0),get_width(1024, gw),get_width(1024, gw),3,5,7,"model.10");auto c3_11 =C3(network, weightMap,*spp10->getOutput(0),get_width(1024, gw),get_width(1024, gw),get_depth(3, gd),false,1,0.5,"model.11");/* ------ yolov5 head ------ */auto conv12 =convBlock(network, weightMap,*c3_11->getOutput(0),get_width(768, gw),1,1,1,"model.12");auto upsample13 = network->addResize(*conv12->getOutput(0));assert(upsample13);
    upsample13->setResizeMode(ResizeMode::kNEAREST);
    upsample13->setOutputDimensions(c3_8->getOutput(0)->getDimensions());
    ITensor* inputTensors14[]={ upsample13->getOutput(0), c3_8->getOutput(0)};auto cat14 = network->addConcatenation(inputTensors14,2);auto c3_15 =C3(network, weightMap,*cat14->getOutput(0),get_width(1536, gw),get_width(768, gw),get_depth(3, gd),false,1,0.5,"model.15");auto conv16 =convBlock(network, weightMap,*c3_15->getOutput(0),get_width(512, gw),1,1,1,"model.16");auto upsample17 = network->addResize(*conv16->getOutput(0));assert(upsample17);
    upsample17->setResizeMode(ResizeMode::kNEAREST);
    upsample17->setOutputDimensions(c3_6->getOutput(0)->getDimensions());
    ITensor* inputTensors18[]={ upsample17->getOutput(0), c3_6->getOutput(0)};auto cat18 = network->addConcatenation(inputTensors18,2);auto c3_19 =C3(network, weightMap,*cat18->getOutput(0),get_width(1024, gw),get_width(512, gw),get_depth(3, gd),false,1,0.5,"model.19");auto conv20 =convBlock(network, weightMap,*c3_19->getOutput(0),get_width(256, gw),1,1,1,"model.20");auto upsample21 = network->addResize(*conv20->getOutput(0));assert(upsample21);
    upsample21->setResizeMode(ResizeMode::kNEAREST);
    upsample21->setOutputDimensions(c3_4->getOutput(0)->getDimensions());
    ITensor* inputTensors21[]={ upsample21->getOutput(0), c3_4->getOutput(0)};auto cat22 = network->addConcatenation(inputTensors21,2);auto c3_23 =C3(network, weightMap,*cat22->getOutput(0),get_width(512, gw),get_width(256, gw),get_depth(3, gd),false,1,0.5,"model.23");auto conv24 =convBlock(network, weightMap,*c3_23->getOutput(0),get_width(256, gw),3,2,1,"model.24");
    ITensor* inputTensors25[]={ conv24->getOutput(0), conv20->getOutput(0)};auto cat25 = network->addConcatenation(inputTensors25,2);auto c3_26 =C3(network, weightMap,*cat25->getOutput(0),get_width(1024, gw),get_width(512, gw),get_depth(3, gd),false,1,0.5,"model.26");auto conv27 =convBlock(network, weightMap,*c3_26->getOutput(0),get_width(512, gw),3,2,1,"model.27");
    ITensor* inputTensors28[]={ conv27->getOutput(0), conv16->getOutput(0)};auto cat28 = network->addConcatenation(inputTensors28,2);auto c3_29 =C3(network, weightMap,*cat28->getOutput(0),get_width(1536, gw),get_width(768, gw),get_depth(3, gd),false,1,0.5,"model.29");auto conv30 =convBlock(network, weightMap,*c3_29->getOutput(0),get_width(768, gw),3,2,1,"model.30");
    ITensor* inputTensors31[]={ conv30->getOutput(0), conv12->getOutput(0)};auto cat31 = network->addConcatenation(inputTensors31,2);auto c3_32 =C3(network, weightMap,*cat31->getOutput(0),get_width(2048, gw),get_width(1024, gw),get_depth(3, gd),false,1,0.5,"model.32");/* ------ detect ------ */
    IConvolutionLayer* det0 = network->addConvolutionNd(*c3_23->getOutput(0),3*(Yolo::CLASS_NUM +5), DimsHW{1,1}, weightMap["model.33.m.0.weight"], weightMap["model.33.m.0.bias"]);
    IConvolutionLayer* det1 = network->addConvolutionNd(*c3_26->getOutput(0),3*(Yolo::CLASS_NUM +5), DimsHW{1,1}, weightMap["model.33.m.1.weight"], weightMap["model.33.m.1.bias"]);
    IConvolutionLayer* det2 = network->addConvolutionNd(*c3_29->getOutput(0),3*(Yolo::CLASS_NUM +5), DimsHW{1,1}, weightMap["model.33.m.2.weight"], weightMap["model.33.m.2.bias"]);
    IConvolutionLayer* det3 = network->addConvolutionNd(*c3_32->getOutput(0),3*(Yolo::CLASS_NUM +5), DimsHW{1,1}, weightMap["model.33.m.3.weight"], weightMap["model.33.m.3.bias"]);auto yolo =addYoLoLayer(network, weightMap,"model.33", std::vector<IConvolutionLayer*>{det0, det1, det2, det3});
    yolo->getOutput(0)->setName(OUTPUT_BLOB_NAME);
    network->markOutput(*yolo->getOutput(0));// Build engine
    builder->setMaxBatchSize(maxBatchSize);
    config->setMaxWorkspaceSize(16*(1<<20));// 16MB#ifdefined(USE_FP16)
    config->setFlag(BuilderFlag::kFP16);#elifdefined(USE_INT8)
    std::cout <<"Your platform support int8: "<<(builder->platformHasFastInt8()?"true":"false")<< std::endl;assert(builder->platformHasFastInt8());
    config->setFlag(BuilderFlag::kINT8);
    Int8EntropyCalibrator2* calibrator =newInt8EntropyCalibrator2(1, INPUT_W, INPUT_H,"./coco_calib/","int8calib.table", INPUT_BLOB_NAME);
    config->setInt8Calibrator(calibrator);#endif
 
    std::cout <<"Building engine, please wait for a while..."<< std::endl;
    ICudaEngine* engine = builder->buildEngineWithConfig(*network,*config);
    std::cout <<"Build engine successfully!"<< std::endl;// Don't need the network any more
    network->destroy();// Release host memoryfor(auto& mem : weightMap){free((void*)(mem.second.values));}return engine;}voidAPIToModel(unsignedint maxBatchSize, IHostMemory** modelStream,float& gd,float& gw, std::string& wts_name){// Create builder
    IBuilder* builder =createInferBuilder(gLogger);
    IBuilderConfig* config = builder->createBuilderConfig();// Create model to populate the network, then set the outputs and create an engine
    ICudaEngine* engine =build_engine(maxBatchSize, builder, config, DataType::kFLOAT, gd, gw, wts_name);assert(engine !=nullptr);// Serialize the engine(*modelStream)= engine->serialize();// Close everything down
    engine->destroy();
    builder->destroy();
    config->destroy();}voiddoInference(IExecutionContext& context, cudaStream_t& stream,void** buffers,float* input,float* output,int batchSize){// DMA input batch data to device, infer on the batch asynchronously, and DMA output back to hostCUDA_CHECK(cudaMemcpyAsync(buffers[0], input, batchSize *3* INPUT_H * INPUT_W *sizeof(float), cudaMemcpyHostToDevice, stream));
    context.enqueue(batchSize, buffers, stream,nullptr);CUDA_CHECK(cudaMemcpyAsync(output, buffers[1], batchSize * OUTPUT_SIZE *sizeof(float), cudaMemcpyDeviceToHost, stream));cudaStreamSynchronize(stream);}boolparse_args(int argc,char** argv, std::string& engine){if(argc <3)returnfalse;if(std::string(argv[1])=="-v"&& argc ==3){
        engine = std::string(argv[2]);}else{returnfalse;}returntrue;}intmain(int argc,char** argv){cudaSetDevice(DEVICE);//std::string wts_name = "";
    std::string engine_name ="";//float gd = 0.0f, gw = 0.0f;//std::string img_dir;if(!parse_args(argc, argv, engine_name)){
        std::cerr <<"arguments not right!"<< std::endl;
        std::cerr <<"./yolov5 -v [.engine] // run inference with camera"<< std::endl;return-1;}
 
    std::ifstream file(engine_name, std::ios::binary);if(!file.good()){
        std::cerr <<" read "<< engine_name <<" error! "<< std::endl;return-1;}char* trtModelStream{nullptr};
    size_t size =0;
    file.seekg(0, file.end);
    size = file.tellg();
    file.seekg(0, file.beg);
    trtModelStream =newchar[size];assert(trtModelStream);
    file.read(trtModelStream, size);
    file.close();// prepare input data ---------------------------staticfloat data[BATCH_SIZE *3* INPUT_H * INPUT_W];//for (int i = 0; i < 3 * INPUT_H * INPUT_W; i++)//    data[i] = 1.0;staticfloat prob[BATCH_SIZE * OUTPUT_SIZE];
    IRuntime* runtime =createInferRuntime(gLogger);assert(runtime !=nullptr);
    ICudaEngine* engine = runtime->deserializeCudaEngine(trtModelStream, size);assert(engine !=nullptr);
    IExecutionContext* context = engine->createExecutionContext();assert(context !=nullptr);delete[] trtModelStream;assert(engine->getNbBindings()==2);void* buffers[2];// In order to bind the buffers, we need to know the names of the input and output tensors.// Note that indices are guaranteed to be less than IEngine::getNbBindings()constint inputIndex = engine->getBindingIndex(INPUT_BLOB_NAME);constint outputIndex = engine->getBindingIndex(OUTPUT_BLOB_NAME);assert(inputIndex ==0);assert(outputIndex ==1);// Create GPU buffers on deviceCUDA_CHECK(cudaMalloc(&buffers[inputIndex], BATCH_SIZE *3* INPUT_H * INPUT_W *sizeof(float)));CUDA_CHECK(cudaMalloc(&buffers[outputIndex], BATCH_SIZE * OUTPUT_SIZE *sizeof(float)));// Create stream
    cudaStream_t stream;CUDA_CHECK(cudaStreamCreate(&stream));// 调用摄像头编号
    cv::VideoCapture capture(0);//cv::VideoCapture capture("../overpass.mp4");//int fourcc = cv::VideoWriter::fourcc('M','J','P','G');//capture.set(cv::CAP_PROP_FOURCC, fourcc);if(!capture.isOpened()){
        std::cout <<"Error opening video stream or file"<< std::endl;return-1;}int key;int fcount =0;while(1){
        cv::Mat frame;
        capture >> frame;if(frame.empty()){
            std::cout <<"Fail to read image from camera!"<< std::endl;break;}
        fcount++;//if (fcount < BATCH_SIZE && f + 1 != (int)file_names.size()) continue;for(int b =0; b < fcount; b++){//cv::Mat img = cv::imread(img_dir + "/" + file_names[f - fcount + 1 + b]);
            cv::Mat img = frame;if(img.empty())continue;
            cv::Mat pr_img =preprocess_img(img, INPUT_W, INPUT_H);// letterbox BGR to RGBint i =0;for(int row =0; row < INPUT_H;++row){
                uchar* uc_pixel = pr_img.data + row * pr_img.step;for(int col =0; col < INPUT_W;++col){
                    data[b *3* INPUT_H * INPUT_W + i]=(float)uc_pixel[2]/255.0;
                    data[b *3* INPUT_H * INPUT_W + i + INPUT_H * INPUT_W]=(float)uc_pixel[1]/255.0;
                    data[b *3* INPUT_H * INPUT_W + i +2* INPUT_H * INPUT_W]=(float)uc_pixel[0]/255.0;
                    uc_pixel +=3;++i;}}}// Run inferenceauto start = std::chrono::system_clock::now();doInference(*context, stream, buffers, data, prob, BATCH_SIZE);auto end = std::chrono::system_clock::now();// std::cout << std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count() << "ms" << std::endl;int fps =1000.0/ std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count();
        std::cout <<"fps: "<< fps << std::endl;
        std::vector<std::vector<Yolo::Detection>>batch_res(fcount);for(int b =0; b < fcount; b++){auto& res = batch_res[b];nms(res,&prob[b * OUTPUT_SIZE], CONF_THRESH, NMS_THRESH);}for(int b =0; b < fcount; b++){auto& res = batch_res[b];//std::cout << res.size() << std::endl;//cv::Mat img = cv::imread(img_dir + "/" + file_names[f - fcount + 1 + b]);for(size_t j =0; j < res.size(); j++){
                cv::Rect r =get_rect(frame, res[j].bbox);
                cv::rectangle(frame, r, cv::Scalar(0x27,0xC1,0x36),6);
                std::string label = my_classes[(int)res[j].class_id];
                cv::putText(frame, label, cv::Point(r.x, r.y -1), cv::FONT_HERSHEY_PLAIN,2, cv::Scalar(0xFF,0xFF,0xFF),2);
                std::string jetson_fps ="Jetson Xavier NX FPS: "+ std::to_string(fps);
                cv::putText(frame, jetson_fps, cv::Point(11,80), cv::FONT_HERSHEY_PLAIN,3, cv::Scalar(0,0,255),2, cv::LINE_AA);}//cv::imwrite("_" + file_names[f - fcount + 1 + b], img);}
        cv::imshow("yolov5", frame);
        key = cv::waitKey(1);if(key =='q'){break;}
        fcount =0;}
 
    capture.release();// Release stream and bufferscudaStreamDestroy(stream);CUDA_CHECK(cudaFree(buffers[inputIndex]));CUDA_CHECK(cudaFree(buffers[outputIndex]));// Destroy the engine
    context->destroy();
    engine->destroy();
    runtime->destroy();return0;}

注意事项:

  • 修改数据集类别名称

在这里插入图片描述

  • 修改调用摄像头序号

在这里插入图片描述

  • 可选:修改摄像头输出信息

在这里插入图片描述

之后再次进行编译,然后执行测试代码:

(yolo) nx@ubuntu:~/tensorrtx-yolov5-v5.0/yolov5/build$ make(yolo) nx@ubuntu:~/tensorrtx-yolov5-v5.0/yolov5/build$ sudo ./yolov5 -v yolov5s.engine

结果如下:
在这里插入图片描述
(PS:这次是真的很感人好不好!!!)

DeepStream部署

安装

1、安装前注意版本对应,Jetpack和DeepStream对应如下表:
JeppackDeepStream4.66.04.5.15.14.4.15.0
本文安装的Jetpack版本为4.6,因此安装对应的DeepStream-6.0

2、安装相关依赖
执行以下命令以安装需要的软件包:

sudoaptinstall\
libssl1.0.0 \
libgstreamer1.0-0 \
gstreamer1.0-tools \
gstreamer1.0-plugins-good \
gstreamer1.0-plugins-bad \
gstreamer1.0-plugins-ugly \
gstreamer1.0-libav \
libgstrtspserver-1.0-0 \libjansson4=2.11-1 

3、安装DeepStream SDK
在第1步已经下载 DeepStream 6.0 Jetson tar package deepstream_sdk_v6.0.0_jetson.tbz2, 到 NX上了,现在输入以下命令以提取并安装DeepStream SDK:

sudotar -xvf deepstream_sdk_v6.0.0_jetson.tbz2 -C / cd /opt/nvidia/deepstream/deepstream-6.0
sudo ./install.sh
sudo ldconfig

Demo测试

安装完成进入官方例程文件夹

cd /opt/nvidia/deepstream/deepstream-6.0/samples/configs/deepstream-app/
#测试一下
deepstream-app -c source8_1080p_dec_infer-resnet_tracker_tiled_display_fp16_nano.txt

这个例程打开速度较慢,耐心等待,结果如下:

在这里插入图片描述

加速部署YOLOv5(Coming soon)

CSI-2摄像头实时检测(Coming soon)

Reference

Jetson“家族”在NVIDIA的定位是什么?对比市面上其他嵌入式平台,Jetson有什么优势?

Jetson Xavier NX 刷机+更换清华源完美讲解

Jetson开发实战记录(二):Jetson Xavier NX版本区别以及烧录系统

Jetson开发实战记录(三):Jetson Xavier NX具体开发(Ubuntu18.04系统)

YOLOV5环境快速配置 Jetson Xavier NX 版本(基本详细)

Jetson Xavier NX 部署Yolov5

Jetson nano上部署自己的Yolov5模型(TensorRT加速)

Jetson nano + yolov5 + TensorRT加速+调用usb摄像头

yolov5s模型转tensorrt+deepstream检测+CSI和USB摄像头检测

Jetson AGX Xavier实现TensorRT加速YOLOv5进行实时检测


本文转载自: https://blog.csdn.net/weixin_43799388/article/details/127021386
版权归原作者 嗜睡的篠龙 所有, 如有侵权,请联系我们删除。

“【模型部署】Jetson Xavier NX(eMMC)部署YOLOv5-5.0”的评论:

还没有评论