0


在Ubuntu上安装CUDA和cuDNN以及验证安装步骤

在Ubuntu上安装CUDA和cuDNN以及验证安装步骤

本教程详细介绍了如何在Ubuntu操作系统上安装CUDA(NVIDIA的并行计算平台)和cuDNN(深度神经网络库),以及如何验证安装是否成功。通过按照这些步骤操作,您将能够配置您的系统以利用GPU加速深度学习和其他计算密集型任务。此外,还包括如何设置环境变量和编译运行示例代码以验证CUDA和cuDNN的正常运行。

安装 CUDA

在安装CUDA之前,我们需要进行一些预安装操作。首先,您需要安装当前正在运行的内核的头文件和开发包。打开终端并执行以下命令:

  1. sudo apt-get install linux-headers-$(uname -r)

接下来,您需要删除过时的签名密钥:

  1. sudo apt-key del 7fa2af80

通过网络仓库安装CUDA(适用于Ubuntu)

新的CUDA存储库的GPG公钥是

  1. 3bf863cc

。您可以通过

  1. cuda-keyring

包或手动方法将其添加到系统中,不建议使用

  1. apt-key

命令。执行以下步骤:

  1. 安装新的cuda-keyring包。根据您的系统版本替换$distro/$arch
  1. wget https://developer.download.nvidia.com/compute/cuda/repos/$distro/$arch/cuda-keyring_1.1-1_all.deb
  2. sudo dpkg -i cuda-keyring_1.1-1_all.deb
  1. $distro/$arch

应该根据以下选项之一进行替换:

  • ubuntu1604/x86_64:适用于 Ubuntu 16.04 64位版本。
  • ubuntu1804/cross-linux-sbsa:适用于 Ubuntu 18.04 交叉编译版本(SBSA 架构)。
  • ubuntu1804/ppc64el:适用于 Ubuntu 18.04 64位 PowerPC 架构版本。 * ubuntu1804/sbsa:适用于 Ubuntu 18.04 SBSA 架构版本。
  • ubuntu1804/x86_64:适用于 Ubuntu 18.04 64位版本。
  • ubuntu2004/cross-linux-aarch64:适用于 Ubuntu 20.04 交叉编译版本(AArch64 架构)。
  • ubuntu2004/arm64:适用于 Ubuntu 20.04 64位 ARM 架构版本。
  • ubuntu2004/cross-linux-sbsa:适用于 Ubuntu 20.04 交叉编译版本(SBSA 架构)。
  • ubuntu2004/sbsa:适用于 Ubuntu 20.04 SBSA 架构版本。
  • ubuntu2004/x86_64:适用于 Ubuntu 20.04 64位版本。
  • ubuntu2204/sbsa:适用于 Ubuntu 22.04 SBSA 架构版本。
  • ubuntu2204/x86_64:适用于 Ubuntu 22.04 64位版本。 根据您的Ubuntu版本和架构选择适当的替代项来执行相应的安装步骤。
  1. 更新Apt仓库缓存:
  1. sudo apt-get update
  1. 安装 CUDA SDK: 您可以使用以下命令获取可用的CUDA包列表:
  1. cat /var/lib/apt/lists/*cuda*Packages | grep "Package:"

或查看下方列表:
Meta PackagePurposecudaInstalls all CUDA Toolkit and Driver packages. Handles upgrading to the next version of the cuda package when it’s released.cuda-12-2Installs all CUDA Toolkit and Driver packages. Remains at version 12.1 until an additional version of CUDA is installed.cuda-toolkit-12-2Installs all CUDA Toolkit packages required to develop CUDA applications. Does not include the driver.cuda-toolkit-12Installs all CUDA Toolkit packages required to develop applications. Will not upgrade beyond the 12.x series toolkits. Does not include the driver.cuda-toolkitInstalls all CUDA Toolkit packages required to develop applications. Handles upgrading to the next 12.x version of CUDA when it’s released. Does not include the driver.cuda-tools-12-2Installs all CUDA command line and visual tools.cuda-runtime-12-2Installs all CUDA Toolkit packages required to run CUDA applications, as well as the Driver packages.cuda-compiler-12-2Installs all CUDA compiler packages.cuda-libraries-12-2Installs all runtime CUDA Library packages.cuda-libraries-dev-12-2Installs all development CUDA Library packages.cuda-driversInstalls all Driver packages. Handles upgrading to the next version of the Driver packages when they’re released.
选择你需要的包进行安装,这里选择 cuda-11.8

  1. sudo apt-get install cuda-11-8

此安装包中包含显卡驱动,安装过程中,会让你输入密码,请记住该密码,后面重启电脑进入 Perform MOK managment 会使用到。

  1. 安装完成后,重新启动系统:
  1. sudo reboot

配置 Perform MOK managment
MOK management
选择

  1. Enroll MOK

(注册)-> 选择

  1. Continue

-> 选择

  1. Enroll the key

-> 选择

  1. Yes

-> 键入步骤3中输入的密码->选择

  1. Reboot

重启电脑,完成英伟达显卡驱动安装。

配置环境变量

  1. 使用 vim 编辑 ~/.bashrc 文件。
  1. sudo vim ~/.bashrc
  1. 在文件结尾添加以下内容:
  1. export PATH=/usr/local/cuda-11.8/bin${PATH:+:${PATH}}
  2. export LD_LIBRARY_PATH=/usr/local/cuda-11.8/lib64\${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
  1. ${PATH:+:${PATH}}

是一个用于设置环境变量的 Bash Shell 中的特殊语法。它的作用是在添加新路径到环境变量时,确保如果原始变量(在这种情况下是

  1. $PATH

)已经包含一些路径,那么新路径会添加在原有路径的末尾,而且它们之间会用冒号 : 分隔。
具体来说,

  1. ${PATH:+:${PATH}}

的含义是:
如果

  1. $PATH

已经定义(非空),那么它会在新路径之前加上一个冒号 :,然后再添加新路径。
如果

  1. $PATH

未定义或为空,那么它只会添加新路径,不会加冒号。
这个语法的目的是确保在向

  1. $PATH

添加新路径时,保持路径之间用冒号分隔,以确保环境变量的正确格式。这在很多环境变量的设置中都很有用,因为它避免了路径之间缺少分隔符而导致的错误。

LD_LIBRARY_PATH 是一个环境变量,用于指定动态链接器(dynamic linker)在运行可执行文件时搜索共享库文件(动态链接库或共享对象文件)的路径。在 Linux 和类Unix系统中,共享库文件包含在各种程序中,允许多个程序共享相同的库,从而减少内存占用并提高系统的效率。

  1. 刷新配置 在终端中运行以下命令,以使新的环境变量设置生效:
  1. source ~/.bashrc

验证安装

首先,我们需要安装一些CUDA示例所需的第三方库。这些示例通常会在构建过程中检测所需的库,但如果未检测到,您需要手动安装它们。打开终端并执行以下命令:

  1. sudo apt-get install g++ freeglut3-dev build-essential libx11-dev \
  2. libxmu-dev libxi-dev libglu1-mesa libglu1-mesa-dev libfreeimage-dev

完成第三方库依赖安装后,从 github 下载 https://github.com/nvidia/cuda-samples 源代码。

下载完成后,可以使用以下命令编译:

  1. cd cuda-sample
  2. sudo make

注意切换到你安装 cuda 版本的分支,这里是 v11.8。

可以完成整个编译,那么说明安装过程没有问题了。

在源代码目录执行

  1. ./bin/x86_64/linux/release/deviceQuery

命令,结果如下所示:

  1. cheungxiongwei@root:~/Source/cuda-samples$ ./bin/x86_64/linux/release/deviceQuery
  2. ./bin/x86_64/linux/release/deviceQuery Starting...
  3. CUDA Device Query (Runtime API) version (CUDART static linking)
  4. Detected 1 CUDA Capable device(s)
  5. Device 0: "NVIDIA GeForce RTX 4060 Laptop GPU"
  6. CUDA Driver Version / Runtime Version 12.2 / 11.8
  7. CUDA Capability Major/Minor version number: 8.9
  8. Total amount of global memory: 7940 MBytes (8325824512 bytes)
  9. MapSMtoCores for SM 8.9 is undefined. Default to use 128 Cores/SM
  10. MapSMtoCores for SM 8.9 is undefined. Default to use 128 Cores/SM
  11. (024) Multiprocessors, (128) CUDA Cores/MP: 3072 CUDA Cores
  12. GPU Max Clock rate: 2250 MHz (2.25 GHz)
  13. Memory Clock rate: 8001 Mhz
  14. Memory Bus Width: 128-bit
  15. L2 Cache Size: 33554432 bytes
  16. Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  17. Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers
  18. Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers
  19. Total amount of constant memory: 65536 bytes
  20. Total amount of shared memory per block: 49152 bytes
  21. Total shared memory per multiprocessor: 102400 bytes
  22. Total number of registers available per block: 65536
  23. Warp size: 32
  24. Maximum number of threads per multiprocessor: 1536
  25. Maximum number of threads per block: 1024
  26. Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  27. Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
  28. Maximum memory pitch: 2147483647 bytes
  29. Texture alignment: 512 bytes
  30. Concurrent copy and kernel execution: Yes with 2 copy engine(s)
  31. Run time limit on kernels: Yes
  32. Integrated GPU sharing Host Memory: No
  33. Support host page-locked memory mapping: Yes
  34. Alignment requirement for Surfaces: Yes
  35. Device has ECC support: Disabled
  36. Device supports Unified Addressing (UVA): Yes
  37. Device supports Managed Memory: Yes
  38. Device supports Compute Preemption: Yes
  39. Supports Cooperative Kernel Launch: Yes
  40. Supports MultiDevice Co-op Kernel Launch: Yes
  41. Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0
  42. Compute Mode:
  43. < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
  44. deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 12.2, CUDA Runtime Version = 11.8, NumDevs = 1
  45. Result = PASS

安装 cuDNN

安装 cuDNN库和 cuDNN 示例

  1. sudo apt-get install libcudnn8=${cudnn_version}-1+${cuda_version}
  2. sudo apt-get install libcudnn8-dev=${cudnn_version}-1+${cuda_version}
  3. sudo apt-get install libcudnn8-samples=${cudnn_version}-1+${cuda_version}

根据以下内容进行替换:

  1. ${cudnn_version}

is 8.9.4.*

  1. ${cuda_version}

is cuda12.2 or cuda11.8

使用以下命令查找与 cuDNN 版本 “libcudnn8” 相关的软件包信息

  1. cat /var/lib/apt/lists/*cuda*Packages | grep "./libcudnn8"

输出结果如下所示:

  1. cheungxiongwei@root:~/cudnn_samples_v8/mnistCUDNN$ cat /var/lib/apt/lists/*cuda*Packages | grep "./libcudnn8"
  2. Filename: ./libcudnn8_8.5.0.96-1+cuda11.7_amd64.deb
  3. Filename: ./libcudnn8-dev_8.5.0.96-1+cuda11.7_amd64.deb
  4. Filename: ./libcudnn8_8.6.0.163-1+cuda11.8_amd64.deb
  5. Filename: ./libcudnn8-dev_8.6.0.163-1+cuda11.8_amd64.deb
  6. Filename: ./libcudnn8_8.7.0.84-1+cuda11.8_amd64.deb
  7. Filename: ./libcudnn8-dev_8.7.0.84-1+cuda11.8_amd64.deb
  8. Filename: ./libcudnn8_8.8.0.121-1+cuda11.8_amd64.deb
  9. Filename: ./libcudnn8_8.8.0.121-1+cuda12.0_amd64.deb
  10. Filename: ./libcudnn8-dev_8.8.0.121-1+cuda11.8_amd64.deb
  11. Filename: ./libcudnn8-dev_8.8.0.121-1+cuda12.0_amd64.deb
  12. Filename: ./libcudnn8_8.8.1.3-1+cuda11.8_amd64.deb
  13. Filename: ./libcudnn8_8.8.1.3-1+cuda12.0_amd64.deb
  14. Filename: ./libcudnn8-dev_8.8.1.3-1+cuda11.8_amd64.deb
  15. Filename: ./libcudnn8-dev_8.8.1.3-1+cuda12.0_amd64.deb
  16. Filename: ./libcudnn8_8.9.0.131-1+cuda11.8_amd64.deb
  17. Filename: ./libcudnn8_8.9.0.131-1+cuda12.1_amd64.deb
  18. Filename: ./libcudnn8-dev_8.9.0.131-1+cuda11.8_amd64.deb
  19. Filename: ./libcudnn8-dev_8.9.0.131-1+cuda12.1_amd64.deb
  20. Filename: ./libcudnn8_8.9.1.23-1+cuda11.8_amd64.deb
  21. Filename: ./libcudnn8_8.9.1.23-1+cuda12.1_amd64.deb
  22. Filename: ./libcudnn8-dev_8.9.1.23-1+cuda11.8_amd64.deb
  23. Filename: ./libcudnn8-dev_8.9.1.23-1+cuda12.1_amd64.deb
  24. Filename: ./libcudnn8-samples_8.9.1.23-1+cuda11.8_amd64.deb
  25. Filename: ./libcudnn8-samples_8.9.1.23-1+cuda12.1_amd64.deb
  26. Filename: ./libcudnn8_8.9.2.26-1+cuda11.8_amd64.deb
  27. Filename: ./libcudnn8_8.9.2.26-1+cuda12.1_amd64.deb
  28. Filename: ./libcudnn8-dev_8.9.2.26-1+cuda11.8_amd64.deb
  29. Filename: ./libcudnn8-dev_8.9.2.26-1+cuda12.1_amd64.deb
  30. Filename: ./libcudnn8-samples_8.9.2.26-1+cuda11.8_amd64.deb
  31. Filename: ./libcudnn8-samples_8.9.2.26-1+cuda12.1_amd64.deb
  32. Filename: ./libcudnn8_8.9.3.28-1+cuda11.8_amd64.deb
  33. Filename: ./libcudnn8_8.9.3.28-1+cuda12.1_amd64.deb
  34. Filename: ./libcudnn8-dev_8.9.3.28-1+cuda11.8_amd64.deb
  35. Filename: ./libcudnn8-dev_8.9.3.28-1+cuda12.1_amd64.deb
  36. Filename: ./libcudnn8-samples_8.9.3.28-1+cuda11.8_amd64.deb
  37. Filename: ./libcudnn8-samples_8.9.3.28-1+cuda12.1_amd64.deb
  38. Filename: ./libcudnn8_8.9.4.25-1+cuda11.8_amd64.deb
  39. Filename: ./libcudnn8_8.9.4.25-1+cuda12.2_amd64.deb
  40. Filename: ./libcudnn8-dev_8.9.4.25-1+cuda11.8_amd64.deb
  41. Filename: ./libcudnn8-dev_8.9.4.25-1+cuda12.2_amd64.deb
  42. Filename: ./libcudnn8-samples_8.9.4.25-1+cuda11.8_amd64.deb
  43. Filename: ./libcudnn8-samples_8.9.4.25-1+cuda12.2_amd64.deb

这里选择最新的

  1. cudnn 8.9.4.25

,和

  1. cuda 11.8

进行替换,替换后的完整指令如下所示:

  1. sudo apt-get install libcudnn8=8.9.4.25-1+cuda11.8
  2. sudo apt-get install libcudnn8-dev=8.9.4.25-1+cuda11.8
  3. sudo apt-get install libcudnn8-samples=8.9.4.25-1+cuda11.8

验证 cuDNN

要验证 cuDNN 是否已安装并正常运行,请编译 `/usr/src/cudnn_samples_v8`` 目录中的 mnistCUDNN 示例。

  1. 复制 cuDNN 示例到当前用户目录
  1. cp -r /usr/src/cudnn_samples_v8/ $HOME
  1. 移动到 cuDNN 示例目录中
  1. cd $HOME/cudnn_samples_v8/mnistCUDNN
  1. 编译 cuDNN mnisiCUDNN 示例
  1. $make clean && make

如报错没有找到 FreeImage.h 文件,请执行 `sudo apt-get install libfreeimage-dev`` 指令安装该依赖。

  1. 运行 mnistCUDNN 示例
  1. ./mnistCUDNN

如果 cuDNN 在您的 Linux 系统上正确安装并编译&运行,您将看到类似以下内容的消息:

  1. heungxiongwei@root:~/cudnn_samples_v8/mnistCUDNN$ ./mnistCUDNN
  2. Executing: mnistCUDNN
  3. cudnnGetVersion() : 8904 , CUDNN_VERSION from cudnn.h : 8904 (8.9.4)
  4. Host compiler version : GCC 11.4.0
  5. There are 1 CUDA capable devices on your machine :
  6. device 0 : sms 24 Capabilities 8.9, SmClock 2250.0 Mhz, MemSize (Mb) 7940, MemClock 8001.0 Mhz, Ecc=0, boardGroupID=0
  7. Using device 0
  8. Testing single precision
  9. Loading binary file data/conv1.bin
  10. Loading binary file data/conv1.bias.bin
  11. Loading binary file data/conv2.bin
  12. Loading binary file data/conv2.bias.bin
  13. Loading binary file data/ip1.bin
  14. Loading binary file data/ip1.bias.bin
  15. Loading binary file data/ip2.bin
  16. Loading binary file data/ip2.bias.bin
  17. Loading image data/one_28x28.pgm
  18. Performing forward propagation ...
  19. Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
  20. ^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 0 memory
  21. ^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
  22. ^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 0 memory
  23. ^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 178432 memory
  24. ^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 184784 memory
  25. ^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 2057744 memory
  26. ^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
  27. ^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
  28. Testing cudnnFindConvolutionForwardAlgorithm ...
  29. ^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.010240 time requiring 0 memory
  30. ^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.010240 time requiring 0 memory
  31. ^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.018432 time requiring 0 memory
  32. ^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.032992 time requiring 178432 memory
  33. ^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.047104 time requiring 2057744 memory
  34. ^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.051200 time requiring 184784 memory
  35. ^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
  36. ^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
  37. Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
  38. ^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 128848 memory
  39. ^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
  40. ^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 128000 memory
  41. ^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 4656640 memory
  42. ^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 2450080 memory
  43. ^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 1433120 memory
  44. ^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
  45. ^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
  46. Testing cudnnFindConvolutionForwardAlgorithm ...
  47. ^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.049152 time requiring 4656640 memory
  48. ^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.051200 time requiring 0 memory
  49. ^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.058368 time requiring 2450080 memory
  50. ^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.063648 time requiring 1433120 memory
  51. ^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.065536 time requiring 128000 memory
  52. ^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.130112 time requiring 128848 memory
  53. ^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
  54. ^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
  55. Resulting weights from Softmax:
  56. 0.0000000 0.9999399 0.0000000 0.0000000 0.0000561 0.0000000 0.0000012 0.0000017 0.0000010 0.0000000
  57. Loading image data/three_28x28.pgm
  58. Performing forward propagation ...
  59. Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
  60. ^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 0 memory
  61. ^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
  62. ^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 0 memory
  63. ^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 178432 memory
  64. ^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 184784 memory
  65. ^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 2057744 memory
  66. ^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
  67. ^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
  68. Testing cudnnFindConvolutionForwardAlgorithm ...
  69. ^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.007328 time requiring 0 memory
  70. ^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.010240 time requiring 0 memory
  71. ^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.011264 time requiring 0 memory
  72. ^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.024576 time requiring 2057744 memory
  73. ^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.025600 time requiring 184784 memory
  74. ^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.026624 time requiring 178432 memory
  75. ^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
  76. ^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
  77. Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
  78. ^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 128848 memory
  79. ^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
  80. ^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 128000 memory
  81. ^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 4656640 memory
  82. ^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 2450080 memory
  83. ^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 1433120 memory
  84. ^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
  85. ^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
  86. Testing cudnnFindConvolutionForwardAlgorithm ...
  87. ^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.025376 time requiring 2450080 memory
  88. ^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.030720 time requiring 128848 memory
  89. ^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.036864 time requiring 4656640 memory
  90. ^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.051200 time requiring 0 memory
  91. ^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.063488 time requiring 1433120 memory
  92. ^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.065536 time requiring 128000 memory
  93. ^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
  94. ^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
  95. Resulting weights from Softmax:
  96. 0.0000000 0.0000000 0.0000000 0.9999288 0.0000000 0.0000711 0.0000000 0.0000000 0.0000000 0.0000000
  97. Loading image data/five_28x28.pgm
  98. Performing forward propagation ...
  99. Resulting weights from Softmax:
  100. 0.0000000 0.0000008 0.0000000 0.0000002 0.0000000 0.9999820 0.0000154 0.0000000 0.0000012 0.0000006
  101. Result of classification: 1 3 5
  102. Test passed!
  103. Testing half precision (math in single precision)
  104. Loading binary file data/conv1.bin
  105. Loading binary file data/conv1.bias.bin
  106. Loading binary file data/conv2.bin
  107. Loading binary file data/conv2.bias.bin
  108. Loading binary file data/ip1.bin
  109. Loading binary file data/ip1.bias.bin
  110. Loading binary file data/ip2.bin
  111. Loading binary file data/ip2.bias.bin
  112. Loading image data/one_28x28.pgm
  113. Performing forward propagation ...
  114. Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
  115. ^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 4608 memory
  116. ^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
  117. ^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 28800 memory
  118. ^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 178432 memory
  119. ^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 184784 memory
  120. ^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 2057744 memory
  121. ^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
  122. ^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
  123. Testing cudnnFindConvolutionForwardAlgorithm ...
  124. ^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.011264 time requiring 0 memory
  125. ^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.021504 time requiring 28800 memory
  126. ^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.022592 time requiring 184784 memory
  127. ^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.025600 time requiring 178432 memory
  128. ^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.033792 time requiring 2057744 memory
  129. ^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.074752 time requiring 4608 memory
  130. ^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
  131. ^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
  132. Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
  133. ^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 1536 memory
  134. ^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
  135. ^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 64000 memory
  136. ^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 4656640 memory
  137. ^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 2450080 memory
  138. ^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 1433120 memory
  139. ^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
  140. ^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
  141. Testing cudnnFindConvolutionForwardAlgorithm ...
  142. ^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.031744 time requiring 2450080 memory
  143. ^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.040960 time requiring 4656640 memory
  144. ^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.051168 time requiring 0 memory
  145. ^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.060416 time requiring 1433120 memory
  146. ^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.064512 time requiring 64000 memory
  147. ^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.069632 time requiring 1536 memory
  148. ^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
  149. ^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
  150. Resulting weights from Softmax:
  151. 0.0000001 1.0000000 0.0000001 0.0000000 0.0000563 0.0000001 0.0000012 0.0000017 0.0000010 0.0000001
  152. Loading image data/three_28x28.pgm
  153. Performing forward propagation ...
  154. Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
  155. ^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 4608 memory
  156. ^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
  157. ^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 28800 memory
  158. ^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 178432 memory
  159. ^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 184784 memory
  160. ^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 2057744 memory
  161. ^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
  162. ^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
  163. Testing cudnnFindConvolutionForwardAlgorithm ...
  164. ^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.009216 time requiring 0 memory
  165. ^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.012288 time requiring 28800 memory
  166. ^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.021312 time requiring 184784 memory
  167. ^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.023552 time requiring 4608 memory
  168. ^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.024352 time requiring 178432 memory
  169. ^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.029696 time requiring 2057744 memory
  170. ^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
  171. ^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
  172. Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
  173. ^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 1536 memory
  174. ^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
  175. ^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 64000 memory
  176. ^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 4656640 memory
  177. ^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 2450080 memory
  178. ^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 1433120 memory
  179. ^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
  180. ^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
  181. Testing cudnnFindConvolutionForwardAlgorithm ...
  182. ^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.025600 time requiring 2450080 memory
  183. ^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.035840 time requiring 4656640 memory
  184. ^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.051200 time requiring 0 memory
  185. ^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.060416 time requiring 1433120 memory
  186. ^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.064512 time requiring 64000 memory
  187. ^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.065536 time requiring 1536 memory
  188. ^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
  189. ^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
  190. Resulting weights from Softmax:
  191. 0.0000000 0.0000000 0.0000000 1.0000000 0.0000000 0.0000714 0.0000000 0.0000000 0.0000000 0.0000000
  192. Loading image data/five_28x28.pgm
  193. Performing forward propagation ...
  194. Resulting weights from Softmax:
  195. 0.0000000 0.0000008 0.0000000 0.0000002 0.0000000 1.0000000 0.0000154 0.0000000 0.0000012 0.0000006
  196. Result of classification: 1 3 5
  197. Test passed!
标签: c++ linux dnn

本文转载自: https://blog.csdn.net/cheungxiongwei/article/details/132655076
版权归原作者 cheungxiongwei.com 所有, 如有侵权,请联系我们删除。

“在Ubuntu上安装CUDA和cuDNN以及验证安装步骤”的评论:

还没有评论