在 Google Research Football 上利用 PPO 训练 AI （Win10）

1、前言

在笔者进行环境配置的过程中遇到了一系列的问题，其中最核心的问题在于显卡的兼容性问题，由于 Google Research Football 推荐使用的是Tensorflow 1.15版本，官方版是在 CUDA 10.0上进行编译的，笔者使用的 RTX3060 在运行训练代码时会报 “failed to run cuBLAS routine: CUBLAS_STATUS_EXECUTION_FAILED” 错误，检查显存时发现显存占用并不多，究其原因是30系显卡对 CUDA 10.0 的兼容性不好，因此本文将在 1650 Ti 显卡和 RTX3060 分别进行测试。10系和20系显卡可以参考1650 Ti 显卡的安装方法，30系和40系显卡可以参考3060显卡的安装方法。

2、1650Ti 显卡上安装强化学习环境

2.1 安装依赖

CUDA 10.0
CUDNN 7.6.4
python 3.7
tensorflow 1.15

2.2 安装步骤

使用如下指令安装dm-sonnet和tensorflow，dm-sonnet最好安装2.*版本的，不然会与football的很多依赖项有冲突。

python -m pip install dm-sonnet==2.* psutil -i https://pypi.tuna.tsinghua.edu.cn/simple
python -m pip install tensorflow-gpu==1.15 -i https://pypi.tuna.tsinghua.edu.cn/simple

使用如下指令安装OpenAI Baselines：

git clone https://github.com/openai/baselines.git
cd baselines
pip install -e . -i https://pypi.tuna.tsinghua.edu.cn/simple

使用如下指令运行训练代码，如果成功的话在终端中会出现下图所示内容：

python -m gfootball.examples.run_ppo2 --level=academy_empty_goal_close

在这里插入图片描述

3、3060 显卡上安装强化学习环境

在 3060 显卡上安装强化学习环境可以安装CUDA 11.7，CUDNN 8.4，这样兼容性比较好。

3.1 安装依赖

CUDA 11.7
CUDNN 8.4
python 3.7
tensorflow 2.6

3.2 安装步骤

使用如下指令安装dm-sonnet和tensorflow，dm-sonnet最好安装2.*版本的，不然会与football的很多依赖项有冲突。

python -m pip install dm-sonnet==2.* psutil -i https://pypi.tuna.tsinghua.edu.cn/simple
python -m pip install tensorflow-gpu==2.6 -i https://pypi.tuna.tsinghua.edu.cn/simple

使用如下指令安装OpenAI Baselines：

git clone https://github.com/openai/baselines.git
cd baselines
pip install -e . -i https://pypi.tuna.tsinghua.edu.cn/simple

前两步跟 1650Ti 上安装的步骤区别不大，但是对于 3060 显卡来说，安装现在才开始。如果觉得下面过程太复杂了，可以下载笔者修改好的baselines代码。首先可能会遇到 “ImportError: cannot import name ‘dtensor’ from ‘tensorflow.compat.v2.experimental’” 错误。
在这里插入图片描述
主要原因是tensorflow和Keras版本不匹配，Keras版本太高，应该与Tensorflow版本相同。

pip install keras==2.6.0 -i https://pypi.tuna.tsinghua.edu.cn/simple

解决上面问题后，会出现“AttributeError: module ‘tensorflow’ has no attribute ‘set_random_seed’”错误：
在这里插入图片描述
这个错误以及后续所有类似缺少属性的错误都是因为这些属性是tensorflow 1.x版本中的变量，在tensorflow 2.x版本中都移除了，只能使用兼容模式，找到出错的文件，在文件中找到“import tensorflow as tf”语句，用“import tensorflow.compat.v1 as tf”代替“import tensorflow as tf”，这样可以解决兼容性问题，有很多文件都存在这个问题，所有类似的错误都可以用这个方法解决。
在这里插入图片描述
接下来还有一类问题，如下图所示，“AttributeError: ‘int’ object has no attribute ‘value’”：

这个错误也是因为tensoflow版本的问题，不过这个处理方法也比较简单，只需要在对应的语句中把".value"几个字符删除即可，所有此类问题都可以用该方法解决。
将这些问题都解决之后，就可以成功运行了。

4、使用训练的权重进行比赛

Google Research Football 支持两个模型进行对战，可以使用以下指令：

python -m gfootball.play_game --players "ppo2_cnn:left_players=1,checkpoint=weights/01600;ppo2_cnn:right_players=1,checkpoint=weights/01900"

5、常见错误

在运行训练代码时可能会报：your generated code is out of date and must be regenerated with protoc >= 3.19.0
解决方法：改错误是由于 protobuf 版本太高导致的，需要降级到3.20以下

pip install protobuf==3.20.* -i https://pypi.tuna.tsinghua.edu.cn/simple

在运行训练代码时可能会报：TypeError: can’t pickle FlagValues 错误。
解决方法：该错误的原因目前还不清楚，可能是依赖库的版本兼容问题，好在问题可以通过修改代码来解决。在anaconda的football环境中找到以下路径中的run_ppo2.py文件：football\lib\site-packages\gfootball\examples\run_ppo2.py 并将run_ppo2.py中的代码用如下代码替换：

"""Runs football_env on OpenAI's ppo2."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import multiprocessing
import os
from absl import app
from absl import flags
from baselines import logger
from baselines.bench import monitor
from baselines.common.vec_env.subproc_vec_env import SubprocVecEnv
from baselines.ppo2 import ppo2
import gfootball.env as football_env
from gfootball.examples import models  

FLAGS = flags.FLAGS

flags.DEFINE_string('level', 'academy_empty_goal_close',
                    'Defines type of problem being solved')
flags.DEFINE_enum('state', 'extracted_stacked', ['extracted',
                                                 'extracted_stacked'],
                  'Observation to be used for training.')
flags.DEFINE_enum('reward_experiment', 'scoring',
                  ['scoring', 'scoring,checkpoints'],
                  'Reward to be used for training.')
flags.DEFINE_enum('policy', 'cnn', ['cnn', 'lstm', 'mlp', 'impala_cnn',
                                    'gfootball_impala_cnn'],
                  'Policy architecture')
flags.DEFINE_integer('num_timesteps', int(2e6),
                     'Number of timesteps to run for.')
flags.DEFINE_integer('num_envs', 1,
                     'Number of environments to run in parallel.')
flags.DEFINE_integer('nsteps', 128, 'Number of environment steps per epoch; '
                     'batch size is nsteps * nenv')
flags.DEFINE_integer('noptepochs', 4, 'Number of updates per epoch.')
flags.DEFINE_integer('nminibatches', 8,
                     'Number of minibatches to split one epoch to.')
flags.DEFINE_integer('save_interval', 100,
                     'How frequently checkpoints are saved.')
flags.DEFINE_integer('seed', 0, 'Random seed.')
flags.DEFINE_float('lr', 0.00008, 'Learning rate')
flags.DEFINE_float('ent_coef', 0.01, 'Entropy coeficient')
flags.DEFINE_float('gamma', 0.993, 'Discount factor')
flags.DEFINE_float('cliprange', 0.27, 'Clip range')
flags.DEFINE_float('max_grad_norm', 0.5, 'Max gradient norm (clipping)')
flags.DEFINE_bool('render', False, 'If True, environment rendering is enabled.')
flags.DEFINE_bool('dump_full_episodes', False,
                  'If True, trace is dumped after every episode.')
flags.DEFINE_bool('dump_scores', False,
                  'If True, sampled traces after scoring are dumped.')
flags.DEFINE_string('load_path', None, 'Path to load initial checkpoint from.')

def create_single_football_env(iprocess, level, state, reward_experiment, render,
  dump_full_episodes, dump_scores):
  env = football_env.create_environment(
  env_name=level,
  stacked=('stacked' in state),
  rewards=reward_experiment,
  logdir=logger.get_dir(),
  write_goal_dumps=dump_scores and (iprocess == 0),
  write_full_episode_dumps=dump_full_episodes and (iprocess == 0),
  render=render and (iprocess == 0),
  dump_frequency=50 if render and iprocess == 0 else 0)
  env = monitor.Monitor(env, logger.get_dir() and os.path.join(logger.get_dir(), str(iprocess)))
  return env

def train(level, state, reward_experiment, policy, num_timesteps, num_envs, nsteps, noptepochs,
  nminibatches, save_interval, seed, lr, ent_coef, gamma, cliprange, max_grad_norm,
  render, dump_full_episodes, dump_scores, load_path):

  vec_env = SubprocVecEnv([
  (lambda _i=i: create_single_football_env(_i, level, state, reward_experiment, render,
  dump_full_episodes, dump_scores))
  for i in range(num_envs)
  ], context=None)
  # Import tensorflow after we create environments. TF is not fork sake, and
  # we could be using TF as part of environment if one of the players is
  # controlled by an already trained model.
  import tensorflow.compat.v1 as tf
  ncpu = multiprocessing.cpu_count()
  config = tf.ConfigProto(allow_soft_placement=True,
                          intra_op_parallelism_threads=ncpu,
                          inter_op_parallelism_threads=ncpu)
  config.gpu_options.allow_growth = True
  tf.Session(config=config).__enter__()

  ppo2.learn(network=policy,
            total_timesteps=num_timesteps,
            env=vec_env,
            seed=seed,
            nsteps=nsteps,
            nminibatches=nminibatches,
            noptepochs=noptepochs,
            max_grad_norm=max_grad_norm,
            gamma=gamma,
            ent_coef=ent_coef,
            lr=lr,
            log_interval=1,
            save_interval=save_interval,
            cliprange=cliprange,
            load_path=load_path)
  
if __name__ == '__main__':
  # app.run(train)
  app.run(lambda _: train(
    FLAGS.level,
    FLAGS.state,
    FLAGS.reward_experiment,
    FLAGS.policy,
    FLAGS.num_timesteps,
    FLAGS.num_envs,
    FLAGS.nsteps,
    FLAGS.noptepochs,
    FLAGS.nminibatches,
    FLAGS.save_interval,
    FLAGS.seed,
    FLAGS.lr,
    FLAGS.ent_coef,
    FLAGS.gamma,
    FLAGS.cliprange,
    FLAGS.max_grad_norm,
    FLAGS.render,
    FLAGS.dump_full_episodes,
    FLAGS.dump_scores,
    FLAGS.load_path
    ))

在windows系统上若在训练时打开了渲染的选项，可能会报“BrokenPipeError: [WinError 109] 管道已结束。”或者“OSError: [WinError 6] 句柄无效。”的错误，所以不建议在训练时开启渲染。

标签：人工智能 python

本文转载自: https://blog.csdn.net/keyanjun_AI/article/details/138586710
版权归原作者 keyanjun_AI 所有，如有侵权，请联系我们删除。

在 Google Research Football 上利用 PPO 训练 AI （Win10）

1、前言

2、1650Ti 显卡上安装强化学习环境

2.1 安装依赖

2.2 安装步骤

3、3060 显卡上安装强化学习环境

3.1 安装依赖

3.2 安装步骤

4、使用训练的权重进行比赛

5、常见错误

发表评论

“在 Google Research Football 上利用 PPO 训练 AI （Win10）”的评论:

关于作者

overfit同步小助手

相关阅读

文章导航