NVIDIA Docker

NVIDIA Docker 是一个github上的开源项目,主要提供了一下两个组件,nvidia-docker基本上是docker命令的一个包装,透明地为容器提供必要的组件以在GPU上执行代码。

1.driver-agnostic CUDA images;
2.a Docker command line wrapper that mounts the user mode components of the driver and the GPUs (character devices) into the container at launch.

一、安装Docker和NVIDIA Docker

创建容器化GPU APP之前需要先安装下面的软件

  • 最新的NVIDIA驱动
  • Docker
  • ‘nvidia-docker’

驱动和Docker之前都安装过,参见install Tensorflowinstall and learn docker,现在安装nvidia-docker。

1.安装nvidia-docker

这里只看ubuntu16.04的效果,其他版本请参考Github上的安装步骤

安装nivdia-docker之前也需要安装nvidia驱动和Docker

# If you have nvidia-docker 1.0 installed: we need to remove it and all existing GPU containers
docker volume ls -q -f driver=nvidia-docker | xargs -r -I{} -n1 docker ps -q -a -f volume={} | xargs -r docker rm -f
sudo apt-get purge -y nvidia-docker

# Add the package repositories
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | \
  sudo apt-key add -
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | \
  sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update

# Install nvidia-docker2 and reload the Docker daemon configuration
sudo apt-get install -y nvidia-docker2
sudo pkill -SIGHUP dockerd

# Test nvidia-smi with the latest official CUDA image
docker run --runtime=nvidia --rm nvidia/cuda nvidia-smi

按照以上步骤进行安装没有遇到问题。

运行以下代码,检测一下

nvidia-docker run --rm hello-world

二、搭建容器化GPU程序

1.开发环境

从docker hub上拉去cuba镜像

nvidia-docker pull nvidia/cuda

查看版本

nvidia-docker run --rm -ti nvidia/cuda nvcc --version

nvcc

可以使用以下命令查看镜像,exit退出

nvidia-docker run --rm -ti nvidia/cuda bash

2.运行一个容器化CUDA app

Dockerfile 内容如下:

# FROM defines the base image
FROM nvidia/cuda

# RUN executes a shell command
# You can chain multiple commands together with && 
# A \ is used to split long lines to help with readability
# This particular instruction installs the source files 
# for deviceQuery by installing the CUDA samples via apt
RUN apt-get update && apt-get install -y --no-install-recommends \
        cuda-samples-$CUDA_PKG_VERSION && \
    rm -rf /var/lib/apt/lists/*

# set the working directory 
WORKDIR /usr/local/cuda/samples/1_Utilities/deviceQuery

RUN make

# CMD defines the default command to be run in the container 
# CMD is overridden by supplying a command + arguments to 
# `docker run`, e.g. `nvcc --version` or `bash`
CMD ./deviceQuery

编辑完Dockefile之后运行以下命令

nvidia-docker build -t device-query .

build完之后运行

nvidia-docker run --rm -ti device-query

默认的docker会映射所有的NVIDIA GPU到容器,也可以控制使用哪一个容器

NV_GPU=1 nvidia-docker run --rm -ti device-query

device-query

3.部署一个GPU容器

打标记

nvidia-docker tag device-query dockerkotar/device-query

上传到Docker Hub(要先登陆,参考share image)

nvidia-docker push dockerkotar/device-query

上传完之后运行以下命令可以得到和之前相同的结果

nvidia-docker run --rm -ti dockerkotar/device-query

4.Docker for Deep Learning and HPC

nvidia-docker run --name digits --rm -ti -p 8000:34448 nvidia/digits

这个命令运行 nVidia/digits,并将容器的34448端口映射到8000端口
部署过程出现错误

libdc1394 error: Failed to initialize libdc1394

解决方法,在运行时添加-v /dev/null:/dev/raw1394

nvidia-docker run --name digits --rm -ti -p 8000:34448 -v /dev/null:/dev/raw1394 nvidia/digits

解决了这个错误,但是还有这个警告,而且不能正确运行。

warnings.warn('Matplotlib is building the font cache using fc-list. This may take a moment.')
2018-06-14 08:43:17 [INFO ] Loaded 0 jobs.

待解决: