使用bazel安装tensorflow

踩坑大型现场

Posted by xhhszc on January 8, 2019

使用bazel安装tensorflow


为了提高CPU运行速度,使用SSE/AVX/FMA指令集,需要从source安装tensorflow,其中最简便的就是利用bazel安装tensorflow,但是安装过程也是充满了血泪。。。

1. 安装bazel

conda install bazel

2. 下载tensorflow代码包

在https://github.com/tensorflow/tensorflow中下载tf包 并上传到服务器中并解压 当然,服务器网速好的话也可以使用指令 $git clone https://github.com/tensorflow/tensorflow

3. 进入tensorflow-master文件夹

cd tensorflow #cd to the top-level directory created

4. 设置configure(请打起精神!)

这里比较重要,虽然都有提示,但还是很容易踩坑

./configure

configure 的时候要选择一些东西是否支持,这里建议都选N,不然后面会报错,如果支持显卡,就在cuda的时候选择y,然后按照提示填写自己的cuda cudnn的版本

整个flow如下(#字为我的注释):

(python2bazel) hthong@node150:~/software/tensorflow-master$ ./configure
WARNING: detected http_proxy set in env, setting no_proxy for localhost.
WARNING: --batch mode is deprecated. Please instead explicitly shut down your Bazel server using the command "bazel shutdown".
INFO: Invocation ID: a54e5cb9-a34d-4848-ae0c-b17d9973d404
You have bazel 0.20.0- (@non-git) installed.
Please specify the location of python. [Default is /home/hthong/anaconda3/envs/python2bazel/bin/python]: 


Found possible Python library paths:
  /home/hthong/anaconda3/envs/python2bazel/lib/python2.7/site-packages
Please input the desired Python library path to use.  Default is [/home/hthong/anaconda3/envs/python2bazel/lib/python2.7/site-packages]

Do you wish to build TensorFlow with XLA JIT support? [Y/n]: Y  #这里可以设置为Y,可以提高tensorflow的运行效率
XLA JIT support will be enabled for TensorFlow.

Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: n
No OpenCL SYCL support will be enabled for TensorFlow.

Do you wish to build TensorFlow with ROCm support? [y/N]: n
No ROCm support will be enabled for TensorFlow.

Do you wish to build TensorFlow with CUDA support? [y/N]: y
CUDA support will be enabled for TensorFlow.

Please specify the CUDA SDK version you want to use. [Leave empty to default to CUDA 10.0]: 


Please specify the location where CUDA 10.0 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: 


Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7]: 


Please specify the location where cuDNN 7 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: 


Do you wish to build TensorFlow with TensorRT support? [y/N]: n
No TensorRT support will be enabled for TensorFlow.

Please specify the locally installed NCCL version you want to use. [Default is to use https://github.com/nvidia/nccl]: 


Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus. #去这个官网查自己GPU型号对应的计算能力系数,一般默认值都与官网一致
Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 6.1,6.1,6.1,6.1,6.1,6.1,6.1,6.1]: 


Do you want to use clang as CUDA compiler? [y/N]: n
nvcc will be used as CUDA compiler.

Please specify which gcc should be used by nvcc as the host compiler. [Default is /home/hthong/anaconda3/envs/python2bazel/bin/gcc]: /usr/bin/gcc
#gcc要求是多版本的(64、32)的gcc,一般使用系统目录下的, 
#对于gcc 5或更高版本的说明:TensorFlow 网站上提供的二进制pip软件包是使用gcc4编译的,高版本gcc涉及一些setting,请自己翻官网


Do you wish to build TensorFlow with MPI support? [y/N]: n
No MPI support will be enabled for TensorFlow.

Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native -Wno-sign-compare]: 


Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: n
Not configuring the WORKSPACE for Android builds.

Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See .bazelrc for more details.
    --config=mkl            # Build with MKL support.
    --config=monolithic     # Config for mostly static monolithic build.
    --config=gdr            # Build with GDR support.
    --config=verbs          # Build with libverbs support.
    --config=ngraph         # Build with Intel nGraph support.
    --config=dynamic_kernels    # (Experimental) Build kernels into separate shared objects.
Preconfigured Bazel build configs to DISABLE default on features:
    --config=noaws          # Disable AWS S3 filesystem support.
    --config=nogcp          # Disable GCP support.
    --config=nohdfs         # Disable HDFS support.
    --config=noignite       # Disable Apacha Ignite support.
    --config=nokafka        # Disable Apache Kafka support.
    --config=nonccl         # Disable NVIDIA NCCL support.
Configuration finished

5.编译

编译之前,需安装:numpy,enum,keras,mock 根据是否使用GPU选择以下命令执行:

bazel build --config=opt //tensorflow/tools/pip_package:build_pip_package # CUP-only 

bazel build --config=opt --config=cuda --verbose_failures //tensorflow/tools/pip_package:build_pip_package # GPU support

这里我遇到了一个坑:

ERROR: missing input file '@pasta//:LICENSE'
ERROR: /home/shiki/tensorflow/tensorflow/tools/pip_package/BUILD:235:1: //tensorflow/tools/pip_package:build_pip_package: missing input file '@pasta//:LICENSE'
Target //tensorflow/tools/pip_package:build_pip_package failed to build
Use --verbose_failures to see the command lines of failed build steps.
ERROR: /home/shiki/tensorflow/tensorflow/tools/pip_package/BUILD:235:1 1 input file(s) do not exist

根据网友的智慧:https://github.com/tensorflow/tensorflow/issues/24722 删掉了(注释掉也行)tensorflow/tools/pip_package/BUILD文件中的第L172行代码(”@pasta//:LICENSE”),完美解决

然后又遇到了一个坑:

    from tensorflow.python.ops import variables
  File "/home/hthong/.cache/bazel/_bazel_hthong/36e03d3843aed90a32db1b13ecd82b39/execroot/org_tensorflow/bazel-out/host/bin/tensorflow/python/keras/api/create_tensorflow.python_api_2_keras_python_api_gen_compat_v2.runfiles/org_tensorflow/tensorflow/python/ops/variables.py", line 118, in <module>
    "* `ONLY_FIRST_TOWER`: Deprecated alias for `ONLY_FIRST_REPLICA`.\n  ")
AttributeError: attribute '__doc__' of 'type' objects is not writable
Target //tensorflow/tools/pip_package:build_pip_package failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 3.041s, Critical Path: 0.72s
INFO: 0 processes.
FAILED: Build did NOT complete successfully

还是网友智慧:https://github.com/tensorflow/tensorflow/issues/12491

卸载enum,安装enum34

6.生成whl

过了很久之后,第五步完成,生成了一个uild_pip_package脚本。

成功示例:

Target //tensorflow/tools/pip_package:build_pip_package up-to-date:
  bazel-bin/tensorflow/tools/pip_package/build_pip_package
INFO: Elapsed time: 270.072s, Critical Path: 269.03s
INFO: 60 processes: 60 local.
INFO: Build completed successfully, 61 total actions
(python2bazel) hthong@node150:~/software/tensorflow-master$

然后我们就可以根据这个脚本生成whl文件了

The bazel build command builds a script named build_pip_package. Running this script as follows will build a .whl file within the /home/hthong/software/tensorflow_pkg directory

bazel-bin/tensorflow/tools/pip_package/build_pip_package /home/hthong/software/tensorflow_pkg

显示如下:

(python2bazel) hthong@node150:~/software/tensorflow-master$ bazel-bin/tensorflow/tools/pip_package/build_pip_package /home/hthong/software/tensorflow_pkg
Mon Jan 7 17:52:44 UTC 2019 : === Preparing sources in dir: /tmp/tmp.GFfdUdpEig
~/software/tensorflow-master ~/software/tensorflow-master
~/software/tensorflow-master
Mon Jan 7 17:53:00 UTC 2019 : === Building wheel
warning: no files found matching '*.pyd' under directory '*'
warning: no files found matching '*.pd' under directory '*'
warning: no files found matching '*.dll' under directory '*'
warning: no files found matching '*.lib' under directory '*'
warning: no files found matching '*.h' under directory 'tensorflow/include/tensorflow'
warning: no files found matching '*' under directory 'tensorflow/include/Eigen'
warning: no files found matching '*.h' under directory 'tensorflow/include/google'
warning: no files found matching '*' under directory 'tensorflow/include/third_party'
warning: no files found matching '*' under directory 'tensorflow/include/unsupported'
Mon Jan 7 17:53:32 UTC 2019 : === Output wheel file is in: /home/hthong/software/tensorflow_pkg
(python2bazel) hthong@node150:~/software/tensorflow-master$ ls
ACKNOWLEDGMENTS     CONTRIBUTING.md    SECURITY.md         bazel-tensorflow-master  third_party
ADOPTERS.md         ISSUES.md          WORKSPACE           bazel-testlogs           tools
AUTHORS             ISSUE_TEMPLATE.md  arm_compiler.BUILD  configure
BUILD               LICENSE            bazel-bin           configure.py
CODEOWNERS          README.md          bazel-genfiles      models.BUILD
CODE_OF_CONDUCT.md  RELEASE.md         bazel-out           tensorflow
(python2bazel) hthong@node150:~/software/tensorflow-master$ cd ..
(python2bazel) hthong@node150:~/software$ ls
Anaconda3-2018.12-Linux-x86_64.sh  srun.sh  tensorflow-master  tensorflow-master.zip  tensorflow_pkg
(python2bazel) hthong@node150:~/software$ cd tensorflow_pkg
(python2bazel) hthong@node150:~/software/tensorflow_pkg$ ls
tensorflow-1.12.0-cp27-cp27mu-linux_x86_64.whl
(python2bazel) hthong@node150:~/software/tensorflow_pkg$

7.安装

终于可以安装了:

pip install /home/hthong/software/tensorflow_pkg/your_tensorflow_whlname.whl

提示成功之后就可以验证tensorflow是否安装成功了,记住退出当前tensorflow文件夹后再进行验证,不然会fail

import tensorflow as tf
hello = tf.constant('Hello, TensorFlow!')
sess = tf.Session()
print(sess.run(hello))

你可以看到,没有warning了!!!tensorflow速度是不是变快了!!