This document compares and summarizes several deep learning frameworks: Caffe, Chainer, CNTK, DL4J, Keras, MXNet, TensorFlow, and Theano. It describes who created each framework, when it was released, example applications, design motivations, and key features from technical, design, and programming perspectives.
5. • 주체
– Created by
• Yangqing Jia (http://daggerfs.com/)
– UC Berkerey 컴퓨터 과학 Ph.D. / 지도 교수(Trevor Darrell, BAIR 책임자)
– 구글 브레인 TensorFlow 프로젝트 참여
– 페이스북 리서치 사이언티트
• Evan Shellhamer (http://imaginarynumber.net/)
– UC Berkerey 컴퓨터 과학 Ph.D. / 지도 교수(Trevor Darrell, BAIR 책임자)
– Maintained by
• BAIR(Berkeley Artificial Intelligence Research, http://bair.berkeley.edu/)
• 릴리즈
– ‘2013: DeCAF (https://arxiv.org/abs/1310.1531)
– Dec. ‘2013: Caffe v0
• 적용 사례
– Facebook, Adobe, Microsoft, Samsung, Flickr, Tesla, Yelp, Pinterest, etc.
• Motivation
– ‘2012 ILSVRC에서 발표한 AlexNet을 재현
– DNN 정의/훈련/배포하기 위한 범용 F/W 구현
Caffe
http://caffe.berkeleyvision.org/
6. Caffe
• 특징
– 장점
• 이미지 처리에 특화
• 프로그래밍하는 대신 설정 파일로 학습 방법을 정의
• Caffe Model Zoo를 통한 다양한 Pre-trained Model 제공
• 이미지 기반 참조 모델의 de facto standard
– 단점
• 이미지 이외의 텍스트, 사운드 등의 데이터 처리에는 부적합
• 유연하지 못한 API
– 새로운 기능 추가의 경우 C++/CUDA로 직접 구현 필요
• 문서화가 잘 안되어 있음
http://caffe.berkeleyvision.org/
Caffe2(http://caffe2.ai/) 출시
- By Facebook
- Android 지원, iOS 지원(예정)
- 분산 처리 지원
F/W 주체 플랫폼 모바일 언어 인터페이스 OpenMP CUDA OpenCL 멀티GPU 분산
Caffe BAIR
Linux,
Mac
- C++
Python,
MATLAB
Y
Y
- Y
7. • 주체
– Created & Maintained by
• Preferred Networks, Inc.(https://www.preferred-networks.jp/ja/)
• 릴리즈
– Jun. ‘2015
• 적용 사례
– Toyota motors, Panasonic
(https://www.wsj.com/articles/japan-seeks-tech-revival-with-artificial-
intelligence-1448911981)
– FANUC
(http://www.fanucamerica.com/FanucAmerica-news/Press-
releases/PressReleaseDetails.aspx?id=79)
• Motivation
– Define-by-Run 아키텍처
• 실행 시점에 네트워크 그래프가 정의됨
• 복잡한 네트워크 정의를 보다 유연하게 지원할 수 있게 함
Chainer
http://docs.chainer.org/en/latest/index.html
8. Chainer
• 특징
– 장점
• 빠른 속도
출처: Performance of Distributed Deep Learning using ChainerMN
http://chainer.org/general/2017/02/08/Performance-of-Distributed-Deep-Learning-Using-ChainerMN.html
http://docs.chainer.org/en/latest/index.html
F/W 주체 플랫폼 모바일 언어 인터페이스 OpenMP CUDA OpenCL 멀티GPU 분산
Chainer
Preferred
Networks
Linux - Python Python -
Y
- Y Y
9. Chainer
• 특징
– 장점
• Define-by-Run 모델 기반의 유연성 제공
– 단점
• 협소한 사용자 커뮤니티
출처: Complex Neural networks made easy by Chainer
https://www.oreilly.com/learning/complex-neural-networks-made-easy-by-chainer
[Define-and-Run (TensorFlow)] [Define-by-Run (Chainer, PyTorch)]
http://docs.chainer.org/en/latest/index.html
10. • 주체
– Created & Maintained by
• Microsoft Research
• 릴리즈
– Jan. ‘2016
• 적용 사례
– Microsoft’s speech recognition engine
– Skype’s Translator
• Motivation
– Efficient performance on distributed environments
CNTK
https://www.microsoft.com/en-us/research/product/cognitive-toolkit/
https://www.microsoft.com/en-us/research/blog/microsoft-computational-network-toolkit-offers-most-efficient-distributed-deep-learning-computational-performance/
11. CNTK
• 특징
– 장점
• 처리 성능의 linear scaling
– 단점
• 협소한 사용자 커뮤니티
https://www.microsoft.com/en-us/research/product/cognitive-toolkit/
출처: Microsoft Computational Network Toolkit offers most efficient distributed deep learning computational performance
https://www.microsoft.com/en-us/research/blog/microsoft-computational-network-toolkit-offers-most-efficient-distributed-
deep-learning-computational-performance/
[2015. 7]
F/W 주체 플랫폼 모바일 언어 인터페이스 OpenMP CUDA OpenCL 멀티GPU 분산
CNTK Microsoft
Linux,
Windows
- C++ Python, C++ Y Y - Y Y
12. • 주체
– Created by
• Adam Gibson @Skymind (CTO)
• Chris Nicholson @Skymind (CEO)
– Maintained by
• Skymind (https://skymind.ai/)
• 릴리즈
– Jun. ‘2014
• 적용 사례
– 은행 Fraud Detection 연구 파트너쉽 with Nextremer in Japan
(https://skymind.ai/press/nextremer)
• Motivation
– 가장 많은 프로그래머를 보유하는 Java 기반의 딥러닝 프레임워크 개발
– 추론엔진에 대해 엔터프라이즈 서비스급 안정성을 보장
DL4J
https://deeplearning4j.org/
13. DL4J
• 특징
– 장점
• Java를 기반으로 한 쉬운 이식성 및 엔터프라이즈 시스템 수준의 안전성 제공
• Spark 기반의 분산 처리 지원
• 문서화가 잘 되어 있음 / 학습 디버깅을 위한 시각화 도구 DL4J UI 제공
• 기업 대상 기술 컨설팅 제공
– 단점
• Java 언어로 인한 학습 및 테스트 과정의 번거로움
• 협소한 사용자 커뮤니티
• 부족한 예제
https://deeplearning4j.org/
F/W 주체 플랫폼 모바일 언어 인터페이스 OpenMP CUDA OpenCL 멀티GPU 분산
DL4J SkyMind
Cross-
platform
(JVM)
Android Java
Java, Scala,
Python
Y Y
- Y
Y
(Spark)
14. • 주체
– Created & Maintained by
• Francois Chollet @Google
• 릴리즈
– Mar. ‘2015
• 적용 사례
– TensorFlow (http://www.fast.ai/2017/01/03/keras)
• Motivation
– Provide a high-level interface based on deep learning framework like
Theano, TensorFlow
– Easy to use
– 최소화, 단순화, 모듈화
– 다양한 딥러닝 프레임워크와의 쉬운 연동
Keras
https://keras.io/
15. Keras
• 특징
– 장점
• 직관적인 API 인터페이스
• Caffe, Torch, TensorFlow 등 다양한 딥러닝 프레임워크 모델 import 기능 제공
• 문서화가 잘되어 있음
– 단점
• 기반 Theano 프레임워크에서 문제가 발생시 debugging이 어려움
https://keras.io/
F/W 주체 플랫폼 모바일 언어 인터페이스 OpenMP CUDA OpenCL 멀티GPU 분산
Keras
François
Chollet
Linux,
Mac,
Windows
- Python Python
Y(Theano)
N(TF)
Y
- Y
16. • 주체
– Created by
• CMU (http://www.cs.cmu.edu/~muli/file/mxnet-learning-sys.pdf)
– Maintained by
• DMLC(Distributed Machine Learning Community)
– CMU, NYU, NVIDIA, Baidu, Amazon, etc.
• 릴리즈
– Oct. ‘2015
• 적용 사례
– AWS (https://www.infoq.com/news/2016/11/amazon-mxnet-deep-learning)
• Motivation
– Support for Mixed Programming Model: Imperative & Symbolic
– Support for Portability: Desktops, Clusters, Mobiles, etc.
– Support for Multiple Languages: C++, R, Python, Matlab, Javascript, etc.
MXNet
http://mxnet.io/
17. MXNet
• 특징
– 장점
• 다양한 프로그래밍 인터페이스 제공
• 모바일 지원
• 빠르게 발전
• low-level / high-level API 모두 제공
• Imperative / Graph 프로그래밍 모델 모두 지원
http://mxnet.io/
F/W 주체 플랫폼 모바일 언어 인터페이스 OpenMP CUDA OpenCL 멀티GPU 분산
MXNet DMLC
Linux,
Mac,
Windows,
Javascript
Android,
iOS
C++
C++, Python,
Julia, MATLAB,
JavaScript, Go,
R, Scala, Perl
Y
Y
- Y Y
18. MXNet
– 단점
• 다소 처리 속도 느림
http://mxnet.io/
출처: How to run deep neural networks on weak hardware
https://www.linkedin.com/pulse/how-run-deep-neural-networks-
weak-hardware-dmytro-prylipko
출처: Benchmarking State-of-the-Art Deep Learning
Software Tools
http://dlbench.comp.hkbu.edu.hk/?v=v7
19. • 주체
– Created & Maintained by
• Google Brain
• 릴리즈
– Nov. ‘2015
• 적용 사례
– Google
• Search Signals (https://www.bloomberg.com/news/articles/2015-10-26/google-
turning-its-lucrative-web-search-over-to-ai-machines)
• Email auto-responder (https://research.googleblog.com/2015/11/computer-
respond-to-this-email.html)
• Photo Search (https://techcrunch.com/2015/11/09/google-open-sources-the-
machine-learning-tech-behind-google-photos-search-smart-reply-and-
more/#.t38yrr8:fUIZ)
• Motivation
– It’s Google
TensorFlow
https://www.tensorflow.org/
20. TensorFlow
• 특징
– 장점
• 추상화된 그래프 모델
• 학습 디버깅을 위한 시각화 도구 TensorBoard 제공
• 모바일 지원
• low-level / high-level API 모두 제공
https://www.tensorflow.org/
F/W 주체 플랫폼 모바일 언어 인터페이스 OpenMP CUDA OpenCL 멀티GPU 분산
TensorFlow Google
Linux,
Mac,
Windows
Android,
iOS
C++,
Python
Python,
C/C++, Java,
Go
N
Y
- Y Y
21. TensorFlow
– 장점
• 방대한 사용자 커뮤니티
https://www.tensorflow.org/
출처: Machine Learning Frameworks Comparison
https://blog.paperspace.com/which-ml-framework-should-i-use
22. TensorFlow
– 단점
• Define-and-Run 모델 / 런타임에 그래프 변경 안됨
• Torch에 비해 느림
https://www.tensorflow.org/
출처: soumith/convnet-benchmarks
https://github.com/soumith/convnet-benchmarks
23. • 주체
– Created by
• James Bergstra, Frederic Bastien, etc.
(http://www.iro.umontreal.ca/~lisa/pointeurs/theano_scipy2010.pdf_
– Maintained by
• LISA lab @ Université de Montréal
• 릴리즈
– Nov ‘2010
• 적용 사례
– Keras
– Lasagne
– Blocks
• Motivation
– There’s any.
Theano
http://deeplearning.net/software/theano/index.html
24. Theano
• 특징
– 장점
• low-level을 제어할 수 있는 API
• 추상화된 그래프 모델 지원
• 빠르고 유연함
• Keras, Lasagne, Blocks 등 Wrapper 프레임워크의 기반 프레임워크
– 단점
• low-level API의 복잡성
http://deeplearning.net/software/theano/index.html
F/W 주체 플랫폼 모바일 언어 인터페이스 OpenMP CUDA OpenCL 멀티GPU 분산
Theano
Université
de
Montréal
Linux,
Mac,
Windows
- Python Python
Y Y
- Y
25. • 주체
– Created & Maintained by
• Ronan Collobert: Research Scientist @ Facebook
• Clément Farabet: Senior Software Engineer @ Twitter
• Koray Kavukcuoglu: Research Scientist @ Google DeepMind
• Soumith Chinatala: Research Engineer @ Facebook
• 릴리즈
– Jul. ‘2014
• 적용 사례
– Facebook, Google, Twitter, Element Inc., etc.
• Motivation
– Unlike Caffe, for research rather than mass market
– Unlike Theano, easy to use based on imperative model rather than
symbolic model
Torch
http://torch.ch/
26. Torch
• 특징
– 장점
• 알고리즘 모듈화가 잘 되어 있어 사용이 용이
• 다양한 데이터 전처리 및 시각화 유틸리티 제공
• 간단한 Lua 프로그래밍 구문
• Imperative 프로그래밍 모델 기반의 직관적인 API
• OpenCL 지원
• 모바일 지원
– 단점
• 파이썬 인터페이스 없음(PyTorch 별도 존재)
• 문서화가 잘 안되어 있음
• 협소한 사용자 커뮤니티
• 심볼릭 모델 미제공
• 상용 어플레케이션이 아니라 연구용으로 적합
http://torch.ch/
F/W 주체 플랫폼 모바일 언어 인터페이스 OpenMP CUDA OpenCL 멀티GPU 분산
Torch
Ronan,
Clément,
Koray,
Soumith
Linux,
Mac,
Windows
Android,
iOS
C, Lua Lua Y
Y
Y Y
Not
officiall
y
28. 오픈소스 딥러닝 프레임워크
• 비교 기준
– 주요 특성
• 설치 플랫폼
• 모바일 지원
• 개발 언어
• 프로그래밍 인터페이스
• OpenMP 지원
• CUDA / OpenCL 지원
• 멀티 노드 지원
• 프로그래밍 모델
– Tech. Stack
• 시각화
• 워크플로우 관리
• Computational Graph(CG) 관리
• Multi-dimensional array 처리
• Numerical computation
• Computational Device
29. 오픈소스 딥러닝 프레임워크
• 비교 기준
– 설계
• (Which) Interface Language
• (How) Compute backprop
• (How) Update parameters
• (When) Run user codes
• (How) Optimize CG
• (How) Scale up training
30. 딥러닝 프레임워크 Sheet – 주요 특성
F/W 주체 플랫폼 모바일 언어 인터페이스 OpenMP CUDA OpenCL 멀티GPU 분산
Caffe BAIR
Linux,
Mac
- C++
Python,
MATAB
Y
Y
- Y
Chainer
Preferred
Networks
Linux - Python Python -
Y
- Y Y
CNTK Microsoft
Linux,
Windows
- C++ Python, C++ Y Y - Y Y
DL4J SkyMind
Cross-
platform
(JVM)
Android Java
Java, Scala,
Python
Y Y
- Y
Y
(Spark)
Keras
François
Chollet
Linux,
Mac,
Windows
- Python Python
Y(Theano)
N(TF)
Y
- Y
MXNet DMLC
Linux,
Mac,
Windows,
Javascript
Android,
iOS
C++
C++, Python,
Julia, MATLAB,
JavaScript, Go,
R, Scala, Perl
Y
Y
- Y Y
TensorFlow Google
Linux,
Mac,
Windows
Android,
iOS
C++,
Python
Python,
C/C++, Java,
Go
N
Y
- Y Y
Theano
Université
de
Montréal
Linux,
Mac,
Windows
- Python Python
Y Y
- Y
Torch
Ronan,
Clément,
Koray,
Soumith
Linux,
Mac,
Windows
Android,
iOS
C, Lua Lua Y
Y
Y Y
Not
officiall
y
출처: Comparison of deep learning software
https://en.wikipedia.org/wiki/Comparison_of_deep_learning_software#cite_note-29
31. 딥러닝 프레임워크 Sheet – 주요 특성
출처: Getting Started with Dep Learning
https://svds.com/getting-started-deep-learning/
32. 딥러닝 프레임워크 Sheet – Tech. Stack
• 딥러닝 프레임워크의 주요 Tech. Stack
출처: DLIF: Common design of neural network implementations
https://www.dropbox.com/s/qfz34ba3ftuli6b/AAAI2017-2-0203.pdf
33. 딥러닝 프레임워크 Sheet
Chainer
Numpy
BLAS
시각화
워크플로우 관리
CG 관리
Multi-dimensional
Array 처리
Numerical
computation
CPU
Computational
Device
시각화
워크플로우 관리
CG 관리
Multi-dimensional
Array 처리
Numerical
computation
Computational
Device
[Caffe] [Chainer]
[CNTK] [DL4J]
Cupy
cuBLAS
GPU
cuDNN
Caffe
blob
BLAS
CPU
cuBLAS
GPU
cuDNN
CNTK
BLAS
CPU
Tensor
cuBLAS
GPU
cuDNN
DL4J
ND4J
BLAS
CPU
cuBLAS
GPU
cuDNN
TensorBoard DL4J UI
출처: DLIF: Common design of neural network implementations
https://www.dropbox.com/s/qfz34ba3ftuli6b/AAAI2017-2-0203.pdf
34. 딥러닝 프레임워크 Sheet
시각화
워크플로우 관리
CG 관리
Multi-dimensional
Array 처리
Numerical
computation
Computational
Device
시각화
워크플로우 관리
CG 관리
Multi-dimensional
Array 처리
Numerical
computation
Computational
Device
[MXNet] [TensorFlow]
[Theano]
Tensor
BLAS
CPU
cuBLAS
GPU
cuDNN
visdom
TensorFlow
TF slim
Keras, Lasgane, Blocks, etc
Theano
BLAS
CPU
Numpy libgpuarray
CUDA, OpenCL
CUDA Toolkit
GPU
[Torch]
MXNet
mxnet.ndarray
BLAS
CPU
cuBLAS
GPU
cuDNN
Torch
Tensor
BLAS
CPU
cuBLAS
GPU
cuDNN
mxnet.viz TensorBoard
출처: DLIF: Common design of neural network implementations
https://www.dropbox.com/s/qfz34ba3ftuli6b/AAAI2017-2-0203.pdf
35. 딥러닝 프레임워크 Sheet – 설계
• 딥러닝 프레임워크의 설계 전략
출처: DLIF: Differences of deep learning frameworks
https://www.dropbox.com/s/6sbt9jmrwg414c8/AAAI2017-3-0331.pdf
36. 딥러닝 프레임워크 Sheet – 설계
출처: DLIF: Differences of deep learning frameworks
https://www.dropbox.com/s/6sbt9jmrwg414c8/AAAI2017-3-0331.pdf
37. 딥러닝 프레임워크 Sheet – 설계
Write NNs – in Which Languges?
Declarative
Config. File
High portability Low flexibility Caffe
Procedural
Scripting
High flexibility Low portability Others
Compute backprop – How?
Graph Easy and simple to
implement
Low flexibility Torch
Caffe
Chainer
Extended Graph High flexibility
Backprop extended
graph에도 오퍼레이
션 가능(eg.
Backrprop of
backprop)
Implementation gets
complicated
Theano
MXNet
TensorFlow
출처: DLIF: Differences of deep learning frameworks
https://www.dropbox.com/s/6sbt9jmrwg414c8/AAAI2017-3-0331.pdf
38. 딥러닝 프레임워크 Sheet – 설계
Update parameters – How to represent?
Parameters as part of operator nodes Intuitiveness Low flexibility and
reusability
Torch
Caffe
MXNet
Parameters as separate nodes in the graph High flexibility and
reusability
We can apply any
operations that can
be done for
variable nodes to
the parameters
Theano
Chainer
TensorFlow
출처: DLIF: Differences of deep learning frameworks
https://www.dropbox.com/s/6sbt9jmrwg414c8/AAAI2017-3-0331.pdf
39. 딥러닝 프레임워크 Sheet – 설계
Update parameters – How to update?
by routines outside of the graph Easy to
impelement
Low integrity Torch
Caffe
MXNet
Chainer
As a part of the graphs High integrity Implement gets
complicated
Theano
TensorFlow
출처: DLIF: Differences of deep learning frameworks
https://www.dropbox.com/s/6sbt9jmrwg414c8/AAAI2017-3-0331.pdf
40. 딥러닝 프레임워크 Sheet – 설계
Run user codes – When?
Static CG
(Define-and-run)
Easy to optimize
the computations
Low flexibility and
usability
Others
Dynamic CG
(Define-by-run)
High flexibility and
usability
Users can build
different graphs for
different iterations
using language
syntaxes
Hard to optimize the
computations
Difficult to do
optimization every
iteration due to its
computational cost
Chainer
출처: DLIF: Differences of deep learning frameworks
https://www.dropbox.com/s/6sbt9jmrwg414c8/AAAI2017-3-0331.pdf
41. 딥러닝 프레임워크 Sheet – 설계
Optimize CG – How to extend current F/W?
Transform the graph to optimize the computations Theano
TensorFlow
Provide easy ways to write custom operator nodes Torch
MXNet
Chainer
Chainer
: also provides ways to write custom CUDA
kernels with JIT compilation
출처: DLIF: Differences of deep learning frameworks
https://www.dropbox.com/s/6sbt9jmrwg414c8/AAAI2017-3-0331.pdf
Scale up training – How?
Multi-GPU All
Distributed Computations MXNet
TensorFlow
CNTK
Chainer
DL4J
Torch(Not Officially)
51. References
• Comparison of deep learning software
https://en.wikipedia.org/wiki/Comparison_of_deep_learning_software#cit
e_note-29
• Deep Learning Frameworks
https://developer.nvidia.com/deep-learning-frameworks
• Comparing Frameworks: Deeplearning4j, Torch, Theano, TensorFlow,
Caffe, Paddle, MxNet, Keras & CNTK
https://deeplearning4j.org/compare-dl4j-torch7-pylearn#caffe
• Performance of Distributed Deep Learning using ChainerMN
http://chainer.org/general/2017/02/08/Performance-of-Distributed-
Deep-Learning-Using-ChainerMN.html
• Comparison with Other Frameworks
http://docs.chainer.org/en/latest/comparison.html
• Complex Neural networks made easy by Chainer
https://www.oreilly.com/learning/complex-neural-networks-made-easy-
by-chainer
52. References
• How is PyTorch different from Tensorflow?
https://hackernoon.com/how-is-pytorch-different-from-tensorflow-
2c90f44747d6
• How to run deep neural networks on weak hardware
https://www.linkedin.com/pulse/how-run-deep-neural-networks-weak-
hardware-dmytro-prylipko
• Deep Learning frameworks: a review before finishing 2016
https://medium.com/@ricardo.guerrero/deep-learning-frameworks-a-
review-before-finishing-2016-5b3ab4010b06
• Microsoft Computational Network Toolkit offers most efficient
distributed deep learning computational performance
https://www.microsoft.com/en-us/research/blog/microsoft-
computational-network-toolkit-offers-most-efficient-distributed-deep-
learning-computational-performance/
53. References
• Machine Learning Frameworks Comparison
https://blog.paperspace.com/which-ml-framework-should-i-use/
• soumith/convnet-benchmarks
https://github.com/soumith/convnet-benchmarks
• Getting Started with Deep Learning
https://svds.com/getting-started-deep-learning/
• Benchmarking State-of-the-Art Deep Learning Software Tools
http://dlbench.comp.hkbu.edu.hk/?v=v7
• DLIF: Common design of neural network implementations
https://www.dropbox.com/s/qfz34ba3ftuli6b/AAAI2017-2-0203.pdf
• DLIF: Differences of deep learning frameworks
https://www.dropbox.com/s/6sbt9jmrwg414c8/AAAI2017-3-0331.pdf
• Complex neural networks made easy by Chainer
https://www.oreilly.com/learning/complex-neural-networks-made-easy-
by-chainer
54. References
• Microsoft Computational Network Toolkit offers most efficient
distributed deep learning computational performance
https://www.microsoft.com/en-us/research/blog/microsoft-
computational-network-toolkit-offers-most-efficient-distributed-deep-
learning-computational-performance/
• Torch vs Theano
http://fastml.com/torch-vs-theano/