SlideShare a Scribd company logo
Igor Freitas
Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on
system configuration. No computer system can be absolutely secure. Check with your system manufacturer or retailer or learn more at
Performance results are based on testing as of Aug. 20, 2017 and may not reflect all publicly available security updates. See configuration disclosure for details. No product can be
absolutely secure.
Cost reduction scenarios described are intended as examples of how a given Intel-based product, in the specified circumstances and configurations, may affect future costs and
provide cost savings. Circumstances will vary. Intel does not guarantee any costs or cost reduction.
This document contains information on products, services and/or processes in development. All information provided here is subject to change without notice. Contact your Intel
representative to obtain the latest forecast, schedule, specifications and roadmaps.
Any forecasts of goods and services needed for Intel’s operations are provided for discussion purposes only. Intel will have no liability to make any purchase in connection with
forecasts published in this document.
ARDUINO 101 and the ARDUINO infinity logo are trademarks or registered trademarks of Arduino, LLC.
Altera, Arria, the Arria logo, Intel, the Intel logo, Intel Atom, Intel Core, Intel Nervana, Intel Saffron, Iris, Movidius, OpenVINO, Stratix and Xeon are trademarks of Intel Corporation or
its subsidiaries in the U.S. and/or other countries.
*Other names and brands may be claimed as the property of others.
Copyright 2019 Intel Corporation.
Paralela em IA/ML
“tradicional” (HPC)
Paralela em
ML Frameworks
de código
Paralela em IA/ML
“tradicional” (HPC)
Paralela em
ML Frameworks
de código

Recommended for you

Real-Time Game Optimization with Intel® GPA
Real-Time Game Optimization with Intel® GPAReal-Time Game Optimization with Intel® GPA
Real-Time Game Optimization with Intel® GPA

The document discusses a presentation given by Seth Schneider from Intel and Russ Glaeser from Cascade Game Foundry. It introduces Intel's Graphics Performance Analyzers (GPA) tool and demonstrates how it was used to optimize the game Infinite Scuba developed by Cascade Game Foundry. The presentation covered an overview of GPA, details about Infinite Scuba, and a live demo of using GPA to analyze and improve performance of the game.

developergame developmentgraphics
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...

Review state-of-the-art techniques that use neural networks to synthesize motion, such as mode-adaptive neural network and phase-functioned neural networks. See how next-generation CPUs with reinforcement learning can offer better performance.

Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...

Software AI Accelerators deliver orders of magnitude performance gain for AI across deep learning, classical machine learning, and graph analytics and are key to enabling AI Everywhere. Get started on your AI Developer Journey @

Big Data Analytics
HPC != Big Data Analytics != Inteligência Artificial ?
*Other brands and names are the property of their respective owners.
FORTRAN / C++ Applications
High Performance
Java, Python, Go, etc.*
Simple to Use
Supports large scale startup
More resilient of hardware failures
Remote Storage
Local Storage
Compute & Memory Focused
High Performance Components
Storage Focused
Standard Server Components
Server Storage
Modelo de
Sistema de
Server Storage
Trends in HPC + Big Data Analytics
Business viability
Code Modernization
(Vector instructions)
Faster time-to-market
Lower costs (HPC at Cloud ? )
Better products
Easy to mantain HW & SW
Integrated solutions:
Storage + Network +
Processing + Memory
Public investments
Varied Resource Needs
Typical HPC
Big Data
Big Data & HPC
Ambientes de Produção
Small Data + Small
e.g. Data analysis
Big Data +
Small Compute
e.g. Search, Streaming,
Data Preconditioning
Small Data +
Big Compute
e.g. Mechanical Design, Multi-physics
Oil & Gas
Video Survey Traffic
Digital Health
Processor Memory Interconnect Storage
de código
Paralela em
ML Frameworks
Paralela em IA/ML
“tradicional” (HPC)

Recommended for you

Intel® Graphics Performance Analyzers
Intel® Graphics Performance AnalyzersIntel® Graphics Performance Analyzers
Intel® Graphics Performance Analyzers

The document describes Intel Graphics Performance Analyzers (Intel GPA), a free tool that allows users to optimize game performance on Windows, Android, and Ubuntu systems. Intel GPA includes tools like the System Analyzer for real-time in-game performance analysis, the Frame Analyzer for detailed frame-level analysis, and the Platform Analyzer to visualize CPU and GPU activity. It also allows experiments like changing graphics settings without code modifications to help identify performance bottlenecks.

gaminggpagraphic performance
Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...
Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...
Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...

Explore practical elements, such as performance profiling, debugging, and porting advice. Get an overview of advanced programming topics, like common design patterns, SIMD lane interoperability, data conversions, and more.

game dev trainingintel game dev trainingintel game developer program training
AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019
AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019
AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019

QuEST Global is a global engineering company that provides AI and digital transformation services using technologies like computer vision, machine learning, and deep learning. It has developed several AI solutions using Intel technologies like OpenVINO that provide accelerated inferencing on Intel CPUs. Some examples include a lung nodule detection solution to help detect early-stage lung cancer from CT scans and a vision analytics platform used for applications in retail, banking, and surveillance. The company leverages Intel's AI Builder program and ecosystem to develop, integrate, and deploy AI solutions globally.

artificial intelligencesoftwareai developer
† Formerly the Intel® Computer Vision SDK
*Other names and brands may be claimed as the property of others.
Developer personas show above represent the primary user base for each row, but are not mutually-exclusive
All products, computer systems, dates, and figures are preliminary based on current expectations, and are subject to change without notice.
OpenVINO™† Intel® Movidius™ SDK
Open Visual Inference & Neural Network Optimization toolkit for
inference deployment on CPU/GPU/FPGA/VPU using TensorFlow*,
Caffe* & MXNet*
Optimized inference deployment
for all Intel® Movidius™ VPUs using TensorFlow
& Caffe
Intel® Deep
Learning Studio‡
Open-source tool to compress
deep learning development
Now optimized for CPU Optimizations in progress
TensorFlow MXNet Caffe BigDL* (Spark) Caffe2 PyTorch CNTK PaddlePaddle
Python R Distributed
• Scikit-
• Pandas
• NumPy
• Cart
• Random
• e1071
• MlLib (on
• Mahout
* * * *
Intel distribution
optimized for
machine learning
Intel® Data Analytics
Acceleration Library
(incl machine learning)
Open-source deep neural
network functions for
CPU / integrated graphics
Intel® nGraph™ Compiler (Alpha)
Open-sourced compiler for deep learning model
computations optimized for multiple devices from
multiple frameworks
Agnostic,Complementarytomajorframeworks Cross-platformflexibility
Supports >100 Public
Models, incl. 30+
Pretrained Models
OpenCV* OpenCL™
CV Library
(Kernel & Graphic APIs)
Over 20 Customer Products Launched based
on Intel® Distribution of OpenVINO™ toolkit
Breadth of vision product portfolio
12,000+ Developers
HighPerformance,high Efficiency
Optimized media
encode/decode functions
10Optimization Notice
An open source version is available at
What’s Inside Intel® Distribution of OpenVINO™ toolkit
OpenVX and the OpenVX logo are trademarks of the Khronos Group Inc.
OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos
Intel® Architecture-Based
Platforms Support
OS Support: CentOS* 7.4 (64 bit), Ubuntu* 16.04.3 LTS (64 bit), Microsoft Windows* 10 (64 bit), Yocto Project* version Poky Jethro v2.0.3 (64 bit)
Intel® Deep Learning Deployment Toolkit Traditional Computer Vision
Model Optimizer
Convert & Optimize
Inference Engine
Optimized InferenceIR OpenCV* OpenVX*
Optimized Libraries & Code Samples
IR = Intermediate Representation file
For Intel® CPU & GPU/Intel® Processor Graphics
Increase Media/Video/Graphics Performance
Intel® Media SDK
Open Source version
Drivers & Runtimes
For GPU/Intel® Processor Graphics
Optimize Intel® FPGA (Linux* only)
FPGA RunTime Environment
(from Intel® FPGA SDK for OpenCL™)
An open source version is available at (some deep learning functions support Intel CPU/GPU only).
Tools & Libraries
Intel® Vision Accelerator
Design Products &
AI in Production/
Developer Kits
30+ Pre-trained
Computer Vision
IR = Intermediate
Representation format
Load, infer
CPU Plugin
GPU Plugin
FPGA Plugin
NCS Plugin
Convert &
Model Optimizer
▪ What it is: A python based tool to import trained models
and convert them to Intermediate representation.
▪ Why important: Optimizes for performance/space with
conservative topology transformations; biggest boost is
from conversion to data types matching hardware.
Inference Engine
▪ What it is: High-level inference API
▪ Why important: Interface is implemented as dynamically
loaded plugins for each hardware type. Delivers best
performance for each type without requiring users to
implement and maintain multiple code pathways.
Common API
(C++ / Python)
Optimized cross-
platform inference
OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos
GPU = Intel CPU with integrated graphics processing unit/Intel® Processor Graphics
GNA Plugin

Recommended for you

More explosions, more chaos, and definitely more blowing stuff up
More explosions, more chaos, and definitely more blowing stuff upMore explosions, more chaos, and definitely more blowing stuff up
More explosions, more chaos, and definitely more blowing stuff up

This document discusses optimizations and new DirectX features for Intel graphics hardware. It begins with an introduction of Avalanche Studios, the developer of the game Just Cause 3. It then discusses the use of Intel's Graphics Performance Analyzers tools to analyze Just Cause 3 and identify optimization opportunities. The document outlines several low-level shader optimizations performed, including reworking math operations, rearranging variables, and reusing intermediate values. It also discusses leveraging new DirectX features pioneered by Intel. The goal of these optimizations is to improve performance for the large install base of gamers using Intel graphics.

intel developer zonedirectxcpu
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...

See how Intel® Processor Graphics can accelerate machine learning and AI workloads to solve complex problems that were previously very difficult.

Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...
Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...
Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...

oneDNN Graph API extends oneDNN with a graph interface which reduces deep learning integration costs and maximizes compute efficiency across a variety of AI hardware including AI accelerators. Get started on your AI Developer Journey @

intelaidata science
Intel® CPUs
(Atom®, Core™, Xeon®)
Intel® CPUs
w/ Integrated Graphics
Intel® Movidius™ VPUs
& Intel® FPGAs
Future Accelerators
(Keem Bay, etc.)
Writeonce - deployAcrossIntelArchitecture - Leveragecommonalgorithms
Add to existing Intel® architectures for
accelerated DL inference capabilities
1. Intel® Distribution of OpenVINO™ toolkit: Computer
vision & deep learning inference tool with common API
2. Portfolio of hardware for computer vision & deep
learning inference, device to cloud
3. Ecosystem to cover the breadth of IoT vision systems
Reference Use Cases, AI Models,
High-level APIs, Feature Engineering, etc.
Bringing Deep Learning to Big Data
▪ Open Sourced Deep Learning Library for
Apache Spark*
▪ Make Deep learning more Accessible to Big
data users and data scientists.
▪ Feature Parity with popular DL frameworks like
Caffe, Torch, Tensorflow etc.
▪ Easy Customer and Developer Experience
▪ Run Deep learning Applications as Standard
Spark programs;
▪ Run on top of existing Spark/Hadoop clusters
(No Cluster change)
▪ High Performance powered by Intel MKL and
Multi-threaded programming.
▪ Efficient Scale out leveraging Spark
Spark Core
SQL SparkR
MLlib GraphX
ML Pipeline
For developers looking to run deep learning on Hadoop/Spark due to familiarity or analytics use
*Other names and brands may be claimed as the property of others.
All products, computer systems, dates, and figures are preliminary based on current expectations, and are subject to change without notice.
Open-source compiler enabling flexibility to run models
across a variety of frameworks and hardware
nGraph™ – Deep Learning Compiler
* *
* * * *

Recommended for you

Embree Ray Tracing Kernels | Overview and New Features | SIGGRAPH 2018 Tech S...
Embree Ray Tracing Kernels | Overview and New Features | SIGGRAPH 2018 Tech S...Embree Ray Tracing Kernels | Overview and New Features | SIGGRAPH 2018 Tech S...
Embree Ray Tracing Kernels | Overview and New Features | SIGGRAPH 2018 Tech S...

Overview of the new Embree 3 ray tracing framework, including how to use the new API, supported geometry types, and ray intersection methods. Includes a look at new features like normal oriented curves, vertex grids, etc.

siggraphgame development
Simple Single Instruction Multiple Data (SIMD) with the Intel® Implicit SPMD ...
Simple Single Instruction Multiple Data (SIMD) with the Intel® Implicit SPMD ...Simple Single Instruction Multiple Data (SIMD) with the Intel® Implicit SPMD ...
Simple Single Instruction Multiple Data (SIMD) with the Intel® Implicit SPMD ...

Learn how to write fast, efficient, and maintainable vector code for the CPU with examples of simple SIMD that use the Intel® ISPC.

oneAPI: Industry Initiative & Intel Product
oneAPI: Industry Initiative & Intel ProductoneAPI: Industry Initiative & Intel Product
oneAPI: Industry Initiative & Intel Product

With the growth of AI, machine learning, and data-centric applications, the industry needs a programming model that allows developers to take advantage of rapid innovation in processor architectures. TensorFlow supports the oneAPI industry initiative and its standards-based open specification. oneAPI complements TensorFlow’s modular design and provides increased choice of hardware vendor and processor architecture, and faster support of next-generation accelerators. TensorFlow uses oneAPI today on Xeon processors and we look forward to using oneAPI to run on future Intel architectures.

inteloneapihpc solutions
Exemplo: Layer Fusion
Convolution 1x1
Convolution 3x3
Convolution 1x1
Memory Read
Memory Write
Memory Read
Memory Write
Memory Read
Memory Write
Convolution 1x1
Convolution 3x3
Fused primitive
Convolution 1x1
Memory Read
Memory Write
Memory Ops
Técnicas de otimização
Integer Matrix Multiply Performance
on Intel® Xeon® Platinum 8180 Processor
Configuration Details on Slide: 13
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors
may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit: Source: Intel
measured as of June 2017 Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not
guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel
microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Enhanced matrix multiply performance on Intel® Xeon® Scalable Processor
integer ops
Performance estimates were obtained prior to implementation of recent software patches and firmware updates intended to address exploits referred to as "Spectre" and "Meltdown." Implementation of these updates may make these results inapplicable to your device or
Programação Paralela aplicado em IA
Programação Paralela aplicado em IA
Técnicas de HPC aplicadas para IA
Job 0
Job 1
Job 2
Job 3
libnumactl kmp_affinity

Recommended for you

AIDC NY: BODO AI Presentation - 09.19.2019
AIDC NY: BODO AI Presentation - 09.19.2019AIDC NY: BODO AI Presentation - 09.19.2019
AIDC NY: BODO AI Presentation - 09.19.2019

This document discusses Bodo Inc.'s product that aims to simplify and accelerate data science workflows. It highlights common problems in data science like complex and slow analytics, segregated development and production environments, and unused data. Bodo provides a unified development and production environment where the same code can run at any scale with automatic parallelization. It integrates an analytics engine and HPC architecture to optimize Python code for performance. Bodo is presented as offering more productive, accurate and cost-effective data science compared to traditional approaches.

artificial intelligencesoftwareai developer
In The Trenches Optimizing UE4 for Intel
In The Trenches Optimizing UE4 for IntelIn The Trenches Optimizing UE4 for Intel
In The Trenches Optimizing UE4 for Intel

Jeff Rous from Intel and Niklas Smedberg from Epic Games discussed optimizing the Unreal Engine 4 (UE4) game engine for Intel processors. They described measuring performance using Intel's Graphics Performance Analyzers, common pain points like memory bandwidth and dense geometry on Intel graphics, and shader optimizations. The presentation also covered optimizing UE4 for DirectX 12, adding support for Android x86/x64, and announcing fast ASTC texture compression support in UE4.

intel developer zonegpagamedev
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...

Integrated into Intel® Advisor, Cache-aware Roofline Modeling (CARM) provides insight into how an application behaves by helping to determine a) how optimally it works on a given hardware, b) the main factors that limit performance, c) if the workload is memory or compute-bound, and d) the right strategy to improve application performance.

modern codecarmcache-aware roofline modeling
Igor Freitas 21
Centros de Excelência em Inteligência Artificial - Intel
Casos de sucesso
“Validador Cognitivo de Infrações de Trânsito”
✓ Performance 22.5x mais rápida em “Xeon Scalable Processors”
“ processamento de multas que antes levava 45 horas agora poderá ser realizado em menos de 2 horas.”
✓ Desenvolvimento do modelo matemático
“Com isso, tivemos uma acurácia de 90% no sistema, além da automação de todo o projeto”,
disse Gustavo Rocha, chefe de divisão do SERPRO,“
Thiago Oliveira, superintendente de Engenharia de
Infraestrutura do SERPRO
TensorFlow for CPU
intra_op_parallelism_threads: Nodes that can use multiple threads to parallelize their execution will schedule the
individual pieces into this pool.
inter_op_parallelism_threads: All ready nodes are scheduled in this pool.
config = tf.ConfigProto()
config.intra_op_parallelism_threads = 44
config.inter_op_parallelism_threads = 44
Aplicando técnica “Afinidade de Processos” (NUMA aware) no TensorFlow
de código
Paralela em ML
Paralela (HPC)
Paralela em
ML Frameworks
Programação Paralela aplicado em IA
Entendendo o ambiente:
• Dual socket
• AVX-512
• 16 cores / socket
• 32 threads / socket
• Total: 64 threads

Recommended for you

Intel® Open Image Denoise: Optimized CPU Denoising | SIGGRAPH 2019 Technical ...
Intel® Open Image Denoise: Optimized CPU Denoising | SIGGRAPH 2019 Technical ...Intel® Open Image Denoise: Optimized CPU Denoising | SIGGRAPH 2019 Technical ...
Intel® Open Image Denoise: Optimized CPU Denoising | SIGGRAPH 2019 Technical ...

Open Image Denoise is an open source library for denoising images rendered with ray tracing. It provides a deep learning based denoising filter that can run on any modern Intel CPU. The filter uses a convolutional neural network architecture and has been shown to improve image quality over other filters while maintaining interactive performance. The API is designed to be simple and easy to integrate into rendering applications. Future versions will include additional features like temporal coherence and support for more input buffers.

RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...
RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...
RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...

This talk focuses on the newest release in RenderMan* 22.5 and its adoption at Pixar Animation Studios* for rendering future movies. With native support for Intel® Advanced Vector Extensions, Intel® Advanced Vector Extensions 2, and Intel® Advanced Vector Extensions 512, it includes enhanced library features, debugging support, and an extensive test framework.


作者:王宗業 【視覺進化論】AI智慧視覺運算技術論壇 ►活動日期:2018/9/26(三) 13:30-16:30 ►活動網址: ►聯繫 / 886-2-2696-0055

aiopen vinointel
Programação Paralela aplicada em IA
Técnicas de HPC aplicadas para IA
Job 0
Job 1
Job 2
Job 3
libnumactl kmp_affinity
Programação Paralela aplicado em IA
Codigo de demonstração:
Convolution + reLu + maxPool +
Convolution + reLu + maxPool
Programação Paralela aplicado em IA
• Preparando o ambiente via Anaconda
“conda create –n tf-pip-2”
“pip install intel-tensorflow”
Programação Paralela aplicado em IA

Recommended for you

Accelerating AI Adoption with Partners
Accelerating AI Adoption with PartnersAccelerating AI Adoption with Partners
Accelerating AI Adoption with Partners

This session was held by Vladimir Brenner, Partner Account Manager, Disruptors & AI, Intel AI at the Dive into H2O: London training on June 17, 2019. Please find the recording here:

artificial intelligencemachine learningdata science
Intel Powered AI Applications for Telco
Intel Powered AI Applications for TelcoIntel Powered AI Applications for Telco
Intel Powered AI Applications for Telco

In this talk, Tong will start with the current landscape and typical use cases of Artificial Intelligence applications in the Telco domain. Then, she will introduce Intel’s strategy and products for Network AI, including our focus areas, our hardware portfolio, software stacks, roadmaps and some case studies. Speaker: Tong Zhang, Principal Engineer and Chief Architect for AI and Analytics of the Network Platforms Group, Intel

ainetwork technologiessdn
TDC2018SP | Trilha IA - Inteligencia Artificial na Arquitetura Intel
TDC2018SP | Trilha IA - Inteligencia Artificial na Arquitetura IntelTDC2018SP | Trilha IA - Inteligencia Artificial na Arquitetura Intel
TDC2018SP | Trilha IA - Inteligencia Artificial na Arquitetura Intel

This document contains several legal notices and disclaimers from Intel regarding their products. No license is granted to any intellectual property and Intel assumes no liability relating to the sale and use of their products. Intel products are not intended for medical or life critical applications. Specifications and descriptions are subject to change without notice.

Programação Paralela aplicado em IA
“numactl -C 0-7 python”
Programação Paralela aplicado em IA
“numactl -C 0 python”
Programação Paralela aplicado em IA
numactl –C 0-15,16-31 python
• Mais cores não significa maior
• 48 threads teve mesma
performance que 64 threads
• Melhor tempo com 32 threads
(83s) – 1.22x speedup
4, 271
8, 140
16, 112
32, 83
48, 102 64, 105
0 10 20 30 40 50 60 70
64 cores modo “default”
Tempo para 64 Threads “default”: 102 segundos
Programação Paralela aplicado em IA
numactl –C 0-15,16-31 python
• KMP_BLOCKTIME: tempo em
milisegundos de espera da thread,
após executar sua tarefa, antes de
• 2.68x speedup
• Melhor tempo com 16 threads
• Melhor Performance x benefício
com 2 Threads
1, 67
2, 43
4, 41 8, 40
16, 39 32, 41
48, 50
64, 46
4, 271
8, 140
16, 112
32, 83
48, 102
64, 105
0 10 20 30 40 50 60 70
64 cores modo “default”

Recommended for you

Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSciStreamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci

Preprocess, visualize, and Build AI Faster at-Scale on Intel Architecture. Develop end-to-end AI pipelines for inferencing including data ingestion, preprocessing, and model inferencing with tabular, NLP, RecSys, video and image using Intel oneAPI AI Analytics Toolkit and other optimized libraries. Build at-scale performant pipelines with Databricks and end-to-end Xeon optimizations. Learn how to visualize with the OmniSci Immerse Platform and experience a live demonstration of the Intel Distribution of Modin and OmniSci.

intel innovationaiartificial intelligence
Accelerate Machine Learning Software on Intel Architecture
Accelerate Machine Learning Software on Intel Architecture Accelerate Machine Learning Software on Intel Architecture
Accelerate Machine Learning Software on Intel Architecture

This session presents performance data for deep learning training for image recognition that achieves greater than 24 times speedup performance with a single Intel® Xeon Phi™ processor 7250 when compared to Caffe*. In addition, we present performance data that shows training time is further reduced by 40 times the speedup with a 128-node Intel® Xeon Phi™ processor cluster over Intel® Omni-Path Architecture (Intel® OPA).

modern codeprofessionalconference
Intel python 2017
Intel python 2017Intel python 2017
Intel python 2017

Out of the box usability, scaling for HPC and big data, optimized for Intel processors. If you like what you read be sure you ♥ it below. Thank you!

machine learningdeep learningartificial intelligence
Programação Paralela aplicado em IA
export KMP_AFFINITY=granularity=fine,verbose,compact,1,0
numactl –C 0-15 python
• 16 threads : 4.86x speedup !
• Menor custo de infra-estrutura
• Mais jobs de treinamento ao
mesmo tempo
• Modelos maiores
• Sem alteração de código
64 cores modo “default”
0 10 20 30 40 50 60 70
Programação Paralela aplicado em IA
• Como as Threads são distribuídas entre os Cores e
• Impacta bandwidth: “velocidade de memória”
• Compact:
• Threads próximas entre si
• Troca de dados entre elas mais rápida
• Dados cabem na cache,
• Pouca troca de dados entre CPU e DRAM
Obrigado !

Recommended for you

Python* Scalability in Production Environments
Python* Scalability in Production EnvironmentsPython* Scalability in Production Environments
Python* Scalability in Production Environments

This document discusses scaling Python performance in production environments. It introduces the Intel Distribution for Python, which provides optimized versions of NumPy, SciPy, and Scikit-Learn using Intel MKL to accelerate linear algebra and machine learning algorithms. It also supports parallelism through MPI, TBB for multithreading, and integration with big data frameworks. Profiling tools like Intel VTune Amplifier help optimize mixed-language Python applications for Intel architectures. The goal is to make Python usable for high performance computing and big data workloads while maintaining its ease of use.

modern codeprofessionalconference
Open Source Interactive CPU Preview Rendering with Pixar's Universal Scene De...
Open Source Interactive CPU Preview Rendering with Pixar's Universal Scene De...Open Source Interactive CPU Preview Rendering with Pixar's Universal Scene De...
Open Source Interactive CPU Preview Rendering with Pixar's Universal Scene De...

Universal Scene Description* (USD) is an open source initiative developed by Pixar for fast, large scale, and universal asset management across multiple programs including Maya, Houdini, and others.

game developmentsiggraph
Intel xeon-scalable-processors-overview
Intel xeon-scalable-processors-overviewIntel xeon-scalable-processors-overview
Intel xeon-scalable-processors-overview

1. The document introduces the Intel Xeon Scalable platform, which provides the foundation for data center innovation with a 1.65x average performance boost over previous generations. 2. It highlights key advantages of the platform including scalable performance, agility in rapid service delivery, and hardware-enhanced security with near-zero performance overhead. 3. Various workload-optimized solutions are discussed that leverage the platform's performance to accelerate insights from analytics, deploy cloud infrastructure more quickly, and transform networks.

▪ Extends neural network support to include LSTM (long short-term memory) from ONNX*, TensorFlow*& MXNet*
frameworks, & 3D convolutional-based networks in preview mode (CPU-only) for non-vision use cases.
▪ Introduces Neural Network Builder API (preview), providing flexibility to create a graph from simple API calls and
directly deploy via the Inference Engine.
▪ Improves Performance - Delivers significant CPU performance boost on multicore systems through new
parallelization techniques via streams. Optimizes performance on Intel® Xeon®, Core™ & Atom processors through
INT8-based primitives for Intel® Advanced Vector Extensions (Intel® AVX-512), Intel® AVX2 & SSE4.2.
▪ Supports Raspberry Pi* hardware as a host for the Intel® Neural Compute Stick 2 (preview). Offload your deep
learning workloads to this low-cost, low-power USB.
▪ Adds 3 new optimized pretrained models (for a total of 30+): Text detection of indoor/outdoor scenes, and 2
single-image super resolution networks that enhance image resolution by a factor of 3 or 4.
What’s New in Intel® Distribution of OpenVINO™ toolkit
2018 R5
See product site & release notes for more details about 2018 R4.
OpenVX and the OpenVX logo are trademarks of the Khronos Group Inc.
Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. No computer system can be absolutely
secure. Check with your system manufacturer or retailer or learn more at []..
Intel does not control or audit third-party benchmark data or the web sites referenced in this document. You should visit the referenced web site and confirm whether referenced
data are accurate.
The cost reduction scenarios described are intended to enable you to get a better understanding of how the purchase of a given Intel based product, combined with a number of
situation-specific variables, might affect future costs and savings. Circumstances will vary and there may be unaccounted-for costs related to the use and deployment of a given
product. Nothing in this document should be interpreted as either a promise of or contract for a given level of costs or cost reduction.
Software and workloads used in performance tests may have been optimized for performance only on Intel® microprocessors. Performance tests, such as SYSmark and MobileMark,
are measured using specific computer systems, components, software, operations, and functions. Any change to any of those factors may cause the results to vary. You should
consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with
other products. For more complete information visit
Tests document performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual performance. Consult
other sources of information to evaluate performance as you consider your purchase. For more complete information about performance and benchmark results, visit
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations
include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on
microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not
specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the
specific instruction sets covered by this notice. Notice Revision #20110804
Intel processors of the same SKU may vary in frequency or power as a result of natural variability in the production process.
© 2018 Intel Corporation. Intel, the Intel logo, Intel Optane and Xeon are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries.
Benchmark Segment AI/ML/DL
Benchmark type Training
Benchmark Metric Training Throughput (images/sec)
Framework BigDL master trunk with Spark 2.1.1
Topology Inception V1, VGG, ResNet-50, ResNet-152
# of Nodes 8, 16 (multiple configurations)
Platform Purley
Sockets 2S
Intel ® Xeon ® Scalable Platinum 8180 Processor (Skylake): 28-core @ 2.5
GHz (base), 3.8 GHz (max turbo), 205W
Intel ® Xeon ® Processor E5-2699v4 (Broadwell): 22-core @ 2.2 GHz (base),
3.6 GHz (max turbo), 145W
Enabled Cores Skylake: 56 per node, Broadwell: 44 per node
Total Memory Skylake: 384 GB, Broadwell: 256 GB
Memory Configuration
Skylake: 12 slots * 32 GB @ 2666 MHz Micron DDR4 RDIMMs
Broadwell: 8 slots * 32 GB @ 2400 MHz Kingston DDR4 RDIMMs
Skylake: Intel® SSD DC P3520 Series (2TB, 2.5in PCIe 3.0 x4, 3D1, MLC)
Broadwell: 8 * 3 TB Seagate HDDs
Network 1 * 10 GbE network per node
CentOS Linux reléase 7.3.1611 (Core), Linux kernel
Turbo On
Computer Type Dual-socket server
Framework Version
Topology Version
Dataset, version ImageNet, 2012; Cifar-10
Performance command
(Inception v1)
spark-submit --class --
master spark://$master_hostname:7077 --executor-cores=36
--num-executors=16 --total-executor-cores=576 --driver-
memory=60g --executor-
memory=300g $BIGDL_HOME/dist/lib/bigdl-*-SNAPSHOT-
jar-with-dependencies.jar --batchSize 2304 --learningRate
0.0896 -f hdfs:///user/root/sequence/ --
checkpoint $check_point_folder
Data setup
Data was stored on HDFS and cached in memory before
Java JDK 1.8.0 update 144
MKL Library version Intel MKL 2017
4.3X for Spark MLlib thru Intel Math Kernel Library (MKL)
▪ Spark-Perf (same for before and after): 9 nodes each with Intel® Xeon® processor E5-2697A v4 @ 2.60GHz * 2 (16 cores, 32 threads); 256 GB ; 10x SSDs; 10Gbps NIC
19x for HDFS Erasure Coding in micro workload (RawErasureCoderBenchmark) and 1.25x in Terasort, plus 50+% storage capacity saving and higher failure tolerance level.
▪ RawErasureCoderBenchmark (same for before and after): single node with Intel® Xeon® processor E5-2699 v4 @ 2.20GHz *2 (22 cores, 44 threads); 256GB; 8x HDDs; 10Gbps NIC
▪ Terasort (same for before and after): 10 nodes each with Intel® Xeon® processor E5-2699 v4 @ 2.20GHz *2 (22 cores, 44 threads); 256GB; 8x HDDs; 10Gbps NIC
5.6x for HBase off heaping read in micro workload (PE) and 1.3x in real Alibaba production workload
▪ PE (same for before and after): Intel® Xeon® Processor X5670 @ 2.93Hz *2 (6 cores, 12 threads); RAM: 150 GB; 1Gbps NIC
▪ Alibaba (same for before and after): 400 nodes cluster with Intel® Xeon® processors
1.22x Spark Shuffle File Encryption performance for TeraSort and 1.28x for BigBench
▪ Terasort (same for before and after): Single node with Intel® Xeon® Processor E5-2699 v3 @ 2.30GHz *2 (18 cores, 36 threads); 128GB; 4x SSD; 10Gbps NIC
▪ BigBench (same for before and after): 6 nodes each with Intel® Xeon® Processor E5-2699 v3 @ 2.30GHz *2 (18 cores, 36 threads); 256GB; 1x SSD; 8x SATA HDD 3TB, 10Gbps NIC
1.35X Spark Shuffle RPC encryption performance for TeraSort and 1.18x for BigBench
▪ Terasort (same for before and after): 3 nodes each with Intel® Xeon® Processor E5-2699 v3 @ 2.30GHz *2 (18 cores, 36 threads); 128GB; 4x SSD; 10Gbps NIC
▪ BigBench (same for before and after): 5 nodes. 1x head node: Intel® Xeon® Processor E5-2699 v3 @ 2.30GHz *2 (18 cores, 36 threads); 384GB; 1x SSD; 8x SATA HDD 3TB, 10Gbps NIC. 4x
worker nodes: each with Intel® Xeon® processor E5-2699 v4 @ 2.20GHz *2 (22 cores, 44 threads); 384GB; 1x SSD; 8x SATA HDD 3TB, 10Gbps NIC.
10X scalability for Word2Vec E5-2630v2 * 2, 128 GB Memory, 12x HDDs; 1000Mb NIC (14 nodes)
70X scalability for LDA (Latent Dirichlet Allocation)
▪ Intel Xeon E5-2630v2 * 2, 288GB Memory, SAS Raid5, 10Gb NIC
Optimization Notice
Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These
optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any
optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain
optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more
information regarding the specific instruction sets covered by this notice.
Notice revision #20110804

Recommended for you

Accelerate Your Apache Spark with Intel Optane DC Persistent Memory
Accelerate Your Apache Spark with Intel Optane DC Persistent MemoryAccelerate Your Apache Spark with Intel Optane DC Persistent Memory
Accelerate Your Apache Spark with Intel Optane DC Persistent Memory

The capacity of data grows rapidly in big data area, more and more memory are consumed either in the computation or holding the intermediate data for analytic jobs. For those memory intensive workloads, end-point users have to scale out the computation cluster or extend memory with storage like HDD or SSD to meet the requirement of computing tasks. For scaling out the cluster, the extra cost from cluster management, operation and maintenance will increase the total cost if the extra CPU resources are not fully utilized. To address the shortcoming above, Intel Optane DC persistent memory (Optane DCPM) breaks the traditional memory/storage hierarchy and scale up the computing server with higher capacity persistent memory. Also it brings higher bandwidth & lower latency than storage like SSD or HDD. And Apache Spark is widely used in the analytics like SQL and Machine Learning on the cloud environment. For cloud environment, low performance of remote data access is typical a stop gap for users especially for some I/O intensive queries. For the ML workload, it's an iterative model which I/O bandwidth is the key to the end-2-end performance. In this talk, we will introduce how to accelerate Spark SQL with OAP ( to accelerate SQL performance on Cloud to archive 8X performance gain and RDD cache to improve K-means performance with 2.5X performance gain leveraging Intel Optane DCPM. Also we will have a deep dive how Optane DCPM for these performance gains. Speakers: Cheng Xu, Piotr Balcer

QATCodec: past, present and future
QATCodec: past, present and futureQATCodec: past, present and future
QATCodec: past, present and future

Apache CarbonData & Spark meetup "QATCodec: past, present and future" if from INTEL Apache Spark™ is a unified analytics engine for large-scale data processing. CarbonData is a high-performance data solution that supports various data analytic scenarios, including BI analysis, ad-hoc SQL query, fast filter lookup on detail record, streaming analytics, and so on. CarbonData has been deployed in many enterprise production environments, in one of the largest scenario it supports queries on single table with 3PB data (more than 5 trillion records) with response time less than 3 seconds!

Tuning For Deep Learning Inference with Intel® Processor Graphics | SIGGRAPH ...
Tuning For Deep Learning Inference with Intel® Processor Graphics | SIGGRAPH ...Tuning For Deep Learning Inference with Intel® Processor Graphics | SIGGRAPH ...
Tuning For Deep Learning Inference with Intel® Processor Graphics | SIGGRAPH ...

This document discusses optimizing deep learning inference on Intel processor graphics using the OpenVINOTM toolkit. Some key points include: - Running inference on client devices provides advantages over cloud like privacy, bandwidth savings, and responsiveness. - OpenVINOTM provides tools to optimize models for Intel hardware and achieve 5-10x speedups on Intel GPUs compared to CPU baselines. - A case study demonstrates optimizing a deep image matting model, reducing inference time from 2.35 seconds to 291 milliseconds on Intel GPU using OpenVINOTM. - Emerging technologies like federated learning are discussed which could improve privacy for on-device inference.

game developmentsiggraph
Hardware DRAM 192GB (12x 16GB DDR4) 768GB (24x 32GB DDR4)
Apache Pass 1TB (ES2: 8 x 128GB) N/A
AEP Mode App Direct (Memkind) N/A
CPU Worker: Intel® Xeon® Platinum 8170 @ 2.10GHz (Thread(s) per core: 2, Core(s) per socket: 26, Socket(s): 2
CPU max MHz: 3700.0000 CPU min MHz: 1000.0000 L1d cache: 32K, L1i cache: 32K, L2 cache: 1024K, L3 cache:
OS 4.16.6-202.fc27.x86_64 (BKC: WW26, BIOS: SE5C620.86B.01.00.0918.062020181644)
Software OAP 1TB AEP based OAP cache 620GB DRAM based OAP cache
Hadoop 8 * HDD disk (ST1000NX0313, 1-replica uncompressed & plain encoded data on Hadoop)
Spark 1 * Driver (5GB) + 2 * Executor (62 cores, 74GB), spark.sql.oap.rowgroup.size=1MB
JDK Oracle JDK 1.8.0_161
Data Scale 2.6TB (9 queries related data is of 729.4GB in capacity)
9 I/O intensive queries (Q19,Q42,Q43,Q52,Q55, Q63,Q68,Q73,Q98)
Multi-Tenants 9 threads (Fair scheduled)
NVMe Apache Pass
Server Hardware System Details Intel® Server Board Purely Platform (2 socket)
CPU Dual Intel® Xeon® Platinum 8180 Processors, 28 core/socket, 2 sockets, 2 threads per core
Hyper-Threading Enabled
DRAM DDR4 dual rank 192GB total = 12 DIMMs 16GB@2667Mhz DDR4 dual rank 384GB total = 12 DIMMs 32GB@2667Mh
Apache Pass N/A AEP ES.2 1.5TB total = 12 DIMMs * 128GB Capacity each: Single Rank, 128GB, 15W
Apache Pass Mode N/A App-Direct
NVMe 4 x Intel P3500 1.6TB NVMe devices N/A
Network 10Gbit on board Intel NIC
Software OS Fedora 27
Kernel Kernel: 4.16.6-202.fc27.x86_64
Cassandra Version 3.11.2 release
Cassandra 4.0 trunk, with App Direct patch version 2.1, software found at
with PCJ library:
JDK Oracle Hotspot JDK (JDK1.8 u131)
Spectra/Meltdown Compliant Patched for variants 1/2/3
Number of Cassandra
1 14
Cluster Nodes One per Cluster
Garbage Collector CMS Parallel
JVM Options (difference from
Schema cqlstress-insanity-example.yaml
DataBase Size per Instance 1.25 Billion entries 100 K entries
Client(s) Hardware Number of Client machines 1 2
System Intel® Server Board model S2600WFT (2 socket)
CPU Dual Intel® Xeon® Platinum 8176M CPU @ 2.1Ghz, 28 core/socket, 2 sockets, 2 threads per core
DRAM DDR4 384GB total = 12 DIMMs 32GB@2666Mhz
Network 10Gbit on board Intel NIC
Software OS Fedora 27
Kernel Kernel: 4.16.6-202.fc27.x86_64
JDK Oracle Hotspot JDK (JDK1.8 u131)
Workload Benchmark Cassandra-Stress
Cassandra-Stress Instances 1 14
Command line to write
cassandra-stress user profile/root/cassandra_4.0/tools/cqlstress-insanity-example.yaml
ops(insert=1) n=1250000000 cl=ONE no-warmup -pop seq=1..1250000000 -mode native
cql3 -node <ip_addr> -rate threads=10
cassandra-stress user profile/root/cassandra_4.0/tools/cqlstress-insanity-example.yaml
ops(insert=1) n=100000 cl=ONE no-warmup -pop seq=1..100000 -mode native cql3 -node
<ip_addr> -rate threads=10
Command line to read
cassandra-stress user profile=/root/cassandra_4.0/tools/cqlstress-insanity-example.yaml
ops(simple1=1) duration=10m cl=ONE no-warmup -pop dist=UNIFORM(1.. 1250000000)
-mode native cql3 –node <ip_addr> -rate threads=300
cassandra-stress user profile=/root/cassandra_4.0/tools/cqlstress-insanity-example.yaml
ops(simple1=1) duration=3m cl=ONE no-warmup -pop dist=UNIFORM(1..100000) -mode
native cql3 –node <ip_addr> -rate threads=320
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Learning

More Related Content

What's hot

Intel Developer Program
Intel Developer ProgramIntel Developer Program
Intel Developer Program
Intel® Software
AIDC Summit LA: LA Drones Solution Overview
AIDC Summit LA: LA Drones Solution OverviewAIDC Summit LA: LA Drones Solution Overview
AIDC Summit LA: LA Drones Solution Overview
Intel® Software
“Acceleration of Deep Learning Using OpenVINO: 3D Seismic Case Study,” a Pres...
“Acceleration of Deep Learning Using OpenVINO: 3D Seismic Case Study,” a Pres...“Acceleration of Deep Learning Using OpenVINO: 3D Seismic Case Study,” a Pres...
“Acceleration of Deep Learning Using OpenVINO: 3D Seismic Case Study,” a Pres...
Edge AI and Vision Alliance
Real-Time Game Optimization with Intel® GPA
Real-Time Game Optimization with Intel® GPAReal-Time Game Optimization with Intel® GPA
Real-Time Game Optimization with Intel® GPA
Intel® Software
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...
Intel® Software
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...
Intel® Software
Intel® Graphics Performance Analyzers
Intel® Graphics Performance AnalyzersIntel® Graphics Performance Analyzers
Intel® Graphics Performance Analyzers
Intel® Software
Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...
Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...
Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...
Intel® Software
AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019
AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019
AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019
Intel® Software
More explosions, more chaos, and definitely more blowing stuff up
More explosions, more chaos, and definitely more blowing stuff upMore explosions, more chaos, and definitely more blowing stuff up
More explosions, more chaos, and definitely more blowing stuff up
Intel® Software
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...
Intel® Software
Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...
Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...
Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...
Intel® Software
Embree Ray Tracing Kernels | Overview and New Features | SIGGRAPH 2018 Tech S...
Embree Ray Tracing Kernels | Overview and New Features | SIGGRAPH 2018 Tech S...Embree Ray Tracing Kernels | Overview and New Features | SIGGRAPH 2018 Tech S...
Embree Ray Tracing Kernels | Overview and New Features | SIGGRAPH 2018 Tech S...
Intel® Software
Simple Single Instruction Multiple Data (SIMD) with the Intel® Implicit SPMD ...
Simple Single Instruction Multiple Data (SIMD) with the Intel® Implicit SPMD ...Simple Single Instruction Multiple Data (SIMD) with the Intel® Implicit SPMD ...
Simple Single Instruction Multiple Data (SIMD) with the Intel® Implicit SPMD ...
Intel® Software
oneAPI: Industry Initiative & Intel Product
oneAPI: Industry Initiative & Intel ProductoneAPI: Industry Initiative & Intel Product
oneAPI: Industry Initiative & Intel Product
Tyrone Systems
AIDC NY: BODO AI Presentation - 09.19.2019
AIDC NY: BODO AI Presentation - 09.19.2019AIDC NY: BODO AI Presentation - 09.19.2019
AIDC NY: BODO AI Presentation - 09.19.2019
Intel® Software
In The Trenches Optimizing UE4 for Intel
In The Trenches Optimizing UE4 for IntelIn The Trenches Optimizing UE4 for Intel
In The Trenches Optimizing UE4 for Intel
Intel® Software
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
Intel® Software
Intel® Open Image Denoise: Optimized CPU Denoising | SIGGRAPH 2019 Technical ...
Intel® Open Image Denoise: Optimized CPU Denoising | SIGGRAPH 2019 Technical ...Intel® Open Image Denoise: Optimized CPU Denoising | SIGGRAPH 2019 Technical ...
Intel® Open Image Denoise: Optimized CPU Denoising | SIGGRAPH 2019 Technical ...
Intel® Software
RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...
RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...
RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...
Intel® Software

What's hot (20)

Intel Developer Program
Intel Developer ProgramIntel Developer Program
Intel Developer Program
AIDC Summit LA: LA Drones Solution Overview
AIDC Summit LA: LA Drones Solution OverviewAIDC Summit LA: LA Drones Solution Overview
AIDC Summit LA: LA Drones Solution Overview
“Acceleration of Deep Learning Using OpenVINO: 3D Seismic Case Study,” a Pres...
“Acceleration of Deep Learning Using OpenVINO: 3D Seismic Case Study,” a Pres...“Acceleration of Deep Learning Using OpenVINO: 3D Seismic Case Study,” a Pres...
“Acceleration of Deep Learning Using OpenVINO: 3D Seismic Case Study,” a Pres...
Real-Time Game Optimization with Intel® GPA
Real-Time Game Optimization with Intel® GPAReal-Time Game Optimization with Intel® GPA
Real-Time Game Optimization with Intel® GPA
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...
Intel® Graphics Performance Analyzers
Intel® Graphics Performance AnalyzersIntel® Graphics Performance Analyzers
Intel® Graphics Performance Analyzers
Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...
Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...
Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...
AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019
AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019
AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019
More explosions, more chaos, and definitely more blowing stuff up
More explosions, more chaos, and definitely more blowing stuff upMore explosions, more chaos, and definitely more blowing stuff up
More explosions, more chaos, and definitely more blowing stuff up
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...
Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...
Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...
Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...
Embree Ray Tracing Kernels | Overview and New Features | SIGGRAPH 2018 Tech S...
Embree Ray Tracing Kernels | Overview and New Features | SIGGRAPH 2018 Tech S...Embree Ray Tracing Kernels | Overview and New Features | SIGGRAPH 2018 Tech S...
Embree Ray Tracing Kernels | Overview and New Features | SIGGRAPH 2018 Tech S...
Simple Single Instruction Multiple Data (SIMD) with the Intel® Implicit SPMD ...
Simple Single Instruction Multiple Data (SIMD) with the Intel® Implicit SPMD ...Simple Single Instruction Multiple Data (SIMD) with the Intel® Implicit SPMD ...
Simple Single Instruction Multiple Data (SIMD) with the Intel® Implicit SPMD ...
oneAPI: Industry Initiative & Intel Product
oneAPI: Industry Initiative & Intel ProductoneAPI: Industry Initiative & Intel Product
oneAPI: Industry Initiative & Intel Product
AIDC NY: BODO AI Presentation - 09.19.2019
AIDC NY: BODO AI Presentation - 09.19.2019AIDC NY: BODO AI Presentation - 09.19.2019
AIDC NY: BODO AI Presentation - 09.19.2019
In The Trenches Optimizing UE4 for Intel
In The Trenches Optimizing UE4 for IntelIn The Trenches Optimizing UE4 for Intel
In The Trenches Optimizing UE4 for Intel
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
Intel® Open Image Denoise: Optimized CPU Denoising | SIGGRAPH 2019 Technical ...
Intel® Open Image Denoise: Optimized CPU Denoising | SIGGRAPH 2019 Technical ...Intel® Open Image Denoise: Optimized CPU Denoising | SIGGRAPH 2019 Technical ...
Intel® Open Image Denoise: Optimized CPU Denoising | SIGGRAPH 2019 Technical ...
RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...
RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...
RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...

Similar to TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Learning

Accelerating AI Adoption with Partners
Accelerating AI Adoption with PartnersAccelerating AI Adoption with Partners
Accelerating AI Adoption with Partners
Sri Ambati
Intel Powered AI Applications for Telco
Intel Powered AI Applications for TelcoIntel Powered AI Applications for Telco
Intel Powered AI Applications for Telco
Michelle Holley
TDC2018SP | Trilha IA - Inteligencia Artificial na Arquitetura Intel
TDC2018SP | Trilha IA - Inteligencia Artificial na Arquitetura IntelTDC2018SP | Trilha IA - Inteligencia Artificial na Arquitetura Intel
TDC2018SP | Trilha IA - Inteligencia Artificial na Arquitetura Intel
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSciStreamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
Intel® Software
Accelerate Machine Learning Software on Intel Architecture
Accelerate Machine Learning Software on Intel Architecture Accelerate Machine Learning Software on Intel Architecture
Accelerate Machine Learning Software on Intel Architecture
Intel® Software
Intel python 2017
Intel python 2017Intel python 2017
Intel python 2017
Python* Scalability in Production Environments
Python* Scalability in Production EnvironmentsPython* Scalability in Production Environments
Python* Scalability in Production Environments
Intel® Software
Open Source Interactive CPU Preview Rendering with Pixar's Universal Scene De...
Open Source Interactive CPU Preview Rendering with Pixar's Universal Scene De...Open Source Interactive CPU Preview Rendering with Pixar's Universal Scene De...
Open Source Interactive CPU Preview Rendering with Pixar's Universal Scene De...
Intel® Software
Intel xeon-scalable-processors-overview
Intel xeon-scalable-processors-overviewIntel xeon-scalable-processors-overview
Intel xeon-scalable-processors-overview
Accelerate Your Apache Spark with Intel Optane DC Persistent Memory
Accelerate Your Apache Spark with Intel Optane DC Persistent MemoryAccelerate Your Apache Spark with Intel Optane DC Persistent Memory
Accelerate Your Apache Spark with Intel Optane DC Persistent Memory
QATCodec: past, present and future
QATCodec: past, present and futureQATCodec: past, present and future
QATCodec: past, present and future
Tuning For Deep Learning Inference with Intel® Processor Graphics | SIGGRAPH ...
Tuning For Deep Learning Inference with Intel® Processor Graphics | SIGGRAPH ...Tuning For Deep Learning Inference with Intel® Processor Graphics | SIGGRAPH ...
Tuning For Deep Learning Inference with Intel® Processor Graphics | SIGGRAPH ...
Intel® Software
E5 Intel Xeon Processor E5 Family Making the Business Case
E5 Intel Xeon Processor E5 Family Making the Business Case E5 Intel Xeon Processor E5 Family Making the Business Case
E5 Intel Xeon Processor E5 Family Making the Business Case
Intel IT Center
HPC DAY 2017 | Accelerating tomorrow's HPC and AI workflows with Intel Archit...
HPC DAY 2017 | Accelerating tomorrow's HPC and AI workflows with Intel Archit...HPC DAY 2017 | Accelerating tomorrow's HPC and AI workflows with Intel Archit...
HPC DAY 2017 | Accelerating tomorrow's HPC and AI workflows with Intel Archit...
Introduction to container networking in K8s - SDN/NFV London meetup
Introduction to container networking in K8s - SDN/NFV  London meetupIntroduction to container networking in K8s - SDN/NFV  London meetup
Introduction to container networking in K8s - SDN/NFV London meetup
Haidee McMahon
Software Development Tools for Intel® IoT Platforms
Software Development Tools for Intel® IoT PlatformsSoftware Development Tools for Intel® IoT Platforms
Software Development Tools for Intel® IoT Platforms
Intel® Software
Driving Industrial InnovationOn the Path to Exascale
Driving Industrial InnovationOn the Path to ExascaleDriving Industrial InnovationOn the Path to Exascale
Driving Industrial InnovationOn the Path to Exascale
Intel IT Center
Spring Hill (NNP-I 1000): Intel's Data Center Inference Chip
Spring Hill (NNP-I 1000): Intel's Data Center Inference ChipSpring Hill (NNP-I 1000): Intel's Data Center Inference Chip
Spring Hill (NNP-I 1000): Intel's Data Center Inference Chip
Cloud Technology: Now Entering the Business Process Phase
Cloud Technology: Now Entering the Business Process PhaseCloud Technology: Now Entering the Business Process Phase
Cloud Technology: Now Entering the Business Process Phase

Similar to TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Learning (20)

Accelerating AI Adoption with Partners
Accelerating AI Adoption with PartnersAccelerating AI Adoption with Partners
Accelerating AI Adoption with Partners
Intel Powered AI Applications for Telco
Intel Powered AI Applications for TelcoIntel Powered AI Applications for Telco
Intel Powered AI Applications for Telco
TDC2018SP | Trilha IA - Inteligencia Artificial na Arquitetura Intel
TDC2018SP | Trilha IA - Inteligencia Artificial na Arquitetura IntelTDC2018SP | Trilha IA - Inteligencia Artificial na Arquitetura Intel
TDC2018SP | Trilha IA - Inteligencia Artificial na Arquitetura Intel
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSciStreamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
Accelerate Machine Learning Software on Intel Architecture
Accelerate Machine Learning Software on Intel Architecture Accelerate Machine Learning Software on Intel Architecture
Accelerate Machine Learning Software on Intel Architecture
Intel python 2017
Intel python 2017Intel python 2017
Intel python 2017
Python* Scalability in Production Environments
Python* Scalability in Production EnvironmentsPython* Scalability in Production Environments
Python* Scalability in Production Environments
Open Source Interactive CPU Preview Rendering with Pixar's Universal Scene De...
Open Source Interactive CPU Preview Rendering with Pixar's Universal Scene De...Open Source Interactive CPU Preview Rendering with Pixar's Universal Scene De...
Open Source Interactive CPU Preview Rendering with Pixar's Universal Scene De...
Intel xeon-scalable-processors-overview
Intel xeon-scalable-processors-overviewIntel xeon-scalable-processors-overview
Intel xeon-scalable-processors-overview
Accelerate Your Apache Spark with Intel Optane DC Persistent Memory
Accelerate Your Apache Spark with Intel Optane DC Persistent MemoryAccelerate Your Apache Spark with Intel Optane DC Persistent Memory
Accelerate Your Apache Spark with Intel Optane DC Persistent Memory
QATCodec: past, present and future
QATCodec: past, present and futureQATCodec: past, present and future
QATCodec: past, present and future
Tuning For Deep Learning Inference with Intel® Processor Graphics | SIGGRAPH ...
Tuning For Deep Learning Inference with Intel® Processor Graphics | SIGGRAPH ...Tuning For Deep Learning Inference with Intel® Processor Graphics | SIGGRAPH ...
Tuning For Deep Learning Inference with Intel® Processor Graphics | SIGGRAPH ...
E5 Intel Xeon Processor E5 Family Making the Business Case
E5 Intel Xeon Processor E5 Family Making the Business Case E5 Intel Xeon Processor E5 Family Making the Business Case
E5 Intel Xeon Processor E5 Family Making the Business Case
HPC DAY 2017 | Accelerating tomorrow's HPC and AI workflows with Intel Archit...
HPC DAY 2017 | Accelerating tomorrow's HPC and AI workflows with Intel Archit...HPC DAY 2017 | Accelerating tomorrow's HPC and AI workflows with Intel Archit...
HPC DAY 2017 | Accelerating tomorrow's HPC and AI workflows with Intel Archit...
Introduction to container networking in K8s - SDN/NFV London meetup
Introduction to container networking in K8s - SDN/NFV  London meetupIntroduction to container networking in K8s - SDN/NFV  London meetup
Introduction to container networking in K8s - SDN/NFV London meetup
Software Development Tools for Intel® IoT Platforms
Software Development Tools for Intel® IoT PlatformsSoftware Development Tools for Intel® IoT Platforms
Software Development Tools for Intel® IoT Platforms
Driving Industrial InnovationOn the Path to Exascale
Driving Industrial InnovationOn the Path to ExascaleDriving Industrial InnovationOn the Path to Exascale
Driving Industrial InnovationOn the Path to Exascale
Spring Hill (NNP-I 1000): Intel's Data Center Inference Chip
Spring Hill (NNP-I 1000): Intel's Data Center Inference ChipSpring Hill (NNP-I 1000): Intel's Data Center Inference Chip
Spring Hill (NNP-I 1000): Intel's Data Center Inference Chip
Cloud Technology: Now Entering the Business Process Phase
Cloud Technology: Now Entering the Business Process PhaseCloud Technology: Now Entering the Business Process Phase
Cloud Technology: Now Entering the Business Process Phase

More from tdc-globalcode

TDC2019 Intel Software Day - Visao Computacional e IA a servico da humanidade
TDC2019 Intel Software Day - Visao Computacional e IA a servico da humanidadeTDC2019 Intel Software Day - Visao Computacional e IA a servico da humanidade
TDC2019 Intel Software Day - Visao Computacional e IA a servico da humanidade
TDC2019 Intel Software Day - ACATE - Cases de Sucesso
TDC2019 Intel Software Day - ACATE - Cases de SucessoTDC2019 Intel Software Day - ACATE - Cases de Sucesso
TDC2019 Intel Software Day - ACATE - Cases de Sucesso
TDC2019 Intel Software Day - Deteccao de objetos em tempo real com OpenVino
TDC2019 Intel Software Day - Deteccao de objetos em tempo real com OpenVinoTDC2019 Intel Software Day - Deteccao de objetos em tempo real com OpenVino
TDC2019 Intel Software Day - Deteccao de objetos em tempo real com OpenVino
TDC2019 Intel Software Day - OpenCV: Inteligencia artificial e Visao Computac...
TDC2019 Intel Software Day - OpenCV: Inteligencia artificial e Visao Computac...TDC2019 Intel Software Day - OpenCV: Inteligencia artificial e Visao Computac...
TDC2019 Intel Software Day - OpenCV: Inteligencia artificial e Visao Computac...
Trilha BigData - Banco de Dados Orientado a Grafos na Seguranca Publica
Trilha BigData - Banco de Dados Orientado a Grafos na Seguranca PublicaTrilha BigData - Banco de Dados Orientado a Grafos na Seguranca Publica
Trilha BigData - Banco de Dados Orientado a Grafos na Seguranca Publica
Trilha .Net - Programacao funcional usando f#
Trilha .Net - Programacao funcional usando f#Trilha .Net - Programacao funcional usando f#
Trilha .Net - Programacao funcional usando f#
TDC2018SP | Trilha Go - Case Easylocus
TDC2018SP | Trilha Go - Case EasylocusTDC2018SP | Trilha Go - Case Easylocus
TDC2018SP | Trilha Go - Case Easylocus
TDC2018SP | Trilha Modern Web - Para onde caminha a Web?
TDC2018SP | Trilha Modern Web - Para onde caminha a Web?TDC2018SP | Trilha Modern Web - Para onde caminha a Web?
TDC2018SP | Trilha Modern Web - Para onde caminha a Web?
TDC2018SP | Trilha Go - Clean architecture em Golang
TDC2018SP | Trilha Go - Clean architecture em GolangTDC2018SP | Trilha Go - Clean architecture em Golang
TDC2018SP | Trilha Go - Clean architecture em Golang
TDC2018SP | Trilha Go - "Go" tambem e linguagem de QA
TDC2018SP | Trilha Go - "Go" tambem e linguagem de QATDC2018SP | Trilha Go - "Go" tambem e linguagem de QA
TDC2018SP | Trilha Go - "Go" tambem e linguagem de QA
TDC2018SP | Trilha Mobile - Digital Wallets - Seguranca, inovacao e tendencia
TDC2018SP | Trilha Mobile - Digital Wallets - Seguranca, inovacao e tendenciaTDC2018SP | Trilha Mobile - Digital Wallets - Seguranca, inovacao e tendencia
TDC2018SP | Trilha Mobile - Digital Wallets - Seguranca, inovacao e tendencia
TDC2018SP | Trilha .Net - Real Time apps com Azure SignalR Service
TDC2018SP | Trilha .Net - Real Time apps com Azure SignalR ServiceTDC2018SP | Trilha .Net - Real Time apps com Azure SignalR Service
TDC2018SP | Trilha .Net - Real Time apps com Azure SignalR Service
TDC2018SP | Trilha .Net - Passado, Presente e Futuro do .NET
TDC2018SP | Trilha .Net - Passado, Presente e Futuro do .NETTDC2018SP | Trilha .Net - Passado, Presente e Futuro do .NET
TDC2018SP | Trilha .Net - Passado, Presente e Futuro do .NET
TDC2018SP | Trilha .Net - Novidades do C# 7 e 8
TDC2018SP | Trilha .Net - Novidades do C# 7 e 8TDC2018SP | Trilha .Net - Novidades do C# 7 e 8
TDC2018SP | Trilha .Net - Novidades do C# 7 e 8
TDC2018SP | Trilha .Net - Obtendo metricas com TDD utilizando build automatiz...
TDC2018SP | Trilha .Net - Obtendo metricas com TDD utilizando build automatiz...TDC2018SP | Trilha .Net - Obtendo metricas com TDD utilizando build automatiz...
TDC2018SP | Trilha .Net - Obtendo metricas com TDD utilizando build automatiz...
TDC2018SP | Trilha .Net - .NET funcional com F#
TDC2018SP | Trilha .Net - .NET funcional com F#TDC2018SP | Trilha .Net - .NET funcional com F#
TDC2018SP | Trilha .Net - .NET funcional com F#
TDC2018SP | Trilha .Net - Crie SPAs com Razor e C# usando Blazor em .Net Core
TDC2018SP | Trilha .Net - Crie SPAs com Razor e C# usando Blazor  em .Net CoreTDC2018SP | Trilha .Net - Crie SPAs com Razor e C# usando Blazor  em .Net Core
TDC2018SP | Trilha .Net - Crie SPAs com Razor e C# usando Blazor em .Net Core
TDC2018SP | Trilha .Net - Novidades do ASP.NET Core 2.1
TDC2018SP | Trilha .Net - Novidades do ASP.NET Core 2.1TDC2018SP | Trilha .Net - Novidades do ASP.NET Core 2.1
TDC2018SP | Trilha .Net - Novidades do ASP.NET Core 2.1
TDC2018SP | Trilha BigData - Big Data Governance - Como estabelecer uma Gover...
TDC2018SP | Trilha BigData - Big Data Governance - Como estabelecer uma Gover...TDC2018SP | Trilha BigData - Big Data Governance - Como estabelecer uma Gover...
TDC2018SP | Trilha BigData - Big Data Governance - Como estabelecer uma Gover...
TDC2018SP | Trilha BigData - Mais Falados - Usando a Interacao Social para a ...
TDC2018SP | Trilha BigData - Mais Falados - Usando a Interacao Social para a ...TDC2018SP | Trilha BigData - Mais Falados - Usando a Interacao Social para a ...
TDC2018SP | Trilha BigData - Mais Falados - Usando a Interacao Social para a ...

More from tdc-globalcode (20)

TDC2019 Intel Software Day - Visao Computacional e IA a servico da humanidade
TDC2019 Intel Software Day - Visao Computacional e IA a servico da humanidadeTDC2019 Intel Software Day - Visao Computacional e IA a servico da humanidade
TDC2019 Intel Software Day - Visao Computacional e IA a servico da humanidade
TDC2019 Intel Software Day - ACATE - Cases de Sucesso
TDC2019 Intel Software Day - ACATE - Cases de SucessoTDC2019 Intel Software Day - ACATE - Cases de Sucesso
TDC2019 Intel Software Day - ACATE - Cases de Sucesso
TDC2019 Intel Software Day - Deteccao de objetos em tempo real com OpenVino
TDC2019 Intel Software Day - Deteccao de objetos em tempo real com OpenVinoTDC2019 Intel Software Day - Deteccao de objetos em tempo real com OpenVino
TDC2019 Intel Software Day - Deteccao de objetos em tempo real com OpenVino
TDC2019 Intel Software Day - OpenCV: Inteligencia artificial e Visao Computac...
TDC2019 Intel Software Day - OpenCV: Inteligencia artificial e Visao Computac...TDC2019 Intel Software Day - OpenCV: Inteligencia artificial e Visao Computac...
TDC2019 Intel Software Day - OpenCV: Inteligencia artificial e Visao Computac...
Trilha BigData - Banco de Dados Orientado a Grafos na Seguranca Publica
Trilha BigData - Banco de Dados Orientado a Grafos na Seguranca PublicaTrilha BigData - Banco de Dados Orientado a Grafos na Seguranca Publica
Trilha BigData - Banco de Dados Orientado a Grafos na Seguranca Publica
Trilha .Net - Programacao funcional usando f#
Trilha .Net - Programacao funcional usando f#Trilha .Net - Programacao funcional usando f#
Trilha .Net - Programacao funcional usando f#
TDC2018SP | Trilha Go - Case Easylocus
TDC2018SP | Trilha Go - Case EasylocusTDC2018SP | Trilha Go - Case Easylocus
TDC2018SP | Trilha Go - Case Easylocus
TDC2018SP | Trilha Modern Web - Para onde caminha a Web?
TDC2018SP | Trilha Modern Web - Para onde caminha a Web?TDC2018SP | Trilha Modern Web - Para onde caminha a Web?
TDC2018SP | Trilha Modern Web - Para onde caminha a Web?
TDC2018SP | Trilha Go - Clean architecture em Golang
TDC2018SP | Trilha Go - Clean architecture em GolangTDC2018SP | Trilha Go - Clean architecture em Golang
TDC2018SP | Trilha Go - Clean architecture em Golang
TDC2018SP | Trilha Go - "Go" tambem e linguagem de QA
TDC2018SP | Trilha Go - "Go" tambem e linguagem de QATDC2018SP | Trilha Go - "Go" tambem e linguagem de QA
TDC2018SP | Trilha Go - "Go" tambem e linguagem de QA
TDC2018SP | Trilha Mobile - Digital Wallets - Seguranca, inovacao e tendencia
TDC2018SP | Trilha Mobile - Digital Wallets - Seguranca, inovacao e tendenciaTDC2018SP | Trilha Mobile - Digital Wallets - Seguranca, inovacao e tendencia
TDC2018SP | Trilha Mobile - Digital Wallets - Seguranca, inovacao e tendencia
TDC2018SP | Trilha .Net - Real Time apps com Azure SignalR Service
TDC2018SP | Trilha .Net - Real Time apps com Azure SignalR ServiceTDC2018SP | Trilha .Net - Real Time apps com Azure SignalR Service
TDC2018SP | Trilha .Net - Real Time apps com Azure SignalR Service
TDC2018SP | Trilha .Net - Passado, Presente e Futuro do .NET
TDC2018SP | Trilha .Net - Passado, Presente e Futuro do .NETTDC2018SP | Trilha .Net - Passado, Presente e Futuro do .NET
TDC2018SP | Trilha .Net - Passado, Presente e Futuro do .NET
TDC2018SP | Trilha .Net - Novidades do C# 7 e 8
TDC2018SP | Trilha .Net - Novidades do C# 7 e 8TDC2018SP | Trilha .Net - Novidades do C# 7 e 8
TDC2018SP | Trilha .Net - Novidades do C# 7 e 8
TDC2018SP | Trilha .Net - Obtendo metricas com TDD utilizando build automatiz...
TDC2018SP | Trilha .Net - Obtendo metricas com TDD utilizando build automatiz...TDC2018SP | Trilha .Net - Obtendo metricas com TDD utilizando build automatiz...
TDC2018SP | Trilha .Net - Obtendo metricas com TDD utilizando build automatiz...
TDC2018SP | Trilha .Net - .NET funcional com F#
TDC2018SP | Trilha .Net - .NET funcional com F#TDC2018SP | Trilha .Net - .NET funcional com F#
TDC2018SP | Trilha .Net - .NET funcional com F#
TDC2018SP | Trilha .Net - Crie SPAs com Razor e C# usando Blazor em .Net Core
TDC2018SP | Trilha .Net - Crie SPAs com Razor e C# usando Blazor  em .Net CoreTDC2018SP | Trilha .Net - Crie SPAs com Razor e C# usando Blazor  em .Net Core
TDC2018SP | Trilha .Net - Crie SPAs com Razor e C# usando Blazor em .Net Core
TDC2018SP | Trilha .Net - Novidades do ASP.NET Core 2.1
TDC2018SP | Trilha .Net - Novidades do ASP.NET Core 2.1TDC2018SP | Trilha .Net - Novidades do ASP.NET Core 2.1
TDC2018SP | Trilha .Net - Novidades do ASP.NET Core 2.1
TDC2018SP | Trilha BigData - Big Data Governance - Como estabelecer uma Gover...
TDC2018SP | Trilha BigData - Big Data Governance - Como estabelecer uma Gover...TDC2018SP | Trilha BigData - Big Data Governance - Como estabelecer uma Gover...
TDC2018SP | Trilha BigData - Big Data Governance - Como estabelecer uma Gover...
TDC2018SP | Trilha BigData - Mais Falados - Usando a Interacao Social para a ...
TDC2018SP | Trilha BigData - Mais Falados - Usando a Interacao Social para a ...TDC2018SP | Trilha BigData - Mais Falados - Usando a Interacao Social para a ...
TDC2018SP | Trilha BigData - Mais Falados - Usando a Interacao Social para a ...

Recently uploaded

AI Risk Management: ISO/IEC 42001, the EU AI Act, and ISO/IEC 23894
AI Risk Management: ISO/IEC 42001, the EU AI Act, and ISO/IEC 23894AI Risk Management: ISO/IEC 42001, the EU AI Act, and ISO/IEC 23894
AI Risk Management: ISO/IEC 42001, the EU AI Act, and ISO/IEC 23894
Delegation Inheritance in Odoo 17 and Its Use Cases
Delegation Inheritance in Odoo 17 and Its Use CasesDelegation Inheritance in Odoo 17 and Its Use Cases
Delegation Inheritance in Odoo 17 and Its Use Cases
Celine George
Understanding and Interpreting Teachers’ TPACK for Teaching Multimodalities i...
Understanding and Interpreting Teachers’ TPACK for Teaching Multimodalities i...Understanding and Interpreting Teachers’ TPACK for Teaching Multimodalities i...
Understanding and Interpreting Teachers’ TPACK for Teaching Multimodalities i...
Neny Isharyanti
NLC Grade 3.................................... ppt.pptx
NLC Grade 3.................................... ppt.pptxNLC Grade 3.................................... ppt.pptx
NLC Grade 3.................................... ppt.pptx
How to Handle the Separate Discount Account on Invoice in Odoo 17
How to Handle the Separate Discount Account on Invoice in Odoo 17How to Handle the Separate Discount Account on Invoice in Odoo 17
How to Handle the Separate Discount Account on Invoice in Odoo 17
Celine George
The membership Module in the Odoo 17 ERP
The membership Module in the Odoo 17 ERPThe membership Module in the Odoo 17 ERP
The membership Module in the Odoo 17 ERP
Celine George
Ardra Nakshatra (आर्द्रा): Understanding its Effects and Remedies
Ardra Nakshatra (आर्द्रा): Understanding its Effects and RemediesArdra Nakshatra (आर्द्रा): Understanding its Effects and Remedies
Ardra Nakshatra (आर्द्रा): Understanding its Effects and Remedies
Astro Pathshala
Front Desk Management in the Odoo 17 ERP
Front Desk  Management in the Odoo 17 ERPFront Desk  Management in the Odoo 17 ERP
Front Desk Management in the Odoo 17 ERP
Celine George
Is Email Marketing Really Effective In 2024?
Is Email Marketing Really Effective In 2024?Is Email Marketing Really Effective In 2024?
Is Email Marketing Really Effective In 2024?
Rakesh Jalan
How to Install Theme in the Odoo 17 ERP
How to  Install Theme in the Odoo 17 ERPHow to  Install Theme in the Odoo 17 ERP
How to Install Theme in the Odoo 17 ERP
Celine George
How to Show Sample Data in Tree and Kanban View in Odoo 17
How to Show Sample Data in Tree and Kanban View in Odoo 17How to Show Sample Data in Tree and Kanban View in Odoo 17
How to Show Sample Data in Tree and Kanban View in Odoo 17
Celine George
Chapter-2-Era-of-One-party-Dominance-Class-12-Political-Science-Notes-2 (1).pptx
Chapter-2-Era-of-One-party-Dominance-Class-12-Political-Science-Notes-2 (1).pptxChapter-2-Era-of-One-party-Dominance-Class-12-Political-Science-Notes-2 (1).pptx
Chapter-2-Era-of-One-party-Dominance-Class-12-Political-Science-Notes-2 (1).pptx
Brajeswar Paul
How to Create Sequence Numbers in Odoo 17
How to Create Sequence Numbers in Odoo 17How to Create Sequence Numbers in Odoo 17
How to Create Sequence Numbers in Odoo 17
Celine George
How to Add Colour Kanban Records in Odoo 17 Notebook
How to Add Colour Kanban Records in Odoo 17 NotebookHow to Add Colour Kanban Records in Odoo 17 Notebook
How to Add Colour Kanban Records in Odoo 17 Notebook
Celine George
National Learning Camp( Reading Intervention for grade1)
National Learning Camp( Reading Intervention for grade1)National Learning Camp( Reading Intervention for grade1)
National Learning Camp( Reading Intervention for grade1)
matatag curriculum education for Kindergarten
matatag curriculum education for Kindergartenmatatag curriculum education for Kindergarten
matatag curriculum education for Kindergarten

Recently uploaded (20)

AI Risk Management: ISO/IEC 42001, the EU AI Act, and ISO/IEC 23894
AI Risk Management: ISO/IEC 42001, the EU AI Act, and ISO/IEC 23894AI Risk Management: ISO/IEC 42001, the EU AI Act, and ISO/IEC 23894
AI Risk Management: ISO/IEC 42001, the EU AI Act, and ISO/IEC 23894
Delegation Inheritance in Odoo 17 and Its Use Cases
Delegation Inheritance in Odoo 17 and Its Use CasesDelegation Inheritance in Odoo 17 and Its Use Cases
Delegation Inheritance in Odoo 17 and Its Use Cases
Understanding and Interpreting Teachers’ TPACK for Teaching Multimodalities i...
Understanding and Interpreting Teachers’ TPACK for Teaching Multimodalities i...Understanding and Interpreting Teachers’ TPACK for Teaching Multimodalities i...
Understanding and Interpreting Teachers’ TPACK for Teaching Multimodalities i...
NLC Grade 3.................................... ppt.pptx
NLC Grade 3.................................... ppt.pptxNLC Grade 3.................................... ppt.pptx
NLC Grade 3.................................... ppt.pptx
How to Handle the Separate Discount Account on Invoice in Odoo 17
How to Handle the Separate Discount Account on Invoice in Odoo 17How to Handle the Separate Discount Account on Invoice in Odoo 17
How to Handle the Separate Discount Account on Invoice in Odoo 17
The membership Module in the Odoo 17 ERP
The membership Module in the Odoo 17 ERPThe membership Module in the Odoo 17 ERP
The membership Module in the Odoo 17 ERP
Ardra Nakshatra (आर्द्रा): Understanding its Effects and Remedies
Ardra Nakshatra (आर्द्रा): Understanding its Effects and RemediesArdra Nakshatra (आर्द्रा): Understanding its Effects and Remedies
Ardra Nakshatra (आर्द्रा): Understanding its Effects and Remedies
Front Desk Management in the Odoo 17 ERP
Front Desk  Management in the Odoo 17 ERPFront Desk  Management in the Odoo 17 ERP
Front Desk Management in the Odoo 17 ERP
Is Email Marketing Really Effective In 2024?
Is Email Marketing Really Effective In 2024?Is Email Marketing Really Effective In 2024?
Is Email Marketing Really Effective In 2024?
How to Install Theme in the Odoo 17 ERP
How to  Install Theme in the Odoo 17 ERPHow to  Install Theme in the Odoo 17 ERP
How to Install Theme in the Odoo 17 ERP
How to Show Sample Data in Tree and Kanban View in Odoo 17
How to Show Sample Data in Tree and Kanban View in Odoo 17How to Show Sample Data in Tree and Kanban View in Odoo 17
How to Show Sample Data in Tree and Kanban View in Odoo 17
Chapter-2-Era-of-One-party-Dominance-Class-12-Political-Science-Notes-2 (1).pptx
Chapter-2-Era-of-One-party-Dominance-Class-12-Political-Science-Notes-2 (1).pptxChapter-2-Era-of-One-party-Dominance-Class-12-Political-Science-Notes-2 (1).pptx
Chapter-2-Era-of-One-party-Dominance-Class-12-Political-Science-Notes-2 (1).pptx
How to Create Sequence Numbers in Odoo 17
How to Create Sequence Numbers in Odoo 17How to Create Sequence Numbers in Odoo 17
How to Create Sequence Numbers in Odoo 17
How to Add Colour Kanban Records in Odoo 17 Notebook
How to Add Colour Kanban Records in Odoo 17 NotebookHow to Add Colour Kanban Records in Odoo 17 Notebook
How to Add Colour Kanban Records in Odoo 17 Notebook
National Learning Camp( Reading Intervention for grade1)
National Learning Camp( Reading Intervention for grade1)National Learning Camp( Reading Intervention for grade1)
National Learning Camp( Reading Intervention for grade1)
“A NOSSA CA(U)SA”.                      .“A NOSSA CA(U)SA”.                      .
matatag curriculum education for Kindergarten
matatag curriculum education for Kindergartenmatatag curriculum education for Kindergarten
matatag curriculum education for Kindergarten

TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Learning

  • 2. NoticesandDisclaimers Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. No computer system can be absolutely secure. Check with your system manufacturer or retailer or learn more at Performance results are based on testing as of Aug. 20, 2017 and may not reflect all publicly available security updates. See configuration disclosure for details. No product can be absolutely secure. Cost reduction scenarios described are intended as examples of how a given Intel-based product, in the specified circumstances and configurations, may affect future costs and provide cost savings. Circumstances will vary. Intel does not guarantee any costs or cost reduction. This document contains information on products, services and/or processes in development. All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest forecast, schedule, specifications and roadmaps. Any forecasts of goods and services needed for Intel’s operations are provided for discussion purposes only. Intel will have no liability to make any purchase in connection with forecasts published in this document. ARDUINO 101 and the ARDUINO infinity logo are trademarks or registered trademarks of Arduino, LLC. Altera, Arria, the Arria logo, Intel, the Intel logo, Intel Atom, Intel Core, Intel Nervana, Intel Saffron, Iris, Movidius, OpenVINO, Stratix and Xeon are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. Copyright 2019 Intel Corporation. 2
  • 3. Programação Paralela em IA/ML Vs Programação Paralela “tradicional” (HPC) Programação Paralela em ML Frameworks Exemplo de código 3
  • 4. Programação Paralela em IA/ML Vs Programação Paralela “tradicional” (HPC) Programação Paralela em ML Frameworks Exemplo de código 4
  • 5. Big Data Analytics HPC != Big Data Analytics != Inteligência Artificial ? *Other brands and names are the property of their respective owners. FORTRAN / C++ Applications MPI High Performance Java, Python, Go, etc.* Applications Hadoop* Simple to Use SLURM Supports large scale startup YARN* More resilient of hardware failures Lustre* Remote Storage HDFS*, SPARK* Local Storage Compute & Memory Focused High Performance Components Storage Focused Standard Server Components Server Storage SSDs Switch Fabric Infrastructure Modelo de Programação Resource Manager Sistema de arquivos Hardware Server Storage HDDs Switch Ethernet Infrastructure
  • 6. Trends in HPC + Big Data Analytics Standards Business viability Performance Code Modernization (Vector instructions) Many-core FPGA, ASICs Usability Faster time-to-market Lower costs (HPC at Cloud ? ) Better products Easy to mantain HW & SW Portability Open Commom Environments Integrated solutions: Storage + Network + Processing + Memory Public investments
  • 7. Varied Resource Needs Typical HPC Workloads Typical Big Data Workloads 7 Big Data & HPC Ambientes de Produção Small Data + Small Compute e.g. Data analysis Big Data + Small Compute e.g. Search, Streaming, Data Preconditioning Small Data + Big Compute e.g. Mechanical Design, Multi-physics Data Compute High Frequency Trading Numeric Weather Simulation Oil & Gas Seismic Systemcostbalance Video Survey Traffic Monitor Personal Digital Health Systemcostbalance Processor Memory Interconnect Storage
  • 8. Exemplo de código 8 Programação Paralela em ML Frameworks Programação Paralela em IA/ML Vs Programação Paralela “tradicional” (HPC)
  • 9. Intel®AITools PortfolioofsoftwaretoolstoexpediteandenrichAIdevelopment † Formerly the Intel® Computer Vision SDK *Other names and brands may be claimed as the property of others. Developer personas show above represent the primary user base for each row, but are not mutually-exclusive All products, computer systems, dates, and figures are preliminary based on current expectations, and are subject to change without notice. TOOLKITS Application Developers libraries Data Scientists foundation Library Developers DEEPLEARNINGDEPLOYMENT OpenVINO™† Intel® Movidius™ SDK Open Visual Inference & Neural Network Optimization toolkit for inference deployment on CPU/GPU/FPGA/VPU using TensorFlow*, Caffe* & MXNet* Optimized inference deployment for all Intel® Movidius™ VPUs using TensorFlow & Caffe DEEPLEARNING Intel® Deep Learning Studio‡ Open-source tool to compress deep learning development cycle DEEPLEARNINGFRAMEWORKS Now optimized for CPU Optimizations in progress TensorFlow MXNet Caffe BigDL* (Spark) Caffe2 PyTorch CNTK PaddlePaddle MACHINELEARNINGLIBRARIES Python R Distributed • Scikit- learn • Pandas • NumPy • Cart • Random Forest • e1071 • MlLib (on Spark) • Mahout * * * * ANALYTICS,MACHINE&DEEPLEARNINGPRIMITIVES Python* DAAL MKL-DNN clDNN Intel distribution optimized for machine learning Intel® Data Analytics Acceleration Library (incl machine learning) Open-source deep neural network functions for CPU / integrated graphics DEEPLEARNINGGRAPHCOMPILER Intel® nGraph™ Compiler (Alpha) Open-sourced compiler for deep learning model computations optimized for multiple devices from multiple frameworks 9
  • 10. Intel®DistributionofOpenVINO™toolkit writeonce,deployeverywhere StrongAdoption+RapidlyExpandingCapability Agnostic,Complementarytomajorframeworks Cross-platformflexibility Supports >100 Public Models, incl. 30+ Pretrained Models D E E P L E A R N I N G C O M P U T E R V I S I O N OpenCV* OpenCL™ CV Algorithms Model Optimizer Inference Engine CV Library (Kernel & Graphic APIs) Over 20 Customer Products Launched based on Intel® Distribution of OpenVINO™ toolkit Breadth of vision product portfolio 12,000+ Developers HighPerformance,high Efficiency Optimized media encode/decode functions 10Optimization Notice An open source version is available at
  • 11. 11 What’s Inside Intel® Distribution of OpenVINO™ toolkit OpenVX and the OpenVX logo are trademarks of the Khronos Group Inc. OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos Intel® Architecture-Based Platforms Support OS Support: CentOS* 7.4 (64 bit), Ubuntu* 16.04.3 LTS (64 bit), Microsoft Windows* 10 (64 bit), Yocto Project* version Poky Jethro v2.0.3 (64 bit) Intel® Deep Learning Deployment Toolkit Traditional Computer Vision Model Optimizer Convert & Optimize Inference Engine Optimized InferenceIR OpenCV* OpenVX* Optimized Libraries & Code Samples IR = Intermediate Representation file For Intel® CPU & GPU/Intel® Processor Graphics Increase Media/Video/Graphics Performance Intel® Media SDK Open Source version OpenCL™ Drivers & Runtimes For GPU/Intel® Processor Graphics Optimize Intel® FPGA (Linux* only) FPGA RunTime Environment (from Intel® FPGA SDK for OpenCL™) Bitstreams Samples An open source version is available at (some deep learning functions support Intel CPU/GPU only). Tools & Libraries Intel® Vision Accelerator Design Products & AI in Production/ Developer Kits 30+ Pre-trained Models Computer Vision Algorithms Samples
  • 12. 12 Intel®DeepLearningDeploymentToolkit ForDeepLearningInference Caffe* TensorFlow* MxNet* .dataIR IR IR = Intermediate Representation format Load, infer CPU Plugin GPU Plugin FPGA Plugin NCS Plugin Model Optimizer Convert & Optimize Model Optimizer ▪ What it is: A python based tool to import trained models and convert them to Intermediate representation. ▪ Why important: Optimizes for performance/space with conservative topology transformations; biggest boost is from conversion to data types matching hardware. Inference Engine ▪ What it is: High-level inference API ▪ Why important: Interface is implemented as dynamically loaded plugins for each hardware type. Delivers best performance for each type without requiring users to implement and maintain multiple code pathways. Trained Models Inference Engine Common API (C++ / Python) Optimized cross- platform inference OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos GPU = Intel CPU with integrated graphics processing unit/Intel® Processor Graphics Kaldi* ONNX* GNA Plugin Extendibility C++ Extendibility OpenCL™ Extendibility OpenCL™
  • 13. Intel® CPUs (Atom®, Core™, Xeon®) Intel® CPUs w/ Integrated Graphics Intel®VISIONAcceleratorDesignProducts Intel®VisionProducts Intel® Movidius™ VPUs & Intel® FPGAs Future Accelerators (Keem Bay, etc.) Writeonce - deployAcrossIntelArchitecture - Leveragecommonalgorithms Add to existing Intel® architectures for accelerated DL inference capabilities 1. Intel® Distribution of OpenVINO™ toolkit: Computer vision & deep learning inference tool with common API 2. Portfolio of hardware for computer vision & deep learning inference, device to cloud 3. Ecosystem to cover the breadth of IoT vision systems 13
  • 15. BIGDL Bringing Deep Learning to Big Data ▪ Open Sourced Deep Learning Library for Apache Spark* ▪ Make Deep learning more Accessible to Big data users and data scientists. ▪ Feature Parity with popular DL frameworks like Caffe, Torch, Tensorflow etc. ▪ Easy Customer and Developer Experience ▪ Run Deep learning Applications as Standard Spark programs; ▪ Run on top of existing Spark/Hadoop clusters (No Cluster change) ▪ High Performance powered by Intel MKL and Multi-threaded programming. ▪ Efficient Scale out leveraging Spark architecture. Spark Core SQL SparkR Stream- ing MLlib GraphX ML Pipeline DataFrame BigDL For developers looking to run deep learning on Hadoop/Spark due to familiarity or analytics use
  • 16. *Other names and brands may be claimed as the property of others. All products, computer systems, dates, and figures are preliminary based on current expectations, and are subject to change without notice. Intel®ngraph™compiler Open-source compiler enabling flexibility to run models across a variety of frameworks and hardware nGraph™ – Deep Learning Compiler GPU Future HW Future FW * * * * * * * *
  • 17. Exemplo: Layer Fusion Convolution 1x1 Convolution 3x3 Convolution 1x1 SUM Input ReL U Output Memory Read Memory Write Continuous Software Optimizations Memory Read Memory Write Memory Read Memory Write Convolution 1x1 Convolution 3x3 Fused primitive Convolution 1x1 + SUM + ReLU Input Output Memory Read Memory Write Memory Ops reduced HPC + IA Técnicas de otimização
  • 18. Integer Matrix Multiply Performance on Intel® Xeon® Platinum 8180 Processor Configuration Details on Slide: 13 Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit: Source: Intel measured as of June 2017 Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Enhanced matrix multiply performance on Intel® Xeon® Scalable Processor Lower precision integer ops PUBLIC Performance estimates were obtained prior to implementation of recent software patches and firmware updates intended to address exploits referred to as "Spectre" and "Meltdown." Implementation of these updates may make these results inapplicable to your device or system.
  • 20. Programação Paralela aplicado em IA Técnicas de HPC aplicadas para IA Job 0 Job 1 Job 2 Job 3 12 threads 12 threads 12 threads 12 threads libnumactl kmp_affinity
  • 21. Igor Freitas 21 Centros de Excelência em Inteligência Artificial - Intel Casos de sucesso “Validador Cognitivo de Infrações de Trânsito” ✓ Performance 22.5x mais rápida em “Xeon Scalable Processors” “ processamento de multas que antes levava 45 horas agora poderá ser realizado em menos de 2 horas.” ✓ Desenvolvimento do modelo matemático “Com isso, tivemos uma acurácia de 90% no sistema, além da automação de todo o projeto”, disse Gustavo Rocha, chefe de divisão do SERPRO,“ Thiago Oliveira, superintendente de Engenharia de Infraestrutura do SERPRO
  • 22. 22 TensorFlow for CPU intra_op_parallelism_threads: Nodes that can use multiple threads to parallelize their execution will schedule the individual pieces into this pool. inter_op_parallelism_threads: All ready nodes are scheduled in this pool. config = tf.ConfigProto() config.intra_op_parallelism_threads = 44 config.inter_op_parallelism_threads = 44 tf.Session(config=config) Aplicando técnica “Afinidade de Processos” (NUMA aware) no TensorFlow Source:
  • 23. 23 Exemplo de código Programação Paralela em ML Vs Programação Paralela (HPC) Programação Paralela em ML Frameworks
  • 24. 24 Programação Paralela aplicado em IA Entendendo o ambiente: • Dual socket • AVX-512 • 16 cores / socket • 32 threads / socket • Total: 64 threads
  • 25. Programação Paralela aplicada em IA Técnicas de HPC aplicadas para IA Job 0 Job 1 Job 2 Job 3 12 threads 12 threads 12 threads 12 threads libnumactl kmp_affinity
  • 26. 26 Programação Paralela aplicado em IA Codigo de demonstração: MNIST Topologia: Convolution + reLu + maxPool + Convolution + reLu + maxPool
  • 27. 27 Programação Paralela aplicado em IA • Preparando o ambiente via Anaconda “conda create –n tf-pip-2” “pip install intel-tensorflow”
  • 28. 28 Programação Paralela aplicado em IA “python”
  • 29. 29 Programação Paralela aplicado em IA “numactl -C 0-7 python”
  • 30. 30 Programação Paralela aplicado em IA “numactl -C 0 python”
  • 31. 31 Programação Paralela aplicado em IA numactl –C 0-15,16-31 python • Mais cores não significa maior performance • 48 threads teve mesma performance que 64 threads (102s) • Melhor tempo com 32 threads (83s) – 1.22x speedup 4, 271 8, 140 16, 112 32, 83 48, 102 64, 105 0 50 100 150 200 250 300 0 10 20 30 40 50 60 70 Segundos Threads NUMACTL 64 cores modo “default” Tempo para 64 Threads “default”: 102 segundos
  • 32. 32 Programação Paralela aplicado em IA export KMP_BLOCKTIME=0 numactl –C 0-15,16-31 python • KMP_BLOCKTIME: tempo em milisegundos de espera da thread, após executar sua tarefa, antes de dormir • 2.68x speedup • Melhor tempo com 16 threads • Melhor Performance x benefício com 2 Threads 1, 67 2, 43 4, 41 8, 40 16, 39 32, 41 48, 50 64, 46 4, 271 8, 140 16, 112 32, 83 48, 102 64, 105 0 50 100 150 200 250 300 0 10 20 30 40 50 60 70 Segundos Threads NUMACTL - KMP_BLOCKTIME=0 KMP_BLOCK_TIME=0 KMP_BLOCK_TIME=Default 64 cores modo “default”
  • 33. 33 Programação Paralela aplicado em IA export KMP_BLOCKTIME=0 export KMP_AFFINITY=granularity=fine,verbose,compact,1,0 numactl –C 0-15 python • 16 threads : 4.86x speedup ! • Menor custo de infra-estrutura • Mais jobs de treinamento ao mesmo tempo • Modelos maiores • Sem alteração de código 64 cores modo “default” 0 50 100 150 200 250 300 0 10 20 30 40 50 60 70 Segundos Threads NUMACTL + KMP_BLOCKTIME=0 + AFFINITY NUMACTL NUMACTL + KMP_BLOCKTIME=0
  • 34. 34 Programação Paralela aplicado em IA KMP_AFFINITY=granularity=fine,verbose,compact,1,0 • Como as Threads são distribuídas entre os Cores e Sockets • Impacta bandwidth: “velocidade de memória” • Compact: • Threads próximas entre si • Troca de dados entre elas mais rápida • Dados cabem na cache, • Pouca troca de dados entre CPU e DRAM
  • 36. 36
  • 37. 37 ▪ Extends neural network support to include LSTM (long short-term memory) from ONNX*, TensorFlow*& MXNet* frameworks, & 3D convolutional-based networks in preview mode (CPU-only) for non-vision use cases. ▪ Introduces Neural Network Builder API (preview), providing flexibility to create a graph from simple API calls and directly deploy via the Inference Engine. ▪ Improves Performance - Delivers significant CPU performance boost on multicore systems through new parallelization techniques via streams. Optimizes performance on Intel® Xeon®, Core™ & Atom processors through INT8-based primitives for Intel® Advanced Vector Extensions (Intel® AVX-512), Intel® AVX2 & SSE4.2. ▪ Supports Raspberry Pi* hardware as a host for the Intel® Neural Compute Stick 2 (preview). Offload your deep learning workloads to this low-cost, low-power USB. ▪ Adds 3 new optimized pretrained models (for a total of 30+): Text detection of indoor/outdoor scenes, and 2 single-image super resolution networks that enhance image resolution by a factor of 3 or 4. What’s New in Intel® Distribution of OpenVINO™ toolkit 2018 R5 See product site & release notes for more details about 2018 R4. OpenVX and the OpenVX logo are trademarks of the Khronos Group Inc.
  • 38. NoticesandDisclaimers Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. No computer system can be absolutely secure. Check with your system manufacturer or retailer or learn more at [].. Intel does not control or audit third-party benchmark data or the web sites referenced in this document. You should visit the referenced web site and confirm whether referenced data are accurate. The cost reduction scenarios described are intended to enable you to get a better understanding of how the purchase of a given Intel based product, combined with a number of situation-specific variables, might affect future costs and savings. Circumstances will vary and there may be unaccounted-for costs related to the use and deployment of a given product. Nothing in this document should be interpreted as either a promise of or contract for a given level of costs or cost reduction. Software and workloads used in performance tests may have been optimized for performance only on Intel® microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations, and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit Tests document performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual performance. Consult other sources of information to evaluate performance as you consider your purchase. For more complete information about performance and benchmark results, visit Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice Revision #20110804 Intel processors of the same SKU may vary in frequency or power as a result of natural variability in the production process. INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS”. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. © 2018 Intel Corporation. Intel, the Intel logo, Intel Optane and Xeon are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries. 38
  • 39. BigDLConfigurationDetails Benchmark Segment AI/ML/DL Benchmark type Training Benchmark Metric Training Throughput (images/sec) Framework BigDL master trunk with Spark 2.1.1 Topology Inception V1, VGG, ResNet-50, ResNet-152 # of Nodes 8, 16 (multiple configurations) Platform Purley Sockets 2S Processor Intel ® Xeon ® Scalable Platinum 8180 Processor (Skylake): 28-core @ 2.5 GHz (base), 3.8 GHz (max turbo), 205W Intel ® Xeon ® Processor E5-2699v4 (Broadwell): 22-core @ 2.2 GHz (base), 3.6 GHz (max turbo), 145W Enabled Cores Skylake: 56 per node, Broadwell: 44 per node Total Memory Skylake: 384 GB, Broadwell: 256 GB Memory Configuration Skylake: 12 slots * 32 GB @ 2666 MHz Micron DDR4 RDIMMs Broadwell: 8 slots * 32 GB @ 2400 MHz Kingston DDR4 RDIMMs Storage Skylake: Intel® SSD DC P3520 Series (2TB, 2.5in PCIe 3.0 x4, 3D1, MLC) Broadwell: 8 * 3 TB Seagate HDDs Network 1 * 10 GbE network per node OS CentOS Linux reléase 7.3.1611 (Core), Linux kernel 4.7.2.el7.x86_64 HT On Turbo On Computer Type Dual-socket server Framework Version Topology Version Dataset, version ImageNet, 2012; Cifar-10 Performance command (Inception v1) spark-submit --class -- master spark://$master_hostname:7077 --executor-cores=36 --num-executors=16 --total-executor-cores=576 --driver- memory=60g --executor- memory=300g $BIGDL_HOME/dist/lib/bigdl-*-SNAPSHOT- jar-with-dependencies.jar --batchSize 2304 --learningRate 0.0896 -f hdfs:///user/root/sequence/ -- checkpoint $check_point_folder Data setup Data was stored on HDFS and cached in memory before training Java JDK 1.8.0 update 144 MKL Library version Intel MKL 2017
  • 40. SparkConfigurationDetails Configurations: 4.3X for Spark MLlib thru Intel Math Kernel Library (MKL) ▪ Spark-Perf (same for before and after): 9 nodes each with Intel® Xeon® processor E5-2697A v4 @ 2.60GHz * 2 (16 cores, 32 threads); 256 GB ; 10x SSDs; 10Gbps NIC 19x for HDFS Erasure Coding in micro workload (RawErasureCoderBenchmark) and 1.25x in Terasort, plus 50+% storage capacity saving and higher failure tolerance level. ▪ RawErasureCoderBenchmark (same for before and after): single node with Intel® Xeon® processor E5-2699 v4 @ 2.20GHz *2 (22 cores, 44 threads); 256GB; 8x HDDs; 10Gbps NIC ▪ Terasort (same for before and after): 10 nodes each with Intel® Xeon® processor E5-2699 v4 @ 2.20GHz *2 (22 cores, 44 threads); 256GB; 8x HDDs; 10Gbps NIC 5.6x for HBase off heaping read in micro workload (PE) and 1.3x in real Alibaba production workload ▪ PE (same for before and after): Intel® Xeon® Processor X5670 @ 2.93Hz *2 (6 cores, 12 threads); RAM: 150 GB; 1Gbps NIC ▪ Alibaba (same for before and after): 400 nodes cluster with Intel® Xeon® processors 1.22x Spark Shuffle File Encryption performance for TeraSort and 1.28x for BigBench ▪ Terasort (same for before and after): Single node with Intel® Xeon® Processor E5-2699 v3 @ 2.30GHz *2 (18 cores, 36 threads); 128GB; 4x SSD; 10Gbps NIC ▪ BigBench (same for before and after): 6 nodes each with Intel® Xeon® Processor E5-2699 v3 @ 2.30GHz *2 (18 cores, 36 threads); 256GB; 1x SSD; 8x SATA HDD 3TB, 10Gbps NIC 1.35X Spark Shuffle RPC encryption performance for TeraSort and 1.18x for BigBench ▪ Terasort (same for before and after): 3 nodes each with Intel® Xeon® Processor E5-2699 v3 @ 2.30GHz *2 (18 cores, 36 threads); 128GB; 4x SSD; 10Gbps NIC ▪ BigBench (same for before and after): 5 nodes. 1x head node: Intel® Xeon® Processor E5-2699 v3 @ 2.30GHz *2 (18 cores, 36 threads); 384GB; 1x SSD; 8x SATA HDD 3TB, 10Gbps NIC. 4x worker nodes: each with Intel® Xeon® processor E5-2699 v4 @ 2.20GHz *2 (22 cores, 44 threads); 384GB; 1x SSD; 8x SATA HDD 3TB, 10Gbps NIC. 10X scalability for Word2Vec E5-2630v2 * 2, 128 GB Memory, 12x HDDs; 1000Mb NIC (14 nodes) 70X scalability for LDA (Latent Dirichlet Allocation) ▪ Intel Xeon E5-2630v2 * 2, 288GB Memory, SAS Raid5, 10Gb NIC Optimization Notice Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804
  • 41. SparkSQLConfigurations 41 AEP DRAM Hardware DRAM 192GB (12x 16GB DDR4) 768GB (24x 32GB DDR4) Apache Pass 1TB (ES2: 8 x 128GB) N/A AEP Mode App Direct (Memkind) N/A SSD N/A N/A CPU Worker: Intel® Xeon® Platinum 8170 @ 2.10GHz (Thread(s) per core: 2, Core(s) per socket: 26, Socket(s): 2 CPU max MHz: 3700.0000 CPU min MHz: 1000.0000 L1d cache: 32K, L1i cache: 32K, L2 cache: 1024K, L3 cache: 36608K) OS 4.16.6-202.fc27.x86_64 (BKC: WW26, BIOS: SE5C620.86B.01.00.0918.062020181644) Software OAP 1TB AEP based OAP cache 620GB DRAM based OAP cache Hadoop 8 * HDD disk (ST1000NX0313, 1-replica uncompressed & plain encoded data on Hadoop) Spark 1 * Driver (5GB) + 2 * Executor (62 cores, 74GB), spark.sql.oap.rowgroup.size=1MB JDK Oracle JDK 1.8.0_161 Workloa d Data Scale 2.6TB (9 queries related data is of 729.4GB in capacity) TPC-DS Queries 9 I/O intensive queries (Q19,Q42,Q43,Q52,Q55, Q63,Q68,Q73,Q98) Multi-Tenants 9 threads (Fair scheduled)
  • 42. ApacheCassandraConfigurations 42 NVMe Apache Pass Server Hardware System Details Intel® Server Board Purely Platform (2 socket) CPU Dual Intel® Xeon® Platinum 8180 Processors, 28 core/socket, 2 sockets, 2 threads per core Hyper-Threading Enabled DRAM DDR4 dual rank 192GB total = 12 DIMMs 16GB@2667Mhz DDR4 dual rank 384GB total = 12 DIMMs 32GB@2667Mh Apache Pass N/A AEP ES.2 1.5TB total = 12 DIMMs * 128GB Capacity each: Single Rank, 128GB, 15W Apache Pass Mode N/A App-Direct NVMe 4 x Intel P3500 1.6TB NVMe devices N/A Network 10Gbit on board Intel NIC Software OS Fedora 27 Kernel Kernel: 4.16.6-202.fc27.x86_64 Cassandra Version 3.11.2 release Cassandra 4.0 trunk, with App Direct patch version 2.1, software found at with PCJ library: JDK Oracle Hotspot JDK (JDK1.8 u131) Spectra/Meltdown Compliant Patched for variants 1/2/3 Cassandra Parameters Number of Cassandra Instances 1 14 Cluster Nodes One per Cluster Garbage Collector CMS Parallel JVM Options (difference from default) -Xms64G -Xmx64G -Xms20G -Xmx20G -Xmn8G -XX:+UseAdaptiveSizePolicy -XX:ParallelGCThreads=5 Schema cqlstress-insanity-example.yaml DataBase Size per Instance 1.25 Billion entries 100 K entries Client(s) Hardware Number of Client machines 1 2 System Intel® Server Board model S2600WFT (2 socket) CPU Dual Intel® Xeon® Platinum 8176M CPU @ 2.1Ghz, 28 core/socket, 2 sockets, 2 threads per core DRAM DDR4 384GB total = 12 DIMMs 32GB@2666Mhz Network 10Gbit on board Intel NIC Software OS Fedora 27 Kernel Kernel: 4.16.6-202.fc27.x86_64 JDK Oracle Hotspot JDK (JDK1.8 u131) Workload Benchmark Cassandra-Stress Cassandra-Stress Instances 1 14 Command line to write database cassandra-stress user profile/root/cassandra_4.0/tools/cqlstress-insanity-example.yaml ops(insert=1) n=1250000000 cl=ONE no-warmup -pop seq=1..1250000000 -mode native cql3 -node <ip_addr> -rate threads=10 cassandra-stress user profile/root/cassandra_4.0/tools/cqlstress-insanity-example.yaml ops(insert=1) n=100000 cl=ONE no-warmup -pop seq=1..100000 -mode native cql3 -node <ip_addr> -rate threads=10 Command line to read database cassandra-stress user profile=/root/cassandra_4.0/tools/cqlstress-insanity-example.yaml ops(simple1=1) duration=10m cl=ONE no-warmup -pop dist=UNIFORM(1.. 1250000000) -mode native cql3 –node <ip_addr> -rate threads=300 cassandra-stress user profile=/root/cassandra_4.0/tools/cqlstress-insanity-example.yaml ops(simple1=1) duration=3m cl=ONE no-warmup -pop dist=UNIFORM(1..100000) -mode native cql3 –node <ip_addr> -rate threads=320