SlideShare a Scribd company logo
Linaro/UDS plenary
Orlando, 03-Nov-2011
David Brash
ARM Technology Update
Agenda
 ARMv7-A update
 Cortex-A7 announcement
 Energy efficient processing
 big-LITTLE: Cortex-A15 & Cortex-A7
 Eco-system development
 The architecture roadmap: ARMv7 => ARMv8
 ARMv8-A announcement at TechCon 2011
ARM Cortex-A15 Momentum
 Expanding list of ARM Partners with
designs in progress
 …and 5 other ARM partners
 Products expected in 2012
Introducing the Cortex-A7
 A highly efficient core for future smartphones
 Entry-level, some mainstream workloads
...and more
 Redefines mobile computing
 big.LITTLE processing model
Power
Performance
Cortex-A15
Cortex-A7
Cortex-A7 is ~1/6th the power,
but half the performance, at the
nominal operating point
Highest Cortex-A15
Operating Point
Highest Cortex-A7 Operating Point
Lowest Cortex-A15 Operating Point
Lowest Cortex-A7 Operating Point
Overdrive
Condition
 Full backward compatibility
with Cortex-A processors
 Feature set and software
compliant with Cortex-A15
 Virtualization
 Large Address Extensions
 Scalable and Extensible
 Multi-processor
 System Coherency
 Small
 <0.5mm2 in 28nm process
ARM Cortex-A7
RTL available Now
Cortex-A15/7 big.LITTLE Processing
Cortex-A15
MPCore
L2 Cache
CPU
Cortex-A7
MPCore
L2 Cache
CCI-400 Coherent Interconnect
CPU
CPU CPU
Interrupt Control
 Uses the right processor for the right job
 Up to 70% energy savings on common workloads
 Flexible and transparent to apps – importance of
seamless software handover
big
“Demanding tasks”
LITTLE
“Always on, always
connected tasks”
Performance and Energy-Efficiency
 Simple, in-order, 8 stage pipeline
 Performance better than today’s
mainstream, high-volume smartphones
Most energy-efficient applications processor from ARM
 Complex, out-of-order, multi-issue pipeline
 Up to 5x the performance of today’s
mainstream, high-volume smartphones
Highest performance in mobile power envelope
Cortex-A7
Cortex-A15
LITTLEbig
Q
u
e
u
e
I
s
s
u
e
I
n
t
e
g
e
r
big.LITTLE Cluster Migration Mechanics
Migration Stimulus Received
Save State
Normal Operation
Snooping Allowed
Outbound Processor (s): Cluster B
Cache Invalidate
Ready for migration
Switch State (Snoop Outbound Processor)
Inbound Processor(s): Cluster A
Outbound Processor OFF
Stimulus from OS/Virtualizer
via system firmware interface
Enable Snooping
Restore State
Normal Operation
Power Down
Power On & Reset
Disable Snooping
Clean Cache
Less than 100-cycles
Less than 20 micro-seconds
This is the “critical period”
where no work is being done on
either cluster
Cycle count is OS
dependent
Leading Software Ecosystem
 Broad support for Cortex-A processors
 100,000s of apps already optimized
 Increasing ARM focus on the platform
 1TB of physical address space
(Cortex-A7/A15 systems) meets a
wide spectrum of developer needs
 a vehicle for software development
and sharing
 Linaro key to Linux and other open-source
software and tools deployment Virtualization
and
Firmware
OS
Power Management Software
Applications and Middleware
Many ARMv7-A software developments
logically extend into ARMv8-A
 Focus for ARM system and software development
 Cortex-A15 cluster
 Cortex-A7 cluster
 Mali graphics support + Memory, IO, debug etc...
 Increasing use of “models-first”: processor, memory & IO
Cortex-A15/A7/MALI platform
CPU 0
L2Cache
Cortex-A15 Cluster
LPDDR2/DDR3
Controller
DMC-400
System Power
Debug & Trace
2012 Compute Subsystem
AMBA Extensions
Interface (Slave)
AMBA Extensions
Interface (Master)
JTAG &
Trace
PMIC/
APB Bus
CPU 2
CPU 1 CPU 3
CPU 0
L2Cache
Cortex-A7 Cluster
CPU 2
CPU 1 CPU 3
Shader
Core
0
Mali T600 series GPU
Shader
Core
1
Shader
Core
2
Shader
Core
3
Cache Coherent Interconnect (CCI-400)
DDR PHY or DDR Memory
NIC 400
CoreSight
Resources
Mgt
SMMU
L2Cache
NIC 400
On-Chip
Memories
(RAM, ROM)
Base
Peripheral
ARMv8-A (announced 27-Oct-2011)
What is ARMv8?
 Next version of the ARMv8 architecture
 First release covers the Applications profile only: ARMv8-A
 Addition of a 64-bit operating capability
 Introduction of new 64-bit execution state – AArch64
 Maintain low power heritage – critique features against PPA* impact
 ARMv7-A compatibility a critical consideration – AArch32
 Interprocessing: defined relationship between 32- and 64-bit
execution
 Maintain ARMv7-A (AArch32) momentum alongside AArch64
 Strong compatibility plus ongoing evolution
*PPA: Power Performance Area
ARMv8-A – Context
• ARMv8
• A-profile only
(at this time)
• 64-bit architecture
support
AArch64 - registers
X0 X8 X16 X24
X1 X9 X17 X25
X2 X10 X18 X26
X3 X11 X19 X27
X4 X12 X20 X28
X5 X13 X21 X29
X6 X14 X22 X30*
X7 X15 X23
EL0 EL1 EL2 EL3
Stack Ptr SP_EL0 SP_EL1 SP_EL2 SP_EL3 (PC)
Exception Link
Register
ELR_EL1 ELR_EL2 ELR_EL3
Saved/Current
Process Status
Register
SPSR_EL1 SPSR_EL2 SPSR_EL3 (CPSR)
* procedure_ LR
V0 V8 V16 V24
V1 V9 V17 V25
V2 V10 V18 V26
V3 V11 V19 V27
V4 V12 V20 V28
V5 V13 V21 V29
V6 V14 V22 V30
V7 V15 V23 V31
64-bit registers
{32-bit SP, 64-bit DP} scalar
FP / 128-bit vectors
Exception model overview
EL2
AArch32 AArch64
EL0
EL1
User
IF EL3 is 64-bit
Svc Abt Und
FIQ IRQ Sys
Hyp
User
Svc Abt Und
FIQ IRQ Sys
EL3
EL0
EL1h EL1t
EL3h EL3t
EL2h EL2t
SecureNon-secure SecureNon-secure
EL0
EL1h EL1t
„h‟andler &
„t‟hread
stack options
Svc Abt Und
FIQ IRQ Sys
Mon
IF EL3 is 32-bit
ARMv7-A
compatibility
Interprocessing:
• EL3: Secure Monitor => EL2: Hypervisor) => EL1: OS = EL0: Application
• AArch64 → AArch32 transition can occur on a transition down the hierarchy (EL3 → EL0)
• AArch32 → AArch64 transition can occur on a transition up the hierarchy (EL0 → EL3)
Interprocessing & AArch32 save/restore
R0
R1
R2
R3
R4
R5
R6
R7
R8
R9
R10
R11
R12
R13 (SP)
R14 (LR)
SP_svc
LR_svc
SP_irq
LR_irq
SP_und
LR_und
SP_fiq
LR_fiq
SP_abt
LR_abt
SP_hyp
R8_fiq
R9_fiq
R10_fiq
R11_fiq
R12_fiq
SP_mon
LR_mon
R0
R1
R2
R3
R4
R5
R6
R7
R8
R9
R10
R11
R12
R0
R1
R2
R3
R4
R5
R6
R7
R8
R9
R10
R11
R12
R0
R1
R2
R3
R4
R5
R6
R7
R8
R9
R10
R11
R12
R0
R1
R2
R3
R4
R5
R6
R7
R8
R9
R10
R11
R12
R0
R1
R2
R3
R4
R5
R6
R7
R8
R9
R10
R11
R12
R0
R1
R2
R3
R4
R5
R6
R7
R8
R9
R10
R11
R12
R0
R1
R2
R3
R4
R5
R6
R7
X16  R14_irq
X17  R13_irq
X18  R14_svc
X19  R13_svc
X20  R14_abt
X21  R13_abt
X22  R14_und
X23  R13_und
X24  R8_fiq
X25  R9_fiq
X26  R10_fiq
X27  R11_fiq
X28  R12_fiq
X29  R13_fiq
X0  R0
X1  R1
X2  R2
X3  R3
X4  R4
X5  R5
X6  R6
X7  R7
X8  R8usr
X9  R9usr
X10  R10usr
X11  R11usr
X12  R12usr
X13  R13usr
X14  R14usr
X15  R13_hyp
X30  R14_fiq
PC
A/CPSR SPSR_svc SPSR_abt SPSR_und SPSR_irq SPSR_hyp
ELR_hyp
SPSR_fiq SPSR_mon
AArch32
AArch64
SP_EL0
PSTATE
PC
SP_EL1-3
ELR_EL1-3
SPSR_EL1-3
Summary
 Cortex-A7 a highly efficient application processor
 Cortex-A7 enables big.LITTLE Processing to
expand performance and battery-life
 Seamless and transparent to application software
 ARM increasing its platform software investments
 A catalyst for many activities
 The ARM architecture roadmap is now clearer
 ARMv8-A architecture development is well advanced
(Specification release expected 2H-2012)

More Related Content

Q4.11: ARM Technology Update Plenary

  • 1. Linaro/UDS plenary Orlando, 03-Nov-2011 David Brash ARM Technology Update
  • 2. Agenda  ARMv7-A update  Cortex-A7 announcement  Energy efficient processing  big-LITTLE: Cortex-A15 & Cortex-A7  Eco-system development  The architecture roadmap: ARMv7 => ARMv8  ARMv8-A announcement at TechCon 2011
  • 3. ARM Cortex-A15 Momentum  Expanding list of ARM Partners with designs in progress  …and 5 other ARM partners  Products expected in 2012
  • 4. Introducing the Cortex-A7  A highly efficient core for future smartphones  Entry-level, some mainstream workloads ...and more  Redefines mobile computing  big.LITTLE processing model Power Performance Cortex-A15 Cortex-A7 Cortex-A7 is ~1/6th the power, but half the performance, at the nominal operating point Highest Cortex-A15 Operating Point Highest Cortex-A7 Operating Point Lowest Cortex-A15 Operating Point Lowest Cortex-A7 Operating Point Overdrive Condition  Full backward compatibility with Cortex-A processors  Feature set and software compliant with Cortex-A15  Virtualization  Large Address Extensions  Scalable and Extensible  Multi-processor  System Coherency  Small  <0.5mm2 in 28nm process ARM Cortex-A7 RTL available Now
  • 5. Cortex-A15/7 big.LITTLE Processing Cortex-A15 MPCore L2 Cache CPU Cortex-A7 MPCore L2 Cache CCI-400 Coherent Interconnect CPU CPU CPU Interrupt Control  Uses the right processor for the right job  Up to 70% energy savings on common workloads  Flexible and transparent to apps – importance of seamless software handover big “Demanding tasks” LITTLE “Always on, always connected tasks”
  • 6. Performance and Energy-Efficiency  Simple, in-order, 8 stage pipeline  Performance better than today’s mainstream, high-volume smartphones Most energy-efficient applications processor from ARM  Complex, out-of-order, multi-issue pipeline  Up to 5x the performance of today’s mainstream, high-volume smartphones Highest performance in mobile power envelope Cortex-A7 Cortex-A15 LITTLEbig Q u e u e I s s u e I n t e g e r
  • 7. big.LITTLE Cluster Migration Mechanics Migration Stimulus Received Save State Normal Operation Snooping Allowed Outbound Processor (s): Cluster B Cache Invalidate Ready for migration Switch State (Snoop Outbound Processor) Inbound Processor(s): Cluster A Outbound Processor OFF Stimulus from OS/Virtualizer via system firmware interface Enable Snooping Restore State Normal Operation Power Down Power On & Reset Disable Snooping Clean Cache Less than 100-cycles Less than 20 micro-seconds This is the “critical period” where no work is being done on either cluster Cycle count is OS dependent
  • 8. Leading Software Ecosystem  Broad support for Cortex-A processors  100,000s of apps already optimized  Increasing ARM focus on the platform  1TB of physical address space (Cortex-A7/A15 systems) meets a wide spectrum of developer needs  a vehicle for software development and sharing  Linaro key to Linux and other open-source software and tools deployment Virtualization and Firmware OS Power Management Software Applications and Middleware Many ARMv7-A software developments logically extend into ARMv8-A
  • 9.  Focus for ARM system and software development  Cortex-A15 cluster  Cortex-A7 cluster  Mali graphics support + Memory, IO, debug etc...  Increasing use of “models-first”: processor, memory & IO Cortex-A15/A7/MALI platform CPU 0 L2Cache Cortex-A15 Cluster LPDDR2/DDR3 Controller DMC-400 System Power Debug & Trace 2012 Compute Subsystem AMBA Extensions Interface (Slave) AMBA Extensions Interface (Master) JTAG & Trace PMIC/ APB Bus CPU 2 CPU 1 CPU 3 CPU 0 L2Cache Cortex-A7 Cluster CPU 2 CPU 1 CPU 3 Shader Core 0 Mali T600 series GPU Shader Core 1 Shader Core 2 Shader Core 3 Cache Coherent Interconnect (CCI-400) DDR PHY or DDR Memory NIC 400 CoreSight Resources Mgt SMMU L2Cache NIC 400 On-Chip Memories (RAM, ROM) Base Peripheral
  • 11. What is ARMv8?  Next version of the ARMv8 architecture  First release covers the Applications profile only: ARMv8-A  Addition of a 64-bit operating capability  Introduction of new 64-bit execution state – AArch64  Maintain low power heritage – critique features against PPA* impact  ARMv7-A compatibility a critical consideration – AArch32  Interprocessing: defined relationship between 32- and 64-bit execution  Maintain ARMv7-A (AArch32) momentum alongside AArch64  Strong compatibility plus ongoing evolution *PPA: Power Performance Area
  • 12. ARMv8-A – Context • ARMv8 • A-profile only (at this time) • 64-bit architecture support
  • 13. AArch64 - registers X0 X8 X16 X24 X1 X9 X17 X25 X2 X10 X18 X26 X3 X11 X19 X27 X4 X12 X20 X28 X5 X13 X21 X29 X6 X14 X22 X30* X7 X15 X23 EL0 EL1 EL2 EL3 Stack Ptr SP_EL0 SP_EL1 SP_EL2 SP_EL3 (PC) Exception Link Register ELR_EL1 ELR_EL2 ELR_EL3 Saved/Current Process Status Register SPSR_EL1 SPSR_EL2 SPSR_EL3 (CPSR) * procedure_ LR V0 V8 V16 V24 V1 V9 V17 V25 V2 V10 V18 V26 V3 V11 V19 V27 V4 V12 V20 V28 V5 V13 V21 V29 V6 V14 V22 V30 V7 V15 V23 V31 64-bit registers {32-bit SP, 64-bit DP} scalar FP / 128-bit vectors
  • 14. Exception model overview EL2 AArch32 AArch64 EL0 EL1 User IF EL3 is 64-bit Svc Abt Und FIQ IRQ Sys Hyp User Svc Abt Und FIQ IRQ Sys EL3 EL0 EL1h EL1t EL3h EL3t EL2h EL2t SecureNon-secure SecureNon-secure EL0 EL1h EL1t „h‟andler & „t‟hread stack options Svc Abt Und FIQ IRQ Sys Mon IF EL3 is 32-bit ARMv7-A compatibility Interprocessing: • EL3: Secure Monitor => EL2: Hypervisor) => EL1: OS = EL0: Application • AArch64 → AArch32 transition can occur on a transition down the hierarchy (EL3 → EL0) • AArch32 → AArch64 transition can occur on a transition up the hierarchy (EL0 → EL3)
  • 15. Interprocessing & AArch32 save/restore R0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 R11 R12 R13 (SP) R14 (LR) SP_svc LR_svc SP_irq LR_irq SP_und LR_und SP_fiq LR_fiq SP_abt LR_abt SP_hyp R8_fiq R9_fiq R10_fiq R11_fiq R12_fiq SP_mon LR_mon R0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 R11 R12 R0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 R11 R12 R0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 R11 R12 R0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 R11 R12 R0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 R11 R12 R0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 R11 R12 R0 R1 R2 R3 R4 R5 R6 R7 X16  R14_irq X17  R13_irq X18  R14_svc X19  R13_svc X20  R14_abt X21  R13_abt X22  R14_und X23  R13_und X24  R8_fiq X25  R9_fiq X26  R10_fiq X27  R11_fiq X28  R12_fiq X29  R13_fiq X0  R0 X1  R1 X2  R2 X3  R3 X4  R4 X5  R5 X6  R6 X7  R7 X8  R8usr X9  R9usr X10  R10usr X11  R11usr X12  R12usr X13  R13usr X14  R14usr X15  R13_hyp X30  R14_fiq PC A/CPSR SPSR_svc SPSR_abt SPSR_und SPSR_irq SPSR_hyp ELR_hyp SPSR_fiq SPSR_mon AArch32 AArch64 SP_EL0 PSTATE PC SP_EL1-3 ELR_EL1-3 SPSR_EL1-3
  • 16. Summary  Cortex-A7 a highly efficient application processor  Cortex-A7 enables big.LITTLE Processing to expand performance and battery-life  Seamless and transparent to application software  ARM increasing its platform software investments  A catalyst for many activities  The ARM architecture roadmap is now clearer  ARMv8-A architecture development is well advanced (Specification release expected 2H-2012)