Winner of the Best Embedded Paper at DAC 2024.
Today’s embedded system application architects face the challenge of mapping to increasingly diverse compute resources including CPUs, AIEs, and FPGA accelerators. The architect must manage the mapping of the application to these compute resources while also considering details like data movement, memory structures, and data types. This results in a complex trade-space analysis of how to optimally map an application to a heterogeneous target such as the Versal FPGA SoC.
This technical talk will outline the existing system architect workflows and shows a gap in today’s SoC tools for supporting the system architect in evaluating their application mapping trade space analysis. The proposed “application explorer” tool supporting the system architect in the early analysis of application mapping to major compute resource types based on a system level stochastic model simulation. This model-based system engineering tool facilitating the system architect to iterate different design mappings and ultimately provide downstream detailed implementation teams with a definition of the scope of their functionality. The talk will then present a prototype of the concept implemented as an extension to the Mirabilis VisualSim Architect tool for a signal processing algorithm that targets a Versal FPGA SoC.
Author 1: Wesley Skeffington, AMD, Albany, NY, USA
Author 2: Surya Chongala, AMD, Hyderabad, India
Author 3: Deepak Shankar, Mirabilis Design Inc, Chennai, TN, India
Report
Share
Report
Share
1 of 58
More Related Content
Similar to Mirabilis_Presentation_DAC_June_2024.pptx
Accelerated development in Automotive E/E Systems using VisualSim ArchitectDeepak Shankar
The recent trends and developments in the automotive sector towards fully autonomous diving system and vehicle to vehicle (V2V) communication would mean a drastic increase in the number of sensors, increased number of ECUs, increased concern for safety and security. This calls for the need to perform thorough evaluations on the target system architecture, at all levels - Hardware, Software and Network. During this webinar, we show how we evaluate each of these aspects of the Automotive E/E system and take a closer look at the performance, power and functional correctness of each of the auto subsystems. We will also inject faults into the demo model, which will tell us how the automotive system would perform under failure.
The webinar also showcases various Use case examples, which includes - comparison of TSN Standards, modelling of various topology, task graph modelling, glimpses into TC10 sleep-wakeup standard and integrated software.
Webinar on Latency and throughput computation of automotive EE networkDeepak Shankar
This solution enables Architects to conduct trade-off on early planning, system sizing and network topology planning. This is part one in a three series that covers systems engineering exploration of Automotive EE Systems. technologies studied in this session include FlexRay, CAN, CAn_FD, TSN. Ethernet, ECU, Brake System, power Supply electronics, Li-Ion Batteries, ADAS and AUTOSAR.
This slides show how to utilize real-world applications to teach early architecture exploration of electronics, embedded systems, software/firmware and semiconductor using visualsim.
BNF Technology Inc. is specialized in software solution for DCS HMI, Data Historian, Predictive Analytic, Real-time Performance Monitoring solution for Process Industry.
In Electronic System design, modeling abstraction is a powerful technique that involves creating simplified representations of complex electronic systems.
VisualSim Architect allows designers to create more manageable, modular, scalable, and robust electronic systems that meet the requirements of real-world applications. By leveraging abstraction, designers can focus on the critical aspects of a system's functionality, behavior, and interface, and effectively communicate design concepts and make informed decisions.
The document discusses the key aspects of designing and managing a data center over its entire lifecycle. It outlines the importance of taking a total lifecycle approach to ensure future expandability and return on investment. It also describes Choice's expertise in data center consulting, design, deployment, and management services.
Traditional vs. SoC FPGA Design Flow A Video Pipeline Case StudyAltera Corporation
This presentation compares the impact of traditional FPGA engineering design flow to one employed with an SoC FPGA. The two approaches will be contrasted in terms of their impacts on system architecture design, debugging, risk mitigation, system integration, bring-up, feature enhancements, design obsolescence, and engineering effort. A case study is presented that explores these impacts within a video pipeline development effort.
Task allocation on many core-multi processor distributed systemDeepak Shankar
Migration of software from a single to multi-core, single to multi-thread, and integrated into a distributed system requires a knowledge of the system and scheduling algorithms. The system consists of a combination of hardware, RTOS, network, and traffic profiles. Of the 100+ popular scheduling algorithms, the majority use First Come-First Server with priority and preemption, Weight Round Robin, and Slot-based. The task allocation must take into consideration a number of factors including the hardware configuration, the RTOS scheduling, task dependency, parallel partitioning, shared resources, and memory access. Additionally, embedded system architectures always have the possibility of using custom hardware to implement tasks that may be associated with Artificial Intelligence, diagnostic or image processing.
In this Webinar, we will show you how to conduct trade-offs using a system model of the tasks and the target resources. You will learn to make decisions based on the hardware and network statistics. The statistics will assist in identifying deadlocks, bottlenecks, possible failures and hardware requirements. To estimate the best task allocation and partitioning, a discrete-event simulation with both time- and quantity-shared resource modeling is essential. The software must be defined as a UML or a task graph.
Web: www.mirabilisdesign.com
Webinar Youtube Link: https://youtu.be/ZrV39SYTWSc
Using VisualSim Architect for Semiconductor System AnalysisDeepak Shankar
Mirabilis Design provides architecture exploration software for semiconductor, electronics and embedded software. Using this modeling and simulation solution, designers could trade-off power vs performance, partition into hardware-software, optimize for timing, minimize power consumption, functional analysis and evaluate the quality of the system in the event of a failure. The outcome of this early exploration is a highly validated specification, a reference design for prospective customers to evaluate and data for certification purposes.
VisualSim has a large library of components (stochastic, hardware, software, network and RTOS) that is used to assemble models of the entire system, extremely fast and handle level of abstraction from stochastic to timing-accurate. These models are simulated against workloads and use-cases and the generated reports are used to make architecture decisions.
The document provides an overview of reconfigurable computing architectures. It discusses several leading companies in the field including Elixent, QuickSilver, Pact Corp, and Systolix. It then summarizes key reconfigurable computing architectures including D-Fabrix array, Adaptive Computing Machine (ACM), eXtreme Processing Platform (XPP), and PulseDSPTM. The ACM is based on QuickSilver's Self-Reconfigurable Gate Array (SRGA) architecture, which allows fast context switching and random access of the configuration memory.
This document provides an overview of embedded systems and embedded processors. It defines embedded systems as computing devices that perform specific focused jobs. Embedded systems are characterized by reliability, performance, power consumption, cost, size, limited user interfaces, and software upgradability. The document categorizes embedded systems as stand-alone, real-time, networked information appliances, and mobile devices. It also discusses embedded system architecture, hardware architecture, software architecture, applications, considerations in design, and types of embedded processors like general purpose processors, microcontrollers, and DSPs.
Get ready to dive into the exciting world of IoT data processing! 🌐📊
Join us for a thought-provoking webinar on "Processing: Turning IoT Data into Intelligence" hosted by industry visionary Deepak Shankar, founder of Mirabilis Design. Discover how to harness the potential of IoT devices by strategically choosing processors that optimize power, performance, and space.
In this engaging session, you'll explore key insights:
✅ Impact of processor architecture on Power-Performance-Area optimization
✅ Enabling AI and ML algorithms through precise compute and storage requirements
✅ Future trends in IoT hardware innovation
✅ Strategies for extending battery life and cost prediction through system design
Don't miss the chance to learn how to leverage a single IoT Edge processor for multiple applications and much more. This is your opportunity to gain a competitive edge in the evolving IoT landscape.
PSIM pushes the possibilities of power electronics. Discover how engineers in more than 70 countries have used PSIM to design and simulate power electronics.
The document discusses using systems intelligence and artificial intelligence/neural networks to enhance semiconductor electronic design automation (EDA) workflows by collecting telemetry data from EDA jobs and infrastructure and analyzing it using complex event processing, machine learning models, and messaging substrates to provide insights that could optimize EDA pipelines and infrastructure. The approach aims to allow both internal and external augmentation of EDA processes and environments through unsupervised and incremental learning.
This document discusses using Mentor Graphics tools like Hyperlynx SI and PI to analyze signal and power integrity for PCB designs. It covers pre-layout and post-layout simulations that can be used to develop design rules and verify signal quality. Specific analysis techniques are described like modeling IC behavior, developing termination strategies, and evaluating power delivery networks.
The document discusses using electronic system level (ESL) design methodology to validate hardware/software functionality, performance, and power requirements above the register-transfer level (RTL). It describes how ESL transaction-level models can be reused at the RTL block level and system integration phases using emulation. ESL allows validating software integration earlier and reducing RTL verification effort by finding bugs earlier in the design cycle. The document also provides an example of using an ARM Cortex-A9 transaction-level platform for virtual prototyping and software integration.
ROLE OF DIGITAL SIMULATION IN CONFIGURING NETWORK PARAMETERSDeepak Shankar
Selecting the right Ethernet standard and configuring all the network devices in the embedded systems accurately is an extremely hard and rigorous job. The configuration depends on the topology, workloads of the connected devices, processing overhead at the switches, and the external interfaces. Network calculus, mathematical models and analytical techniques provide worst case execution time (WCET), but their probability of activity is extremely wide. This leads to overdesign which leads to higher costs, power consumption, weight, and size. Simulating the network is the best way to measure the throughput of the entire system. Digital system simulation provides better latency and throughput accuracy, but the accuracy is still limited because it does not consider the latency associated with the network OS, cybersecurity processing and scheduling. In many cases, these factors can reduce the throughput by 20-40%.
In this paper, we will present our research on modeling the entire Ethernet network, including the workloads, network flow control, scheduling, switch hardware, and software. To substantially increase the coverage and compare topologies, we have developed a set of benchmarks that provides coverage for different combination of deterministic, rate-constrained, and best effort traffic. During the presentation, we will cover the benchmarks, the list of attributes required to accurately model the traffic, nodes, switches, and the scheduler settings. We will also look at the statistics and reports required to make the configuration decision. In addition, we will discuss how the model must be constructed to study the impact of future requirements, failures, network intrusions, and security detection schemes.
Key Takeaways:
1. Learn how to efficiently use network simulation to design Ethernet systems
2. Develop a reusable benchmark and associated statistics to test different configurations
3. The role and impact of the CDT slots, guard band, send slope, idle slope, shuffle scheduling, flow control and virtual channels
Pratik Shah has over 11 years of experience in electrical design and project management. He has extensive experience designing hardware systems including microcontroller and microprocessor-based boards. He has managed complex technical projects from proposal through delivery and ensured they were completed on time and under budget. He is proficient in areas like system architecture, mixed-signal circuit design, PCB layout, integration, and project management.
Sudheer Vaddi is seeking a full-time opportunity in VLSI that utilizes his skills in physical design, hardware design, computer architecture, and ASIC design/verification. He has a Master's degree in Electrical Engineering from Arizona State University and work experience as an intern at FINISAR and Analog Rails. His skills include Cadence, Synopsys, Verilog, SystemVerilog, and he has experience with projects involving microprocessor component design, ASIC design from RTL to layout, and memory hierarchy latency measurement.
Similar to Mirabilis_Presentation_DAC_June_2024.pptx (20)
Evaluating UCIe based multi-die SoC to meet timing and power Deepak Shankar
This document discusses evaluating a UCIe-based multi-die system-on-chip (SoC) using system modeling to meet timing and power constraints. It provides an overview of UCIe and how it can be used to connect multiple dies. It then describes assembling a system model in VisualSim Architect using UCIe components to analyze configurations and optimize latency, bandwidth, and power. Examples of multi-media and automotive applications using UCIe-based chiplet designs are also presented.
Compare Performance-power of Arm Cortex vs RISC-V for AI applications_oct_2021Deepak Shankar
The document discusses comparing the performance and power of ARM Cortex and RISC-V processors for AI applications. It outlines a methodology for modeling systems from the microarchitecture to SoC level using different instruction sets. Examples are provided to demonstrate how the methodology can be used to improve the accuracy of comparisons between architectures.
Capacity Planning and Power Management of Data Centers. Deepak Shankar
Key Points discussed in this webinar are:
1.How dynamic simulation can replace traditional network simulations that are slow and lack configuration and visibility to analyze performance.
2.How to avoid over or under design, cost increases, and delays.
3.How an architectural model can be used to test the capacity and power requirements of your data center or your server.
Contact us at info@mirabilisdesign.com for any queries.
Analytical, prototyping, model-based systems engineering and custom discrete-event model development of automotive networks are inaccurate, expensive, and takes too long to do detailed routing analysis, Quality-of-Service (QoS) trade-off, and bandwidth exploration. To capture the nuances of QoS, scheduling, buffer management, and network topologies, these solutions require a considerable amount of time, costs, and customization. To achieve the reliability of wiring harness, the latency and bandwidth measurements of automotive networks must be accurate, tested for failure conditions, and simulated for security breaches, traffic spikes, and translations.
Using ai for optimal time sensitive networking in avionicsDeepak Shankar
The IEEE 802.1 Time-Sensitive Networking is a standard technology to provide deterministic
routing or transmission of packets on standard Ethernet. By reserving resources for critical traffic,
and applying various queuing and shaping techniques, TSN achieves zero congestion loss for
critical data traffic. This, in turn, allows TSN to guarantee a worst-case end-to-end latency for
critical data. TSN also provides ultra-reliability for data traffic via a data packet level reliability
mechanism as well as protection against bandwidth violation, malfunctioning, malicious attacks,
etc. TSN includes reliable time synchronization, a profile of IEEE 1588, which provides the basis
for many other TSN functions.
Introduction to Architecture Exploration of Semiconductor, Embedded Systems, ...Deepak Shankar
- Identify design challenges, trade-offs, and exploration.
- Construct an architecture model using data available in documents, spreadsheets, existing code, datasheets, and future concepts.
- Analyze the model to determine the cause of a bottleneck or performance degradation
Develop High-bandwidth/low latency electronic systems for AI/ML applicationDeepak Shankar
the architecture exploration required to accurately size and implement AI/ML platforms for a wide-range of applications in automotive, radar and high-performance computing.
Webinar: Detecting Deadlocks in Electronic Systems using Time-based SimulationDeepak Shankar
Webinar: Detecting Deadlocks in Electronic Systems
Date: Nov 13th, 2019
Europe/ India Time: 11 AM CEST / 2:30 PM IST
US Time: 10 AM PT/ 1 PM ET
Register For the Webinar
Join Deepak Shankar, Founder of Mirabilis Design,
on Deadlock Detection of task graphs, using Discrete-Event Simulation.
on Thursday Nov 13th 2019
Europe/ India Time: 11 AM CEST / 2:30 PM IST
US Time: 10 AM PT/ 1 PM ET
Register For the Webinar
In Part One on Functional Analysis and Safety, we covered architecture modeling, fault injection, identification and resolution. View this Webinar, at the Mirabilis Design Video Channel. In Part Two, we focus on detecting deadlocks in systems that are time-variant. Traditional methods such as Ho-Ramamoorthy check for deadlocks in static directed graphs. In real systems, deadlocks occur from dependents missing deadlines, non-availability of resources from dependency and processing needs, multiple concurrent resource requests, criss-cross requests, stringent flow control, limited credit policies and buffer overflow. These require a dynamic, time-based simulation model to evaluate and detect deadlocks. In this Webinar, we use VisualSim Architect to assemble the task graph of the electronic; run use-cases and traffic through a time-based simulation; and evaluate the generated report to detect the source of the deadlocks.
During the webinar, you will learn to
1. Construct the system behavior using a system modeling environment
2. Run traffic and use-cases to create real-world operation
3. Evaluate the timing and resource consumption data to detect deadlocks
4. Determine the cause of the deadlocks using process and resource information
We will evaluate the simulated outcomes of an application to observe the functional coverage and design bottlenecks. Data Sampling with different test case are used to validate the correctness of the design. Example of deadlock scenarios are Multi-Core Cache Coherence, protocol and baseband Task Graphs, preemptive shared Bus and external resources such as printer, cameras and electrical drives.
Webinar on Functional Safety Analysis using Model-based System AnalysisDeepak Shankar
To learn more, visit https://www.mirabilisdesign.com or email: info (at) mirabilisdesign.com.
To meet the ISO-26262 Parts 4,5,6 Requirements.
Failure Analysis, Identification and Resolution of Electronics and Software
Join Mirabilis Design for a Webinar to evaluate performance and power consumption, measure the quality of your architecture in the event of failures and, the recovery time from the failures. During this Webinar, we will demonstrate a step-by-step approach to dynamic system modeling, fault generation, and evaluation of diagnostics to cover both ISO26262-Part 4,5,6.
Using the VisualSim modeling and simulation software, we will validate and optimize the system architecture, apply failures, add diagnostics to identify the failures, and create logic to resolve the error condition. This model will be used to measure the compliance of the functional safety setup to meet the requirements of ISO26262-Part 4,5,6.
At the Webinar, we will
1. Cover hardware, software, network, RTOS and power systems.
2. Construct an architecture model of a braking system.
3. Apply failures, add methods to detect errors and algorithms to return the system to normal operation.
3. Analyze the models to meet the timing, power and functional requirements during an event of a failure.
System failure analysis plays a vital role in avoiding any real-time injuries/dangers, especially in aerospace, automotive and medical appliances. While designing the system, a proactive and systematic method to evaluate where and how the system might fail, the outcome of the failure, and how the failures can be prevented helps to consider required safety measures. This minimizes the cost, resources, and time-consumed after the occurrence of an unexpected incident.
How to create innovative architecture using VisualSim?Deepak Shankar
In this presentation, we will get you started on using VisualSim Architect to conduct performance analysis, power measurement and functional validation. You will learn advanced concepts of system modeling and how to apply VisualSim Architect for a variety of applications.
Highlights include the application for both System-on-Chip and Large Systems including Designing memory interfaces using DDR3 and LPDDR3.
VisualSim Architect is used by systems and semiconductor companies to validate and analyze the specification of the product. The environment offers an easy-to-use methodology, huge library of technology components, extremely fast simulator and a huge reports list.
How to create innovative architecture using VisualSim?Deepak Shankar
In this presentation, we will get you started on using VisualSim Architect to conduct performance analysis, power measurement and functional validation. You will learn advanced concepts of system modeling and how to apply VisualSim Architect for a variety of applications.
Highlights include the application for both System-on-Chip and Large Systems including Designing memory interfaces using DDR3 and LPDDR3.
VisualSim Architect is used by systems and semiconductor companies to validate and analyze the specification of the product. The environment offers an easy-to-use methodology, huge library of technology components, extremely fast simulator and a huge reports list.
How to create innovative architecture using ViualSim?Deepak Shankar
In this presentation, we will get you started on using VisualSim Architect to conduct performance analysis, power measurement and functional validation. You will learn advanced concepts of system modeling and how to apply VisualSim Architect for a variety of applications.
Highlights include the application for both System-on-Chip and Large Systems including Designing memory interfaces using DDR3 and LPDDR3.
VisualSim Architect is used by systems and semiconductor companies to validate and analyze the specification of the product. The environment offers an easy-to-use methodology, huge library of technology components, extremely fast simulator and a huge reports list.
Please find our webinar video - How to create innovative architecture using ViualSim? at the last slide.
Is accurate system-level power measurement challenging? Check this out!Deepak Shankar
The most common method of computing power of a system or semiconductor is with spreadsheets. Spreadsheets generates worst case power consumption and, in most cases, is insufficient to make architecture decisions. Accurate power measurement requires knowledge of use-cases, processing time, resource consumption and any transitions. Doing this at the RTL-level or using software tools is both too late and requires huge model construction effort. Based on our experience, a systems-level model with timing, power and functionality is the only real solution to measure accurate power consumption. Unfortunately, system-level models are hard to construct because of the complex expressions, right-level of abstraction and defining the right workload. Fortunately, there is a solution that enables to you to build functional models that can generate accurate power measures. These measurements can be used to make architecture decisions, conduct performance-power trade-off, determining power management quality, and compliance with requirements.
During this Presentation, we will demonstrate how system-level power modeling and measurement works. We shall go over the requirements to create the model, what outputs to capture and how to ensure accuracy. During the presentation, the speaker will demonstrate real-life examples, share best practices, and compare with real hardware. This presentation will cover power from the perspective of semiconductor, systems and embedded software.
Architectural tricks to maximize memory bandwidthDeepak Shankar
Deepak Shankar, CEO and Founder of Mirabilis Deign Inc. hosted a webinar(Feb 17,2016) on the architectural possibilities to improve memory bandwidth. This webinar highlighted that memory plays a role in impacting the performance & power consumption of a system.
A study on drug utilization evaluation of bronchodilators using DDD methodDr. Afreen Nasir
The abstract was published as a conference proceeding in a Newsletter after being presented as an e-posture and secured 2nd prize during the scientific proceedings of "National Conference on Health Economics and Outcomes Research (HEOR) to Enhance Decision Making for Global Health" held at Raghavendra Institute of Pharmaceutical Education and Research (RIPER)- Autonomous in association with the International Society for Pharmacoeconomics and Outcomes Research (ISPOR)-India Andhra Pradesh Regional Chapter during 4th& 5th August 2023.
Nasir A. A study on drug utilization evaluation of bronchodilators using the DDD method. RIPER - PDIC Bulletin ISPOR India Andhra Pradesh Regional Chapter Newsletter [Internet]. 2023 Sep;11(51):14. Available from: www.riper.ac.in
Call India AmanTel allows you to call from any country in the world including India to the USA and Canada at the cheapest rate Limited offers new users some free minutes.
Risks & Business Risks Reduce - investment.pdfHome
In this presentation, I have shown major risks that are to face in a business investment. Also I have shown their classification and sources.
This information have taken from my text book -" Investment Analysis and Portfolio Management ~chapter 2 Investment~ " For complete this Presentation I used Figma and Canva.
My Role:
a. Student Final year - Accounting
b. Presentation Designer
Destyney Duhon personal brand explorationminxxmaree
Destyney Duhon embodies a singular blend of creativity, resilience, and purpose that defines modern entrepreneurial spirit. As a visionary at the intersection of artistry and innovation, Destyney fearlessly navigates uncharted waters, sculpting her journey with a profound commitment to authenticity and impact.This Brand exploration power point is a great example of her dedication to her craft.
stackconf 2024 | Using European Open Source to build a Sovereign Multi-Cloud ...NETWAYS
The European Commission has clearly identified open source as a strategic tool for bringing some balance to an EU cloud market currently dominated by a handful of non-EU hyperscalers. Part of that commitment comes through a series of ambitious, multi-million EU projects like the SIMPL platform for Data Spaces and the multi-country “Important Project of Common European Interest on Next Generation Cloud Infrastructure and Services” (IPCEI-CIS). For the first time in the history of the European Union, it is the EU industry who will be leading large-scale open source projects aimed at building European strategic technologies. In this talk we will explain in detail how specific European open source technologies are being brought together as part of some of those projects to start building Sovereign Multi-Cloud solutions that ensure interoperability and digital sovereignty for European users while preventing vendor lock-in in the cloud market, opening up competition in the emerging 5G/edge.
stackconf 2024 | On-Prem is the new Black by AJ JesterNETWAYS
In a world where Cloud gives us the ease and flexibility to deploy and scale your apps we often overlook security and control. The fact that resources in the cloud are still shared, the hardware is shared, the network is shared, there is not much insight into the infrastructure unless the logs are exposed by the cloud provider. Even an air gap environment in the cloud is truly not air gapped, it’s a pseudo-private network. Moreover, the general trend in the industry is shifting towards cloud repatriation, it’s a fancy term for bringing your apps and services from cloud back to on-prem, like old school how things were run before the cloud was even a thing. This shift has caused what I call a knowledge gap where engineers are only familiar with interacting with infrastructure via APIs but not the hardware or networks their application runs on. In this talk I aim to demystify on-prem environments and more importantly show engineers how easy and smooth it is to repatriate data from cloud to an on-prem air gap environment.
stackconf 2024 | Buzzing across the eBPF Landscape and into the Hive by Bill ...NETWAYS
The buzz around the Linux kernel technology eBPF is growing quickly and it can be hard to know where to start or how to keep up with this technology that is reshaping our infrastructure stack. In this talk, Bill will trace how he got into eBPF, explore some of the applications leveraging eBPF today, and teach others how to dive into the hive of activity around eBPF. People just beginning with eBPF will learn how eBPF makes it possible to have efficient networking, observability without instrumentation, effortless tracing, and real-time security (among other things) without needing your own kernel team. Those already familiar with eBPF will get an overview of the eBPF landscape and learn about many new and expanding eBPF applications that allow them to harness the power without needing to dive into the bytecode. The audience will walk away with an understanding of the buzz around eBPF and knowledge of new tools that may solve some of their problems in networking, observability, and security.
2. Mirabilis Design
EDA Software Company based in Silicon Valley
Integrating sub-system teams to the mission using System-Level Design
Highly experience Management and Engineering team
Over 150 man-years of background in semiconductors, automotive and
aerospace
VisualSim Architect –Design the Right product
Graphical modeling and simulation platform with complete set of system-level modeling IP
Eliminate all surprises prior to integration
Optimizing specification, collaboration between mission, sub-systems
and suppliers, evaluating use-cases and identify test scenarios for
system validation
Networking
18th companies
& 32nd universities
Electronics Modeling
35th customer
2008
Company Incorporated
2011
First Engagement with
HP and ISRO
2013
Announced
VisualSim
2014
University Program
10th Customer
2015
Stochastic and
Network modeling
2016 2018 2019
Automotive
& Avionics
2020
System-level IP
Open API
2022/23
Re-engineered
AI, DNN, Power, GPU
2021
Requirements Tracking
50th customer
5. Assemble System Model using Pre-Built System-Level IP
Scheduling/Arbitration
proportional
share
WFQ
static
dynamic
fixed priority
EDF
TDMA
FCFS
Communication Templates
Architecture # 1 Architecture # 2
Computation Templates
DSP
AI
GPU
DRAM
CPU
FPGA
m
E
DSP
TDMA
Priority
EDF
WFQ
RISC
DSP
LookUp
Cipher
AI DS
P
CPU
GP
U
mE DD
R
static
Which architecture is better suited
for our application?
6. Add the Task Graph to Define the Workload
I/O
DSP
CPU1
CPU2
task1 task2 task3 task4
Contention
- limited resources
- scheduling/arbitration
Interference of multiple
applications
- limited resources
- scheduling/arbitration
- anomalies
Complex behavior
- input stream
- data dependent behavior
8. Impact of System Architecture Exploration
• System sizing and topology design
• Power consumption, cooling & management
• Device distribution across one/multi-die
• Application mapping on CPU, GPU, TPU, DSP
• SW, firmware, scheduler and network tuning
• Merges Shift-Left and Shift-Right
• System-level model integrates requirements,
creates a single model of the entire system,
trade-offs power-performance-area and
generate tests
• To optimize associated area
• To design thermal structure
• To create Chiplet IP industry
• To meet timing and power
• To meet mission requirements
• Single platform from Concept to End-of-Life
• Collaboration between design teams,
suppliers, customers
9. ARM Cortex A53
Benchmark FPGA VisualSim Difference Comments
ED1 5.94ms 6.425ms 7.55% Integer processing
MM 12.084ms 11.863ms 1.08% Most load operations with
random addresses
MM_st 13.984ms 14.65ms 4.5% Most store operations with
random addresses
Test System
Xilinx Ultrascale+ Zynq® UltraScale+™ XCZU9EG-2FFVB1156E MPSoC running on the ZCU102 board
Specification: 4 core ARM Cortex A53 at 1200Mhz; 32KiB i-cache; 32KiB d-cache, 1MiB L2; 2GB DDR4
DRAM 2400
10. Comparing Power for ARM Cortex A53
Frequency VisualSim Simulated
Power
Measured Power as
reported by Anandtech
Delta percentage
500.0 Mhz 0.037 W 0.038 W 2.63%
600.0 Mhz 0.053 W 0.051 W -3.92%
700.0 Mhz 0.073 W 0.080 W 8.75%
800.0 Mhz 0.097 W 0.090 W -7.77%
1000.0 Mhz 0.157 W 0.159 W 1.25%
1100.0 Mhz 0.193 W 0.188 W -2.65%
1200.0 Mhz 0.233 W 0.227 W -2.64%
1300.0 Mhz 0.277 W 0.269 W -2.97%
Source: Anandtech.com
Over 97% accuracy
11. Comparing different Cores- Dhrystone
Processor MoP Hit
Ratio
MoP Mean
Latency
I1 Hit Ratio I1 Mean
Latency
D1 Hit
Ratio
D1 Mean
Latency
L2 Hit
Ratio
L2 Mean
Latency
DSU Hit
Ratio
DSU Mean
Latency
ARM Cortex
A53
- - 99.97 1.93E-09 99.98 2.02E-09 18.75 9.33E-08 - -
ARM Cortex
A77
99.90 1.75E-09 67.22 6.25E-08 99.96 7.32E-10 14.19 1.82E-07 6.96 2.05E-09
RISC-V u74 - - 99.98 4.15E-09 99.98 1.86E-09 39.58 5.25E-08 - -
Processor Instructions Latency Max MIPS
ARM Cortex A53 ~ 56,66,000 0.0055846 ~ 1039
ARM Cortex A77 ~ 44,78,000 0.0011795 ~ 3960
RISC-V u74 ~ 60,58,000 0.007726 ~ 797
12. VisualSim drives Efficiency & Productivity
Model Creation (6)
Implementation (18)
Using Current Design Methodology
Project Schedule
)
Implementation (12)
Using VisualSim Design Methodology
Time savings
based on 24
month project
is 20-40%
Note: All times in months
TM
Communication and Refinement (4)
Analysis (2.5)
Model Creation (0.5)
Analysis (1.5)
Communication and Refinement (6)
Advantageous over generic modeling environment due to Shorter duration & greater applicability
14. Vary Compute, Interconnect and Traffic
Package_Type = Advanced
Max_Link_Speed_GTps = 32
Number of Modules = 4
Tx_Buffer_Size = 8192 ( No packets dropped)
Protocol = PCIe_Gen6
Flit_Size = 256 Bytes
Num_of_Flits_per_Flow_Control_Check =8
Run Simulation with Different Configurations and Topology
15. Power
Generation
Power
Storage
Power
Consumption
Thermal
Management
• Different charging schemes
• Impact of surge and shocks
• Battery Lifecycle
• Battery Consumption
• Statistics
• Heat and
temperature
• Impact of
cooling strategy
• Add impact of
power spikes
• State based power consumption
of electronics (controller, SOC)
and Mechanical (brakes, wheels)
• Average, instant and Cumulative
• Power per device and application
Verification and Debugging
• 4 Types of Power
Generators in VisualSim
• Constant, variable, motor,
solar charge
• Charge sent to battery
1 2 3 5
6
• Optimize and test the power management algorithms
• Sizing of power generators and battery
• Optimize the schedule, supplynet and voltage
• Estimate power consumed by the software application
Downstream Integration
• Generate UPF file with power domains and
associated voltage levels
• Generate SystemVerilog power testbench
• Generate powerState change VCD dump
7
Power
Management
• Change in power
state controlled by
time, utilization,
temperature and
expected activity
4
Add the Power and Thermal
16. Behavior Task Graph
Power Table
Power management Unit
SystemVerilog Output for Power System Test
VCD Waveform for Verification
create_power_domain PD_Top -include_scope
create_power_domain -name PD_1_2.0 -elements {"CLKMUX"}
create_power_domain -name PD_1_1.0 -elements {"PLL","G2","G3"}
create_power_domain -name PD_1_3.0 -elements {"PROC"}
create_supply_port -port VDD_1.0 -direction in -domain PD_Top
create_supply_port -port VDD_2.0 -direction in -domain PD_Top
create_supply_port -port VDD_3.0 -direction in -domain PD_Top
create_supply_port -port VSS_0.0 -direction in -domain PD_Top
create_supply_net VDD_1.0 -domain PD_Top
create_supply_net VDD_2.0 -domain PD_Top
create_supply_net VDD_3.0 -domain PD_Top
create_supply_net VSS_0.0 -domain PD_Top
connect_supply_net VDD_1.0 -ports VDD_1.0
connect_supply_net VDD_2.0 -ports VDD_2.0
connect_supply_net VDD_3.0 -ports VDD_3.0
connect_supply_net VSS_0.0 -ports VSS_0.0
add_power_state PD_1_2.0 -state Active
{-supply_expr (VDD_2.0 == {ON, 2.0}) && (VSS_0.0 =={ON,0.0})}
add_power_state PD_1_2.0 -state
OFF {-supply_expr (VDD_2.0 == {OFF, 0.0}) && (VSS_0.0 =={ON,0.0})}
add_power_state PD_1_1.0 -state Active
{-supply_expr (VDD_1.0 == {ON, 1.0}) && (VSS_0.0 =={ON,0.0})}
add_power_state PD_1_1.0 -state OFF
{-supply_expr (VDD_1.0 == {OFF, 0.0}) && (VSS_0.0 =={ON,0.0})}
add_power_state PD_1_3.0 -state Active
{-supply_expr (VDD_3.0 == {ON, 3.0}) && (VSS_0.0 =={ON,0.0})}
add_power_state PD_1_3.0 -state OFF
{-supply_expr (VDD_3.0 == {OFF, 0.0}) && (VSS_0.0 =={ON,0.0})}
Power Modeling Integration
17. System Verification
• Validate product not just HW/SW
• Application relevant test vectors
• Generate test cases and run against RTL
• Compare simulation output against RTL
• Match architecture timing within range
• Verify functional correctness
• Task sequencing @ DSP/uP
• Resource contention
Eliminate product failure by maximizing relevant verification
Golden
Reference
Comparator
Match Tag
Architecture
model of IP
Verilog/C/
Hardware
19. Architecting Hardware-Software for Infotainment System
Mirabilis Design Confidential
DRAM
Display
IO
A
M
B
A
A
X
I
B
u
s
CPU
GPU
Display
Ctrl
P
C
I
e
Video Camera SRAM
Packet
• System Overview
• Camera : 30fps, VGA corresponds
• CPU : Multi-core ARM Cortex-A53 1.2GHz
• GPU : 64Cores(8Warps×8PEs), 32Threads,
1GHz
• DisplayCtrl : DisplayBuffer 293,888Byte
• SRAM : SDR, 64MB, 1.0GHz
• DRAM : DDR3, 64MB, 2.4GHz
Explore at the board- and semiconductor-level to size uP/GPU, memory bandwidth and bus/switch configuration
20. System Model of an Infotainment System
Mirabilis Design Confidential
NXP i.MX6 /
nVIDIA Drive PX
Xilinx FPGA
Kintex 8
Discrete
DMA
ARM A53
GPU
Display Ctrl
SRAM3
DRAM3
Video IN
Parameters
Video OUT
21. Conducting Architecture Trade-off
• By changing the amount of video input data (packet number), observe the SRAM -> DRAM transfer
performance and examine the upper limit performance of the video input that the system can
tolerate. 210Packet/Sec
12ms
21Packet/Sec
41.4us
300Packet/Sec
• 250 Packet/Sec is the system limit
• With 300 Packet/Sec, simulation cannot be
executed due to FIFO buffer overflow.
24. Experiments with Different Implementations
Run 3 – Using Direct Path
between Logic and AI
Run 2 – Segmentation
Mapped to AI Engine
Run 1 – Base Configuration
Mapped to Logic and ARM
Application latency increasing over time.
Latency increases due to Segmentation.
Remap segmentation task AI Tiles
Latency is deterministic
Latency requirement (App latency
< 80 msec) is met.
Utilization across NoC is acceptable
Application latency in bounded range.
NoC Utilization is high.
Changed interconnect for Segmentation
from NoC to Direct
26. ADAS SoC Block Diagram
UCIe
AI Engine Tiles
Warp
Scheduler
PE
PE
PE
PE
Local Mem
GPU
Memory chiplet
ADC
DDR5
Processor subsystem
Core L1
B
u
s
SLC
• Optimal
mesh size
(mxn) ?
• Best sample
size (16
bytes vs 32
bytes etc) ?
Use a single protocol
stack or multi protocol
stack?
Do we need PCIe
gen6 or still use
gen5 for meeting
application
requirements?
28. Statistics for Multi-Die SoC
• Note the AI Engine
latency spikes
• For multi protocol,
half bandwidth for
each protocol.
• Older gen protocols
are mixed with PCIe 6,
• Lower FLIT size
increases latency.
29. Comparing Different Configurations using UCIe Interface
All Die Adapters using PCIe 6.0
Die Adapters using PCIe 6.0
and Streaming Protocols (AXI)
Lower latency when using PCIe 6.0
31. Mask Region-CNN (MR-CNN) for object detection and image
segmentation
Overall representation of Mask
R-CNN model
Network Architecture of Mask R-CNN
output
CPU Preprocessing
CPU Postprocessing
32. Using ChatGPT to translate AI model (Mask R-CNN) in to VisualSim
Task Graph
• Each of the layers are defined as different
tasks in the task graph and the dependency
between them is modeled.
• A database is used to list the
layers/functions and the parameters
associated with them.
• These will be used to determine the
number of Multiply Accumulate (MAC)
operations corresponding to each
layer/function
Class, box
mask
33. VisualSim Model of DNN Hardware and Task Graph
Application sequence from
Task Graph is mapped to
HW architecture
• PE – 12x14
• 4 memory hierarchy
• Power computation
per PE, Buses and
memory
34. Results – Base model (168 AI Cores, 90% data availability at
SRAM)
• Peak Power
consumption at
around 10.8 Watts
• Obtained FPS = 0.414
35. Results – 8x8 (64) cores, 90% data availability at SRAM
• Peak Power consumption at
around 5.6 Watts as the number
of cores were reduced
• Obtained FPS = 0.29, which is
lower than the base model
results as the number of
resources for doing MAC
operations were lower
36. Results - 100% data availability at
SRAM, 168 cores
• The number of off chip memory
accesses were reduced. The only
accesses made were to load the
images and weights into the
SRAM
• Obtained FPS = 9.93, which is
higher than the base model
results as the number of off chip
memory accesses were reduced
• Peak Power consumption (10.4
W) is lower as off chip memory
accesses were reduced
37. Results - 60% data availability at SRAM,
168 cores
• The number of off chip memory
accesses were increased
• Obtained FPS = 0.04, which is
lower than the base model
results as the number of off chip
memory accesses were
increased
39. SoC System Specification
Processor Core – RISC-V or ARM A53 core
Processor Speed – 1200 MHz
L1 cache:
I Cache : 32 KB
: 2 way set
associative
D Cache : 32 KB
: 4 way set associative
L2 Cache
Size :1 MB
Associativity :16 way
Ext DRAM
Size :4 GB
Type :DDR4
Speed :2400 MHz
HW Accelerator
Speed : 100 MHz
Software
Multimedia task
Stochastic instruction trace
Goals
Peak Power < 1.0W
Number of Matrices > 19K
40. VisualSim SoC Model
MPEG Application
IP or RISC-V level
• Evaluate pipeline stages
• Width, Speed
• Number of execution units, Levels of cache
SoC
• Number of RISC-V cores
• Accelerators
• Cache memory hierarchy and coherence
System level
• Development of an IoT device, ECU or an
integrated platform
Behavior
Hardware
Bus Topology
41. CASE 1: All SW tasks
Observations:
1. Avg power
consumption within
requirements (<1.0 W)
2. Performance
requirement not
achieved (Only a max of
9.4K frames)
43. CASE 2: Run Rotate Frame Task on HW Accelerator
Observations:
1. Avg power consumption
requirement not met (>
1.3 W)
2. Performance
requirement achieved (
max of 19.9K frames)
44. CASE 3: Run Rotate Frame task on
HW Accelerator + Power management
Observations:
1. Avg power consumption
requirement met (<1.0
W)
2. Performance
requirement achieved (
max of 19.8K frames)
46. Generated Statistics
Per Execution
unit stats, stall
percentages,
buffer
occupancies
are reported
• Detailed Cache, Bus
and Memory stats
are generated per
simulation.
• Stats Include – hit
ratio, throughput,
latency, number of
write backs, evictions
etc.
49. Use cases
Run Num Description M4 (Latency) M55 (Latency) U74 (Latency)
1 Running Dhrystone on
core. No
cache/bus/memory access
5.576700039E-4 9.47200014E-5 1.77875568E-5
2 Cache/Bus/Memory
access
8.7438000752E-4 1.6319750281E-4 5.05307708E-5
* Number of loops are different for each core
51. ECU Performance Analysis under Different Use Cases
Demo environment
1. Brake ECU integrated to a CAN Network
2. Sensors write data to the memory
3. Brake Pedal or Proximity sensor triggers the braking action from the Brake ECU
ECU
Using a RISC-V processor for the Brake ECU
Analysis
1. Latency (Time taken for the signal to reach all the wheels from the Brake ECU)
2. Processor performance (MIPS)
3. Power Consumption (Breaking activity, ECU usage and Network activity)
6/28/2024 Mirabilis Design Inc. 52
52. 6/28/2024 Mirabilis Design Inc. 53
System Overview
Gateway
Transfer messages between different CAN
networks
CAN Bus
CAN bus is the network that connects
sensors and ECU’s
Wheel
1
Wheel
4
Wheel
3
Wheel
2
Gateway
CAN
Bus
Engine
Proximity
Sensor
Brake
Pedal
Gyro
Sensor
Road
condition
sensor
CAN
Bus
CAN
Bus
ECU
53. Automotive Network System
6/28/2024 Mirabilis Design Inc.
N
CAN Wire
CAN Node
Wheel1
Wheel2
Wheel3
Wheel4
Brake
Pedal
Proximity
Sensor
Gyro
Sensor
Gateway
ECU
Road
condition
sensor
Engine
CAN
BUS
CAN
BUS
CAN
BUS
N N
N N N
N
N
N
N
N
N
N
N
54. 6/28/2024 Mirabilis Design Inc. 55
VisualSim Model
RISC-V
Model
location:
VS_ARdemo
automotiveBr
ake_Model_W
ith_ECU_A53
Brake_CAN_m
odel_ECU_ne
w_RISC-V.xml
55. Configuration of the ECU/Processor
6/28/2024 Mirabilis Design Inc. 56
Processor Spec
1. Processor (ECU) RISC-V – 5 Pipeline stages
2. Number of core 1 - 2
2. Processor Speed 100 MHz - 1.2GHz
3. DRAM Type DDR3 SDRAM (Synchronous DRAM)
4. DRAM Speed Range 400 – 1066 MHz
5. Cache Speed 500Mhz
6. Cache Size 64Kbytes
7. Memory Controller DDR3, 750MHz
8. Bus CAN
ECU Data input
1. Wheels 2. Engine 3. Proximity Sensor 4. Brake Pedal
5. Gyro Sensor 6. Road Condition Sensor
56. Designing Brake ECU using Single Core – RISC-V
6/28/2024 Mirabilis Design Inc. 57
57. Results – single core RISC-V
6/28/2024 Mirabilis Design Inc. 58
Slight
improvement
in Processor
Task Latency
at few
instances