Skip to main content

Showing 1–33 of 33 results for author: Merchant, F

  1. arXiv:2407.11275  [pdf, other

    cs.CV

    M18K: A Comprehensive RGB-D Dataset and Benchmark for Mushroom Detection and Instance Segmentation

    Authors: Abdollah Zakeri, Mulham Fawakherji, Jiming Kang, Bikram Koirala, Venkatesh Balan, Weihang Zhu, Driss Benhaddou, Fatima A. Merchant

    Abstract: Automating agricultural processes holds significant promise for enhancing efficiency and sustainability in various farming practices. This paper contributes to the automation of agricultural processes by providing a dedicated mushroom detection dataset related to automated harvesting, growth monitoring, and quality control of the button mushroom produced using Agaricus Bisporus fungus. With over 1… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  2. arXiv:2407.03843  [pdf, other

    cs.ET

    Resistive Memory for Computing and Security: Algorithms, Architectures, and Platforms

    Authors: Simranjeet Singh, Farhad Merchant, Sachin Patkar

    Abstract: Resistive random-access memory (RRAM) is gaining popularity due to its ability to offer computing within the memory and its non-volatile nature. The unique properties of RRAM, such as binary switching, multi-state switching, and device variations, can be leveraged to design novel techniques and algorithms. This thesis proposes a technique for utilizing RRAM devices in three major directions: i) di… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: Accepted as PhD Forum at VLSI-SoC 2024

  3. arXiv:2407.02921  [pdf, other

    cs.ET

    In-Memory Mirroring: Cloning Without Reading

    Authors: Simranjeet Singh, Ankit Bende, Chandan Kumar Jha, Vikas Rana, Rolf Drechsler, Sachin Patkar, Farhad Merchant

    Abstract: In-memory computing (IMC) has gained significant attention recently as it attempts to reduce the impact of memory bottlenecks. Numerous schemes for digital IMC are presented in the literature, focusing on logic operations. Often, an application's description has data dependencies that must be resolved. Contemporary IMC architectures perform read followed by write operations for this purpose, which… ▽ More

    Submitted 4 July, 2024; v1 submitted 3 July, 2024; originally announced July 2024.

    Comments: Accepted in IFIP/IEEE VLSI-SoC 2024

  4. arXiv:2404.09818  [pdf, other

    cs.AR

    Error Detection and Correction Codes for Safe In-Memory Computations

    Authors: Luca Parrini, Taha Soliman, Benjamin Hettwer, Jan Micha Borrmann, Simranjeet Singh, Ankit Bende, Vikas Rana, Farhad Merchant, Norbert Wehn

    Abstract: In-Memory Computing (IMC) introduces a new paradigm of computation that offers high efficiency in terms of latency and power consumption for AI accelerators. However, the non-idealities and defects of emerging technologies used in advanced IMC can severely degrade the accuracy of inferred Neural Networks (NN) and lead to malfunctions in safety-critical applications. In this paper, we investigate a… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: This paper will be presented at 29th IEEE European Test Symposium 2024 (ETS) 2024

  5. arXiv:2401.17819  [pdf, ps, other

    cs.CR cs.AR

    QTFlow: Quantitative Timing-Sensitive Information Flow for Security-Aware Hardware Design on RTL

    Authors: Lennart M. Reimann, Anshul Prashar, Chiara Ghinami, Rebecca Pelke, Dominik Sisejkovic, Farhad Merchant, Rainer Leupers

    Abstract: In contemporary Electronic Design Automation (EDA) tools, security often takes a backseat to the primary goals of power, performance, and area optimization. Commonly, the security analysis is conducted by hand, leading to vulnerabilities in the design remaining unnoticed. Security-aware EDA tools assist the designer in the identification and removal of security threats while keeping performance an… ▽ More

    Submitted 6 February, 2024; v1 submitted 31 January, 2024; originally announced January 2024.

    Comments: accepted at IEEE VLSI-DAT 2024, Taiwan; 4 pages

  6. arXiv:2310.10460  [pdf, other

    cs.ET

    Experimental Validation of Memristor-Aided Logic Using 1T1R TaOx RRAM Crossbar Array

    Authors: Ankit Bende, Simranjeet Singh, Chandan Kumar Jha, Tim Kempen, Felix Cüppers, Christopher Bengel, Andre Zambanini, Dennis Nielinger, Sachin Patkar, Rolf Drechsler, Rainer Waser, Farhad Merchant, Vikas Rana

    Abstract: Memristor-aided logic (MAGIC) design style holds a high promise for realizing digital logic-in-memory functionality. The ability to implement a specific gate in a MAGIC design style hinges on the SET-to-RESET threshold ratio. The TaOx memristive devices exhibit distinct SET-to-RESET ratios, enabling the implementation of OR and NOT operations. As the adoption of the MAGIC design style gains moment… ▽ More

    Submitted 16 October, 2023; originally announced October 2023.

    Comments: Accepted in VLSID 2024

  7. arXiv:2309.04868  [pdf, other

    cs.ET cs.LO

    MemSPICE: Automated Simulation and Energy Estimation Framework for MAGIC-Based Logic-in-Memory

    Authors: Simranjeet Singh, Chandan Kumar Jha, Ankit Bende, Vikas Rana, Sachin Patkar, Rolf Drechsler, Farhad Merchant

    Abstract: Existing logic-in-memory (LiM) research is limited to generating mappings and micro-operations. In this paper, we present~\emph{MemSPICE}, a novel framework that addresses this gap by automatically generating both the netlist and testbench needed to evaluate the LiM on a memristive crossbar. MemSPICE goes beyond conventional approaches by providing energy estimation scripts to calculate the precis… ▽ More

    Submitted 9 September, 2023; originally announced September 2023.

    Comments: Accepted in ASP-DAC 2024

  8. arXiv:2308.02694  [pdf, ps, other

    cs.CR cs.AR

    SoftFlow: Automated HW-SW Confidentiality Verification for Embedded Processors

    Authors: Lennart M. Reimann, Jonathan Wiesner, Dominik Sisejkovic, Farhad Merchant, Rainer Leupers

    Abstract: Despite its ever-increasing impact, security is not considered as a design objective in commercial electronic design automation (EDA) tools. This results in vulnerabilities being overlooked during the software-hardware design process. Specifically, vulnerabilities that allow leakage of sensitive data might stay unnoticed by standard testing, as the leakage itself might not result in evident functi… ▽ More

    Submitted 4 August, 2023; originally announced August 2023.

    Comments: 6 pages, accepted at 31st IFIP/IEEE Conference on Very Large Scale Integration (VLSI-SoC 2023)

  9. arXiv:2307.03669  [pdf, other

    cs.ET

    Should We Even Optimize for Execution Energy? Rethinking Mapping for MAGIC Design Style

    Authors: Simranjeet Singh, Chandan Kumar Jha, Ankit Bende, Phrangboklang Lyngton Thangkhiew, Vikas Rana, Sachin Patkar, Rolf Drechsler, Farhad Merchant

    Abstract: Memristor-based logic-in-memory (LiM) has become popular as a means to overcome the von Neumann bottleneck in traditional data-intensive computing. Recently, the memristor-aided logic (MAGIC) design style has gained immense traction for LiM due to its simplicity. However, understanding the energy distribution during the design of logic operations within the memristive memory is crucial in assessin… ▽ More

    Submitted 7 July, 2023; originally announced July 2023.

    Comments: Accepted to published in IEEE EMBEDDED SYSTEMS LETTER

  10. arXiv:2305.12914  [pdf, other

    cs.AR cs.AI cs.ET cs.LG

    IMBUE: In-Memory Boolean-to-CUrrent Inference ArchitecturE for Tsetlin Machines

    Authors: Omar Ghazal, Simranjeet Singh, Tousif Rahman, Shengqi Yu, Yujin Zheng, Domenico Balsamo, Sachin Patkar, Farhad Merchant, Fei Xia, Alex Yakovlev, Rishad Shafik

    Abstract: In-memory computing for Machine Learning (ML) applications remedies the von Neumann bottlenecks by organizing computation to exploit parallelism and locality. Non-volatile memory devices such as Resistive RAM (ReRAM) offer integrated switching and storage capabilities showing promising performance for ML applications. However, ReRAM devices have design challenges, such as non-linear digital-analog… ▽ More

    Submitted 22 May, 2023; originally announced May 2023.

    Comments: Accepted at ACM/IEEE International Symposium on Low Power Electronics and Design 2023 (ISLPED 2023)

  11. arXiv:2304.13552  [pdf, other

    cs.ET

    Finite State Automata Design using 1T1R ReRAM Crossbar

    Authors: Simranjeet Singh, Omar Ghazal, Chandan Kumar Jha, Vikas Rana, Rolf Drechsler, Rishad Shafik, Alex Yakovlev, Sachin Patkar, Farhad Merchant

    Abstract: Data movement costs constitute a significant bottleneck in modern machine learning (ML) systems. When combined with the computational complexity of algorithms, such as neural networks, designing hardware accelerators with low energy footprint remains challenging. Finite state automata (FSA) constitute a type of computation model used as a low-complexity learning unit in ML systems. The implementat… ▽ More

    Submitted 30 June, 2023; v1 submitted 26 April, 2023; originally announced April 2023.

    Comments: Accepted by 21st IEEE Interregional NEWCAS Conference 2023 (NEWCAS 2023)

  12. arXiv:2304.13531  [pdf, other

    cs.ET

    Integrated Architecture for Neural Networks and Security Primitives using RRAM Crossbar

    Authors: Simranjeet Singh, Furqan Zahoor, Gokulnath Rajendran, Vikas Rana, Sachin Patkar, Anupam Chattopadhyay, Farhad Merchant

    Abstract: This paper proposes an architecture that integrates neural networks (NNs) and hardware security modules using a single resistive random access memory (RRAM) crossbar. The proposed architecture enables using a single crossbar to implement NN, true random number generator (TRNG), and physical unclonable function (PUF) applications while exploiting the multi-state storage characteristic of the RRAM c… ▽ More

    Submitted 1 May, 2023; v1 submitted 26 April, 2023; originally announced April 2023.

  13. arXiv:2304.05686  [pdf, other

    cs.ET cs.CR

    Gate Camouflaging Using Reconfigurable ISFET-Based Threshold Voltage Defined Logic

    Authors: Elmira Moussavi, Animesh Singh, Dominik Sisejkovic, Aravind Padma Kumar, Daniyar Kizatov, Sven Ingebrandt, Rainer Leupers, Vivek Pachauri, Farhad Merchant

    Abstract: Most chip designers outsource the manufacturing of their integrated circuits (ICs) to external foundries due to the exorbitant cost and complexity of the process. This involvement of untrustworthy, external entities opens the door to major security threats, such as reverse engineering (RE). RE can reveal the physical structure and functionality of intellectual property (IP) and ICs, leading to IP… ▽ More

    Submitted 12 April, 2023; originally announced April 2023.

  14. arXiv:2211.03526  [pdf, other

    cs.CR cs.AR cs.ET

    Hardware Security Primitives using Passive RRAM Crossbar Array: Novel TRNG and PUF Designs

    Authors: Simranjeet Singh, Furqan Zahoor, Gokulnath Rajendran, Sachin Patkar, Anupam Chattopadhyay, Farhad Merchant

    Abstract: With rapid advancements in electronic gadgets, the security and privacy aspects of these devices are significant. For the design of secure systems, physical unclonable function (PUF) and true random number generator (TRNG) are critical hardware security primitives for security applications. This paper proposes novel implementations of PUF and TRNGs on the RRAM crossbar structure. Firstly, two tech… ▽ More

    Submitted 7 November, 2022; originally announced November 2022.

    Comments: To appear at ASP-DAC 2023

  15. arXiv:2208.04769  [pdf, other

    eess.SY cs.ET

    A Temperature Independent Readout Circuit for ISFET-Based Sensor Applications

    Authors: Elmira Moussavi, Dominik Sisejkovic, Animesh Singh, Daniyar Kizatov, Rainer Leupers, Sven Ingebrandt, Vivek Pachauri, Farhad Merchant

    Abstract: The ion-sensitive field-effect transistor (ISFET) is an emerging technology that has received much attention in numerous research areas, including biochemistry, medicine, and security applications. However, compared to other types of sensors, the complexity of ISFETs make it more challenging to achieve a sensitive, fast and repeatable response. Therefore, various readout circuits have been develop… ▽ More

    Submitted 9 August, 2022; originally announced August 2022.

    Comments: 4pages, 6 figures, Accepted in LATS 2022

  16. arXiv:2207.10526  [pdf, other

    cs.CR

    PA-PUF: A Novel Priority Arbiter PUF

    Authors: Simranjeet Singh, Srinivasu Bodapati, Sachin Patkar, Rainer Leupers, Anupam Chattopadhyay, Farhad Merchant

    Abstract: This paper proposes a 3-input arbiter-based novel physically unclonable function (PUF) design. Firstly, a 3-input priority arbiter is designed using a simple arbiter, two multiplexers (2:1), and an XOR logic gate. The priority arbiter has an equal probability of 0's and 1's at the output, which results in excellent uniformity (49.45%) while retrieving the PUF response. Secondly, a new PUF design b… ▽ More

    Submitted 21 July, 2022; originally announced July 2022.

  17. arXiv:2202.12085  [pdf, other

    cs.ET

    pHGen: A pH-Based Key Generation Mechanism Using ISFETs

    Authors: Elmira Moussavi, Dominik Sisejkovic, Fabian Brings, Daniyar Kizatov, Animesh Singh, Xuan Thang Vu, Sven Ingebrandt, Rainer Leupers, Vivek Pachauri, Farhad Merchant

    Abstract: Digital keys are a fundamental component of many hardware- and software-based security mechanisms. However, digital keys are limited to binary values and easily exploitable when stored in standard memories. In this paper, based on emerging technologies, we introduce pHGen, a potential-of-hydrogen (pH)-based key generation mechanism that leverages chemical reactions in the form of a potential chang… ▽ More

    Submitted 24 February, 2022; originally announced February 2022.

    Comments: Accepted in HOST 2022

  18. arXiv:2112.13157  [pdf, other

    cs.AR cs.ET

    A Parallel SystemC Virtual Platform for Neuromorphic Architectures

    Authors: Melvin Galicia, Farhad Merchant, Rainer Leupers

    Abstract: With the increasing interest in neuromorphic computing, designers of embedded systems face the challenge of efficiently simulating such platforms to enable architecture design exploration early in the development cycle. Executing artificial neural network applications on neuromorphic systems which are being simulated on virtual platforms (VPs) is an extremely demanding computational task. Neverthe… ▽ More

    Submitted 24 December, 2021; originally announced December 2021.

    Comments: Accepted at 23rd International Symposium on Quality Electronic Design (ISQED'22)

  19. NeuroHammer: Inducing Bit-Flips in Memristive Crossbar Memories

    Authors: Felix Staudigl, Hazem Al Indari, Daniel Schön, Dominik Sisejkovic, Farhad Merchant, Jan Moritz Joseph, Vikas Rana, Stephan Menzel, Rainer Leupers

    Abstract: Emerging non-volatile memory (NVM) technologies offer unique advantages in energy efficiency, latency, and features such as computing-in-memory. Consequently, emerging NVM technologies are considered an ideal substrate for computation and storage in future-generation neuromorphic platforms. These technologies need to be evaluated for fundamental reliability and security issues. In this paper, we p… ▽ More

    Submitted 6 December, 2021; v1 submitted 2 December, 2021; originally announced December 2021.

  20. QFlow: Quantitative Information Flow for Security-Aware Hardware Design in Verilog

    Authors: Lennart M. Reimann, Luca Hanel, Dominik Sisejkovic, Farhad Merchant, Rainer Leupers

    Abstract: The enormous amount of code required to design modern hardware implementations often leads to critical vulnerabilities being overlooked. Especially vulnerabilities that compromise the confidentiality of sensitive data, such as cryptographic keys, have a major impact on the trustworthiness of an entire system. Information flow analysis can elaborate whether information from sensitive signals flows… ▽ More

    Submitted 22 December, 2021; v1 submitted 6 September, 2021; originally announced September 2021.

    Comments: 5 pages, accepted at International Conference on Computer Design 2021 (ICCD)

    Journal ref: 2021 IEEE 39th International Conference on Computer Design (ICCD)

  21. Deceptive Logic Locking for Hardware Integrity Protection against Machine Learning Attacks

    Authors: Dominik Sisejkovic, Farhad Merchant, Lennart M. Reimann, Rainer Leupers

    Abstract: Logic locking has emerged as a prominent key-driven technique to protect the integrity of integrated circuits. However, novel machine-learning-based attacks have recently been introduced to challenge the security foundations of locking schemes. These attacks are able to recover a significant percentage of the key without having access to an activated circuit. This paper address this issue through… ▽ More

    Submitted 19 July, 2021; originally announced July 2021.

    Comments: Accepted at IEEE TCAD 2021

    Journal ref: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), July, 2021

  22. Logic Locking at the Frontiers of Machine Learning: A Survey on Developments and Opportunities

    Authors: Dominik Sisejkovic, Lennart M. Reimann, Elmira Moussavi, Farhad Merchant, Rainer Leupers

    Abstract: In the past decade, a lot of progress has been made in the design and evaluation of logic locking; a premier technique to safeguard the integrity of integrated circuits throughout the electronics supply chain. However, the widespread proliferation of machine learning has recently introduced a new pathway to evaluating logic locking schemes. This paper summarizes the recent developments in logic lo… ▽ More

    Submitted 23 November, 2021; v1 submitted 5 July, 2021; originally announced July 2021.

    Comments: 6 pages, 3 figures, accepted at VLSI-SOC 2021

    Journal ref: 2021 IFIP/IEEE 29th International Conference on Very Large Scale Integration (VLSI-SoC)

  23. arXiv:2101.06665  [pdf, other

    cs.AR cs.MS

    Brightening the Optical Flow through Posit Arithmetic

    Authors: Vinay Saxena, Ankitha Reddy, Jonathan Neudorfer, John Gustafson, Sangeeth Nambiar, Rainer Leupers, Farhad Merchant

    Abstract: As new technologies are invented, their commercial viability needs to be carefully examined along with their technical merits and demerits. The posit data format, proposed as a drop-in replacement for IEEE 754 float format, is one such invention that requires extensive theoretical and experimental study to identify products that can benefit from the advantages of posits for specific market segment… ▽ More

    Submitted 17 January, 2021; originally announced January 2021.

    Comments: To appear in ISQED 2021

  24. arXiv:2101.05591  [pdf, other

    cs.AR

    ANDROMEDA: An FPGA Based RISC-V MPSoC Exploration Framework

    Authors: Farhad Merchant, Dominik Sisejkovic, Lennart M. Reimann, Kirthihan Yasotharan, Thomas Grass, Rainer Leupers

    Abstract: With the growing demands of consumer electronic products, the computational requirements are increasing exponentially. Due to the applications' computational needs, the computer architects are trying to pack as many cores as possible on a single die for accelerated execution of the application program codes. In a multiprocessor system-on-chip (MPSoC), striking a balance among the number of cores,… ▽ More

    Submitted 14 January, 2021; originally announced January 2021.

    Comments: Accepted in VLSI Design 2021

  25. arXiv:2101.01416  [pdf, other

    cs.AR

    An Investigation on Inherent Robustness of Posit Data Representation

    Authors: Ihsen Alouani, Anouar Ben Khalifa, Farhad Merchant, Rainer Leupers

    Abstract: As the dimensions and operating voltages of computer electronics shrink to cope with consumers' demand for higher performance and lower power consumption, circuit sensitivity to soft errors increases dramatically. Recently, a new data-type is proposed in the literature called posit data type. Posit arithmetic has absolute advantages such as higher numerical accuracy, speed, and simpler hardware de… ▽ More

    Submitted 5 January, 2021; originally announced January 2021.

    Comments: To appear in VLSID 2021

  26. arXiv:2011.10389  [pdf, other

    cs.CR cs.LG cs.NE

    Challenging the Security of Logic Locking Schemes in the Era of Deep Learning: A Neuroevolutionary Approach

    Authors: Dominik Sisejkovic, Farhad Merchant, Lennart M. Reimann, Harshit Srivastava, Ahmed Hallawa, Rainer Leupers

    Abstract: Logic locking is a prominent technique to protect the integrity of hardware designs throughout the integrated circuit design and fabrication flow. However, in recent years, the security of locking schemes has been thoroughly challenged by the introduction of various deobfuscation attacks. As in most research branches, deep learning is being introduced in the domain of logic locking as well. Theref… ▽ More

    Submitted 30 November, 2020; v1 submitted 20 November, 2020; originally announced November 2020.

    Comments: 25 pages, 17 figures, accepted at ACM JETC

    Journal ref: ACM J. Emerg. Technol. Comput. Syst. 17, 3, Article 30 (May 2021), 26 pages

  27. arXiv:2010.12869  [pdf, other

    cs.AR cs.AI cs.ET cs.PF

    ExPAN(N)D: Exploring Posits for Efficient Artificial Neural Network Design in FPGA-based Systems

    Authors: Suresh Nambi, Salim Ullah, Aditya Lohana, Siva Satyendra Sahoo, Farhad Merchant, Akash Kumar

    Abstract: The recent advances in machine learning, in general, and Artificial Neural Networks (ANN), in particular, has made smart embedded systems an attractive option for a larger number of application areas. However, the high computational complexity, memory footprints, and energy requirements of machine learning models hinder their deployment on resource-constrained embedded systems. Most state-of-the-a… ▽ More

    Submitted 27 October, 2020; v1 submitted 24 October, 2020; originally announced October 2020.

  28. arXiv:2006.00364  [pdf, other

    cs.AR

    CLARINET: A RISC-V Based Framework for Posit Arithmetic Empiricism

    Authors: Niraj Sharma, Riya Jain, Madhumita Mohan, Sachin Patkar, Rainer Leupers, Nikhil Rishiyur, Farhad Merchant

    Abstract: Many engineering and scientific applications require high precision arithmetic. IEEE~754-2008 compliant (floating-point) arithmetic is the de facto standard for performing these computations. Recently, posit arithmetic has been proposed as a drop-in replacement for floating-point arithmetic. The posit\texttrademark data representation and arithmetic claim several absolute advantages over the float… ▽ More

    Submitted 27 October, 2021; v1 submitted 30 May, 2020; originally announced June 2020.

  29. arXiv:1803.05320  [pdf, other

    cs.DC cs.AR cs.MS

    Efficient Realization of Givens Rotation through Algorithm-Architecture Co-design for Acceleration of QR Factorization

    Authors: Farhad Merchant, Tarun Vatwani, Anupam Chattopadhyay, Soumyendu Raha, S K Nandy, Ranjani Narayan, Rainer Leupers

    Abstract: We present efficient realization of Generalized Givens Rotation (GGR) based QR factorization that achieves 3-100x better performance in terms of Gflops/watt over state-of-the-art realizations on multicore, and General Purpose Graphics Processing Units (GPGPUs). GGR is an improvement over classical Givens Rotation (GR) operation that can annihilate multiple elements of rows and columns of an input… ▽ More

    Submitted 23 March, 2018; v1 submitted 14 March, 2018; originally announced March 2018.

  30. arXiv:1802.03650  [pdf, other

    cs.MS cs.AR

    Achieving Efficient Realization of Kalman Filter on CGRA through Algorithm-Architecture Co-design

    Authors: Farhad Merchant, Tarun Vatwani, Anupam Chattopadhyay, Soumyendu Raha, S K Nandy, Ranjani Narayan

    Abstract: In this paper, we present efficient realization of Kalman Filter (KF) that can achieve up to 65% of the theoretical peak performance of underlying architecture platform. KF is realized using Modified Faddeeva Algorithm (MFA) as a basic building block due to its versatility and REDEFINE Coarse Grained Reconfigurable Architecture (CGRA) is used as a platform for experiments since REDEFINE is capable… ▽ More

    Submitted 10 February, 2018; originally announced February 2018.

    Comments: Accepted in ARC 2018

  31. Efficient Realization of Householder Transform through Algorithm-Architecture Co-design for Acceleration of QR Factorization

    Authors: Farhad Merchant, Tarun Vatwani, Anupam Chattopadhyay, Soumyendu Raha, S K Nandy, Ranjani Narayan

    Abstract: We present efficient realization of Householder Transform (HT) based QR factorization through algorithm-architecture co-design where we achieve performance improvement of 3-90x in-terms of Gflops/watt over state-of-the-art multicore, General Purpose Graphics Processing Units (GPGPUs), Field Programmable Gate Arrays (FPGAs), and ClearSpeed CSX700. Theoretical and experimental analysis of classical… ▽ More

    Submitted 13 December, 2016; originally announced December 2016.

  32. arXiv:1610.08705  [pdf, other

    cs.AR

    Accelerating BLAS and LAPACK via Efficient Floating Point Architecture Design

    Authors: Farhad Merchant, Anupam Chattopadhyay, Soumyendu Raha, S K Nandy, Ranjani Narayan

    Abstract: Basic Linear Algebra Subprograms (BLAS) and Linear Algebra Package (LAPACK) form basic building blocks for several High Performance Computing (HPC) applications and hence dictate performance of the HPC applications. Performance in such tuned packages is attained through tuning of several algorithmic and architectural parameters such as number of parallel operations in the Directed Acyclic Graph of… ▽ More

    Submitted 13 November, 2017; v1 submitted 27 October, 2016; originally announced October 2016.

  33. arXiv:1610.06385  [pdf, other

    cs.AR cs.MS

    Accelerating BLAS on Custom Architecture through Algorithm-Architecture Co-design

    Authors: Farhad Merchant, Tarun Vatwani, Anupam Chattopadhyay, Soumyendu Raha, S K Nandy, Ranjani Narayan

    Abstract: Basic Linear Algebra Subprograms (BLAS) play key role in high performance and scientific computing applications. Experimentally, yesteryear multicore and General Purpose Graphics Processing Units (GPGPUs) are capable of achieving up to 15 to 57% of the theoretical peak performance at 65W to 240W respectively for compute bound operations like Double/Single Precision General Matrix Multiplication (X… ▽ More

    Submitted 27 November, 2016; v1 submitted 20 October, 2016; originally announced October 2016.