-
Unified Anomaly Detection methods on Edge Device using Knowledge Distillation and Quantization
Authors:
Sushovan Jena,
Arya Pulkit,
Kajal Singh,
Anoushka Banerjee,
Sharad Joshi,
Ananth Ganesh,
Dinesh Singh,
Arnav Bhavsar
Abstract:
With the rapid advances in deep learning and smart manufacturing in Industry 4.0, there is an imperative for high-throughput, high-performance, and fully integrated visual inspection systems. Most anomaly detection approaches using defect detection datasets, such as MVTec AD, employ one-class models that require fitting separate models for each class. On the contrary, unified models eliminate the…
▽ More
With the rapid advances in deep learning and smart manufacturing in Industry 4.0, there is an imperative for high-throughput, high-performance, and fully integrated visual inspection systems. Most anomaly detection approaches using defect detection datasets, such as MVTec AD, employ one-class models that require fitting separate models for each class. On the contrary, unified models eliminate the need for fitting separate models for each class and significantly reduce cost and memory requirements. Thus, in this work, we experiment with considering a unified multi-class setup. Our experimental study shows that multi-class models perform at par with one-class models for the standard MVTec AD dataset. Hence, this indicates that there may not be a need to learn separate object/class-wise models when the object classes are significantly different from each other, as is the case of the dataset considered. Furthermore, we have deployed three different unified lightweight architectures on the CPU and an edge device (NVIDIA Jetson Xavier NX). We analyze the quantized multi-class anomaly detection models in terms of latency and memory requirements for deployment on the edge device while comparing quantization-aware training (QAT) and post-training quantization (PTQ) for performance at different precision widths. In addition, we explored two different methods of calibration required in post-training scenarios and show that one of them performs notably better, highlighting its importance for unsupervised tasks. Due to quantization, the performance drop in PTQ is further compensated by QAT, which yields at par performance with the original 32-bit Floating point in two of the models considered.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
EEG classification for visual brain decoding with spatio-temporal and transformer based paradigms
Authors:
Akanksha Sharma,
Jyoti Nigam,
Abhishek Rathore,
Arnav Bhavsar
Abstract:
In this work, we delve into the EEG classification task in the domain of visual brain decoding via two frameworks, involving two different learning paradigms. Considering the spatio-temporal nature of EEG data, one of our frameworks is based on a CNN-BiLSTM model. The other involves a CNN-Transformer architecture which inherently involves the more versatile attention based learning paradigm. In bo…
▽ More
In this work, we delve into the EEG classification task in the domain of visual brain decoding via two frameworks, involving two different learning paradigms. Considering the spatio-temporal nature of EEG data, one of our frameworks is based on a CNN-BiLSTM model. The other involves a CNN-Transformer architecture which inherently involves the more versatile attention based learning paradigm. In both cases, a special 1D-CNN feature extraction module is used to generate the initial embeddings with 1D convolutions in the time and the EEG channel domains. Considering the EEG signals are noisy, non stationary and the discriminative features are even less clear (than in semantically structured data such as text or image), we also follow a window-based classification followed by majority voting during inference, to yield labels at a signal level. To illustrate how brain patterns correlate with different image classes, we visualize t-SNE plots of the BiLSTM embeddings alongside brain activation maps for the top 10 classes. These visualizations provide insightful revelations into the distinct neural signatures associated with each visual category, showcasing the BiLSTM's capability to capture and represent the discriminative brain activity linked to visual stimuli. We demonstrate the performance of our approach on the updated EEG-Imagenet dataset with positive comparisons with state-of-the-art methods.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
Attend, Distill, Detect: Attention-aware Entropy Distillation for Anomaly Detection
Authors:
Sushovan Jena,
Vishwas Saini,
Ujjwal Shaw,
Pavitra Jain,
Abhay Singh Raihal,
Anoushka Banerjee,
Sharad Joshi,
Ananth Ganesh,
Arnav Bhavsar
Abstract:
Unsupervised anomaly detection encompasses diverse applications in industrial settings where a high-throughput and precision is imperative. Early works were centered around one-class-one-model paradigm, which poses significant challenges in large-scale production environments. Knowledge-distillation based multi-class anomaly detection promises a low latency with a reasonably good performance but w…
▽ More
Unsupervised anomaly detection encompasses diverse applications in industrial settings where a high-throughput and precision is imperative. Early works were centered around one-class-one-model paradigm, which poses significant challenges in large-scale production environments. Knowledge-distillation based multi-class anomaly detection promises a low latency with a reasonably good performance but with a significant drop as compared to one-class version. We propose a DCAM (Distributed Convolutional Attention Module) which improves the distillation process between teacher and student networks when there is a high variance among multiple classes or objects. Integrated multi-scale feature matching strategy to utilise a mixture of multi-level knowledge from the feature pyramid of the two networks, intuitively helping in detecting anomalies of varying sizes which is also an inherent problem in the multi-class scenario. Briefly, our DCAM module consists of Convolutional Attention blocks distributed across the feature maps of the student network, which essentially learns to masks the irrelevant information during student learning alleviating the "cross-class interference" problem. This process is accompanied by minimizing the relative entropy using KL-Divergence in Spatial dimension and a Channel-wise Cosine Similarity between the same feature maps of teacher and student. The losses enables to achieve scale-invariance and capture non-linear relationships. We also highlight that the DCAM module would only be used during training and not during inference as we only need the learned feature maps and losses for anomaly scoring and hence, gaining a performance gain of 3.92% than the multi-class baseline with a preserved latency.
△ Less
Submitted 10 May, 2024;
originally announced May 2024.
-
Evaluating the efficacy of haptic feedback, 360° treadmill-integrated Virtual Reality framework and longitudinal training on decision-making performance in a complex search-and-shoot simulation
Authors:
Akash K Rao,
Arnav Bhavsar,
Shubhajit Roy Chowdhury,
Sushil Chandra,
Ramsingh Negi,
Prakash Duraisamy,
Varun Dutt
Abstract:
Virtual Reality (VR) has made significant strides, offering users a multitude of ways to interact with virtual environments. Each sensory modality in VR provides distinct inputs and interactions, enhancing the user's immersion and presence. However, the potential of additional sensory modalities, such as haptic feedback and 360° locomotion, to improve decision-making performance has not been thoro…
▽ More
Virtual Reality (VR) has made significant strides, offering users a multitude of ways to interact with virtual environments. Each sensory modality in VR provides distinct inputs and interactions, enhancing the user's immersion and presence. However, the potential of additional sensory modalities, such as haptic feedback and 360° locomotion, to improve decision-making performance has not been thoroughly investigated. This study addresses this gap by evaluating the impact of a haptic feedback, 360° locomotion-integrated VR framework and longitudinal, heterogeneous training on decision-making performance in a complex search-and-shoot simulation. The study involved 32 participants from a defence simulation base in India, who were randomly divided into two groups: experimental (haptic feedback, 360° locomotion-integrated VR framework with longitudinal, heterogeneous training) and placebo control (longitudinal, heterogeneous VR training without extrasensory modalities). The experiment lasted 10 days. On Day 1, all subjects executed a search-and-shoot simulation closely replicating the elements/situations in the real world. From Day 2 to Day 9, the subjects underwent heterogeneous training, imparted by the design of various complexity levels in the simulation using changes in behavioral attributes/artificial intelligence of the enemies. On Day 10, they repeated the search-and-shoot simulation executed on Day 1. The results showed that the experimental group experienced a gradual increase in presence, immersion, and engagement compared to the placebo control group. However, there was no significant difference in decision-making performance between the two groups on day 10. We intend to use these findings to design multisensory VR training frameworks that enhance engagement levels and decision-making performance.
△ Less
Submitted 14 April, 2024;
originally announced April 2024.
-
Generating Visual Stimuli from EEG Recordings using Transformer-encoder based EEG encoder and GAN
Authors:
Rahul Mishra,
Arnav Bhavsar
Abstract:
In this study, we tackle a modern research challenge within the field of perceptual brain decoding, which revolves around synthesizing images from EEG signals using an adversarial deep learning framework. The specific objective is to recreate images belonging to various object categories by leveraging EEG recordings obtained while subjects view those images. To achieve this, we employ a Transforme…
▽ More
In this study, we tackle a modern research challenge within the field of perceptual brain decoding, which revolves around synthesizing images from EEG signals using an adversarial deep learning framework. The specific objective is to recreate images belonging to various object categories by leveraging EEG recordings obtained while subjects view those images. To achieve this, we employ a Transformer-encoder based EEG encoder to produce EEG encodings, which serve as inputs to the generator component of the GAN network. Alongside the adversarial loss, we also incorporate perceptual loss to enhance the quality of the generated images.
△ Less
Submitted 15 February, 2024;
originally announced February 2024.
-
Classification of attention performance post-longitudinal tDCS via functional connectivity and machine learning methods
Authors:
Akash K Rao,
Vishnu K Menon,
Arnav Bhavsar,
Shubhajit Roy Chowdhury,
Ramsingh Negi,
Varun Dutt
Abstract:
Attention is the brain's mechanism for selectively processing specific stimuli while filtering out irrelevant information. Characterizing changes in attention following long-term interventions (such as transcranial direct current stimulation (tDCS)) has seldom been emphasized in the literature. To classify attention performance post-tDCS, this study uses functional connectivity and machine learnin…
▽ More
Attention is the brain's mechanism for selectively processing specific stimuli while filtering out irrelevant information. Characterizing changes in attention following long-term interventions (such as transcranial direct current stimulation (tDCS)) has seldom been emphasized in the literature. To classify attention performance post-tDCS, this study uses functional connectivity and machine learning algorithms. Fifty individuals were split into experimental and control conditions. On Day 1, EEG data was obtained as subjects executed an attention task. From Day 2 through Day 8, the experimental group was administered 1mA tDCS, while the control group received sham tDCS. On Day 10, subjects repeated the task mentioned on Day 1. Functional connectivity metrics were used to classify attention performance using various machine learning methods. Results revealed that combining the Adaboost model and recursive feature elimination yielded a classification accuracy of 91.84%. We discuss the implications of our results in developing neurofeedback frameworks to assess attention.
△ Less
Submitted 31 January, 2024;
originally announced February 2024.
-
Prediction of multitasking performance post-longitudinal tDCS via EEG-based functional connectivity and machine learning methods
Authors:
Akash K Rao,
Shashank Uttrani,
Vishnu K Menon,
Darshil Shah,
Arnav Bhavsar,
Shubhajit Roy Chowdhury,
Varun Dutt
Abstract:
Predicting and understanding the changes in cognitive performance, especially after a longitudinal intervention, is a fundamental goal in neuroscience. Longitudinal brain stimulation-based interventions like transcranial direct current stimulation (tDCS) induce short-term changes in the resting membrane potential and influence cognitive processes. However, very little research has been conducted o…
▽ More
Predicting and understanding the changes in cognitive performance, especially after a longitudinal intervention, is a fundamental goal in neuroscience. Longitudinal brain stimulation-based interventions like transcranial direct current stimulation (tDCS) induce short-term changes in the resting membrane potential and influence cognitive processes. However, very little research has been conducted on predicting these changes in cognitive performance post-intervention. In this research, we intend to address this gap in the literature by employing different EEG-based functional connectivity analyses and machine learning algorithms to predict changes in cognitive performance in a complex multitasking task. Forty subjects were divided into experimental and active-control conditions. On Day 1, all subjects executed a multitasking task with simultaneous 32-channel EEG being acquired. From Day 2 to Day 7, subjects in the experimental condition undertook 15 minutes of 2mA anodal tDCS stimulation during task training. Subjects in the active-control condition undertook 15 minutes of sham stimulation during task training. On Day 10, all subjects again executed the multitasking task with EEG acquisition. Source-level functional connectivity metrics, namely phase lag index and directed transfer function, were extracted from the EEG data on Day 1 and Day 10. Various machine learning models were employed to predict changes in cognitive performance. Results revealed that the multi-layer perceptron and directed transfer function recorded a cross-validation training RMSE of 5.11% and a test RMSE of 4.97%. We discuss the implications of our results in developing real-time cognitive state assessors for accurately predicting cognitive performance in dynamic and complex tasks post-tDCS intervention
△ Less
Submitted 31 January, 2024;
originally announced January 2024.
-
Image Forgery Detection with Interpretability
Authors:
Ankit Katiyar,
Arnav Bhavsar
Abstract:
In this work, we present a learning based method focusing on the convolutional neural network (CNN) architecture to detect these forgeries. We consider the detection of both copy-move forgeries and inpainting based forgeries. For these, we synthesize our own large dataset. In addition to classification, the focus is also on interpretability of the forgery detection. As the CNN classification yield…
▽ More
In this work, we present a learning based method focusing on the convolutional neural network (CNN) architecture to detect these forgeries. We consider the detection of both copy-move forgeries and inpainting based forgeries. For these, we synthesize our own large dataset. In addition to classification, the focus is also on interpretability of the forgery detection. As the CNN classification yields the image-level label, it is important to understand if forged region has indeed contributed to the classification. For this purpose, we demonstrate using the Grad-CAM heatmap, that in various correctly classified examples, that the forged region is indeed the region contributing to the classification. Interestingly, this is also applicable for small forged regions, as is depicted in our results. Such an analysis can also help in establishing the reliability of the classification.
△ Less
Submitted 2 February, 2022;
originally announced February 2022.
-
Stain Normalized Breast Histopathology Image Recognition using Convolutional Neural Networks for Cancer Detection
Authors:
Sruthi Krishna,
Suganthi S. S,
Shivsubramani Krishnamoorthy,
Arnav Bhavsar
Abstract:
Computer assisted diagnosis in digital pathology is becoming ubiquitous as it can provide more efficient and objective healthcare diagnostics. Recent advances have shown that the convolutional Neural Network (CNN) architectures, a well-established deep learning paradigm, can be used to design a Computer Aided Diagnostic (CAD) System for breast cancer detection. However, the challenges due to stain…
▽ More
Computer assisted diagnosis in digital pathology is becoming ubiquitous as it can provide more efficient and objective healthcare diagnostics. Recent advances have shown that the convolutional Neural Network (CNN) architectures, a well-established deep learning paradigm, can be used to design a Computer Aided Diagnostic (CAD) System for breast cancer detection. However, the challenges due to stain variability and the effect of stain normalization with such deep learning frameworks are yet to be well explored. Moreover, performance analysis with arguably more efficient network models, which may be important for high throughput screening, is also not well explored.To address this challenge, we consider some contemporary CNN models for binary classification of breast histopathology images that involves (1) the data preprocessing with stain normalized images using an adaptive colour deconvolution (ACD) based color normalization algorithm to handle the stain variabilities; and (2) applying transfer learning based training of some arguably more efficient CNN models, namely Visual Geometry Group Network (VGG16), MobileNet and EfficientNet. We have validated the trained CNN networks on a publicly available BreaKHis dataset, for 200x and 400x magnified histopathology images. The experimental analysis shows that pretrained networks in most cases yield better quality results on data augmented breast histopathology images with stain normalization, than the case without stain normalization. Further, we evaluated the performance and efficiency of popular lightweight networks using stain normalized images and found that EfficientNet outperforms VGG16 and MobileNet in terms of test accuracy and F1 Score. We observed that efficiency in terms of test time is better in EfficientNet than other networks; VGG Net, MobileNet, without much drop in the classification accuracy.
△ Less
Submitted 3 January, 2022;
originally announced January 2022.
-
MHATC: Autism Spectrum Disorder identification utilizing multi-head attention encoder along with temporal consolidation modules
Authors:
Ranjeet Ranjan Jha,
Abhishek Bhardwaj,
Devin Garg,
Arnav Bhavsar,
Aditya Nigam
Abstract:
Resting-state fMRI is commonly used for diagnosing Autism Spectrum Disorder (ASD) by using network-based functional connectivity. It has been shown that ASD is associated with brain regions and their inter-connections. However, discriminating based on connectivity patterns among imaging data of the control population and that of ASD patients' brains is a non-trivial task. In order to tackle said c…
▽ More
Resting-state fMRI is commonly used for diagnosing Autism Spectrum Disorder (ASD) by using network-based functional connectivity. It has been shown that ASD is associated with brain regions and their inter-connections. However, discriminating based on connectivity patterns among imaging data of the control population and that of ASD patients' brains is a non-trivial task. In order to tackle said classification task, we propose a novel deep learning architecture (MHATC) consisting of multi-head attention and temporal consolidation modules for classifying an individual as a patient of ASD. The devised architecture results from an in-depth analysis of the limitations of current deep neural network solutions for similar applications. Our approach is not only robust but computationally efficient, which can allow its adoption in a variety of other research and clinical settings.
△ Less
Submitted 27 December, 2021;
originally announced January 2022.
-
Virtual Screening of Pharmaceutical Compounds with hERG Inhibitory Activity (Cardiotoxicity) using Ensemble Learning
Authors:
Aditya Sarkar,
Arnav Bhavsar
Abstract:
In silico prediction of cardiotoxicity with high sensitivity and specificity for potential drug molecules can be of immense value. Hence, building machine learning classification models, based on some features extracted from the molecular structure of drugs, which are capable of efficiently predicting cardiotoxicity is critical. In this paper, we consider the application of various machine learnin…
▽ More
In silico prediction of cardiotoxicity with high sensitivity and specificity for potential drug molecules can be of immense value. Hence, building machine learning classification models, based on some features extracted from the molecular structure of drugs, which are capable of efficiently predicting cardiotoxicity is critical. In this paper, we consider the application of various machine learning approaches, and then propose an ensemble classifier for the prediction of molecular activity on a Drug Discovery Hackathon (DDH) (1st reference) dataset. We have used only 2-D descriptors of SMILE notations for our prediction. Our ensemble classification uses 5 classifiers (2 Random Forest Classifiers, 2 Support Vector Machines and a Dense Neural Network) and uses Max-Voting technique and Weighted-Average technique for final decision.
△ Less
Submitted 5 June, 2021;
originally announced June 2021.
-
Semantic Features Aided Multi-Scale Reconstruction of Inter-Modality Magnetic Resonance Images
Authors:
Preethi Srinivasan,
Prabhjot Kaur,
Aditya Nigam,
Arnav Bhavsar
Abstract:
Long acquisition time (AQT) due to series acquisition of multi-modality MR images (especially T2 weighted images (T2WI) with longer AQT), though beneficial for disease diagnosis, is practically undesirable. We propose a novel deep network based solution to reconstruct T2W images from T1W images (T1WI) using an encoder-decoder architecture. The proposed learning is aided with semantic features by u…
▽ More
Long acquisition time (AQT) due to series acquisition of multi-modality MR images (especially T2 weighted images (T2WI) with longer AQT), though beneficial for disease diagnosis, is practically undesirable. We propose a novel deep network based solution to reconstruct T2W images from T1W images (T1WI) using an encoder-decoder architecture. The proposed learning is aided with semantic features by using multi-channel input with intensity values and gradient of image in two orthogonal directions. A reconstruction module (RM) augmenting the network along with a domain adaptation module (DAM) which is an encoder-decoder model built-in with sharp bottleneck module (SBM) is trained via modular training. The proposed network significantly reduces the total AQT with negligible qualitative artifacts and quantitative loss (reconstructs one volume in approximately 1 second). The testing is done on publicly available dataset with real MR images, and the proposed network shows (approximately 1dB) increase in PSNR over SOTA.
△ Less
Submitted 22 June, 2020;
originally announced June 2020.
-
Detecting Deepfakes with Metric Learning
Authors:
Akash Kumar,
Arnav Bhavsar
Abstract:
With the arrival of several face-swapping applications such as FaceApp, SnapChat, MixBooth, FaceBlender and many more, the authenticity of digital media content is hanging on a very loose thread. On social media platforms, videos are widely circulated often at a high compression factor. In this work, we analyze several deep learning approaches in the context of deepfakes classification in high com…
▽ More
With the arrival of several face-swapping applications such as FaceApp, SnapChat, MixBooth, FaceBlender and many more, the authenticity of digital media content is hanging on a very loose thread. On social media platforms, videos are widely circulated often at a high compression factor. In this work, we analyze several deep learning approaches in the context of deepfakes classification in high compression scenario and demonstrate that a proposed approach based on metric learning can be very effective in performing such a classification. Using less number of frames per video to assess its realism, the metric learning approach using a triplet network architecture proves to be fruitful. It learns to enhance the feature space distance between the cluster of real and fake videos embedding vectors. We validated our approaches on two datasets to analyze the behavior in different environments. We achieved a state-of-the-art AUC score of 99.2% on the Celeb-DF dataset and accuracy of 90.71% on a highly compressed Neural Texture dataset. Our approach is especially helpful on social media platforms where data compression is inevitable.
△ Less
Submitted 19 March, 2020;
originally announced March 2020.
-
Copy-Move Forgery Classification via Unsupervised Domain Adaptation
Authors:
Akash Kumar,
Arnav Bhavsar
Abstract:
In the current era, image manipulation is becoming increasingly easier, yielding more natural looking images, owing to the modern tools in image processing and computer vision techniques. The task of the segregation of forged images has become very challenging. To tackle such problems, publicly available datasets are insufficient. In this paper, we propose to create a synthetic forged dataset usin…
▽ More
In the current era, image manipulation is becoming increasingly easier, yielding more natural looking images, owing to the modern tools in image processing and computer vision techniques. The task of the segregation of forged images has become very challenging. To tackle such problems, publicly available datasets are insufficient. In this paper, we propose to create a synthetic forged dataset using deep semantic image inpainting algorithm. Furthermore, we use an unsupervised domain adaptation network to detect copy-move forgery in images. Our approach can be helpful in those cases, where the classification of data is unavailable.
△ Less
Submitted 14 November, 2019;
originally announced November 2019.
-
Role of Class-specific Features in Various Classification Frameworks for Human Epithelial (HEp-2) Cell Images
Authors:
Vibha Gupta,
Arnav Bhavsar
Abstract:
The antinuclear antibody detection with human epithelial cells is a popular approach for autoimmune diseases diagnosis. The manual evaluation demands time, effort and capital, and automation in screening can greatly aid the physicians in these respects. In this work, we employ simple, efficient and visually more interpretable, class-specific features which defined based on the visual characteristi…
▽ More
The antinuclear antibody detection with human epithelial cells is a popular approach for autoimmune diseases diagnosis. The manual evaluation demands time, effort and capital, and automation in screening can greatly aid the physicians in these respects. In this work, we employ simple, efficient and visually more interpretable, class-specific features which defined based on the visual characteristics of each class. We believe that defining features with a good visual interpretation, is indeed important in a scenario, where such an approach is used in an interactive CAD system for pathologists. Considering that problem consists of few classes, and our rather simplistic feature definitions, frameworks can be structured as hierarchies of various binary classifiers. These variants include frameworks which are earlier explored and some which are not explored for this task. We perform various experiments which include traditional texture features and demonstrate the effectiveness of class-specific features in various frameworks. We make insightful comparisons between different types of classification frameworks given their silent aspects and pros and cons over each other. We also demonstrate an experiment with only intermediates samples for testing. The proposed work yields encouraging results with respect to the state-of-the-art and highlights the role of class-specific features in different classification frameworks.
△ Less
Submitted 30 October, 2018;
originally announced October 2018.
-
Considerations for a PAP Smear Image Analysis System with CNN Features
Authors:
Srishti Gautam,
Harinarayan K. K.,
Nirmal Jith,
Anil K. Sao,
Arnav Bhavsar,
Adarsh Natarajan
Abstract:
It has been shown that for automated PAP-smear image classification, nucleus features can be very informative. Therefore, the primary step for automated screening can be cell-nuclei detection followed by segmentation of nuclei in the resulting single cell PAP-smear images. We propose a patch based approach using CNN for segmentation of nuclei in single cell images. We then pose the question of ion…
▽ More
It has been shown that for automated PAP-smear image classification, nucleus features can be very informative. Therefore, the primary step for automated screening can be cell-nuclei detection followed by segmentation of nuclei in the resulting single cell PAP-smear images. We propose a patch based approach using CNN for segmentation of nuclei in single cell images. We then pose the question of ion of segmentation for classification using representation learning with CNN, and whether low-level CNN features may be useful for classification. We suggest a CNN-based feature level analysis and a transfer learning based approach for classification using both segmented as well full single cell images. We also propose a decision-tree based approach for classification. Experimental results demonstrate the effectiveness of the proposed algorithms individually (with low-level CNN features), and simultaneously proving the sufficiency of cell-nuclei detection (rather than accurate segmentation) for classification. Thus, we propose a system for analysis of multi-cell PAP-smear images consisting of a simple nuclei detection algorithm followed by classification using transfer learning.
△ Less
Submitted 23 June, 2018;
originally announced June 2018.
-
Classifying Object Manipulation Actions based on Grasp-types and Motion-Constraints
Authors:
Kartik Gupta,
Darius Burschka,
Arnav Bhavsar
Abstract:
In this work, we address a challenging problem of fine-grained and coarse-grained recognition of object manipulation actions. Due to the variations in geometrical and motion constraints, there are different manipulations actions possible to perform different sets of actions with an object. Also, there are subtle movements involved to complete most of object manipulation actions. This makes the tas…
▽ More
In this work, we address a challenging problem of fine-grained and coarse-grained recognition of object manipulation actions. Due to the variations in geometrical and motion constraints, there are different manipulations actions possible to perform different sets of actions with an object. Also, there are subtle movements involved to complete most of object manipulation actions. This makes the task of object manipulation action recognition difficult with only just the motion information. We propose to use grasp and motion-constraints information to recognise and understand action intention with different objects. We also provide an extensive experimental evaluation on the recent Yale Human Grasping dataset consisting of large set of 455 manipulation actions. The evaluation involves a) Different contemporary multi-class classifiers, and binary classifiers with one-vs-one multi- class voting scheme, b) Differential comparisons results based on subsets of attributes involving information of grasp and motion-constraints, c) Fine-grained and Coarse-grained object manipulation action recognition based on fine-grained as well as coarse-grained grasp type information, and d) Comparison between Instance level and Sequence level modeling of object manipulation actions. Our results justifies the efficacy of grasp attributes for the task of fine-grained and coarse-grained object manipulation action recognition.
△ Less
Submitted 20 June, 2018;
originally announced June 2018.
-
Learning to Decode 7T-like MR Image Reconstruction from 3T MR Images
Authors:
Aditya Sharma,
Prabhjot Kaur,
Aditya Nigam,
Arnav Bhavsar
Abstract:
Increasing demand for high field magnetic resonance (MR) scanner indicates the need for high-quality MR images for accurate medical diagnosis. However, cost constraints, instead, motivate a need for algorithms to enhance images from low field scanners. We propose an approach to process the given low field (3T) MR image slices to reconstruct the corresponding high field (7T-like) slices. Our framew…
▽ More
Increasing demand for high field magnetic resonance (MR) scanner indicates the need for high-quality MR images for accurate medical diagnosis. However, cost constraints, instead, motivate a need for algorithms to enhance images from low field scanners. We propose an approach to process the given low field (3T) MR image slices to reconstruct the corresponding high field (7T-like) slices. Our framework involves a novel architecture of a merged convolutional autoencoder with a single encoder and multiple decoders. Specifically, we employ three decoders with random initializations, and the proposed training approach involves selection of a particular decoder in each weight-update iteration for back propagation. We demonstrate that the proposed algorithm outperforms some related contemporary methods in terms of performance and reconstruction time.
△ Less
Submitted 18 June, 2018;
originally announced June 2018.
-
Siamese LSTM based Fiber Structural Similarity Network (FS2Net) for Rotation Invariant Brain Tractography Segmentation
Authors:
Shreyas Malakarjun Patil,
Aditya Nigam,
Arnav Bhavsar,
Chiranjoy Chattopadhyay
Abstract:
In this paper, we propose a novel deep learning architecture combining stacked Bi-directional LSTM and LSTMs with the Siamese network architecture for segmentation of brain fibers, obtained from tractography data, into anatomically meaningful clusters. The proposed network learns the structural difference between fibers of different classes, which enables it to classify fibers with high accuracy.…
▽ More
In this paper, we propose a novel deep learning architecture combining stacked Bi-directional LSTM and LSTMs with the Siamese network architecture for segmentation of brain fibers, obtained from tractography data, into anatomically meaningful clusters. The proposed network learns the structural difference between fibers of different classes, which enables it to classify fibers with high accuracy. Importantly, capturing such deep inter and intra class structural relationship also ensures that the segmentation is robust to relative rotation among test and training data, hence can be used with unregistered data. Our extensive experimentation over order of hundred-thousands of fibers show that the proposed model achieves state-of-the-art results, even in cases of large relative rotations between test and training data.
△ Less
Submitted 28 December, 2017;
originally announced December 2017.
-
Shape Estimation from Defocus Cue for Microscopy Images via Belief Propagation
Authors:
Arnav Bhavsar
Abstract:
In recent years, the usefulness of 3D shape estimation is being realized in microscopic or close-range imaging, as the 3D information can further be used in various applications. Due to limited depth of field at such small distances, the defocus blur induced in images can provide information about the 3D shape of the object. The task of `shape from defocus' (SFD), involves the problem of estimatin…
▽ More
In recent years, the usefulness of 3D shape estimation is being realized in microscopic or close-range imaging, as the 3D information can further be used in various applications. Due to limited depth of field at such small distances, the defocus blur induced in images can provide information about the 3D shape of the object. The task of `shape from defocus' (SFD), involves the problem of estimating good quality 3D shape estimates from images with depth-dependent defocus blur. While the research area of SFD is quite well-established, the approaches have largely demonstrated results on objects with bulk/coarse shape variation. However, in many cases, objects studied under microscopes often involve fine/detailed structures, which have not been explicitly considered in most methods. In addition, given that, in recent years, large data volumes are typically associated with microscopy related applications, it is also important for such SFD methods to be efficient. In this work, we provide an indication of the usefulness of the Belief Propagation (BP) approach in addressing these concerns for SFD. BP has been known to be an efficient combinatorial optimization approach, and has been empirically demonstrated to yield good quality solutions in low-level vision problems such as image restoration, stereo disparity estimation etc. For exploiting the efficiency of BP in SFD, we assume local space-invariance of the defocus blur, which enables the application of BP in a straightforward manner. Even with such an assumption, the ability of BP to provide good quality solutions while using non-convex priors, reflects in yielding plausible shape estimates in presence of fine structures on the objects under microscopy imaging.
△ Less
Submitted 30 December, 2016;
originally announced December 2016.
-
Resolution Enhancement of Range Images via Color-Image Segmentation
Authors:
Arnav Bhavsar
Abstract:
We report a method for super-resolution of range images. Our approach leverages the interpretation of LR image as sparse samples on the HR grid. Based on this interpretation, we demonstrate that our recently reported approach, which reconstructs dense range images from sparse range data by exploiting a registered colour image, can be applied for the task of resolution enhancement of range images.…
▽ More
We report a method for super-resolution of range images. Our approach leverages the interpretation of LR image as sparse samples on the HR grid. Based on this interpretation, we demonstrate that our recently reported approach, which reconstructs dense range images from sparse range data by exploiting a registered colour image, can be applied for the task of resolution enhancement of range images. Our method only uses a single colour image in addition to the range observation in the super-resolution process. Using the proposed approach, we demonstrate super-resolution results for large factors (e.g. 4) with good localization accuracy.
△ Less
Submitted 28 October, 2012;
originally announced October 2012.
-
Analysis of Magnification in Depth from Defocus
Authors:
Arnav Bhavsar
Abstract:
In depth from defocus (DFD), when images are captured with different camera parameters, a relative magnification is induced between them. Image warping is a simpler solution to account for magnification than seemingly more accurate optical approaches. This work is an investigation into the effects of magnification on the accuracy of DFD. We comment on issues regarding scaling effect on relative bl…
▽ More
In depth from defocus (DFD), when images are captured with different camera parameters, a relative magnification is induced between them. Image warping is a simpler solution to account for magnification than seemingly more accurate optical approaches. This work is an investigation into the effects of magnification on the accuracy of DFD. We comment on issues regarding scaling effect on relative blur computation. We statistically analyze accountability of scale factor, commenting on the bias and efficiency of the estimator that does not consider scale. We also discuss the effect of interpolation errors on blur estimation in a warping based solution to handle magnification and carry out experimental analysis to comment on the blur estimation accuracy.
△ Less
Submitted 5 November, 2012; v1 submitted 28 March, 2012;
originally announced March 2012.