-
Wireless Spectrum in Rural Farmlands: Status, Challenges and Opportunities
Authors:
Mukaram Shahid,
Kunal Das,
Taimoor Ul Islam,
Christ Somiah,
Daji Qiao,
Arsalan Ahmad,
Jimming Song,
Zhengyuan Zhu,
Sarath Babu,
Yong Guan,
Tusher Chakraborty,
Suraj Jog,
Ranveer Chandra,
Hongwei Zhang
Abstract:
Due to factors such as low population density and expansive geographical distances, network deployment falls behind in rural regions, leading to a broadband divide. Wireless spectrum serves as the blood and flesh of wireless communications. Shared white spaces such as those in the TVWS and CBRS spectrum bands offer opportunities to expand connectivity, innovate, and provide affordable access to hi…
▽ More
Due to factors such as low population density and expansive geographical distances, network deployment falls behind in rural regions, leading to a broadband divide. Wireless spectrum serves as the blood and flesh of wireless communications. Shared white spaces such as those in the TVWS and CBRS spectrum bands offer opportunities to expand connectivity, innovate, and provide affordable access to high-speed Internet in under-served areas without additional cost to expensive licensed spectrum. However, the current methods to utilize these white spaces are inefficient due to very conservative models and spectrum policies, causing under-utilization of valuable spectrum resources. This hampers the full potential of innovative wireless technologies that could benefit farmers, small Internet Service Providers (ISPs) or Mobile Network Operators (MNOs) operating in rural regions. This study explores the challenges faced by farmers and service providers when using shared spectrum bands to deploy their networks while ensuring maximum system performance and minimizing interference with other users. Additionally, we discuss how spatiotemporal spectrum models, in conjunction with database-driven spectrum-sharing solutions, can enhance the allocation and management of spectrum resources, ultimately improving the efficiency and reliability of wireless networks operating in shared spectrum bands.
△ Less
Submitted 5 July, 2024;
originally announced July 2024.
-
Exploring Wireless Channels in Rural Areas: A Comprehensive Measurement Study
Authors:
Tianyi Zhang,
Guoying Zu,
Taimoor Ul Islam,
Evan Gossling,
Sarath Babu,
Daji Qiao,
Hongwei Zhang
Abstract:
The study of wireless channel behavior has been an active research topic for many years. However, there exists a noticeable scarcity of studies focusing on wireless channel characteristics in rural areas. With the advancement of smart agriculture practices in rural regions, there has been an increasing demand for affordable, high-capacity, and low-latency wireless networks to support various preci…
▽ More
The study of wireless channel behavior has been an active research topic for many years. However, there exists a noticeable scarcity of studies focusing on wireless channel characteristics in rural areas. With the advancement of smart agriculture practices in rural regions, there has been an increasing demand for affordable, high-capacity, and low-latency wireless networks to support various precision agriculture applications such as plant phenotyping, livestock health monitoring, and agriculture automation. To address this research gap, we conducted a channel measurement study on multiple wireless frequency bands at various crop and livestock farms near Ames, Iowa, based on Iowa State University~(ISU)'s ARA Wireless Living lab - one of the NSF PAWR platforms. We specifically investigate the impact of weather conditions, humidity, temperature, and farm buildings on wireless channel behavior. The resulting measurement dataset, which will soon be made publicly accessible, represents a valuable resource for researchers interested in wireless channel prediction and optimization.
△ Less
Submitted 26 April, 2024;
originally announced April 2024.
-
iSeg: Interactive 3D Segmentation via Interactive Attention
Authors:
Itai Lang,
Fei Xu,
Dale Decatur,
Sudarshan Babu,
Rana Hanocka
Abstract:
We present iSeg, a new interactive technique for segmenting 3D shapes. Previous works have focused mainly on leveraging pre-trained 2D foundation models for 3D segmentation based on text. However, text may be insufficient for accurately describing fine-grained spatial segmentations. Moreover, achieving a consistent 3D segmentation using a 2D model is challenging since occluded areas of the same se…
▽ More
We present iSeg, a new interactive technique for segmenting 3D shapes. Previous works have focused mainly on leveraging pre-trained 2D foundation models for 3D segmentation based on text. However, text may be insufficient for accurately describing fine-grained spatial segmentations. Moreover, achieving a consistent 3D segmentation using a 2D model is challenging since occluded areas of the same semantic region may not be visible together from any 2D view. Thus, we design a segmentation method conditioned on fine user clicks, which operates entirely in 3D. Our system accepts user clicks directly on the shape's surface, indicating the inclusion or exclusion of regions from the desired shape partition. To accommodate various click settings, we propose a novel interactive attention module capable of processing different numbers and types of clicks, enabling the training of a single unified interactive segmentation model. We apply iSeg to a myriad of shapes from different domains, demonstrating its versatility and faithfulness to the user's specifications. Our project page is at https://threedle.github.io/iSeg/.
△ Less
Submitted 4 April, 2024;
originally announced April 2024.
-
HyperFields: Towards Zero-Shot Generation of NeRFs from Text
Authors:
Sudarshan Babu,
Richard Liu,
Avery Zhou,
Michael Maire,
Greg Shakhnarovich,
Rana Hanocka
Abstract:
We introduce HyperFields, a method for generating text-conditioned Neural Radiance Fields (NeRFs) with a single forward pass and (optionally) some fine-tuning. Key to our approach are: (i) a dynamic hypernetwork, which learns a smooth mapping from text token embeddings to the space of NeRFs; (ii) NeRF distillation training, which distills scenes encoded in individual NeRFs into one dynamic hyperne…
▽ More
We introduce HyperFields, a method for generating text-conditioned Neural Radiance Fields (NeRFs) with a single forward pass and (optionally) some fine-tuning. Key to our approach are: (i) a dynamic hypernetwork, which learns a smooth mapping from text token embeddings to the space of NeRFs; (ii) NeRF distillation training, which distills scenes encoded in individual NeRFs into one dynamic hypernetwork. These techniques enable a single network to fit over a hundred unique scenes. We further demonstrate that HyperFields learns a more general map between text and NeRFs, and consequently is capable of predicting novel in-distribution and out-of-distribution scenes -- either zero-shot or with a few finetuning steps. Finetuning HyperFields benefits from accelerated convergence thanks to the learned general map, and is capable of synthesizing novel scenes 5 to 10 times faster than existing neural optimization-based methods. Our ablation experiments show that both the dynamic architecture and NeRF distillation are critical to the expressivity of HyperFields.
△ Less
Submitted 13 June, 2024; v1 submitted 25 October, 2023;
originally announced October 2023.
-
Evaluation of Deep Neural Operator Models toward Ocean Forecasting
Authors:
Ellery Rajagopal,
Anantha N. S. Babu,
Tony Ryu,
Patrick J. Haley Jr.,
Chris Mirabito,
Pierre F. J. Lermusiaux
Abstract:
Data-driven, deep-learning modeling frameworks have been recently developed for forecasting time series data. Such machine learning models may be useful in multiple domains including the atmospheric and oceanic ones, and in general, the larger fluids community. The present work investigates the possible effectiveness of such deep neural operator models for reproducing and predicting classic fluid…
▽ More
Data-driven, deep-learning modeling frameworks have been recently developed for forecasting time series data. Such machine learning models may be useful in multiple domains including the atmospheric and oceanic ones, and in general, the larger fluids community. The present work investigates the possible effectiveness of such deep neural operator models for reproducing and predicting classic fluid flows and simulations of realistic ocean dynamics. We first briefly evaluate the capabilities of such deep neural operator models when trained on a simulated two-dimensional fluid flow past a cylinder. We then investigate their application to forecasting ocean surface circulation in the Middle Atlantic Bight and Massachusetts Bay, learning from high-resolution data-assimilative simulations employed for real sea experiments. We confirm that trained deep neural operator models are capable of predicting idealized periodic eddy shedding. For realistic ocean surface flows and our preliminary study, they can predict several of the features and show some skill, providing potential for future research and applications.
△ Less
Submitted 22 August, 2023;
originally announced August 2023.
-
Emotion Recognition for Challenged People Facial Appearance in Social using Neural Network
Authors:
P. Deivendran,
P. Suresh Babu,
G. Malathi,
K. Anbazhagan,
R. Senthil Kumar
Abstract:
Human communication is the vocal and non verbal signal to communicate with others. Human expression is a significant biometric object in picture and record databases of surveillance systems. Face appreciation has a serious role in biometric methods and is good-looking for plentiful applications, including visual scrutiny and security. Facial expressions are a form of nonverbal communication; recog…
▽ More
Human communication is the vocal and non verbal signal to communicate with others. Human expression is a significant biometric object in picture and record databases of surveillance systems. Face appreciation has a serious role in biometric methods and is good-looking for plentiful applications, including visual scrutiny and security. Facial expressions are a form of nonverbal communication; recognizing them helps improve the human machine interaction. This paper proposes an idea for face and enlightenment invariant credit of facial expressions by the images. In order on, the person's face can be computed. Face expression is used in CNN classifier to categorize the acquired picture into different emotion categories. It is a deep, feed-forward artificial neural network. Outcome surpasses human presentation and shows poses alternate performance. Varying lighting conditions can influence the fitting process and reduce recognition precision. Results illustrate that dependable facial appearance credited with changing lighting conditions for separating reasonable facial terminology display emotions is an efficient representation of clean and assorted moving expressions. This process can also manage the proportions of dissimilar basic affecting expressions of those mixed jointly to produce sensible emotional facial expressions. Our system contains a pre-defined data set, which was residential by a statistics scientist and includes all pure and varied expressions. On average, a data set has achieved 92.4% exact validation of the expressions synthesized by our technique. These facial expressions are compared through the pre-defined data-position inside our system. If it recognizes the person in an abnormal condition, an alert will be passed to the nearby hospital/doctor seeing that a message.
△ Less
Submitted 11 May, 2023;
originally announced May 2023.
-
Soft robotics towards sustainable development goals and climate actions
Authors:
Goffredo Giordano,
Saravana Prashanth Murali Babu,
Barbara Mazzolai
Abstract:
Soft robotics technology can aid in achieving United Nations Sustainable Development Goals (SDGs) and the Paris Climate Agreement through development of autonomous, environmentally responsible machines powered by renewable energy. By utilizing soft robotics, we can mitigate the detrimental effects of climate change on human society and the natural world through fostering adaptation, restoration, a…
▽ More
Soft robotics technology can aid in achieving United Nations Sustainable Development Goals (SDGs) and the Paris Climate Agreement through development of autonomous, environmentally responsible machines powered by renewable energy. By utilizing soft robotics, we can mitigate the detrimental effects of climate change on human society and the natural world through fostering adaptation, restoration, and remediation. Moreover, the implementation of soft robotics can lead to groundbreaking discoveries in material science, biology, control systems, energy efficiency, and sustainable manufacturing processes. However, to achieve these goals, we need further improvements in understanding biological principles at the basis of embodied and physical intelligence, environment-friendly materials, and energy-saving strategies to design and manufacture self-piloting and field-ready soft robots. This paper provides insights on how soft robotics can address the pressing issue of environmental sustainability. Sustainable manufacturing of soft robots at a large scale, exploring the potential of biodegradable and bioinspired materials, and integrating onboard renewable energy sources to promote autonomy and intelligence are some of the urgent challenges of this field that we discuss in this paper. Specifically, we will present field-ready soft robots that address targeted productive applications in urban farming, healthcare, land and ocean preservation, disaster remediation, and clean and affordable energy, thus supporting some of the SDGs. By embracing soft robotics as a solution, we can concretely support economic growth and sustainable industry, drive solutions for environment protection and clean energy, and improve overall health and well-being.
△ Less
Submitted 21 March, 2023;
originally announced March 2023.
-
Identification of Surface Defects on Solar PV Panels and Wind Turbine Blades using Attention based Deep Learning Model
Authors:
Divyanshi Dwivedi,
K. Victor Sam Moses Babu,
Pradeep Kumar Yemula,
Pratyush Chakraborty,
Mayukha Pal
Abstract:
The global generation of renewable energy has rapidly increased, primarily due to the installation of large-scale renewable energy power plants. However, monitoring renewable energy assets in these large plants remains challenging due to environmental factors that could result in reduced power generation, malfunctioning, and degradation of asset life. Therefore, the detection of surface defects on…
▽ More
The global generation of renewable energy has rapidly increased, primarily due to the installation of large-scale renewable energy power plants. However, monitoring renewable energy assets in these large plants remains challenging due to environmental factors that could result in reduced power generation, malfunctioning, and degradation of asset life. Therefore, the detection of surface defects on renewable energy assets is crucial for maintaining the performance and efficiency of these plants. This paper proposes an innovative detection framework to achieve an economical surface monitoring system for renewable energy assets. High-resolution images of the assets are captured regularly and inspected to identify surface or structural damages on solar panels and wind turbine blades. {Vision transformer (ViT), one of the latest attention-based deep learning (DL) models in computer vision, is proposed in this work to classify surface defects.} The ViT model outperforms other DL models, including MobileNet, VGG16, Xception, EfficientNetB7, and ResNet50, achieving high accuracy scores above 97\% for both wind and solar plant assets. From the results, our proposed model demonstrates its potential for monitoring and detecting damages in renewable energy assets for efficient and reliable operation of renewable power plants.
△ Less
Submitted 8 January, 2024; v1 submitted 22 November, 2022;
originally announced November 2022.
-
Enhancement to Training of Bidirectional GAN : An Approach to Demystify Tax Fraud
Authors:
Priya Mehta,
Sandeep Kumar,
Ravi Kumar,
Ch. Sobhan Babu
Abstract:
Outlier detection is a challenging activity. Several machine learning techniques are proposed in the literature for outlier detection. In this article, we propose a new training approach for bidirectional GAN (BiGAN) to detect outliers. To validate the proposed approach, we train a BiGAN with the proposed training approach to detect taxpayers, who are manipulating their tax returns. For each taxpa…
▽ More
Outlier detection is a challenging activity. Several machine learning techniques are proposed in the literature for outlier detection. In this article, we propose a new training approach for bidirectional GAN (BiGAN) to detect outliers. To validate the proposed approach, we train a BiGAN with the proposed training approach to detect taxpayers, who are manipulating their tax returns. For each taxpayer, we derive six correlation parameters and three ratio parameters from tax returns submitted by him/her. We train a BiGAN with the proposed training approach on this nine-dimensional derived ground-truth data set. Next, we generate the latent representation of this data set using the $encoder$ (encode this data set using the $encoder$) and regenerate this data set using the $generator$ (decode back using the $generator$) by giving this latent representation as the input. For each taxpayer, compute the cosine similarity between his/her ground-truth data and regenerated data. Taxpayers with lower cosine similarity measures are potential return manipulators. We applied our method to analyze the iron and steel taxpayers data set provided by the Commercial Taxes Department, Government of Telangana, India.
△ Less
Submitted 16 August, 2022;
originally announced August 2022.
-
Representation Learning on Graphs to Identifying Circular Trading in Goods and Services Tax
Authors:
Priya Mehta,
Sanat Bhargava,
M. Ravi Kumar,
K. Sandeep Kumar,
Ch. Sobhan Babu
Abstract:
Circular trading is a form of tax evasion in Goods and Services Tax where a group of fraudulent taxpayers (traders) aims to mask illegal transactions by superimposing several fictitious transactions (where no value is added to the goods or service) among themselves in a short period. Due to the vast database of taxpayers, it is infeasible for authorities to manually identify groups of circular tra…
▽ More
Circular trading is a form of tax evasion in Goods and Services Tax where a group of fraudulent taxpayers (traders) aims to mask illegal transactions by superimposing several fictitious transactions (where no value is added to the goods or service) among themselves in a short period. Due to the vast database of taxpayers, it is infeasible for authorities to manually identify groups of circular traders and the illegitimate transactions they are involved in. This work uses big data analytics and graph representation learning techniques to propose a framework to identify communities of circular traders and isolate the illegitimate transactions in the respective communities. Our approach is tested on real-life data provided by the Department of Commercial Taxes, Government of Telangana, India, where we uncovered several communities of circular traders.
△ Less
Submitted 16 August, 2022;
originally announced August 2022.
-
Peer-to-Peer Sharing of Energy Storage Systems under Net Metering and Time-of-Use Pricing
Authors:
K. Victor Sam Moses Babu,
Satya Surya Vinay K,
Pratyush Chakraborty
Abstract:
Sharing economy has become a socio-economic trend in transportation and housing sectors. It develops business models leveraging underutilized resources. Like those sectors, power grid is also becoming smarter with many flexible resources, and researchers are investigating the impact of sharing resources here as well that can help to reduce cost and extract value. In this work, we investigate shari…
▽ More
Sharing economy has become a socio-economic trend in transportation and housing sectors. It develops business models leveraging underutilized resources. Like those sectors, power grid is also becoming smarter with many flexible resources, and researchers are investigating the impact of sharing resources here as well that can help to reduce cost and extract value. In this work, we investigate sharing of energy storage devices among individual households in a cooperative fashion. Coalitional game theory is used to model the scenario where utility company imposes time-of-use (ToU) price and net metering billing mechanism. The resulting game has a non-empty core and we can develop a cost allocation mechanism with easy to compute analytical formula. Allocation is fair and cost effective for every household. We design the price for peer to peer network (P2P) and an algorithm for sharing that keeps the grand coalition always stable. Thus sharing electricity of storage devices among consumers can be effective in this set-up. Our mechanism is implemented in a community of 80 households in Texas using real data of demand and solar irradiance and the results show significant cost savings for our method.
△ Less
Submitted 1 October, 2022; v1 submitted 25 July, 2022;
originally announced July 2022.
-
Cooperate or Compete: A New Perspective on Training of Generative Networks
Authors:
Ch. Sobhan Babu,
Ravindra Guravannavar,
Arvind Hulgeri
Abstract:
GANs have two competing modules: the generator module is trained to generate new examples, and the discriminator module is trained to discriminate real examples from generated examples. The training procedure of GAN is modeled as a finitely repeated simultaneous game. Each module tries to increase its performance at every repetition of the base game (at every batch of training data) in a non-coope…
▽ More
GANs have two competing modules: the generator module is trained to generate new examples, and the discriminator module is trained to discriminate real examples from generated examples. The training procedure of GAN is modeled as a finitely repeated simultaneous game. Each module tries to increase its performance at every repetition of the base game (at every batch of training data) in a non-cooperative manner. We observed that each module can perform better and learn faster if training is modeled as an infinitely repeated simultaneous game. At every repetition of the base game (at every batch of training data) the stronger module (whose performance is increased or remains the same compared to the previous batch of training data) cooperates with the weaker module (whose performance is decreased compared to the previous batch of training data) and only the weaker module is allowed to increase its performance.
△ Less
Submitted 28 September, 2022; v1 submitted 5 July, 2022;
originally announced July 2022.
-
Investigating the impact of BTI, HCI and time-zero variability on neuromorphic spike event generation circuits
Authors:
Shaik Jani Babu,
Rohit Singh,
Siona Menezes Picardo,
Nilesh Goel,
Sonal Singhal
Abstract:
Neuromorphic computing refers to brain-inspired computers, that differentiate it from von Neumann architecture. Analog VLSI based neuromorphic circuits is a current research interest. Two simpler spiking integrate and fire neuron model namely axon-Hillock (AH) and voltage integrate, and fire (VIF) circuits are commonly used for generating spike events. This paper discusses the impact of reliabilit…
▽ More
Neuromorphic computing refers to brain-inspired computers, that differentiate it from von Neumann architecture. Analog VLSI based neuromorphic circuits is a current research interest. Two simpler spiking integrate and fire neuron model namely axon-Hillock (AH) and voltage integrate, and fire (VIF) circuits are commonly used for generating spike events. This paper discusses the impact of reliability issues like Bias Temperature instability (BTI) and Hot Carrier Injection (HCI), and timezero variability on these CMOS based neuromorphic circuits. AH and VIF circuits are implemented using HKMG based 45nm technology. For reliability analysis, industry standard Cadence RelXpert tool is used. For time-zero variability analysis, 1000 Monte-Carlo simulations are performed.
△ Less
Submitted 19 May, 2022;
originally announced May 2022.
-
VoxelHop: Successive Subspace Learning for ALS Disease Classification Using Structural MRI
Authors:
Xiaofeng Liu,
Fangxu Xing,
Chao Yang,
C. -C. Jay Kuo,
Suma Babu,
Georges El Fakhri,
Thomas Jenkins,
Jonghye Woo
Abstract:
Deep learning has great potential for accurate detection and classification of diseases with medical imaging data, but the performance is often limited by the number of training datasets and memory requirements. In addition, many deep learning models are considered a "black-box," thereby often limiting their adoption in clinical applications. To address this, we present a successive subspace learn…
▽ More
Deep learning has great potential for accurate detection and classification of diseases with medical imaging data, but the performance is often limited by the number of training datasets and memory requirements. In addition, many deep learning models are considered a "black-box," thereby often limiting their adoption in clinical applications. To address this, we present a successive subspace learning model, termed VoxelHop, for accurate classification of Amyotrophic Lateral Sclerosis (ALS) using T2-weighted structural MRI data. Compared with popular convolutional neural network (CNN) architectures, VoxelHop has modular and transparent structures with fewer parameters without any backpropagation, so it is well-suited to small dataset size and 3D imaging data. Our VoxelHop has four key components, including (1) sequential expansion of near-to-far neighborhood for multi-channel 3D data; (2) subspace approximation for unsupervised dimension reduction; (3) label-assisted regression for supervised dimension reduction; and (4) concatenation of features and classification between controls and patients. Our experimental results demonstrate that our framework using a total of 20 controls and 26 patients achieves an accuracy of 93.48$\%$ and an AUC score of 0.9394 in differentiating patients from controls, even with a relatively small number of datasets, showing its robustness and effectiveness. Our thorough evaluations also show its validity and superiority to the state-of-the-art 3D CNN classification methods. Our framework can easily be generalized to other classification tasks using different imaging modalities.
△ Less
Submitted 13 January, 2021;
originally announced January 2021.
-
Tutorial I: Learning the Principles of Mobile Radio Propagation through Smartphone and CRFO
Authors:
Prabhu Chandhar,
Sathish Babu,
Tamizh Elakkiya
Abstract:
In this tutorial, we present three simple smartphone based experiments for understanding the basic concepts of mobile communications such as pathloss, Shadow fading, and small scale fading. We also explain the use of Collaborative Radio Frequency Observatory (CRFO), an online platform, for visualizing radio coverage maps.
In this tutorial, we present three simple smartphone based experiments for understanding the basic concepts of mobile communications such as pathloss, Shadow fading, and small scale fading. We also explain the use of Collaborative Radio Frequency Observatory (CRFO), an online platform, for visualizing radio coverage maps.
△ Less
Submitted 5 January, 2021;
originally announced January 2021.
-
Learning to Count in the Crowd from Limited Labeled Data
Authors:
Vishwanath A. Sindagi,
Rajeev Yasarla,
Deepak Sam Babu,
R. Venkatesh Babu,
Vishal M. Patel
Abstract:
Recent crowd counting approaches have achieved excellent performance. However, they are essentially based on fully supervised paradigm and require large number of annotated samples. Obtaining annotations is an expensive and labour-intensive process. In this work, we focus on reducing the annotation efforts by learning to count in the crowd from limited number of labeled samples while leveraging a…
▽ More
Recent crowd counting approaches have achieved excellent performance. However, they are essentially based on fully supervised paradigm and require large number of annotated samples. Obtaining annotations is an expensive and labour-intensive process. In this work, we focus on reducing the annotation efforts by learning to count in the crowd from limited number of labeled samples while leveraging a large pool of unlabeled data. Specifically, we propose a Gaussian Process-based iterative learning mechanism that involves estimation of pseudo-ground truth for the unlabeled data, which is then used as supervision for training the network. The proposed method is shown to be effective under the reduced data (semi-supervised) settings for several datasets like ShanghaiTech, UCF-QNRF, WorldExpo, UCSD, etc. Furthermore, we demonstrate that the proposed method can be leveraged to enable the network in learning to count from synthetic dataset while being able to generalize better to real-world datasets (synthetic-to-real transfer).
△ Less
Submitted 8 July, 2020; v1 submitted 7 July, 2020;
originally announced July 2020.
-
Black or White? How to Develop an AutoTuner for Memory-based Analytics [Extended Version]
Authors:
Mayuresh Kunjir,
Shivnath Babu
Abstract:
There is a lot of interest today in building autonomous (or, self-driving) data processing systems. An emerging school of thought is to leverage AI-driven "black box" algorithms for this purpose. In this paper, we present a contrarian view. We study the problem of autotuning the memory allocation for applications running on modern distributed data processing systems. For this problem, we show that…
▽ More
There is a lot of interest today in building autonomous (or, self-driving) data processing systems. An emerging school of thought is to leverage AI-driven "black box" algorithms for this purpose. In this paper, we present a contrarian view. We study the problem of autotuning the memory allocation for applications running on modern distributed data processing systems. For this problem, we show that an empirically-driven "white-box" algorithm, called RelM, that we have developed provides a close-to-optimal tuning at a fraction of the overheads compared to state-of-the-art AI-driven "black box" algorithms, namely, Bayesian Optimization (BO) and Deep Distributed Policy Gradient (DDPG). The main reason for RelM's superior performance is that the memory management in modern memory-based data analytics systems is an interplay of algorithms at multiple levels: (i) at the resource-management level across various containers allocated by resource managers like Kubernetes and YARN, (ii) at the container level among the OS, pods, and processes such as the Java Virtual Machine (JVM), (iii) at the application level for caching, aggregation, data shuffles, and application data structures, and (iv) at the JVM level across various pools such as the Young and Old Generation. RelM understands these interactions and uses them in building an analytical solution to autotune the memory management knobs. In another contribution, called GBO, we use the RelM's analytical models to speed up Bayesian Optimization. Through an evaluation based on Apache Spark, we showcase that RelM's recommendations are significantly better than what commonly-used Spark deployments provide, and are close to the ones obtained by brute-force exploration; while GBO provides optimality guarantees for a higher, but still significantly lower compared to the state-of-the-art AI-driven policies, cost overhead.
△ Less
Submitted 26 February, 2020;
originally announced February 2020.
-
Domain-independent Dominance of Adaptive Methods
Authors:
Pedro Savarese,
David McAllester,
Sudarshan Babu,
Michael Maire
Abstract:
From a simplified analysis of adaptive methods, we derive AvaGrad, a new optimizer which outperforms SGD on vision tasks when its adaptability is properly tuned. We observe that the power of our method is partially explained by a decoupling of learning rate and adaptability, greatly simplifying hyperparameter search. In light of this observation, we demonstrate that, against conventional wisdom, A…
▽ More
From a simplified analysis of adaptive methods, we derive AvaGrad, a new optimizer which outperforms SGD on vision tasks when its adaptability is properly tuned. We observe that the power of our method is partially explained by a decoupling of learning rate and adaptability, greatly simplifying hyperparameter search. In light of this observation, we demonstrate that, against conventional wisdom, Adam can also outperform SGD on vision tasks, as long as the coupling between its learning rate and adaptability is taken into account. In practice, AvaGrad matches the best results, as measured by generalization accuracy, delivered by any existing optimizer (SGD or adaptive) across image classification (CIFAR, ImageNet) and character-level language modelling (Penn Treebank) tasks.
△ Less
Submitted 16 March, 2020; v1 submitted 4 December, 2019;
originally announced December 2019.
-
iVRNote: Design, Creation and Evaluation of an Interactive Note-Taking Interface for Study and Reflection in VR Learning Environments
Authors:
Yi-Ting Chen,
Chi-Hsuan Hsu,
Chih-Han Chung,
Yu-Shuen Wang,
Sabarish V. Babu
Abstract:
In this contribution, we design, implement and evaluate the pedagogical benefits of a novel interactive note taking interface (iVRNote) in VR for the purpose of learning and reflection lectures. In future VR learning environments, students would have challenges in taking notes when they wear a head mounted display (HMD). To solve this problem, we installed a digital tablet on the desk and provided…
▽ More
In this contribution, we design, implement and evaluate the pedagogical benefits of a novel interactive note taking interface (iVRNote) in VR for the purpose of learning and reflection lectures. In future VR learning environments, students would have challenges in taking notes when they wear a head mounted display (HMD). To solve this problem, we installed a digital tablet on the desk and provided several tools in VR to facilitate the learning experience. Specifically, we track the stylus position and orientation in the physical world and then render a virtual stylus in VR. In other words, when students see a virtual stylus somewhere on the desk, they can reach out with their hand for the physical stylus. The information provided will also enable them to know where they will draw or write before the stylus touches the tablet. Since the presented iVRNote featuring our note taking system is a digital environment, we also enable students save efforts in taking extensive notes by providing several functions, such as post-editing and picture taking, so that they can pay more attention to lectures in VR. We also record the time of each stroke on the note to help students review a lecture. They can select a part of their note to revisit the corresponding segment in a virtual online lecture. Figures and the accompanying video demonstrate the feasibility of the presented iVRNote system. To evaluate the system, we conducted a user study with 20 participants to assess the preference and pedagogical benefits of the iVRNote interface.
△ Less
Submitted 3 October, 2019;
originally announced October 2019.
-
Scalable K-Medoids via True Error Bound and Familywise Bandits
Authors:
Aravindakshan Babu,
Saurabh Agarwal,
Sudarshan Babu,
Hariharan Chandrasekaran
Abstract:
K-Medoids(KM) is a standard clustering method, used extensively on semi-metric data.Error analyses of KM have traditionally used an in-sample notion of error,which can be far from the true error and suffer from generalization gap. We formalize the true K-Medoid error based on the underlying data distribution.We decompose the true error into fundamental statistical problems of: minimum estimation (…
▽ More
K-Medoids(KM) is a standard clustering method, used extensively on semi-metric data.Error analyses of KM have traditionally used an in-sample notion of error,which can be far from the true error and suffer from generalization gap. We formalize the true K-Medoid error based on the underlying data distribution.We decompose the true error into fundamental statistical problems of: minimum estimation (ME) and minimum mean estimation (MME). We provide a convergence result for MME. We show $\errMME$ decreases no slower than $Θ(\frac{1}{n^{\frac{2}{3}}})$, where $n$ is a measure of sample size. Inspired by this bound, we propose a computationally efficient, distributed KM algorithm namely MCPAM. MCPAM has expected runtime $\mathcal{O}(km)$,where $k$ is the number of medoids and $m$ is number of samples. MCPAM provides massive computational savings for a small tradeoff in accuracy. We verify the quality and scaling properties of MCPAM on various datasets. And achieve the hitherto unachieved feat of calculating the KM of 1 billion points on semi-metric spaces.
△ Less
Submitted 29 October, 2019; v1 submitted 27 May, 2019;
originally announced May 2019.
-
Relation Networks for Optic Disc and Fovea Localization in Retinal Images
Authors:
Sudharshan Chandra Babu,
Shishira R Maiya,
Sivasankar Elango
Abstract:
Diabetic Retinopathy is the leading cause of blindness in the world. At least 90\% of new cases can be reduced with proper treatment and monitoring of the eyes. However, scanning the entire population of patients is a difficult endeavor. Computer-aided diagnosis tools in retinal image analysis can make the process scalable and efficient. In this work, we focus on the problem of localizing the cent…
▽ More
Diabetic Retinopathy is the leading cause of blindness in the world. At least 90\% of new cases can be reduced with proper treatment and monitoring of the eyes. However, scanning the entire population of patients is a difficult endeavor. Computer-aided diagnosis tools in retinal image analysis can make the process scalable and efficient. In this work, we focus on the problem of localizing the centers of the Optic disc and Fovea, a task crucial to the analysis of retinal scans. Current systems recognize the Optic disc and Fovea individually, without exploiting their relations during learning. We propose a novel approach to localizing the centers of the Optic disc and Fovea by simultaneously processing them and modeling their relative geometry and appearance. We show that our approach improves localization and recognition by incorporating object-object relations efficiently, and achieves highly competitive results.
△ Less
Submitted 23 November, 2018;
originally announced December 2018.
-
Slum Segmentation and Change Detection : A Deep Learning Approach
Authors:
Shishira R Maiya,
Sudharshan Chandra Babu
Abstract:
More than one billion people live in slums around the world. In some developing countries, slum residents make up for more than half of the population and lack reliable sanitation services, clean water, electricity, other basic services. Thus, slum rehabilitation and improvement is an important global challenge, and a significant amount of effort and resources have been put into this endeavor. The…
▽ More
More than one billion people live in slums around the world. In some developing countries, slum residents make up for more than half of the population and lack reliable sanitation services, clean water, electricity, other basic services. Thus, slum rehabilitation and improvement is an important global challenge, and a significant amount of effort and resources have been put into this endeavor. These initiatives rely heavily on slum mapping and monitoring, and it is essential to have robust and efficient methods for mapping and monitoring existing slum settlements. In this work, we introduce an approach to segment and map individual slums from satellite imagery, leveraging regional convolutional neural networks for instance segmentation using transfer learning. In addition, we also introduce a method to perform change detection and monitor slum change over time. We show that our approach effectively learns slum shape and appearance, and demonstrates strong quantitative results, resulting in a maximum AP of 80.0.
△ Less
Submitted 19 November, 2018;
originally announced November 2018.
-
A Rate-Optimal Construction of Codes with Sequential Recovery with Low Block Length
Authors:
Balaji Srinivasan Babu,
Ganesh R. Kini,
P. Vijay Kumar
Abstract:
An erasure code is said to be a code with sequential recovery with parameters $r$ and $t$, if for any $s \leq t$ erased code symbols, there is an $s$-step recovery process in which at each step we recover exactly one erased code symbol by contacting at most $r$ other code symbols. In earlier work by the same authors, presented at ISIT 2017, we had given a construction for binary codes with sequent…
▽ More
An erasure code is said to be a code with sequential recovery with parameters $r$ and $t$, if for any $s \leq t$ erased code symbols, there is an $s$-step recovery process in which at each step we recover exactly one erased code symbol by contacting at most $r$ other code symbols. In earlier work by the same authors, presented at ISIT 2017, we had given a construction for binary codes with sequential recovery from $t$ erasures, with locality parameter $r$, which were optimal in terms of code rate for given $r,t$, but where the block length was large, on the order of $r^{c^t}$, for some constant $c >1$. In the present paper, we present an alternative construction of a rate-optimal code for any value of $t$ and any $r\geq3$, where the block length is significantly smaller, on the order of $r^{\frac{5t}{4}+\frac{7}{4}}$ (in some instances of order $r^{\frac{3t}{2}+2}$). Our construction is based on the construction of certain kind of tree-like graphs with girth $t+1$. We construct these graphs and hence the codes recursively.
△ Less
Submitted 21 January, 2018;
originally announced January 2018.
-
An algorithmic approach to handle circular trading in commercial taxing system
Authors:
Jithin Mathews,
Priya Mehta,
S. V. Kasi Visweswara Rao,
Ch. Sobhan Babu
Abstract:
Tax manipulation comes in a variety of forms with different motivations and of varying complexities. In this paper, we deal with a specific technique used by tax-evaders known as circular trading. In particular, we define algorithms for the detection and analysis of circular trade. To achieve this, we have modelled the whole system as a directed graph with the actors being vertices and the transac…
▽ More
Tax manipulation comes in a variety of forms with different motivations and of varying complexities. In this paper, we deal with a specific technique used by tax-evaders known as circular trading. In particular, we define algorithms for the detection and analysis of circular trade. To achieve this, we have modelled the whole system as a directed graph with the actors being vertices and the transactions among them as directed edges. We illustrate the results obtained after running the proposed algorithm on the commercial tax dataset of the government of Telangana, India, which contains the transaction details of a set of participants involved in a known circular trade.
△ Less
Submitted 5 November, 2017; v1 submitted 30 October, 2017;
originally announced October 2017.
-
Urban Delay Tolerant Network Simulator (UDTNSim v0.1)
Authors:
Sarath Babu,
Gaurav Jain,
B. S. Manoj
Abstract:
Delay Tolerant Networking (DTN) is an approach to networking which handles network disruptions and high delays that may occur in many kinds of communication networks. The major reasons for high delay include partial connectivity of networks as can be seen in many types of ad hoc wireless networks with frequent network partitions, long propagation time as experienced in inter-planetary and deep spa…
▽ More
Delay Tolerant Networking (DTN) is an approach to networking which handles network disruptions and high delays that may occur in many kinds of communication networks. The major reasons for high delay include partial connectivity of networks as can be seen in many types of ad hoc wireless networks with frequent network partitions, long propagation time as experienced in inter-planetary and deep space networks, and frequent link disruptions due to the mobility of nodes as observed in terrestrial wireless network environments. Experimenting network architectures, protocols, and mobility models in such real-world scenarios is difficult due to the complexities involved in the network environment. Therefore, in this document, we present the documentation of an Urban Delay Tolerant Network Simulator (UDTNSim) version 0.1, capable of simulating urban road network environments with DTN characteristics including mobility models and routing protocols. The mobility models included in this version of UDTNSim are (i) Stationary Movement, (ii) Simple Random Movement, (iii) Path Type Based Movememt, (iv) Path Memory Based Movement, (v) Path Type with Restricted Movement, and (vi) Path Type with Wait Movement. In addition to mobility models, we also provide three routing and data hand-off protocols: (i) Epidemic Routing, (ii) Superior Only Handoff, and (iii) Superior Peer Handoff. UDTNSim v0.1 is designed using object-oriented programming approach in order to provide flexibility in addition of new features to the DTN environment. UDTNSim v0.1 is distributed as an open source simulator for the use of the research community.
△ Less
Submitted 17 September, 2017;
originally announced September 2017.
-
Analyzing Query Performance and Attributing Blame for Contentions in a Cluster Computing Framework
Authors:
Prajakta Kalmegh,
Shivnath Babu,
Sudeepa Roy
Abstract:
There are many approaches is use today to either prevent or minimize the impact of inter-query interactions on a shared cluster. Despite these measures, performance issues due to concurrent executions of mixed workloads still prevail causing undue waiting times for queries. Analyzing these resource interferences is thus critical in order to answer time sensitive questions like 'who is causing my q…
▽ More
There are many approaches is use today to either prevent or minimize the impact of inter-query interactions on a shared cluster. Despite these measures, performance issues due to concurrent executions of mixed workloads still prevail causing undue waiting times for queries. Analyzing these resource interferences is thus critical in order to answer time sensitive questions like 'who is causing my query to slowdown' in a multi-tenant environment. More importantly, dignosing whether the slowdown of a query is a result of resource contentions caused by other queries or some other external factor can help an admin narrow down the many possibilities of performance degradation. This process of investigating the symptoms of resource contentions and attributing blame to concurrent queries is non-trivial and tedious, and involves hours of manually debugging through a cycle of query interactions.
In this paper, we present ProtoXplore - a Proto or first system to eXplore contentions, that helps administrators determine whether the blame for resource bottlenecks can be attributed to concurrent queries, and uses a methodology called Resource Acquire Time Penalty (RATP) to quantify this blame towards contentious sources accurately. Further, ProtoXplore builds on the theory of explanations and enables a step-wise deep exploration of various levels of performance bottlenecks faced by a query during its execution using a multi-level directed acyclic graph called ProtoGraph. Our experimental evaluation uses ProtoXplore to analyze the interactions between TPC-DS queries on Apache Spark to show how ProtoXplore provides explanations that help in diagnosing contention related issues and better managing a changing mixed workload in a shared cluster.
△ Less
Submitted 29 May, 2018; v1 submitted 28 August, 2017;
originally announced August 2017.
-
A Cognitive Theory-based Opportunistic Resource-Pooling Scheme for Ad hoc Networks
Authors:
Seema B Hegde,
B. Sathish babu,
Pallapa Venkatram
Abstract:
Resource pooling in ad hoc networks deals with accumulating computing and network resources to implement network control schemes such as routing, congestion, traffic management, and so on. Pooling of resources can be accomplished using the distributed and dynamic nature of ad hoc networks to achieve collaboration between the devices. Ad hoc networks need a resource-pooling technique that offers qu…
▽ More
Resource pooling in ad hoc networks deals with accumulating computing and network resources to implement network control schemes such as routing, congestion, traffic management, and so on. Pooling of resources can be accomplished using the distributed and dynamic nature of ad hoc networks to achieve collaboration between the devices. Ad hoc networks need a resource-pooling technique that offers quick response, adaptability, and reliability. In this context, we are proposing an opportunistic resource pooling scheme that uses a cognitive computing model to accumulate the resources with faster resource convergence rate, reliability, and lower latency. The proposed scheme is implemented using the behaviors observations beliefs cognitive model, in which the resource pooling decisions are made based on accumulated knowledge over various behaviors exhibited by nodes in ad hoc networks.
△ Less
Submitted 11 July, 2017;
originally announced July 2017.
-
An Opportunistic AODV Routing Scheme : A Cognitive Mobile Agents Approach
Authors:
Seema B Hegde,
Sathish Babu,
Pallapa Venkatram
Abstract:
In Manets Dynamics and Robustness are the key features of the nodes and are governed by several routing protocols such as AODV, DSR and so on. However in the network the growing resource demand leads to resource scarcity. The Node Mobility often leads to the link breakages and high routing overhead decreasing the stability and reliability of the network connectivity. In this context, the paper pro…
▽ More
In Manets Dynamics and Robustness are the key features of the nodes and are governed by several routing protocols such as AODV, DSR and so on. However in the network the growing resource demand leads to resource scarcity. The Node Mobility often leads to the link breakages and high routing overhead decreasing the stability and reliability of the network connectivity. In this context, the paper proposes a novel opportunistic AODV routing scheme which implements a cognitive agent based intelligent technique to set up a stable connectivity over the Manet. The Scheme computes the routing metric (rf) based on the collaboration sensitivity levels of the nodes obtained based through the knowledge based decision. This Routing Metric is subsequently used to set up the stable path for network connectivity. Thus minimizes the route overhead and increases the stability of the path. The Performance evaluation is conducted in comparison with the AODV and sleep AODV routing protocol and validated.
△ Less
Submitted 11 July, 2017;
originally announced July 2017.
-
Automatic Keyword Extraction for Text Summarization: A Survey
Authors:
Santosh Kumar Bharti,
Korra Sathya Babu
Abstract:
In recent times, data is growing rapidly in every domain such as news, social media, banking, education, etc. Due to the excessiveness of data, there is a need of automatic summarizer which will be capable to summarize the data especially textual data in original document without losing any critical purposes. Text summarization is emerged as an important research area in recent past. In this regar…
▽ More
In recent times, data is growing rapidly in every domain such as news, social media, banking, education, etc. Due to the excessiveness of data, there is a need of automatic summarizer which will be capable to summarize the data especially textual data in original document without losing any critical purposes. Text summarization is emerged as an important research area in recent past. In this regard, review of existing work on text summarization process is useful for carrying out further research. In this paper, recent literature on automatic keyword extraction and text summarization are presented since text summarization process is highly depend on keyword extraction. This literature includes the discussion about different methodology used for keyword extraction and text summarization. It also discusses about different databases used for text summarization in several domains along with evaluation matrices. Finally, it discusses briefly about issues and research challenges faced by researchers along with future direction.
△ Less
Submitted 11 April, 2017;
originally announced April 2017.
-
Tempo: Robust and Self-Tuning Resource Management in Multi-tenant Parallel Databases
Authors:
Zilong Tan,
Shivnath Babu
Abstract:
Multi-tenant database systems have a component called the Resource Manager, or RM that is responsible for allocating resources to tenants. RMs today do not provide direct support for performance objectives such as: "Average job response time of tenant A must be less than two minutes", or "No more than 5% of tenant B's jobs can miss the deadline of 1 hour." Thus, DBAs have to tinker with the RM's l…
▽ More
Multi-tenant database systems have a component called the Resource Manager, or RM that is responsible for allocating resources to tenants. RMs today do not provide direct support for performance objectives such as: "Average job response time of tenant A must be less than two minutes", or "No more than 5% of tenant B's jobs can miss the deadline of 1 hour." Thus, DBAs have to tinker with the RM's low-level configuration settings to meet such objectives. We propose a framework called Tempo that brings simplicity, self-tuning, and robustness to existing RMs. Tempo provides a simple interface for DBAs to specify performance objectives declaratively, and optimizes the RM configuration settings to meet these objectives. Tempo has a solid theoretical foundation which gives key robustness guarantees. We report experiments done on Tempo using production traces of data-processing workloads from companies such as Facebook and Cloudera. These experiments demonstrate significant improvements in meeting desired performance objectives over RM configuration settings specified by human experts.
△ Less
Submitted 2 December, 2015;
originally announced December 2015.
-
ROBUS: Fair Cache Allocation for Multi-tenant Data-parallel Workloads
Authors:
Mayuresh Kunjir,
Brandon Fain,
Kamesh Munagala,
Shivnath Babu
Abstract:
Systems for processing big data---e.g., Hadoop, Spark, and massively parallel databases---need to run workloads on behalf of multiple tenants simultaneously. The abundant disk-based storage in these systems is usually complemented by a smaller, but much faster, {\em cache}. Cache is a precious resource: Tenants who get to use cache can see two orders of magnitude performance improvement. Cache is…
▽ More
Systems for processing big data---e.g., Hadoop, Spark, and massively parallel databases---need to run workloads on behalf of multiple tenants simultaneously. The abundant disk-based storage in these systems is usually complemented by a smaller, but much faster, {\em cache}. Cache is a precious resource: Tenants who get to use cache can see two orders of magnitude performance improvement. Cache is also a limited and hence shared resource: Unlike a resource like a CPU core which can be used by only one tenant at a time, a cached data item can be accessed by multiple tenants at the same time. Cache, therefore, has to be shared by a multi-tenancy-aware policy across tenants, each having a unique set of priorities and workload characteristics.
In this paper, we develop cache allocation strategies that speed up the overall workload while being {\em fair} to each tenant. We build a novel fairness model targeted at the shared resource setting that incorporates not only the more standard concepts of Pareto-efficiency and sharing incentive, but also define envy freeness via the notion of {\em core} from cooperative game theory. Our cache management platform, ROBUS, uses randomization over small time batches, and we develop a proportionally fair allocation mechanism that satisfies the core property in expectation. We show that this algorithm and related fair algorithms can be approximated to arbitrary precision in polynomial time. We evaluate these algorithms on a ROBUS prototype implemented on Spark with RDD store used as cache. Our evaluation on a synthetically generated industry-standard workload shows that our algorithms provide a speedup close to performance optimal algorithms while guaranteeing fairness across tenants.
△ Less
Submitted 3 September, 2015; v1 submitted 25 April, 2015;
originally announced April 2015.
-
A Narrative Vehicle Protection Representation for Vehicle Speed Regulator Under Driver Exhaustion -- A Study
Authors:
V. Karthikeyan,
B. Praveen Kumar,
S. Suresh Babu,
R. Purusothaman,
shijin Thomas
Abstract:
Driver fatigue is one of the important factors that cause traffic accidents, and the ever-increasing number due to diminished drivers vigilance level has become a problem of serious concern to society. Drivers with a diminished vigilance level suffer from a marked decline in their abilities of perception, recognition, and vehicle control, and therefore pose serious danger to their own life and the…
▽ More
Driver fatigue is one of the important factors that cause traffic accidents, and the ever-increasing number due to diminished drivers vigilance level has become a problem of serious concern to society. Drivers with a diminished vigilance level suffer from a marked decline in their abilities of perception, recognition, and vehicle control, and therefore pose serious danger to their own life and the lives of other people. Exhaustion resulting from sleep deprivation or sleep disorders is an important factor in the creasing number of accidents. In this projected work, we discuss the various methods of the existing and the proposed method based on a real time online safety prototype that controls the vehicle speed under driver fatigue. The purpose of such a model is to advance a system to detect fatigue symptoms in drivers and control the speed of vehicle to avoid accidents. This system was tested adequately with subjects of different technology of various researchers finally the validity of the proposed model for vehicle speed controller based on driver fatigue detection is shown.
△ Less
Submitted 15 February, 2014;
originally announced February 2014.
-
Analysis & Prediction of Sales Data in SAP-ERP System using Clustering Algorithms
Authors:
S. Hanumanth Sastry,
Prof. M. S. Prasada Babu
Abstract:
Clustering is an important data mining technique where we will be interested in maximizing intracluster distance and also minimizing intercluster distance. We have utilized clustering techniques for detecting deviation in product sales and also to identify and compare sales over a particular period of time. Clustering is suited to group items that seem to fall naturally together, when there is no…
▽ More
Clustering is an important data mining technique where we will be interested in maximizing intracluster distance and also minimizing intercluster distance. We have utilized clustering techniques for detecting deviation in product sales and also to identify and compare sales over a particular period of time. Clustering is suited to group items that seem to fall naturally together, when there is no specified class for any new item. We have utilizedannual sales data of a steel major to analyze Sales Volume & Value with respect to dependent attributes like products, customers and quantities sold. The demand for steel products is cyclical and depends on many factors like customer profile, price,Discounts and tax issues. In this paper, we have analyzed sales data with clustering algorithms like K-Means&EMwhichrevealed many interesting patternsuseful for improving sales revenue and achieving higher sales volume. Our study confirms that partition methods like K-Means & EM algorithms are better suited to analyze our sales data in comparison to Density based methods like DBSCAN & OPTICS or Hierarchical methods like COBWEB.
△ Less
Submitted 10 December, 2013;
originally announced December 2013.
-
Implementation of CRISP Methodology for ERP Systems
Authors:
S. Hanumanth Sastry,
Prof. M. S. Prasada Babu
Abstract:
ERP systems contain huge amounts of data related to the actual execution of business processes. These systems have a particular way of recording activities which results in an unclear display of business processes in event logs. Several works have been conducted on ERP systems, most of them focusing on the development of new algorithms for the automatic discovery of business processes. We focused…
▽ More
ERP systems contain huge amounts of data related to the actual execution of business processes. These systems have a particular way of recording activities which results in an unclear display of business processes in event logs. Several works have been conducted on ERP systems, most of them focusing on the development of new algorithms for the automatic discovery of business processes. We focused on addressing issues like, how can organizations with ERP systems apply process mining for analyzing their business processes in order to improve them. The data handling aspect of ERP systems contrasts with those of BPMS or workflow based systems, whose systematical storage of events facilitates the application of process mining techniques. CRISP-DM has emerged as the de facto standard for developing data mining and knowledge discovery projects. Successful data mining requires three families of analytical capabilities namely reporting, classification and forecasting. A data miner uses more than one analytical method to get the best results. The objective of this paper is to improve the usability and understandability of process mining techniques, by implementing CRISP-DM methodology for their application in ERP contexts, detailed in terms of specific implementation tools and step by step coordination. Our study confirms that data discovery from ERP system improves strategic and operational decision making.
△ Less
Submitted 7 December, 2013;
originally announced December 2013.
-
Stubby: A Transformation-based Optimizer for MapReduce Workflows
Authors:
Harold Lim,
Herodotos Herodotou,
Shivnath Babu
Abstract:
There is a growing trend of performing analysis on large datasets using workflows composed of MapReduce jobs connected through producer-consumer relationships based on data. This trend has spurred the development of a number of interfaces--ranging from program-based to query-based interfaces--for generating MapReduce workflows. Studies have shown that the gap in performance can be quite large betw…
▽ More
There is a growing trend of performing analysis on large datasets using workflows composed of MapReduce jobs connected through producer-consumer relationships based on data. This trend has spurred the development of a number of interfaces--ranging from program-based to query-based interfaces--for generating MapReduce workflows. Studies have shown that the gap in performance can be quite large between optimized and unoptimized workflows. However, automatic cost-based optimization of MapReduce workflows remains a challenge due to the multitude of interfaces, large size of the execution plan space, and the frequent unavailability of all types of information needed for optimization. We introduce a comprehensive plan space for MapReduce workflows generated by popular workflow generators. We then propose Stubby, a cost-based optimizer that searches selectively through the subspace of the full plan space that can be enumerated correctly and costed based on the information available in any given setting. Stubby enumerates the plan space based on plan-to-plan transformations and an efficient search algorithm. Stubby is designed to be extensible to new interfaces and new types of optimizations, which is a desirable feature given how rapidly MapReduce systems are evolving. Stubby's efficiency and effectiveness have been evaluated using representative workflows from many domains.
△ Less
Submitted 31 July, 2012;
originally announced August 2012.
-
DBC based Face Recognition using DWT
Authors:
H S Jagadeesh,
K Suresh Babu,
K B Raja
Abstract:
The applications using face biometric has proved its reliability in last decade. In this paper, we propose DBC based Face Recognition using DWT (DBC- FR) model. The Poly-U Near Infra Red (NIR) database images are scanned and cropped to get only the face part in pre-processing. The face part is resized to 100*100 and DWT is applied to derive LL, LH, HL and HH subbands. The LL subband of size 50*50…
▽ More
The applications using face biometric has proved its reliability in last decade. In this paper, we propose DBC based Face Recognition using DWT (DBC- FR) model. The Poly-U Near Infra Red (NIR) database images are scanned and cropped to get only the face part in pre-processing. The face part is resized to 100*100 and DWT is applied to derive LL, LH, HL and HH subbands. The LL subband of size 50*50 is converted into 100 cells with 5*5 dimention of each cell. The Directional Binary Code (DBC) is applied on each 5*5 cell to derive 100 features. The Euclidian distance measure is used to compare the features of test image and database images. The proposed algorithm render better percentage recognition rate compared to the existing algorithm.
△ Less
Submitted 8 May, 2012;
originally announced May 2012.
-
An entropy based proof of the Moore bound for irregular graphs
Authors:
S. Ajesh Babu,
Jaikumar Radhakrishnan
Abstract:
We provide proofs of the following theorems by considering the entropy of random walks: Theorem 1.(Alon, Hoory and Linial) Let G be an undirected simple graph with n vertices, girth g, minimum degree at least 2 and average degree d: Odd girth: If g=2r+1,then n \geq 1 + d*(\Sum_{i=0}^{r-1}(d-1)^i) Even girth: If g=2r,then n \geq 2*(\Sum_{i=0}^{r-1} (d-1)^i) Theorem 2.(Hoory) Let G = (V_L,V_R,E) be…
▽ More
We provide proofs of the following theorems by considering the entropy of random walks: Theorem 1.(Alon, Hoory and Linial) Let G be an undirected simple graph with n vertices, girth g, minimum degree at least 2 and average degree d: Odd girth: If g=2r+1,then n \geq 1 + d*(\Sum_{i=0}^{r-1}(d-1)^i) Even girth: If g=2r,then n \geq 2*(\Sum_{i=0}^{r-1} (d-1)^i) Theorem 2.(Hoory) Let G = (V_L,V_R,E) be a bipartite graph of girth g = 2r, with n_L = |V_L| and n_R = |V_R|, minimum degree at least 2 and the left and right average degrees d_L and d_R. Then, n_L \geq \Sum_{i=0}^{r-1}(d_R-1)^{i/2}(d_L-1)^{i/2} n_R \geq \Sum_{i=0}^{r-1}(d_L-1)^{i/2}(d_R-1)^{i/2}
△ Less
Submitted 3 November, 2010;
originally announced November 2010.
-
Nash equilibria in Fisher market
Authors:
Bharat Adsul,
Ch. Sobhan Babu,
Jugal Garg,
Ruta Mehta,
Milind Sohoni
Abstract:
Much work has been done on the computation of market equilibria. However due to strategic play by buyers, it is not clear whether these are actually observed in the market. Motivated by the observation that a buyer may derive a better payoff by feigning a different utility function and thereby manipulating the Fisher market equilibrium, we formulate the {\em Fisher market game} in which buyers str…
▽ More
Much work has been done on the computation of market equilibria. However due to strategic play by buyers, it is not clear whether these are actually observed in the market. Motivated by the observation that a buyer may derive a better payoff by feigning a different utility function and thereby manipulating the Fisher market equilibrium, we formulate the {\em Fisher market game} in which buyers strategize by posing different utility functions. We show that existence of a {\em conflict-free allocation} is a necessary condition for the Nash equilibria (NE) and also sufficient for the symmetric NE in this game. There are many NE with very different payoffs, and the Fisher equilibrium payoff is captured at a symmetric NE. We provide a complete polyhedral characterization of all the NE for the two-buyer market game. Surprisingly, all the NE of this game turn out to be symmetric and the corresponding payoffs constitute a piecewise linear concave curve. We also study the correlated equilibria of this game and show that third-party mediation does not help to achieve a better payoff than NE payoffs.
△ Less
Submitted 11 May, 2010; v1 submitted 25 February, 2010;
originally announced February 2010.
-
Call Admission Control performance model for Beyond 3G Wireless Networks
Authors:
H. S. Ramesh Babu,
Gowri Shankar,
P. S. Satyanarayana
Abstract:
The Next Generation Wireless Networks (NGWN) will be heterogeneous in nature where the different Radio Access Technologies (RATs) operate together .The mobile terminals operating in this heterogeneous environment will have different QoS requirements to be handled by the system. These QoS requirements are determined by a set of QoS parameters. The radio resource management is one of the key chall…
▽ More
The Next Generation Wireless Networks (NGWN) will be heterogeneous in nature where the different Radio Access Technologies (RATs) operate together .The mobile terminals operating in this heterogeneous environment will have different QoS requirements to be handled by the system. These QoS requirements are determined by a set of QoS parameters. The radio resource management is one of the key challenges in NGWN. Call admission control is one of the radio resource management technique plays instrumental role in ensure the desired QoS to the users working on different applications which have diversified QoS requirements from the wireless networks . The call blocking probability is one such QoS parameter for the wireless network. For better QoS it is desirable to reduce the call blocking probability. In this customary scenario it is highly desirable to obtain analytic Performance model. In this paper we propose a higher order Markov chain based performance model for call admission control in a heterogeneous wireless network environment. In the proposed algorithm we have considered three classes of traffic having different QoS requirements and we have considered the heterogeneous network environment which includes the RATs that can effectively handle applications like voice calls, Web browsing and file transfer applications which are with varied QoS parameters. The paper presents the call blocking probabilities for all the three types of traffic both for fixed and varied traffic scenario.
△ Less
Submitted 13 January, 2010;
originally announced January 2010.
-
Why Did My Query Slow Down?
Authors:
Nedyalko Borisov,
Shivnath Babu,
Sandeep Uttamchandani,
Ramani Routray,
Aameek Singh
Abstract:
Many enterprise environments have databases running on network-attached server-storage infrastructure (referred to as Storage Area Networks or SANs). Both the database and the SAN are complex systems that need their own separate administrative teams. This paper puts forth the vision of an innovative management framework to simplify administrative tasks that require an in-depth understanding of bot…
▽ More
Many enterprise environments have databases running on network-attached server-storage infrastructure (referred to as Storage Area Networks or SANs). Both the database and the SAN are complex systems that need their own separate administrative teams. This paper puts forth the vision of an innovative management framework to simplify administrative tasks that require an in-depth understanding of both the database and the SAN. As a concrete instance, we consider the task of diagnosing the slowdown in performance of a database query that is executed multiple times (e.g., in a periodic report-generation setting). This task is very challenging because the space of possible causes includes problems specific to the database, problems specific to the SAN, and problems that arise due to interactions between the two systems. In addition, the monitoring data available from these systems can be noisy.
We describe the design of DIADS which is an integrated diagnosis tool for database and SAN administrators. DIADS generates and uses a powerful abstraction called Annotated Plan Graphs (APGs) that ties together the execution path of queries in the database and the SAN. Using an innovative workflow that combines domain-specific knowledge with machine-learning techniques, DIADS was applied successfully to diagnose query slowdowns caused by complex combinations of events across a PostgreSQL database and a production SAN.
△ Less
Submitted 22 October, 2011; v1 submitted 18 July, 2009;
originally announced July 2009.
-
Reducing Order Enforcement Cost in Complex Query Plans
Authors:
Ravindra Guravannavar,
S Sudarshan,
Ajit A Diwan,
Ch. Sobhan Babu
Abstract:
Algorithms that exploit sort orders are widely used to implement joins, grouping, duplicate elimination and other set operations. Query optimizers traditionally deal with sort orders by using the notion of interesting orders. The number of interesting orders is unfortunately factorial in the number of participating attributes. Optimizer implementations use heuristics to prune the number of inter…
▽ More
Algorithms that exploit sort orders are widely used to implement joins, grouping, duplicate elimination and other set operations. Query optimizers traditionally deal with sort orders by using the notion of interesting orders. The number of interesting orders is unfortunately factorial in the number of participating attributes. Optimizer implementations use heuristics to prune the number of interesting orders, but the quality of the heuristics is unclear. Increasingly complex decision support queries and increasing use of covering indices, which provide multiple alternative sort orders for relations, motivate us to better address the problem of optimization with interesting orders.
We show that even a simplified version of optimization with sort orders is NP-hard and provide principled heuristics for choosing interesting orders. We have implemented the proposed techniques in a Volcano-style cost-based optimizer, and our performance study shows significant improvements in estimated cost. We also executed our plans on a widely used commercial database system, and on PostgreSQL, and found that actual execution times for our plans were significantly better than for plans generated by those systems in several cases.
△ Less
Submitted 20 November, 2006;
originally announced November 2006.