subscribe to arXiv mailings

Improved Outerplanarity Bounds for Planar Graphs

Authors: Therese Biedl, Debajyoti Mondal

Abstract: In this paper, we study the outerplanarity of planar graphs, i.e., the number of times that we must (in a planar embedding that we can initially freely choose) remove the outerface vertices until the graph is empty. It is well-known that there are $n$-vertex graphs with outerplanarity $\tfrac{n}{6}+Θ(1)$, and not difficult to show that the outerplanarity can never be bigger. We give here improved… ▽ More In this paper, we study the outerplanarity of planar graphs, i.e., the number of times that we must (in a planar embedding that we can initially freely choose) remove the outerface vertices until the graph is empty. It is well-known that there are $n$-vertex graphs with outerplanarity $\tfrac{n}{6}+Θ(1)$, and not difficult to show that the outerplanarity can never be bigger. We give here improved bounds of the form $\tfrac{n}{2g}+2g+O(1)$, where $g$ is the fence-girth, i.e., the length of the shortest cycle with vertices on both sides. This parameter $g$ is at least the connectivity of the graph, and often bigger; for example, our results imply that planar bipartite graphs have outerplanarity $\tfrac{n}{8}+O(1)$. We also show that the outerplanarity of a planar graph $G$ is at most $\tfrac{1}{2}$diam$(G)+O(\sqrt{n})$, where diam$(G)$ is the diameter of the graph. All our bounds are tight up to smaller-order terms, and a planar embedding that achieves the outerplanarity bound can be found in linear time. △ Less

Submitted 5 July, 2024; originally announced July 2024.

MSC Class: 05C85 ACM Class: F.2; G.2

arXiv:2407.03381 [pdf, other]

SeqMate: A Novel Large Language Model Pipeline for Automating RNA Sequencing

Authors: Devam Mondal, Atharva Inamdar

Abstract: RNA sequencing techniques, like bulk RNA-seq and Single Cell (sc) RNA-seq, are critical tools for the biologist looking to analyze the genetic activity/transcriptome of a tissue or cell during an experimental procedure. Platforms like Illumina's next-generation sequencing (NGS) are used to produce the raw data for this experimental procedure. This raw FASTQ data must then be prepared via a complex… ▽ More RNA sequencing techniques, like bulk RNA-seq and Single Cell (sc) RNA-seq, are critical tools for the biologist looking to analyze the genetic activity/transcriptome of a tissue or cell during an experimental procedure. Platforms like Illumina's next-generation sequencing (NGS) are used to produce the raw data for this experimental procedure. This raw FASTQ data must then be prepared via a complex series of data manipulations by bioinformaticians. This process currently takes place on an unwieldy textual user interface like a terminal/command line that requires the user to install and import multiple program packages, preventing the untrained biologist from initiating data analysis. Open-source platforms like Galaxy have produced a more user-friendly pipeline, yet the visual interface remains cluttered and highly technical, remaining uninviting for the natural scientist. To address this, SeqMate is a user-friendly tool that allows for one-click analytics by utilizing the power of a large language model (LLM) to automate both data preparation and analysis (differential expression, trajectory analysis, etc). Furthermore, by utilizing the power of generative AI, SeqMate is also capable of analyzing such findings and producing written reports of upregulated/downregulated/user-prompted genes with sources cited from known repositories like PubMed, PDB, and Uniprot. △ Less

Submitted 2 July, 2024; originally announced July 2024.

arXiv:2407.02659 [pdf, other]

Ensuring Responsible Sourcing of Large Language Model Training Data Through Knowledge Graph Comparison

Authors: Devam Mondal, Carlo Lipizzi

Abstract: In light of recent plagiarism allegations Brough by publishers, newspapers, and other creators of copyrighted corpora against large language model (LLM) developers, we propose a novel system, a variant of a plagiarism detection system, that assesses whether a knowledge source has been used in the training or fine-tuning of a large language model. Unlike current methods, we utilize an approach that… ▽ More In light of recent plagiarism allegations Brough by publishers, newspapers, and other creators of copyrighted corpora against large language model (LLM) developers, we propose a novel system, a variant of a plagiarism detection system, that assesses whether a knowledge source has been used in the training or fine-tuning of a large language model. Unlike current methods, we utilize an approach that uses Resource Description Framework (RDF) triples to create knowledge graphs from both a source document and a LLM continuation of that document. These graphs are then analyzed with respect to content using cosine similarity and with respect to structure using a normalized version of graph edit distance that shows the degree of isomorphism. Unlike traditional systems that focus on content matching and keyword identification between a source and target corpus, our approach enables a broader evaluation of similarity and thus a more accurate comparison of the similarity between a source document and LLM continuation by focusing on relationships between ideas and their organization with regards to others. Additionally, our approach does not require access to LLM metrics like perplexity that may be unavailable in closed large language modeling "black-box" systems, as well as the training corpus. A prototype of our system will be found on a hyperlinked GitHub repository. △ Less

Submitted 2 July, 2024; originally announced July 2024.

arXiv:2405.04063 [pdf, other]

doi 10.1145/3639478.3643116

xNose: A Test Smell Detector for C#

Authors: Partha P. Paul, Md Tonoy Akanda, M. Raihan Ullah, Dipto Mondal, Nazia S. Chowdhury, Fazle M. Tawsif

Abstract: Test smells, similar to code smells, can negatively impact both the test code and the production code being tested. Despite extensive research on test smells in languages like Java, Scala, and Python, automated tools for detecting test smells in C# are lacking. This paper aims to bridge this gap by extending the study of test smells to C#, and developing a tool (xNose) to identify test smells in t… ▽ More Test smells, similar to code smells, can negatively impact both the test code and the production code being tested. Despite extensive research on test smells in languages like Java, Scala, and Python, automated tools for detecting test smells in C# are lacking. This paper aims to bridge this gap by extending the study of test smells to C#, and developing a tool (xNose) to identify test smells in this language and analyze their distribution across projects. We identified 16 test smells from prior studies that were language-independent and had equivalent features in C# and evaluated xNose, achieving a precision score of 96.97% and a recall score of 96.03%. In addition, we conducted an empirical study to determine the prevalence of test smells in xUnit-based C# projects. This analysis sheds light on the frequency and distribution of test smells, deepening our understanding of their impact on C# projects and test suites. The development of xNose and our analysis of test smells in C# code aim to assist developers in maintaining code quality by addressing potential issues early in the development process. △ Less

Submitted 7 May, 2024; originally announced May 2024.

Comments: Full report of our ICSE'24 poster

arXiv:2404.16308 [pdf, other]

WorldValuesBench: A Large-Scale Benchmark Dataset for Multi-Cultural Value Awareness of Language Models

Authors: Wenlong Zhao, Debanjan Mondal, Niket Tandon, Danica Dillion, Kurt Gray, Yuling Gu

Abstract: The awareness of multi-cultural human values is critical to the ability of language models (LMs) to generate safe and personalized responses. However, this awareness of LMs has been insufficiently studied, since the computer science community lacks access to the large-scale real-world data about multi-cultural values. In this paper, we present WorldValuesBench, a globally diverse, large-scale benc… ▽ More The awareness of multi-cultural human values is critical to the ability of language models (LMs) to generate safe and personalized responses. However, this awareness of LMs has been insufficiently studied, since the computer science community lacks access to the large-scale real-world data about multi-cultural values. In this paper, we present WorldValuesBench, a globally diverse, large-scale benchmark dataset for the multi-cultural value prediction task, which requires a model to generate a rating response to a value question based on demographic contexts. Our dataset is derived from an influential social science project, World Values Survey (WVS), that has collected answers to hundreds of value questions (e.g., social, economic, ethical) from 94,728 participants worldwide. We have constructed more than 20 million examples of the type "(demographic attributes, value question) $\rightarrow$ answer" from the WVS responses. We perform a case study using our dataset and show that the task is challenging for strong open and closed-source models. On merely $11.1\%$, $25.0\%$, $72.2\%$, and $75.0\%$ of the questions, Alpaca-7B, Vicuna-7B-v1.5, Mixtral-8x7B-Instruct-v0.1, and GPT-3.5 Turbo can respectively achieve $<0.2$ Wasserstein 1-distance from the human normalized answer distributions. WorldValuesBench opens up new research avenues in studying limitations and opportunities in multi-cultural value awareness of LMs. △ Less

Submitted 24 April, 2024; originally announced April 2024.

Comments: Accepted at LREC-COLING 2024. Wenlong and Debanjan contributed equally

arXiv:2404.04535 [pdf, other]

Quantum Speedup for Some Geometric 3SUM-Hard Problems and Beyond

Authors: J. Mark Keil, Fraser McLeod, Debajyoti Mondal

Abstract: The classical 3SUM conjecture states that the class of 3SUM-hard problems does not admit a truly subquadratic $O(n^{2-δ})$-time algorithm, where $δ>0$, in classical computing. The geometric 3SUM-hard problems have widely been studied in computational geometry and recently, these problems have been examined under the quantum computing model. For example, Ambainis and Larka [TQC'20] designed a quant… ▽ More The classical 3SUM conjecture states that the class of 3SUM-hard problems does not admit a truly subquadratic $O(n^{2-δ})$-time algorithm, where $δ>0$, in classical computing. The geometric 3SUM-hard problems have widely been studied in computational geometry and recently, these problems have been examined under the quantum computing model. For example, Ambainis and Larka [TQC'20] designed a quantum algorithm that can solve many geometric 3SUM-hard problems in $O(n^{1+o(1)})$-time, whereas Buhrman [ITCS'22] investigated lower bounds under quantum 3SUM conjecture that claims there does not exist any sublinear $O(n^{1-δ})$-time quantum algorithm for the 3SUM problem. The main idea of Ambainis and Larka is to formulate a 3SUM-hard problem as a search problem, where one needs to find a point with a certain property over a set of regions determined by a line arrangement in the plane. The quantum speed-up then comes from the application of the well-known quantum search technique called Grover search over all regions. This paper further generalizes the technique of Ambainis and Larka for some 3SUM-hard problems when a solution may not necessarily correspond to a single point or the search regions do not immediately correspond to the subdivision determined by a line arrangement. Given a set of $n$ points and a positive number $q$, we design $O(n^{1+o(1)})$-time quantum algorithms to determine whether there exists a triangle among these points with an area at most $q$ or a unit disk that contains at least $q$ points. We also give an $O(n^{1+o(1)})$-time quantum algorithm to determine whether a given set of intervals can be translated so that it becomes contained in another set of given intervals and discuss further generalizations. △ Less

Submitted 6 April, 2024; originally announced April 2024.

MSC Class: 68Q12 ACM Class: F.2

arXiv:2404.03751 [pdf, other]

The Maximum Clique Problem in a Disk Graph Made Easy

Authors: J. Mark Keil, Debajyoti Mondal

Abstract: A disk graph is an intersection graph of disks in $\mathbb{R}^2$. Determining the computational complexity of finding a maximum clique in a disk graph is a long-standing open problem. In 1990, Clark, Colbourn, and Johnson gave a polynomial-time algorithm for computing a maximum clique in a unit disk graph. However, finding a maximum clique when disks are of arbitrary size is widely believed to be… ▽ More A disk graph is an intersection graph of disks in $\mathbb{R}^2$. Determining the computational complexity of finding a maximum clique in a disk graph is a long-standing open problem. In 1990, Clark, Colbourn, and Johnson gave a polynomial-time algorithm for computing a maximum clique in a unit disk graph. However, finding a maximum clique when disks are of arbitrary size is widely believed to be a challenging open problem. The problem is open even if we restrict the disks to have at most two different sizes of radii, or restrict the radii to be within $[1,1+\varepsilon]$ for some $ε>0$. In this paper, we provide a new perspective to examine adjacencies in a disk graph that helps obtain the following results. - We design an $O(2^k n^{2k} poly(n))$-time algorithm to find a maximum clique in a $n$-vertex disk graph with $k$ different sizes of radii. This is polynomial for every fixed $k$, and thus settles the open question for the case when $k=2$. - Given a set of $n$ unit disks, we show how to compute a maximum clique inside each possible axis-aligned rectangle determined by the disk centers in $O(n^5\log n)$-time. This is at least a factor of $n^{4/3}$ faster than applying the fastest known algorithm for finding a maximum clique in a unit disk graph for each rectangle independently. - We give an $O(2^kn^{2rk} poly(n,r))$-time algorithm to find a maximum clique in a $n$-vertex ball graph with $k$ different sizes of radii where the ball centers lie on $r$ parallel planes. This is polynomial for every fixed $k$ and $r$, and thus contrasts the previously known NP-hardness result for finding a maximum clique in an arbitrary ball graph. △ Less

Submitted 15 July, 2024; v1 submitted 4 April, 2024; originally announced April 2024.

MSC Class: 52C99; 68Q25 ACM Class: F.2.2; I.3.5

arXiv:2403.13925 [pdf, other]

Reducing Large Language Model Bias with Emphasis on 'Restricted Industries': Automated Dataset Augmentation and Prejudice Quantification

Authors: Devam Mondal, Carlo Lipizzi

Abstract: Despite the growing capabilities of large language models, there exists concerns about the biases they develop. In this paper, we propose a novel, automated mechanism for debiasing through specified dataset augmentation in the lens of bias producers and in the context of 'restricted industries' with limited data. We additionally create two new additional metrics, the mb-index and db-index, to quan… ▽ More Despite the growing capabilities of large language models, there exists concerns about the biases they develop. In this paper, we propose a novel, automated mechanism for debiasing through specified dataset augmentation in the lens of bias producers and in the context of 'restricted industries' with limited data. We additionally create two new additional metrics, the mb-index and db-index, to quantify bias, considering the idea that bias occurs due to both intrinsic model architecture and dataset. △ Less

Submitted 20 March, 2024; originally announced March 2024.

arXiv:2402.19120 [pdf]

A Naive Approach for Automatic Line-level Code Completion

Authors: Shamima Naznin, Dr. Manishankar Mondal

Abstract: Coding is an integral aspect of programming. A programmer can automatically complete a code fragment after writing a few tokens, and the process of automatic completion is known as code completion. Several research studies on code completion have previously been conducted for method body completion and method parameter completion. However, this fundamental study explores the automatic completion o… ▽ More Coding is an integral aspect of programming. A programmer can automatically complete a code fragment after writing a few tokens, and the process of automatic completion is known as code completion. Several research studies on code completion have previously been conducted for method body completion and method parameter completion. However, this fundamental study explores the automatic completion of any program statement that might not even be part of a method. The goal is to provide suggestions to the programmer for completing code throughout the codebase by identifying and analyzing code similarities. The proposed methodology can be regarded as a fundamental framework for automated code completion. From the investigation of hundreds of revisions of four subject systems written in C and Java, it is observed that the proposed method can automatically complete around 22% of code statements with an average accuracy of 87% that a programmer writes during development, accelerating software development time. The empirical analysis further demonstrates that the approach can be used with programming language neutrality. The study concludes by illustrating that taking 10 characters as prefixes before invoking completion provides maximum precision. △ Less

Submitted 29 February, 2024; originally announced February 2024.

Comments: 6 pages

arXiv:2401.12863 [pdf, other]

KAM-CoT: Knowledge Augmented Multimodal Chain-of-Thoughts Reasoning

Authors: Debjyoti Mondal, Suraj Modi, Subhadarshi Panda, Rituraj Singh, Godawari Sudhakar Rao

Abstract: Large Language Models (LLMs) have demonstrated impressive performance in natural language processing tasks by leveraging chain of thought (CoT) that enables step-by-step thinking. Extending LLMs with multimodal capabilities is the recent interest, but incurs computational cost and requires substantial hardware resources. To address these challenges, we propose KAM-CoT a framework that integrates C… ▽ More Large Language Models (LLMs) have demonstrated impressive performance in natural language processing tasks by leveraging chain of thought (CoT) that enables step-by-step thinking. Extending LLMs with multimodal capabilities is the recent interest, but incurs computational cost and requires substantial hardware resources. To address these challenges, we propose KAM-CoT a framework that integrates CoT reasoning, Knowledge Graphs (KGs), and multiple modalities for a comprehensive understanding of multimodal tasks. KAM-CoT adopts a two-stage training process with KG grounding to generate effective rationales and answers. By incorporating external knowledge from KGs during reasoning, the model gains a deeper contextual understanding reducing hallucinations and enhancing the quality of answers. This knowledge-augmented CoT reasoning empowers the model to handle questions requiring external context, providing more informed answers. Experimental findings show KAM-CoT outperforms the state-of-the-art methods. On the ScienceQA dataset, we achieve an average accuracy of 93.87%, surpassing GPT-3.5 (75.17%) by 18% and GPT-4 (83.99%) by 10%. Remarkably, KAM-CoT achieves these results with only 280M trainable parameters at a time, demonstrating its cost-efficiency and effectiveness. △ Less

Submitted 23 January, 2024; originally announced January 2024.

Comments: AAAI 2024

arXiv:2312.03182 [pdf, other]

Investigating Technology Usage Span by Analyzing Users' Q&A Traces in Stack Overflow

Authors: Saikat Mondal, Debajyoti Mondal, Chanchal K. Roy

Abstract: Choosing an appropriate software development technology (e.g., programming language) is challenging due to the proliferation of diverse options. The selection of inappropriate technologies for development may have a far-reaching effect on software developers' career growth. Switching to a different technology after working with one may lead to a complex learning curve and, thus, be more challengin… ▽ More Choosing an appropriate software development technology (e.g., programming language) is challenging due to the proliferation of diverse options. The selection of inappropriate technologies for development may have a far-reaching effect on software developers' career growth. Switching to a different technology after working with one may lead to a complex learning curve and, thus, be more challenging. Therefore, it is crucial for software developers to find technologies that have a high usage span. Intuitively, the usage span of a technology can be determined by the time span developers have used that technology. Existing literature focuses on the technology landscape to explore the complex and implicit dependencies among technologies but lacks formal studies to draw insights about their usage span. This paper investigates the technology usage span by analyzing the question and answering (Q&A) traces of Stack Overflow (SO), the largest technical Q&A website available to date. In particular, we analyze 6.7 million Q&A traces posted by about 97K active SO users and see what technologies have appeared in their questions or answers over 15 years. According to our analysis, C# and Java programming languages have a high usage span, followed by JavaScript. Besides, developers used the .NET framework, iOS & Windows Operating Systems (OS), and SQL query language for a long time (on average). Our study also exposes the emerging (i.e., newly growing) technologies. For example, usages of technologies such as SwiftUI, .NET-6.0, Visual Studio 2022, and Blazor WebAssembly framework are increasing. The findings from our study can assist novice developers, startup software industries, and software users in determining appropriate technologies. This also establishes an initial benchmark for future investigation on the use span of software technologies. △ Less

Submitted 5 December, 2023; originally announced December 2023.

Comments: Accepted in the 30th Asia-Pacific Software Engineering Conference (APSEC 2023)

arXiv:2310.16314 [pdf, other]

Understanding Code Semantics: An Evaluation of Transformer Models in Summarization

Authors: Debanjan Mondal, Abhilasha Lodha, Ankita Sahoo, Beena Kumari

Abstract: This paper delves into the intricacies of code summarization using advanced transformer-based language models. Through empirical studies, we evaluate the efficacy of code summarization by altering function and variable names to explore whether models truly understand code semantics or merely rely on textual cues. We have also introduced adversaries like dead code and commented code across three pr… ▽ More This paper delves into the intricacies of code summarization using advanced transformer-based language models. Through empirical studies, we evaluate the efficacy of code summarization by altering function and variable names to explore whether models truly understand code semantics or merely rely on textual cues. We have also introduced adversaries like dead code and commented code across three programming languages (Python, Javascript, and Java) to further scrutinize the model's understanding. Ultimately, our research aims to offer valuable insights into the inner workings of transformer-based LMs, enhancing their ability to understand code and contributing to more efficient software development practices and maintenance workflows. △ Less

Submitted 26 October, 2023; v1 submitted 24 October, 2023; originally announced October 2023.

Comments: Accepted at GenBench, EMNLP 2023. All authors are co-first authors and have equal contributions

arXiv:2305.14956 [pdf, other]

Editing Common Sense in Transformers

Authors: Anshita Gupta, Debanjan Mondal, Akshay Krishna Sheshadri, Wenlong Zhao, Xiang Lorraine Li, Sarah Wiegreffe, Niket Tandon

Abstract: Editing model parameters directly in Transformers makes updating open-source transformer-based models possible without re-training (Meng et al., 2023). However, these editing methods have only been evaluated on statements about encyclopedic knowledge with a single correct answer. Commonsense knowledge with multiple correct answers, e.g., an apple can be green or red but not transparent, has not be… ▽ More Editing model parameters directly in Transformers makes updating open-source transformer-based models possible without re-training (Meng et al., 2023). However, these editing methods have only been evaluated on statements about encyclopedic knowledge with a single correct answer. Commonsense knowledge with multiple correct answers, e.g., an apple can be green or red but not transparent, has not been studied but is as essential for enhancing transformers' reliability and usefulness. In this paper, we investigate whether commonsense judgments are causally associated with localized, editable parameters in Transformers, and we provide an affirmative answer. We find that directly applying the MEMIT editing algorithm results in sub-par performance and improve it for the commonsense domain by varying edit tokens and improving the layer selection strategy, i.e., $MEMIT_{CSK}$. GPT-2 Large and XL models edited using $MEMIT_{CSK}$ outperform best-fine-tuned baselines by 10.97% and 10.73% F1 scores on PEP3k and 20Q datasets. In addition, we propose a novel evaluation dataset, PROBE SET, that contains unaffected and affected neighborhoods, affected paraphrases, and affected reasoning challenges. $MEMIT_{CSK}$ performs well across the metrics while fine-tuning baselines show significant trade-offs between unaffected and affected metrics. These results suggest a compelling future direction for incorporating feedback about common sense into Transformers through direct model editing. △ Less

Submitted 26 October, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

Comments: Accepted to EMNLP 2023 Main Conference. Anshita, Debanjan, Akshay are co-first authors. Code and datasets for all experiments are available at https://github.com/anshitag/memit_csk

arXiv:2303.07645 [pdf, other]

Finding a Maximum Clique in a Disk Graph

Authors: Jared Espenant, J. Mark Keil, Debajyoti Mondal

Abstract: A disk graph is an intersection graph of disks in the Euclidean plane, where the disks correspond to the vertices of the graph and a pair of vertices are adjacent if and only if their corresponding disks intersect. The problem of determining the time complexity of computing a maximum clique in a disk graph is a long-standing open question. The problem is known to be open even when the radii of all… ▽ More A disk graph is an intersection graph of disks in the Euclidean plane, where the disks correspond to the vertices of the graph and a pair of vertices are adjacent if and only if their corresponding disks intersect. The problem of determining the time complexity of computing a maximum clique in a disk graph is a long-standing open question. The problem is known to be open even when the radii of all the disks are in the interval $[1,(1+\varepsilon)]$, where $\varepsilon>0$. However, the maximum clique problem is known to be APX-hard for the intersection graphs of many other convex objects such as intersection graphs of ellipses, triangles, and a combination of unit disks and axis-parallel rectangles. Furthermore, there exists an $O(n^3\log n)$-time algorithm to compute a maximum clique for unit disks. Here we obtain the following results. - We give an algorithm to compute a maximum clique in a unit disk graph in $O(n^{2.5}\log n)$-time, which improves the previously best known running time of $O(n^3\log n)$ [Eppstein '09]. - We extend a widely used `co-2-subdivision approach' to prove that computing a maximum clique in a combination of unit disks and axis-parallel rectangles is NP-hard to approximate within $4448/4449 \approx 0.9997 $. The use of a `co-2-subdivision approach' was previously thought to be unlikely in this setting [Bonnet et al. '20]. Our result improves the previously known inapproximability factor of $7633010347/7633010348\approx 0.9999$. - We show that the parameter minimum lens width of the disk arrangement may be used to make progress in the case when disk radii are in $[1,(1+\varepsilon)]$. For example, if the minimum lens width is at least $0.265$ and $ \varepsilon\le 0.0001$, which still allows for non-Helly triples in the arrangement, then one can find a maximum clique in polynomial time. △ Less

Submitted 14 March, 2023; originally announced March 2023.

Comments: Preliminary results appeared at the 39th International Symposium on Computational Geometry (SoCG 2023)

MSC Class: 52C99; 68Q25 ACM Class: F.2.2; I.3.5

arXiv:2303.01435 [pdf, other]

Pathways to Leverage Transcompiler based Data Augmentation for Cross-Language Clone Detection

Authors: Subroto Nag Pinku, Debajyoti Mondal, Chanchal K. Roy

Abstract: Software clones are often introduced when developers reuse code fragments to implement similar functionalities in the same or different software systems. Many high-performing clone detection tools today are based on deep learning techniques and are mostly used for detecting clones written in the same programming language, whereas clone detection tools for detecting cross-language clones are also e… ▽ More Software clones are often introduced when developers reuse code fragments to implement similar functionalities in the same or different software systems. Many high-performing clone detection tools today are based on deep learning techniques and are mostly used for detecting clones written in the same programming language, whereas clone detection tools for detecting cross-language clones are also emerging rapidly. The popularity of deep learning-based clone detection tools creates an opportunity to investigate how known strategies that boost the performances of deep learning models could be further leveraged to improve clone detection tools. In this paper, we investigate such a strategy, data augmentation, which has not yet been explored for cross-language clone detection as opposed to single-language clone detection. We show how the existing knowledge on transcompilers (source-to-source translators) can be used for data augmentation to boost the performance of cross-language clone detection models, as well as to adapt single-language clone detection models to create cross-language clone detection pipelines. To demonstrate the performance boost for cross-language clone detection through data augmentation, we exploit Transcoder, which is a pre-trained source-to-source translator. To show how to extend single-language models for cross-language clone detection, we extend a popular single-language model, Graph Matching Network (GMN) in a combination with the transcompilers. We evaluated our models on popular benchmark datasets. Our experimental results showed improvements in F1 scores (sometimes up to 3%) for the cutting-edge cross-language clone detection models. Even when extending GMN for cross-language clone detection, the models built leveraging data augmentation outperformed the baseline with scores of 0.90, 0.92, and 0.91 for precision, recall, and F1 score, respectively. △ Less

Submitted 2 March, 2023; originally announced March 2023.

Comments: Accepted at the 31st IEEE/ACM International Conference on Program Comprehension (ICPC 2023)

ACM Class: D.2; D.2.13

arXiv:2209.13024 [pdf, other]

Improved and Generalized Algorithms for Burning a Planar Point Set

Authors: Prashant Gokhale, J. Mark Keil, Debajyoti Mondal

Abstract: Given a set $P$ of points in the plane, a point burning process is a discrete time process to burn all the points of $P$ where fires must be initiated at the given points. Specifically, the point burning process starts with a single burnt point from $P$, and at each subsequent step, burns all the points in the plane that are within one unit distance from the currently burnt points, as well as one… ▽ More Given a set $P$ of points in the plane, a point burning process is a discrete time process to burn all the points of $P$ where fires must be initiated at the given points. Specifically, the point burning process starts with a single burnt point from $P$, and at each subsequent step, burns all the points in the plane that are within one unit distance from the currently burnt points, as well as one other unburnt point of $P$ (if exists). The point burning number of $P$ is the smallest number of steps required to burn all the points of $P$. If we allow the fire to be initiated anywhere, then the burning process is called an anywhere burning process, and the corresponding burning number is called anywhere burning number. Computing the point and anywhere burning number is known to be NP-hard. In this paper we show that both these problems admit PTAS in one dimension. We then show that in two dimensions, point burning and anywhere burning are $(1.96296+\varepsilon)$ and $(1.92188+\varepsilon)$ approximable, respectively, for every $\varepsilon>0$, which improves the previously known $(2+\varepsilon)$ factor for these problems. We also observe that a known result on set cover problem can be leveraged to obtain a 2-approximation for burning the maximum number of points in a given number of steps. We show how the results generalize if we allow the points to have different fire spreading rates. Finally, we prove that even if the burning sources are given as input, finding a point burning sequence itself is NP-hard. △ Less

Submitted 26 September, 2022; originally announced September 2022.

MSC Class: 52C15; 68Q25 ACM Class: F.2

arXiv:2208.06122 [pdf, other]

Minimum Ply Covering of Points with Unit Squares

Authors: Stephane Durocher, J. Mark Keil, Debajyoti Mondal

Abstract: Given a set $P$ of points and a set $U$ of axis-parallel unit squares in the Euclidean plane, a minimum ply cover of $P$ with $U$ is a subset of $U$ that covers $P$ and minimizes the number of squares that share a common intersection, called the minimum ply cover number of $P$ with $U$. Biedl et al. [Comput. Geom., 94:101712, 2020] showed that determining the minimum ply cover number for a set of… ▽ More Given a set $P$ of points and a set $U$ of axis-parallel unit squares in the Euclidean plane, a minimum ply cover of $P$ with $U$ is a subset of $U$ that covers $P$ and minimizes the number of squares that share a common intersection, called the minimum ply cover number of $P$ with $U$. Biedl et al. [Comput. Geom., 94:101712, 2020] showed that determining the minimum ply cover number for a set of points by a set of axis-parallel unit squares is NP-hard, and gave a polynomial-time 2-approximation algorithm for instances in which the minimum ply cover number is constant. The question of whether there exists a polynomial-time approximation algorithm remained open when the minimum ply cover number is $ω(1)$. We settle this open question and present a polynomial-time $(8+\varepsilon)$-approximation algorithm for the general problem, for every fixed $\varepsilon>0$. △ Less

Submitted 12 August, 2022; originally announced August 2022.

MSC Class: 68W25; 52C15 ACM Class: F.2

arXiv:2205.13095 [pdf]

VizInspect Pro -- Automated Optical Inspection (AOI) solution

Authors: Faraz Waseem, Sanjit Menon, Haotian Xu, Debashis Mondal

Abstract: Traditional vision based Automated Optical Inspection (referred to as AOI in paper) systems present multiple challenges in factory settings including inability to scale across multiple product lines, requirement of vendor programming expertise, little tolerance to variations and lack of cloud connectivity for aggregated insights. The lack of flexibility in these systems presents a unique opportuni… ▽ More Traditional vision based Automated Optical Inspection (referred to as AOI in paper) systems present multiple challenges in factory settings including inability to scale across multiple product lines, requirement of vendor programming expertise, little tolerance to variations and lack of cloud connectivity for aggregated insights. The lack of flexibility in these systems presents a unique opportunity for a deep learning based AOI system specifically for factory automation. The proposed solution, VizInspect pro is a generic computer vision based AOI solution built on top of Leo - An edge AI platform. Innovative features that overcome challenges of traditional vision systems include deep learning based image analysis which combines the power of self-learning with high speed and accuracy, an intuitive user interface to configure inspection profiles in minutes without ML or vision expertise and the ability to solve complex inspection challenges while being tolerant to deviations and unpredictable defects. This solution has been validated by multiple external enterprise customers with confirmed value propositions. In this paper we show you how this solution and platform solved problems around model development, deployment, scaling multiple inferences and visualizations. △ Less

Submitted 25 May, 2022; originally announced May 2022.

arXiv:2205.04643 [pdf, other]

Burning Number for the Points in the Plane

Authors: J. Mark Keil, Debajyoti Mondal, Ehsan Moradi

Abstract: The burning process on a graph $G$ starts with a single burnt vertex, and at each subsequent step, burns the neighbors of the currently burnt vertices, as well as one other unburnt vertex. The burning number of $G$ is the smallest number of steps required to burn all the vertices of the graph. In this paper, we examine the problem of computing the burning number in a geometric setting. The input i… ▽ More The burning process on a graph $G$ starts with a single burnt vertex, and at each subsequent step, burns the neighbors of the currently burnt vertices, as well as one other unburnt vertex. The burning number of $G$ is the smallest number of steps required to burn all the vertices of the graph. In this paper, we examine the problem of computing the burning number in a geometric setting. The input is a set of points $P$ in the Euclidean plane. The burning process starts with a single burnt point, and at each subsequent step, burns all the points that are within a distance of one unit from the currently burnt points and one other unburnt point. The burning number of $P$ is the smallest number of steps required to burn all the points of $P$. We call this variant \emph{point burning}. We consider another variant called \emph{anywhere burning}, where we are allowed to burn any point of the plane. We show that point burning and anywhere burning problems are both NP-complete, but $(2+\varepsilon)$ approximable for every $\varepsilon>0$. Moreover, if we put a restriction on the number of burning sources that can be used, then the anywhere burning problem becomes NP-hard to approximate within a factor of $\frac{2}{\sqrt{3}}-\varepsilon$. △ Less

Submitted 26 September, 2022; v1 submitted 9 May, 2022; originally announced May 2022.

MSC Class: 52C15; 68Q25 ACM Class: F.2

arXiv:2203.04253 [pdf, other]

Oriented Diameter of Planar Triangulations

Authors: Debajyoti Mondal, N. Parthiban, Indra Rajasingh

Abstract: The diameter of an undirected or a directed graph is defined to be the maximum shortest path distance over all pairs of vertices in the graph. Given an undirected graph $G$, we examine the problem of assigning directions to each edge of $G$ such that the diameter of the resulting oriented graph is minimized. The minimum diameter over all strongly connected orientations is called the oriented diame… ▽ More The diameter of an undirected or a directed graph is defined to be the maximum shortest path distance over all pairs of vertices in the graph. Given an undirected graph $G$, we examine the problem of assigning directions to each edge of $G$ such that the diameter of the resulting oriented graph is minimized. The minimum diameter over all strongly connected orientations is called the oriented diameter of $G$. The problem of determining the oriented diameter of a graph is known to be NP-hard, but the time-complexity question is open for planar graphs. In this paper we compute the exact value of the oriented diameter for triangular grid graphs. We then prove an $n/3$ lower bound and an $n/2+O(\sqrt{n})$ upper bound on the oriented diameter of planar triangulations. It is known that given a planar graph $G$ with bounded treewidth and a fixed positive integer $k$, one can determine in linear time whether the oriented diameter of $G$ is at most $k$. In contrast, we consider a weighted version of the oriented diameter problem and show it to be is weakly NP-complete for planar graphs with bounded pathwidth. △ Less

Submitted 8 March, 2022; originally announced March 2022.

MSC Class: 05C85; 68Q25 ACM Class: F.2.0; G.2.2

arXiv:2201.10137 [pdf, other]

Leveraging Structural Properties of Source Code Graphs for Just-In-Time Bug Prediction

Authors: Md Nadim, Debajyoti Mondal, Chanchal K. Roy

Abstract: The most common use of data visualization is to minimize the complexity for proper understanding. A graph is one of the most commonly used representations for understanding relational data. It produces a simplified representation of data that is challenging to comprehend if kept in a textual format. In this study, we propose a methodology to utilize the relational properties of source code in the… ▽ More The most common use of data visualization is to minimize the complexity for proper understanding. A graph is one of the most commonly used representations for understanding relational data. It produces a simplified representation of data that is challenging to comprehend if kept in a textual format. In this study, we propose a methodology to utilize the relational properties of source code in the form of a graph to identify Just-in-Time (JIT) bug prediction in software systems during different revisions of software evolution and maintenance. We presented a method to convert the source codes of commit patches to equivalent graph representations and named it Source Code Graph (SCG). To understand and compare multiple source code graphs, we extracted several structural properties of these graphs, such as the density, number of cycles, nodes, edges, etc. We then utilized the attribute values of those SCGs to visualize and detect buggy software commits. We process more than 246K software commits from 12 subject systems in this investigation. Our investigation on these 12 open-source software projects written in C++ and Java programming languages shows that if we combine the features from SCG with conventional features used in similar studies, we will get the increased performance of Machine Learning (ML) based buggy commit detection models. We also find the increase of F1~Scores in predicting buggy and non-buggy commits statistically significant using the Wilcoxon Signed Rank Test. Since SCG-based feature values represent the style or structural properties of source code updates or changes in the software system, it suggests the importance of careful maintenance of source code style or structure for keeping a software system bug-free. △ Less

Submitted 25 January, 2022; originally announced January 2022.

Comments: Has been accepted for publication Automated Software Engineering (AUSE), an International Journal published by Springer

arXiv:2108.12500 [pdf, other]

Positive Planar Satisfiability Problems under 3-Connectivity Constraints

Authors: Md. Manzurul Hasan, Debajyoti Mondal, Md. Saidur Rahman

Abstract: A 3-SAT problem is called positive and planar if all the literals are positive and the clause-variable incidence graph (i.e., SAT graph) is planar. The NAE 3-SAT and 1-in-3-SAT are two variants of 3-SAT that remain NP-complete even when they are positive. The positive 1-in-3-SAT problem remains NP-complete under planarity constraint, but planar NAE 3-SAT is solvable in $O(n^{1.5}\log n)$ time. In… ▽ More A 3-SAT problem is called positive and planar if all the literals are positive and the clause-variable incidence graph (i.e., SAT graph) is planar. The NAE 3-SAT and 1-in-3-SAT are two variants of 3-SAT that remain NP-complete even when they are positive. The positive 1-in-3-SAT problem remains NP-complete under planarity constraint, but planar NAE 3-SAT is solvable in $O(n^{1.5}\log n)$ time. In this paper we prove that a positive planar NAE 3-SAT is always satisfiable when the underlying SAT graph is 3-connected, and a satisfiable assignment can be obtained in linear time. We also show that without 3-connectivity constraint, existence of a linear-time algorithm for positive planar NAE 3-SAT problem is unlikely as it would imply a linear-time algorithm for finding a spanning 2-matching in a planar subcubic graph. We then prove that positive planar 1-in-3-SAT remains NP-complete under the 3-connectivity constraint, even when each variable appears in at most 4 clauses. However, we show that the 3-connected planar 1-in-3-SAT is always satisfiable when each variable appears in an even number of clauses. △ Less

Submitted 27 August, 2021; originally announced August 2021.

MSC Class: 05C85; 05C85 ACM Class: F.2

arXiv:2108.12464 [pdf, other]

Bottleneck Convex Subsets: Finding $k$ Large Convex Sets in a Point Set

Authors: Stephane Durocher, J. Mark Keil, Saeed Mehrabi, Debajyoti Mondal

Abstract: Chvátal and Klincsek (1980) gave an $O(n^3)$-time algorithm for the problem of finding a maximum-cardinality convex subset of an arbitrary given set $P$ of $n$ points in the plane. This paper examines a generalization of the problem, the Bottleneck Convex Subsets problem: given a set $P$ of $n$ points in the plane and a positive integer $k$, select $k$ pairwise disjoint convex subsets of $P$ such… ▽ More Chvátal and Klincsek (1980) gave an $O(n^3)$-time algorithm for the problem of finding a maximum-cardinality convex subset of an arbitrary given set $P$ of $n$ points in the plane. This paper examines a generalization of the problem, the Bottleneck Convex Subsets problem: given a set $P$ of $n$ points in the plane and a positive integer $k$, select $k$ pairwise disjoint convex subsets of $P$ such that the cardinality of the smallest subset is maximized. Equivalently, a solution maximizes the cardinality of $k$ mutually disjoint convex subsets of $P$ of equal cardinality. We show the problem is NP-hard when $k$ is an arbitrary input parameter, we give an algorithm that solves the problem exactly, with running time polynomial in $n$ when $k$ is fixed, and we give a fixed-parameter tractable algorithm parameterized in terms of the number of points strictly interior to the convex hull. △ Less

Submitted 27 August, 2021; originally announced August 2021.

Comments: Preliminary results appeared at the 27th International Computing and Combinatorics Conference (COCOON 2021)

MSC Class: 05C85; 52C30; 52C35 ACM Class: F.2

arXiv:2108.00529 [pdf, other]

BigGraphVis: Leveraging Streaming Algorithms and GPU Acceleration for Visualizing Big Graphs

Authors: Ehsan Moradi, Debajyoti Mondal

Abstract: Graph layouts are key to exploring massive graphs. An enormous number of nodes and edges do not allow network analysis software to produce meaningful visualization of the pervasive networks. Long computation time, memory and display limitations encircle the software's ability to explore massive graphs. This paper introduces BigGraphVis, a new parallel graph visualization method that uses GPU paral… ▽ More Graph layouts are key to exploring massive graphs. An enormous number of nodes and edges do not allow network analysis software to produce meaningful visualization of the pervasive networks. Long computation time, memory and display limitations encircle the software's ability to explore massive graphs. This paper introduces BigGraphVis, a new parallel graph visualization method that uses GPU parallel processing and community detection algorithm to visualize graph communities. We combine parallelized streaming community detection algorithm and probabilistic data structure to leverage parallel processing of Graphics Processing Unit (GPU). To the best of our knowledge, this is the first attempt to combine the power of streaming algorithms coupled with GPU computing to tackle big graph visualization challenges. Our method extracts community information in a few passes on the edge list, and renders the community structures using the ForceAtlas2 algorithm. Our experiment with massive real-life graphs indicates that about 70 to 95 percent speedup can be achieved by visualizing graph communities, and the visualization appears to be meaningful and reliable. The biggest graph that we examined contains above 3 million nodes and 34 million edges, and the layout computation took about five minutes. We also observed that the BigGraphVis coloring strategy can be successfully applied to produce a more informative ForceAtlas2 layout. △ Less

Submitted 1 August, 2021; originally announced August 2021.

MSC Class: 68U05 ACM Class: C.1.4; I.3.2; I.3.5; G.2.3

arXiv:2107.05198 [pdf, other]

Finding a Maximum Clique in a Grounded 1-Bend String Graph

Authors: J. Mark Keil, Debajyoti Mondal, Ehsan Moradi, Yakov Nekrich

Abstract: A grounded 1-bend string graph is an intersection graph of a set of polygonal lines, each with one bend, such that the lines lie above a common horizontal line $\ell$ and have exactly one endpoint on $\ell$. We show that the problem of finding a maximum clique in a grounded 1-bend string graph is APX-hard, even for strictly $y$-monotone strings. For general 1-bend strings, the problem remains APX-… ▽ More A grounded 1-bend string graph is an intersection graph of a set of polygonal lines, each with one bend, such that the lines lie above a common horizontal line $\ell$ and have exactly one endpoint on $\ell$. We show that the problem of finding a maximum clique in a grounded 1-bend string graph is APX-hard, even for strictly $y$-monotone strings. For general 1-bend strings, the problem remains APX-hard even if we restrict the position of the bends and end-points to lie on at most three parallel horizontal lines. We give fast algorithms to compute a maximum clique for different subclasses of grounded segment graphs, which are formed by restricting the strings to various forms of $L$-shapes. △ Less

Submitted 12 July, 2021; originally announced July 2021.

Comments: A preliminary version of the paper was presented at the 32nd Canadian Conference on Computational Geometry (CCCG 2020)

MSC Class: 68Q25; 65D18 ACM Class: I.3.5

arXiv:2106.04693 [pdf, other]

On the Evolution of Neuron Communities in a Deep Learning Architecture

Authors: Sakib Mostafa, Debajyoti Mondal

Abstract: Deep learning techniques are increasingly being adopted for classification tasks over the past decade, yet explaining how deep learning architectures can achieve state-of-the-art performance is still an elusive goal. While all the training information is embedded deeply in a trained model, we still do not understand much about its performance by only analyzing the model. This paper examines the ne… ▽ More Deep learning techniques are increasingly being adopted for classification tasks over the past decade, yet explaining how deep learning architectures can achieve state-of-the-art performance is still an elusive goal. While all the training information is embedded deeply in a trained model, we still do not understand much about its performance by only analyzing the model. This paper examines the neuron activation patterns of deep learning-based classification models and explores whether the models' performances can be explained through neurons' activation behavior. We propose two approaches: one that models neurons' activation behavior as a graph and examines whether the neurons form meaningful communities, and the other examines the predictability of neurons' behavior using entropy. Our comprehensive experimental study reveals that both the community quality and entropy can provide new insights into the deep learning models' performances, thus paves a novel way of explaining deep learning models directly from the neurons' activation pattern. △ Less

Submitted 8 October, 2021; v1 submitted 8 June, 2021; originally announced June 2021.

arXiv:2103.15037 [pdf, other]

StreamTable: An Area Proportional Visualization for Tables with Flowing Streams

Authors: Jared Espenant, Debajyoti Mondal

Abstract: Let $M$ be a two-dimensional table with each cell weighted by a nonzero positive number. A StreamTable visualization of $M$ represents the columns as non-overlapping vertical streams and the rows as horizontal stripes such that the intersection between a stream and a stripe is a rectangle with area equal to the weight of the corresponding cell. To avoid large wiggle of the streams, it is desirable… ▽ More Let $M$ be a two-dimensional table with each cell weighted by a nonzero positive number. A StreamTable visualization of $M$ represents the columns as non-overlapping vertical streams and the rows as horizontal stripes such that the intersection between a stream and a stripe is a rectangle with area equal to the weight of the corresponding cell. To avoid large wiggle of the streams, it is desirable to keep the consecutive cells in a stream to be adjacent. Let $B$ be the smallest axis-aligned bounding box containing the StreamTable. Then the difference between the area of $B$ and the sum of the weights is referred to as the excess area. We attempt to optimize various StreamTable aesthetics (e.g., minimizing excess area, or maximizing cell adjacencies in streams). (A) If the row permutation is fixed and the row heights are given, then we give an $O(rc)$-time algorithm to optimize these aesthetics, where $r$ and $c$ are the number of rows and columns, respectively. (B) If the row permutation is fixed but the row heights can be chosen, then we discuss a technique to compute an aesthetic (but not necessarily optimal) StreamTable by solving a quadratically-constrained quadratic program, followed by iterative improvements. If the row heights are restricted to be integers, then we prove the problem to be NP-hard. (C) If the row permutations can be chosen, then we show that it is NP-hard to find a row permutation that optimizes the area or adjacency aesthetics. △ Less

Submitted 25 March, 2022; v1 submitted 27 March, 2021; originally announced March 2021.

Comments: Preliminary results appeared at the 16th International Conference and Workshops (WALCOM 2022), Jember, Indonesia

MSC Class: 52C45; 65D18 ACM Class: F.2.0; I.3.5

arXiv:2101.06596 [pdf, other]

Simultaneous Embedding of Colored Graphs

Authors: Debajyoti Mondal

Abstract: A set of colored graphs are compatible, if for every color $i$, the number of vertices of color $i$ is the same in every graph. A simultaneous embedding of $k$ compatibly colored graphs, each with $n$ vertices, consists of $k$ planar polyline drawings of these graphs such that the vertices of the same color are mapped to a common set of vertex locations. We prove that simultaneous embedding of… ▽ More A set of colored graphs are compatible, if for every color $i$, the number of vertices of color $i$ is the same in every graph. A simultaneous embedding of $k$ compatibly colored graphs, each with $n$ vertices, consists of $k$ planar polyline drawings of these graphs such that the vertices of the same color are mapped to a common set of vertex locations. We prove that simultaneous embedding of $k\in o(\log \log n)$ colored planar graphs, each with $n$ vertices, can always be computed with a sublinear number of bends per edge. Specifically, we show an $O(\min\{c, n^{1-1/γ}\})$ upper bound on the number of bends per edge, where $γ= 2^{\lceil k/2 \rceil}$ and $c$ is the total number of colors. Our bound, which results from a better analysis of a previously known algorithm [Durocher and Mondal, SIAM J. Discrete Math., 32(4), 2018], improves the bound for $k$, as well as the bend complexity by a factor of $\sqrt{2}^{k}$. The algorithm can be generalized to obtain small universal point sets for colored graphs. We prove that $n\lceil c/b \rceil$ vertex locations, where $b\ge 1$, suffice to embed any set of compatibly colored $n$-vertex planar graphs with bend complexity $O(b)$, where $c$ is the number of colors. △ Less

Submitted 17 January, 2021; originally announced January 2021.

MSC Class: 68R10 ACM Class: G.2.2

arXiv:2006.14733 [pdf, other]

APX-Hardness and Approximation for the k-Burning Number Problem

Authors: Debajyoti Mondal, N. Parthiban, V. Kavitha, Indra Rajasingh

Abstract: Consider an information diffusion process on a graph $G$ that starts with $k>0$ burnt vertices, and at each subsequent step, burns the neighbors of the currently burnt vertices, as well as $k$ other unburnt vertices. The \emph{$k$-burning number} of $G$ is the minimum number of steps $b_k(G)$ such that all the vertices can be burned within $b_k(G)$ steps. Note that the last step may have smaller t… ▽ More Consider an information diffusion process on a graph $G$ that starts with $k>0$ burnt vertices, and at each subsequent step, burns the neighbors of the currently burnt vertices, as well as $k$ other unburnt vertices. The \emph{$k$-burning number} of $G$ is the minimum number of steps $b_k(G)$ such that all the vertices can be burned within $b_k(G)$ steps. Note that the last step may have smaller than $k$ unburnt vertices available, where all of them are burned. The $1$-burning number coincides with the well-known burning number problem, which was proposed to model the spread of social contagion. The generalization to $k$-burning number allows us to examine different worst-case contagion scenarios by varying the spread factor $k$. In this paper we prove that computing $k$-burning number is APX-hard, for any fixed constant $k$. We then give an $O((n+m)\log n)$-time 3-approximation algorithm for computing $k$-burning number, for any $k\ge 1$, where $n$ and $m$ are the number of vertices and edges, respectively. Finally, we show that even if the burning sources are given as an input, computing a burning sequence itself is an NP-hard problem. △ Less

Submitted 17 January, 2021; v1 submitted 25 June, 2020; originally announced June 2020.

MSC Class: 03D15; 68Q25

arXiv:2004.07996 [pdf, other]

Compatible Paths on Labelled Point Sets

Authors: Elena Arseneva, Yeganeh Bahoo, Ahmad Biniaz, Pilar Cano, Farah Chanchary, John Iacono, Kshitij Jain, Anna Lubiw, Debajyoti Mondal, Khadijeh Sheikhan, Csaba D. Tóth

Abstract: Let $P$ and $Q$ be finite point sets of the same cardinality in $\mathbb{R}^2$, each labelled from $1$ to $n$. Two noncrossing geometric graphs $G_P$ and $G_Q$ spanning $P$ and $Q$, respectively, are called compatible if for every face $f$ in $G_P$, there exists a corresponding face in $G_Q$ with the same clockwise ordering of the vertices on its boundary as in $f$. In particular, $G_P$ and $G_Q$… ▽ More Let $P$ and $Q$ be finite point sets of the same cardinality in $\mathbb{R}^2$, each labelled from $1$ to $n$. Two noncrossing geometric graphs $G_P$ and $G_Q$ spanning $P$ and $Q$, respectively, are called compatible if for every face $f$ in $G_P$, there exists a corresponding face in $G_Q$ with the same clockwise ordering of the vertices on its boundary as in $f$. In particular, $G_P$ and $G_Q$ must be straight-line embeddings of the same connected $n$-vertex graph. Deciding whether two labelled point sets admit compatible geometric paths is known to be NP-complete. We give polynomial-time algorithms to find compatible paths or report that none exist in three scenarios: $O(n)$ time for points in convex position; $O(n^2)$ time for two simple polygons, where the paths are restricted to remain inside the closed polygons; and $O(n^2 \log n)$ time for points in general position if the paths are restricted to be monotone. △ Less

Submitted 16 April, 2020; originally announced April 2020.

Comments: A preliminary version of the paper was presented at the 30th Canadian Conference on Computational Geometry (CCCG 2018)

MSC Class: 05C85; 52C30; 52C35

arXiv:2002.09740 [pdf, other]

(Faster) Multi-Sided Boundary Labelling

Authors: Prosenjit Bose, Saeed Mehrabi, Debajyoti Mondal

Abstract: A 1-bend boundary labelling problem consists of an axis-aligned rectangle $B$, $n$ points (called sites) in the interior, and $n$ points (called ports) on the labels along the boundary of $B$. The goal is to find a set of $n$ axis-aligned curves (called leaders), each having at most one bend and connecting one site to one port, such that the leaders are pairwise disjoint. A 1-bend boundary labelli… ▽ More A 1-bend boundary labelling problem consists of an axis-aligned rectangle $B$, $n$ points (called sites) in the interior, and $n$ points (called ports) on the labels along the boundary of $B$. The goal is to find a set of $n$ axis-aligned curves (called leaders), each having at most one bend and connecting one site to one port, such that the leaders are pairwise disjoint. A 1-bend boundary labelling problem is $k$-sided ($1\leq k\leq 4$) if the ports appear on $k$ different sides of $B$. Kindermann et al. ["Multi-Sided Boundary Labeling", Algorithmica, 76(1): 225-258, 2016] showed that the 1-bend three-sided and four-sided boundary labelling problems can be solved in $O(n^4)$ and $O(n^9)$ time, respectively. Bose et al. [SWAT, 12:1-12:14, 2018] improved the latter running time to $O(n^6)$ by reducing the problem to computing maximum independent set in an outerstring graph. In this paper, we improve both previous results by giving new algorithms with running times $O(n^3\log n)$ and $O(n^5)$ to solve the 1-bend three-sided and four-sided boundary labelling problems, respectively. △ Less

Submitted 22 February, 2020; originally announced February 2020.

Comments: 16 pages, 12 figures

arXiv:2002.05099 [pdf, other]

Parameterized Complexity of Two-Interval Pattern Problem

Authors: Prosenjit Bose, Saeed Mehrabi, Debajyoti Mondal

Abstract: A \emph{2-interval} is the union of two disjoint intervals on the real line. Two 2-intervals $D_1$ and $D_2$ are \emph{disjoint} if their intersection is empty (i.e., no interval of $D_1$ intersects any interval of $D_2$). There can be three different relations between two disjoint 2-intervals; namely, preceding ($<$), nested ($\sqsubset$) and crossing ($\between$). Two 2-intervals $D_1$ and… ▽ More A \emph{2-interval} is the union of two disjoint intervals on the real line. Two 2-intervals $D_1$ and $D_2$ are \emph{disjoint} if their intersection is empty (i.e., no interval of $D_1$ intersects any interval of $D_2$). There can be three different relations between two disjoint 2-intervals; namely, preceding ($<$), nested ($\sqsubset$) and crossing ($\between$). Two 2-intervals $D_1$ and $D_2$ are called \emph{$R$-comparable} for some $R\in\{<,\sqsubset,\between\}$, if either $D_1RD_2$ or $D_2RD_1$. A set $\mathcal{D}$ of disjoint 2-intervals is $\mathcal{R}$-comparable, for some $\mathcal{R}\subseteq\{<,\sqsubset,\between\}$ and $\mathcal{R}\neq\emptyset$, if every pair of 2-intervals in $\mathcal{R}$ are $R$-comparable for some $R\in\mathcal{R}$. Given a set of 2-intervals and some $\mathcal{R}\subseteq\{<,\sqsubset,\between\}$, the objective of the \emph{2-interval pattern problem} is to find a largest subset of 2-intervals that is $\mathcal{R}$-comparable. The 2-interval pattern problem is known to be $W[1]$-hard when $|\mathcal{R}|=3$ and $NP$-hard when $|\mathcal{R}|=2$ (except for $\mathcal{R}=\{<,\sqsubset\}$, which is solvable in quadratic time). In this paper, we fully settle the parameterized complexity of the problem by showing it to be $W[1]$-hard for both $\mathcal{R}=\{\sqsubset,\between\}$ and $\mathcal{R}=\{<,\between\}$ (when parameterized by the size of an optimal solution); this answers an open question posed by Vialette [Encyclopedia of Algorithms, 2008]. △ Less

Submitted 12 February, 2020; originally announced February 2020.

Comments: 11 pages, 6 figures

arXiv:1910.10376 [pdf, other]

Emanation Graph: A Plane Geometric Spanner with Steiner Points

Authors: Bardia Hamedmohseni, Zahed Rahmati, Debajyoti Mondal

Abstract: An emanation graph of grade $k$ on a set of points is a plane spanner made by shooting $2^{k+1}$ equally spaced rays from each point, where the shorter rays stop the longer ones upon collision. The collision points are the Steiner points of the spanner. Emanation graphs of grade one were studied by Mondal and Nachmanson in the context of network visualization. They proved that the spanning ratio o… ▽ More An emanation graph of grade $k$ on a set of points is a plane spanner made by shooting $2^{k+1}$ equally spaced rays from each point, where the shorter rays stop the longer ones upon collision. The collision points are the Steiner points of the spanner. Emanation graphs of grade one were studied by Mondal and Nachmanson in the context of network visualization. They proved that the spanning ratio of such a graph is bounded by $(2+\sqrt{2})\approx 3.414$. We improve this upper bound to $\sqrt{10} \approx 3.162$ and show this to be tight, i.e., there exist emanation graphs with spanning ratio $\sqrt{10}$. We show that for every fixed $k$, the emanation graphs of grade $k$ are constant spanners, where the constant factor depends on $k$. An emanation graph of grade two may have twice the number of edges compared to grade one graphs. Hence we introduce a heuristic method for simplifying them. In particular, we compare simplified emanation graphs against Shewchuk's constrained Delaunay triangulations on both synthetic and real-life datasets. Our experimental results reveal that the simplified emanation graphs outperform constrained Delaunay triangulations in common quality measures (e.g., edge count, angular resolution, average degree, total edge length) while maintaining a comparable spanning ratio and Steiner point count. △ Less

Submitted 14 November, 2022; v1 submitted 23 October, 2019; originally announced October 2019.

Comments: A preliminary version of this work was presented at the 30th Canadian Conference on Computational Geometry (CCCG) and the 46th International Conference on Current Trends in Theory and Practice of Computer Science (SOFSEM)

arXiv:1904.06760 [pdf, other]

Drawing HV-Restricted Planar Graphs

Authors: Stephane Durocher, Stefan Felsner, Saeed Mehrabi, Debajyoti Mondal

Abstract: A strict orthogonal drawing of a graph $G=(V, E)$ in $\mathbb{R}^2$ is a drawing of $G$ such that each vertex is mapped to a distinct point and each edge is mapped to a horizontal or vertical line segment. A graph $G$ is $HV$-restricted if each of its edges is assigned a horizontal or vertical orientation. A strict orthogonal drawing of an $HV$-restricted graph $G$ is good if it is planar and resp… ▽ More A strict orthogonal drawing of a graph $G=(V, E)$ in $\mathbb{R}^2$ is a drawing of $G$ such that each vertex is mapped to a distinct point and each edge is mapped to a horizontal or vertical line segment. A graph $G$ is $HV$-restricted if each of its edges is assigned a horizontal or vertical orientation. A strict orthogonal drawing of an $HV$-restricted graph $G$ is good if it is planar and respects the edge orientations of $G$. In this paper, we give a polynomial-time algorithm to check whether a given $HV$-restricted plane graph (i.e., a planar graph with a fixed combinatorial embedding) admits a good orthogonal drawing preserving the input embedding, which settles an open question posed by Maňuch et al. (Graph Drawing 2010). We then examine $HV$-restricted planar graphs (i.e., when the embedding is not fixed), and give a complete characterization of the $HV$-restricted biconnected outerplanar graphs that admit good orthogonal drawings. △ Less

Submitted 14 April, 2019; originally announced April 2019.

Comments: 17 pages, 9 figures

arXiv:1903.07024 [pdf, other]

Computing Maximum Independent Set on Outerstring Graphs and Their Relatives

Authors: Prosenjit Bose, Paz Carmi, J. Mark Keil, Anil Maheshwari, Saeed Mehrabi, Debajyoti Mondal, Michiel Smid

Abstract: A graph $G$ with $n$ vertices is called an outerstring graph if it has an intersection representation of a set of $n$ curves inside a disk such that one endpoint of every curve is attached to the boundary of the disk. Given an outerstring graph representation, the Maximum Independent Set (MIS) problem of the underlying graph can be computed in $O(s^3)$ time, where $s$ is the number of segments in… ▽ More A graph $G$ with $n$ vertices is called an outerstring graph if it has an intersection representation of a set of $n$ curves inside a disk such that one endpoint of every curve is attached to the boundary of the disk. Given an outerstring graph representation, the Maximum Independent Set (MIS) problem of the underlying graph can be computed in $O(s^3)$ time, where $s$ is the number of segments in the representation (Keil et al., Comput. Geom., 60:19--25, 2017). If the strings are of constant size (e.g., line segments, L-shapes, etc.), then the algorithm takes $O(n^3)$ time. In this paper, we examine the fine-grained complexity of the MIS problem on some well-known outerstring representations. We show that solving the MIS problem on grounded segment and grounded square-L representations is at least as hard as solving MIS on circle graph representations. Note that no $O(n^{2-δ})$-time algorithm, $δ>0$, is known for the MIS problem on circle graphs. For the grounded string representations where the strings are $y$-monotone simple polygonal paths of constant length with segments at integral coordinates, we solve MIS in $O(n^2)$ time and show this to be the best possible under the strong exponential time hypothesis (SETH). For the intersection graph of $n$ L-shapes in the plane, we give a $(4\cdot \log OPT)$-approximation algorithm for MIS (where $OPT$ denotes the size of an optimal solution), improving the previously best-known $(4\cdot \log n)$-approximation algorithm of Biedl and Derka (WADS 2017). △ Less

Submitted 1 August, 2021; v1 submitted 17 March, 2019; originally announced March 2019.

Comments: A preliminary version of this paper appeared in the 16th International Symposium on Algorithms and Data Structures (WADS 2019)

MSC Class: 68Q25; 65D18 ACM Class: I.3.5

arXiv:1903.06981 [pdf, other]

doi 10.46298/dmtcs.8383

Token Swapping on Trees

Authors: Ahmad Biniaz, Kshitij Jain, Anna Lubiw, Zuzana Masárová, Tillmann Miltzow, Debajyoti Mondal, Anurag Murty Naredla, Josef Tkadlec, Alexi Turcotte

Abstract: The input to the token swapping problem is a graph with vertices $v_1, v_2, \ldots, v_n$, and $n$ tokens with labels $1, 2, \ldots, n$, one on each vertex. The goal is to get token $i$ to vertex $v_i$ for all $i= 1, \ldots, n$ using a minimum number of swaps, where a swap exchanges the tokens on the endpoints of an edge. We present some results about token swapping on a tree, also known as "sortin… ▽ More The input to the token swapping problem is a graph with vertices $v_1, v_2, \ldots, v_n$, and $n$ tokens with labels $1, 2, \ldots, n$, one on each vertex. The goal is to get token $i$ to vertex $v_i$ for all $i= 1, \ldots, n$ using a minimum number of swaps, where a swap exchanges the tokens on the endpoints of an edge. We present some results about token swapping on a tree, also known as "sorting with a transposition tree": 1. An optimum swap sequence may need to perform a swap on a leaf vertex that has the correct token (a "happy leaf"), disproving a conjecture of Vaughan. 2. Any algorithm that fixes happy leaves -- as all known approximation algorithms for the problem do -- has approximation factor at least $4/3$. Furthermore, the two best-known 2-approximation algorithms have approximation factor exactly 2. 3. A generalized problem -- weighted coloured token swapping -- is NP-complete on trees, even when they are restricted to be subdivided stars, but solvable in polynomial time on paths and stars. In this version, tokens and vertices have colours, and colours have weights. The goal is to get every token to a vertex of the same colour, and the cost of a swap is the sum of the weights of the two tokens involved. △ Less

Submitted 12 January, 2023; v1 submitted 16 March, 2019; originally announced March 2019.

Comments: 37 pages, Discrete Mathematics and Theoretical Computer Science, DMTCS vol. 24:2, 2022, #9

MSC Class: 03D15; 05C05; 68R05 ACM Class: F.2.0

Journal ref: Discrete Mathematics & Theoretical Computer Science, vol. 24, no 2, Discrete Algorithms (January 18, 2023) dmtcs:8383

arXiv:1812.05656 [pdf, other]

Polygon Simplification by Minimizing Convex Corners

Authors: Yeganeh Bahoo, Stephane Durocher, J. Mark Keil, Debajyoti Mondal, Saeed Mehrabi, Sahar Mehrpour

Abstract: Let $P$ be a polygon with $r>0$ reflex vertices and possibly with holes and islands. A subsuming polygon of $P$ is a polygon $P'$ such that $P \subseteq P'$, each connected component $R$ of $P$ is a subset of a distinct connected component $R'$ of $P'$, and the reflex corners of $R$ coincide with those of $R'$. A subsuming chain of $P'$ is a minimal path on the boundary of $P'$ whose two end edges… ▽ More Let $P$ be a polygon with $r>0$ reflex vertices and possibly with holes and islands. A subsuming polygon of $P$ is a polygon $P'$ such that $P \subseteq P'$, each connected component $R$ of $P$ is a subset of a distinct connected component $R'$ of $P'$, and the reflex corners of $R$ coincide with those of $R'$. A subsuming chain of $P'$ is a minimal path on the boundary of $P'$ whose two end edges coincide with two edges of $P$. Aichholzer et al. proved that every polygon $P$ has a subsuming polygon with $O(r)$ vertices, and posed an open problem to determine the computational complexity of computing subsuming polygons with the minimum number of convex vertices. We prove that the problem of computing an optimal subsuming polygon is NP-complete, but the complexity remains open for simple polygons (i.e., polygons without holes). Our NP-hardness result holds even when the subsuming chains are restricted to have constant length and lie on the arrangement of lines determined by the edges of the input polygon. We show that this restriction makes the problem polynomial-time solvable for simple polygons. △ Less

Submitted 13 December, 2018; originally announced December 2018.

Comments: 15 pages, 9 figures

arXiv:1809.00340 [pdf, other]

Identifying Land Patterns from Satellite Imagery in Amazon Rainforest using Deep Learning

Authors: Somnath Rakshit, Soumyadeep Debnath, Dhiman Mondal

Abstract: The Amazon rainforests have been suffering widespread damage, both via natural and artificial means. Every minute, it is estimated that the world loses forest cover the size of 48 football fields. Deforestation in the Amazon rainforest has led to drastically reduced biodiversity, loss of habitat, climate change, and other biological losses. In this respect, it has become essential to track how the… ▽ More The Amazon rainforests have been suffering widespread damage, both via natural and artificial means. Every minute, it is estimated that the world loses forest cover the size of 48 football fields. Deforestation in the Amazon rainforest has led to drastically reduced biodiversity, loss of habitat, climate change, and other biological losses. In this respect, it has become essential to track how the nature of these forests change over time. Image classification using deep learning can help speed up this process by removing the manual task of classifying each image. Here, it is shown how convolutional neural networks can be used to track changes in land patterns in the Amazon rainforests. In this work, a testing accuracy of 96.71% was obtained. This can help governments and other agencies to track changes in land patterns more effectively and accurately. △ Less

Submitted 2 September, 2018; originally announced September 2018.

arXiv:1808.10005 [pdf, other]

Recognition and Drawing of Stick Graphs

Authors: Felice De Luca, Md Iqbal Hossain, Stephen Kobourov, Anna Lubiw, Debajyoti Mondal

Abstract: A \emph{Stick graph} is an intersection graph of axis-aligned segments such that the left end-points of the horizontal segments and the bottom end-points of the vertical segments lie on a `ground line,' a line with slope $-1$. It is an open question to decide in polynomial time whether a given bipartite graph $G$ with bipartition $A\cup B$ has a Stick representation where the vertices in $A$ and… ▽ More A \emph{Stick graph} is an intersection graph of axis-aligned segments such that the left end-points of the horizontal segments and the bottom end-points of the vertical segments lie on a `ground line,' a line with slope $-1$. It is an open question to decide in polynomial time whether a given bipartite graph $G$ with bipartition $A\cup B$ has a Stick representation where the vertices in $A$ and $B$ correspond to horizontal and vertical segments, respectively. We prove that $G$ has a Stick representation if and only if there are orderings of $A$ and $B$ such that $G$'s bipartite adjacency matrix with rows $A$ and columns $B$ excludes three small `forbidden' submatrices. This is similar to characterizations for other classes of bipartite intersection graphs. We present an algorithm to test whether given orderings of $A$ and $B$ permit a Stick representation respecting those orderings, and to find such a representation if it exists. The algorithm runs in time linear in the size of the adjacency matrix. For the case when only the ordering of $A$ is given, we present an $O(|A|^3|B|^3)$-time algorithm. When neither ordering is given, we present some partial results about graphs that are, or are not, Stick representable. △ Less

Submitted 29 August, 2018; originally announced August 2018.

Comments: Appears in the Proceedings of the 26th International Symposium on Graph Drawing and Network Visualization (GD 2018)

arXiv:1806.04742 [pdf, other]

Minimum Shared-Power Edge Cut

Authors: Sergio Cabello, Kshitij Jain, Anna Lubiw, Debajyoti Mondal

Abstract: We introduce a problem called the Minimum Shared-Power Edge Cut (MSPEC). The input to the problem is an undirected edge-weighted graph with distinguished vertices s and t, and the goal is to find an s-t cut by assigning "powers" at the vertices and removing an edge if the sum of the powers at its endpoints is at least its weight. The objective is to minimize the sum of the assigned powers. MSPEC… ▽ More We introduce a problem called the Minimum Shared-Power Edge Cut (MSPEC). The input to the problem is an undirected edge-weighted graph with distinguished vertices s and t, and the goal is to find an s-t cut by assigning "powers" at the vertices and removing an edge if the sum of the powers at its endpoints is at least its weight. The objective is to minimize the sum of the assigned powers. MSPEC is a graph generalization of a barrier coverage problem in a wireless sensor network: given a set of unit disks with centers in a rectangle, what is the minimum total amount by which we must shrink the disks to permit an intruder to cross the rectangle undetected, i.e. without entering any disc. This is a more sophisticated measure of barrier coverage than the minimum number of disks whose removal breaks the barrier. We develop a fully polynomial time approximation scheme (FPTAS) for MSPEC. We give polynomial time algorithms for the special cases where the edge weights are uniform, or the power values are restricted to a bounded set. Although MSPEC is related to network flow and matching problems, its computational complexity (in P or NP-hard) remains open. △ Less

Submitted 12 June, 2018; originally announced June 2018.

arXiv:1803.10812 [pdf, other]

Boundary Labeling for Rectangular Diagrams

Authors: Prosenjit Bose, Paz Carmi, J. Mark Keil, Saeed Mehrabi, Debajyoti Mondal

Abstract: Given a set of $n$ points (sites) inside a rectangle $R$ and $n$ points (label locations or ports) on its boundary, a boundary labeling problem seeks ways of connecting every site to a distinct port while achieving different labeling aesthetics. We examine the scenario when the connecting lines (leaders) are drawn as axis-aligned polylines with few bends, every leader lies strictly inside $R$, no… ▽ More Given a set of $n$ points (sites) inside a rectangle $R$ and $n$ points (label locations or ports) on its boundary, a boundary labeling problem seeks ways of connecting every site to a distinct port while achieving different labeling aesthetics. We examine the scenario when the connecting lines (leaders) are drawn as axis-aligned polylines with few bends, every leader lies strictly inside $R$, no two leaders cross, and the sum of the lengths of all the leaders is minimized. In a $k$-sided boundary labeling problem, where $1\le k\le 4$, the label locations are located on the $k$ consecutive sides of $R$. In this paper, we develop an $O(n^3\log n)$-time algorithm for 2-sided boundary labeling, where the leaders are restricted to have one bend. This improves the previously best known $O(n^8\log n)$-time algorithm of Kindermann et al. (Algorithmica, 76(1):225-258, 2016). We show the problem is polynomial-time solvable in more general settings such as when the ports are located on more than two sides of $R$, in the presence of obstacles, and even when the objective is to minimize the total number of bends. Our results improve the previous algorithms on boundary labeling with obstacles, as well as provide the first polynomial-time algorithms for minimizing the total leader length and number of bends for 3- and 4-sided boundary labeling. These results settle a number of open questions on the boundary labeling problems (Wolff, Handbook of Graph Drawing, Chapter 23, Table 23.1, 2014). △ Less

Submitted 28 March, 2018; originally announced March 2018.

Comments: 18 pages, 9 figures

arXiv:1802.06699 [pdf, other]

The Complexity of Drawing a Graph in a Polygonal Region

Authors: Anna Lubiw, Tillmann Miltzow, Debajyoti Mondal

Abstract: We prove that the following problem is complete for the existential theory of the reals: Given a planar graph and a polygonal region, with some vertices of the graph assigned to points on the boundary of the region, place the remaining vertices to create a planar straight-line drawing of the graph inside the region. This strengthens an NP-hardness result by Patrignani on extending partial planar g… ▽ More We prove that the following problem is complete for the existential theory of the reals: Given a planar graph and a polygonal region, with some vertices of the graph assigned to points on the boundary of the region, place the remaining vertices to create a planar straight-line drawing of the graph inside the region. This strengthens an NP-hardness result by Patrignani on extending partial planar graph drawings. Our result is one of the first showing that a problem of drawing planar graphs with straight-line edges is hard for the existential theory of the reals. The complexity of the problem is open in the case of a simply connected region. We also show that, even for integer input coordinates, it is possible that drawing a graph in a polygonal region requires some vertices to be placed at irrational coordinates. By contrast, the coordinates are known to be bounded in the special case of a convex region, or for drawing a path in any polygonal region. △ Less

Submitted 5 September, 2018; v1 submitted 19 February, 2018; originally announced February 2018.

Comments: Appears in the Proceedings of the 26th International Symposium on Graph Drawing and Network Visualization (GD 2018)

arXiv:1801.06290 [pdf, other]

Angle-Monotone Graphs: Construction and Local Routing

Authors: Anna Lubiw, Debajyoti Mondal

Abstract: A geometric graph in the plane is angle-monotone of width $γ$ if every pair of vertices is connected by an angle-monotone path of width $γ$, a path such that the angles of any two edges in the path differ by at most $γ$. Angle-monotone graphs have good spanning properties. We prove that every point set in the plane admits an angle-monotone graph of width $90^\circ$, hence with spanning ratio… ▽ More A geometric graph in the plane is angle-monotone of width $γ$ if every pair of vertices is connected by an angle-monotone path of width $γ$, a path such that the angles of any two edges in the path differ by at most $γ$. Angle-monotone graphs have good spanning properties. We prove that every point set in the plane admits an angle-monotone graph of width $90^\circ$, hence with spanning ratio $\sqrt 2$, and a subquadratic number of edges. This answers an open question posed by Dehkordi, Frati and Gudmundsson. We show how to construct, for any point set of size $n$ and any angle $α$, $0 < α< 45^\circ$, an angle-monotone graph of width $(90^\circ+α)$ with $O(\frac{n}α)$ edges. Furthermore, we give a local routing algorithm to find angle-monotone paths of width $(90^\circ+α)$ in these graphs. The routing ratio, which is the ratio of path length to Euclidean distance, is at most $1/\cos(45^\circ + \fracα{2})$, i.e., ranging from $\sqrt 2 \approx 1.414$ to $2.613$. For the special case $α= 30^\circ$, we obtain the $Θ_6$-graph and our routing algorithm achieves the known routing ratio 2 while finding angle-monotone paths of width $120^\circ$. △ Less

Submitted 18 January, 2018; originally announced January 2018.

MSC Class: 68Q25; 65D18 ACM Class: I.3.5

arXiv:1708.09560 [pdf, other]

A Note on Plus-Contacts, Rectangular Duals, and Box-Orthogonal Drawings

Authors: Therese Biedl, Debajyoti Mondal

Abstract: A plus-contact representation of a planar graph $G$ is called $c$-balanced if for every plus shape $+_v$, the number of other plus shapes incident to each arm of $+_v$ is at most $ c Δ+O(1)$, where $Δ$ is the maximum degree of $G$. Although small values of $c$ have been achieved for a few subclasses of planar graphs (e.g., $2$- and $3$-trees), it is unknown whether $c$-balanced representations wit… ▽ More A plus-contact representation of a planar graph $G$ is called $c$-balanced if for every plus shape $+_v$, the number of other plus shapes incident to each arm of $+_v$ is at most $ c Δ+O(1)$, where $Δ$ is the maximum degree of $G$. Although small values of $c$ have been achieved for a few subclasses of planar graphs (e.g., $2$- and $3$-trees), it is unknown whether $c$-balanced representations with $c<1$ exist for arbitrary planar graphs. In this paper we compute $(1/2)$-balanced plus-contact representations for all planar graphs that admit a rectangular dual. Our result implies that any graph with a rectangular dual has a 1-bend box-orthogonal drawings such that for each vertex $v$, the box representing $v$ is a square of side length $\frac{deg(v)}{2}+ O(1)$. △ Less

Submitted 31 August, 2017; originally announced August 2017.

Comments: A poster related to this research appeared at the 25th International Symposium on Graph Drawing & Network Visualization (GD 2017)

MSC Class: 68R10 ACM Class: G.2.2

arXiv:1708.09515 [pdf, other]

On Upward Drawings of Trees on a Given Grid

Authors: Therese Biedl, Debajyoti Mondal

Abstract: Computing a minimum-area planar straight-line drawing of a graph is known to be NP-hard for planar graphs, even when restricted to outerplanar graphs. However, the complexity question is open for trees. Only a few hardness results are known for straight-line drawings of trees under various restrictions such as edge length or slope constraints. On the other hand, there exist polynomial-time algorit… ▽ More Computing a minimum-area planar straight-line drawing of a graph is known to be NP-hard for planar graphs, even when restricted to outerplanar graphs. However, the complexity question is open for trees. Only a few hardness results are known for straight-line drawings of trees under various restrictions such as edge length or slope constraints. On the other hand, there exist polynomial-time algorithms for computing minimum-width (resp., minimum-height) upward drawings of trees, where the height (resp., width) is unbounded. In this paper we take a major step in understanding the complexity of the area minimization problem for strictly-upward drawings of trees, which is one of the most common styles for drawing rooted trees. We prove that given a rooted tree $T$ and a $W\times H$ grid, it is NP-hard to decide whether $T$ admits a strictly-upward (unordered) drawing in the given grid. △ Less

Submitted 30 August, 2017; originally announced August 2017.

Comments: Appears in the Proceedings of the 25th International Symposium on Graph Drawing and Network Visualization (GD 2017)

arXiv:1707.02953 [pdf, other]

On String Contact Representations in 3D

Authors: Debajyoti Mondal

Abstract: An axis-aligned string is a simple polygonal path, where each line segment is parallel to an axis in $\mathbb{R}^3$. Given a graph $G$, a string contact representation $Ψ$ of $G$ maps the vertices of $G$ to interior disjoint axis-aligned strings, where no three strings meet at a point, and two strings share a common point if and only if their corresponding vertices are adjacent in $G$. The complex… ▽ More An axis-aligned string is a simple polygonal path, where each line segment is parallel to an axis in $\mathbb{R}^3$. Given a graph $G$, a string contact representation $Ψ$ of $G$ maps the vertices of $G$ to interior disjoint axis-aligned strings, where no three strings meet at a point, and two strings share a common point if and only if their corresponding vertices are adjacent in $G$. The complexity of $Ψ$ is the minimum integer $r$ such that every string in $Ψ$ is a $B_r$-string, i.e., a string with at most $r$ bends. While a result of Duncan et al. implies that every graph $G$ with maximum degree 4 has a string contact representation using $B_4$-strings, we examine constraints on $G$ that allow string contact representations with complexity 3, 2 or 1. We prove that if $G$ is Hamiltonian and triangle-free, then $G$ admits a contact representation where all the strings but one are $B_3$-strings. If $G$ is 3-regular and bipartite, then $G$ admits a contact representation with string complexity 2, and if we further restrict $G$ to be Hamiltonian, then $G$ has a contact representation, where all the strings but one are $B_1$-strings (i.e., $L$-shapes). Finally, we prove some complementary lower bounds on the complexity of string contact representations. △ Less

Submitted 10 July, 2017; originally announced July 2017.

MSC Class: 68Q25; 65D18 ACM Class: I.3.5

arXiv:1706.09086 [pdf, other]

On Compatible Triangulations with a Minimum Number of Steiner Points

Authors: Anna Lubiw, Debajyoti Mondal

Abstract: Two vertex-labelled polygons are \emph{compatible} if they have the same clockwise cyclic ordering of vertices. The definition extends to polygonal regions (polygons with holes) and to triangulations---for every face, the clockwise cyclic order of vertices on the boundary must be the same. It is known that every pair of compatible $n$-vertex polygonal regions can be extended to compatible triangul… ▽ More Two vertex-labelled polygons are \emph{compatible} if they have the same clockwise cyclic ordering of vertices. The definition extends to polygonal regions (polygons with holes) and to triangulations---for every face, the clockwise cyclic order of vertices on the boundary must be the same. It is known that every pair of compatible $n$-vertex polygonal regions can be extended to compatible triangulations by adding $O(n^2)$ Steiner points. Furthermore, $Ω(n^2)$ Steiner points are sometimes necessary, even for a pair of polygons. Compatible triangulations provide piecewise linear homeomorphisms and are also a crucial first step in morphing planar graph drawings, aka "2D shape animation". An intriguing open question, first posed by Aronov, Seidel, and Souvaine in 1993, is to decide if two compatible polygons have compatible triangulations with at most $k$ Steiner points. In this paper we prove the problem to be NP-hard for polygons with holes. The question remains open for simple polygons. △ Less

Submitted 27 June, 2017; originally announced June 2017.

Comments: A preliminary version appeared at the 29th Canadian Conference on Computational Geometry (CCCG 2017)

MSC Class: 68Q25; 65D18 ACM Class: I.3.5

arXiv:1705.05479 [pdf, other]

A New Approach to GraphMaps, a System Browsing Large Graphs as Interactive Maps

Authors: Debajyoti Mondal, Lev Nachmanson

Abstract: A GraphMaps is a system that visualizes a graph using zoom levels, which is similar to a geographic map visualization. GraphMaps reveals the structural properties of the graph and enables users to explore the graph in a natural way by using the standard zoom and pan operations. The available implementation of GraphMaps faces many challenges such as the number of zoom levels may be large, nodes may… ▽ More A GraphMaps is a system that visualizes a graph using zoom levels, which is similar to a geographic map visualization. GraphMaps reveals the structural properties of the graph and enables users to explore the graph in a natural way by using the standard zoom and pan operations. The available implementation of GraphMaps faces many challenges such as the number of zoom levels may be large, nodes may be unevenly distributed to different levels, shared edges may create ambiguity due to the selection of multiple nodes. In this paper, we develop an algorithmic framework to construct GraphMaps from any given mesh (generated from a 2D point set), and for any given number of zoom levels. We demonstrate our approach introducing competition mesh, which is simple to construct, has a low dilation and high angular resolution. We present an algorithm for assigning nodes to zoom levels that minimizes the change in the number of nodes on visible on the screen while the user zooms in and out between the levels. We think that keeping this change small facilitates smooth browsing of the graph. We also propose new node selection techniques to cope with some of the challenges of the GraphMaps approach. △ Less

Submitted 12 August, 2018; v1 submitted 15 May, 2017; originally announced May 2017.

Comments: Appears in the Proceedings of the 8th International Conference on Information Visualization Theory and Applications (IVAPP 2018)

MSC Class: 97R99 ACM Class: I.3.5; I.5.5; D.4.7

arXiv:1702.08380 [pdf, other]

Exploring Increasing-Chord Paths and Trees

Authors: Yeganeh Bahoo, Stephane Durocher, Sahar Mehrpour, Debajyoti Mondal

Abstract: A straight-line drawing $Γ$ of a graph $G=(V,E)$ is a drawing of $G$ in the Euclidean plane, where every vertex in $G$ is mapped to a distinct point, and every edge in $G$ is mapped to a straight line segment between their endpoints. A path $P$ in $Γ$ is called increasing-chord if for every four points (not necessarily vertices) $a,b,c,d$ on $P$ in this order, the Euclidean distance between $b,c$… ▽ More A straight-line drawing $Γ$ of a graph $G=(V,E)$ is a drawing of $G$ in the Euclidean plane, where every vertex in $G$ is mapped to a distinct point, and every edge in $G$ is mapped to a straight line segment between their endpoints. A path $P$ in $Γ$ is called increasing-chord if for every four points (not necessarily vertices) $a,b,c,d$ on $P$ in this order, the Euclidean distance between $b,c$ is at most the Euclidean distance between $a,d$. A spanning tree $T$ rooted at some vertex $r$ in $Γ$ is called increasing-chord if $T$ contains an increasing-chord path from $r$ to every vertex in $T$. In this paper we prove that given a vertex $r$ in a straight-line drawing $Γ$, it is NP-complete to determine whether $Γ$ contains an increasing-chord spanning tree rooted at $r$. We conjecture that finding an increasing-chord path between a pair of vertices in $Γ$, which is an intriguing open problem posed by Alamdari et al., is also NP-complete, and show a (non-polynomial) reduction from the 3-SAT problem. △ Less

Submitted 1 July, 2017; v1 submitted 27 February, 2017; originally announced February 2017.

Comments: A preliminary version appeared at the 29th Canadian Conference on Computational Geometry (CCCG 2017)

MSC Class: 68Q25; 65D18 ACM Class: I.3.5

arXiv:1607.05422 [pdf, ps, other]

A Novel Information Theoretic Framework for Finding Semantic Similarity in WordNet

Authors: Abhijit Adhikari, Shivang Singh, Deepjyoti Mondal, Biswanath Dutta, Animesh Dutta

Abstract: Information content (IC) based measures for finding semantic similarity is gaining preferences day by day. Semantics of concepts can be highly characterized by information theory. The conventional way for calculating IC is based on the probability of appearance of concepts in corpora. Due to data sparseness and corpora dependency issues of those conventional approaches, a new corpora independent i… ▽ More Information content (IC) based measures for finding semantic similarity is gaining preferences day by day. Semantics of concepts can be highly characterized by information theory. The conventional way for calculating IC is based on the probability of appearance of concepts in corpora. Due to data sparseness and corpora dependency issues of those conventional approaches, a new corpora independent intrinsic IC calculation measure has evolved. In this paper, we mainly focus on such intrinsic IC model and several topological aspects of the underlying ontology. Accuracy of intrinsic IC calculation and semantic similarity measure rely on these aspects deeply. Based on these analysis we propose an information theoretic framework which comprises an intrinsic IC calculator and a semantic similarity model. Our approach is compared with state of the art semantic similarity measures based on corpora dependent IC calculation as well as intrinsic IC based methods using several benchmark data set. We also compare our model with the related Edge based, Feature based and Distributional approaches. Experimental results show that our intrinsic IC model gives high correlation value when applied to different semantic similarity models. Our proposed semantic similarity model also achieves significant results when embedded with some state of the art IC models including ours. △ Less

Submitted 19 July, 2016; originally announced July 2016.

Showing 1–50 of 53 results for author: Mondal, D