eXtensible markup language (XML) appeared internationally as the format for data representation over the web. Yet, most organizations are still utilising relational databases as their database solutions. As such, it is crucial to provide seamless integration via effective transformation between these database infrastructures. In this paper, we propose XML-REG to bridge these two technologies based on node-based and path-based approaches. The node-based approach is good to annotate each positional node uniquely, while the path-based approach provides summarised path information to join the nodes. On top of that, a new range labelling is also proposed to annotate nodes uniquely by ensuring the structural relationships are maintained between nodes. If a new node is to be added to the document, re-labelling is not required as the new label will be assigned to the node via the new proposed labelling scheme. Experimental evaluations indicated that the performance of XML-REG exceeded XMap, XRecursive, XAncestor and Mini-XML concerning storing time, query retrieval time and scalability. This research produces a core framework for XML to relational databases (RDB) mapping, which could be adopted in various industries.
Extracting interesting knowledge from versions of dynamic xml documents
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
IRJET- Clustering of Hierarchical Documents based on the Similarity Deduc...
This document discusses techniques for clustering hierarchical documents based on their structural similarity. It summarizes several existing approaches:
1) A tree edit distance-based method that represents trees as paths and computes the distance between subtrees. However, it requires trees to have a pre-specified structure.
2) Chawathe's algorithm that uses pre-order tree traversal and transforms trees into sequences of node labels and depths to calculate distances. It allows efficient assignment of new documents to clusters.
3) The XCLSC algorithm that clusters documents in two phases - grouping structurally similar documents and then searching to further improve clustering results and performance. However, it has high computational requirements.
4) The XPattern and PathXP
An Efficient and Fault Tolerant Data Replica Placement Technique for Cloud ba...
The growth of internet of things and wireless technology has led to enormous generation of data for various application uses such as healthcare, scientific and data intensive application. Cloud based Storage Area Network (SAN) has been widely in recent time for storing and processing these data. Providing fault tolerant and continuous access to data with minimal latency and cost is challenging. For that efficient fault tolerant mechanism is required. Data replication is an efficient mechanism for providing fault tolerant mechanism that has been considered by exiting methodologies. However, data replica placement is challenging and existing method are not efficient considering application dynamic requirement of cloud based storage area network. Thus, incurring latency, due to which induce higher cost of data transmission. This work present an efficient replica placement and transmission technique using Bipartite Graph based Data Replica Placement (BGDRP) technique that aid in minimizing latency and computing cost. Performance of BGDRP is evaluated using real-time scientific application workflow. The outcome shows BGDRP technique minimize data access latency, computation time and cost over state-of-art technique.
Bitmap Indexes for Relational XML Twig Query Processing
This document proposes bitmap indexes to improve the processing of XML twig queries on shredded XML data stored in relational database tables. It introduces several bitmap indexes built on the tag, path, and tag+level domains that provide quick access to relevant XML elements. A hybrid "bitTwig" index is also presented that can find all instances matching a twig query using only a pair of cursor-enabled bitmap indexes, without accessing the actual XML data stored in the tables. Experiments show this bitTwig index outperforms existing indexing approaches in most cases.
New proximity estimate for incremental update of non uniformly distributed cl...
The conventional clustering algorithms mine static databases and generate a set of patterns in the form of
clusters. Many real life databases keep growing incrementally. For such dynamic databases, the patterns
extracted from the original database become obsolete. Thus the conventional clustering algorithms are not
suitable for incremental databases due to lack of capability to modify the clustering results in accordance
with recent updates. In this paper, the author proposes a new incremental clustering algorithm called
CFICA(Cluster Feature-Based Incremental Clustering Approach for numerical data) to handle numerical
data and suggests a new proximity metric called Inverse Proximity Estimate (IPE) which considers the
proximity of a data point to a cluster representative as well as its proximity to a farthest point in its vicinity.
CFICA makes use of the proposed proximity metric to determine the membership of a data point into a
cluster.
XML is a standard of data exchange between web applications such as in e-commerce, elearning
and other web portals. The data volume has grown substantially in the web and in
order to effectively retrieve or store these data, it is recommended to be physically or virtually
fragmented and distributed into different nodes. Basically, fragmentation design contains of
two parts: fragmentation operation and fragmentation method. There are three different kinds
of fragmentation operation: Horizontal, Vertical and Hybrid, determines how the XML should
be fragmented. The aim of this paper is to give an overview on the fragmentation design
consideration.
This thesis focuses on classification and clustering techniques using kernel density estimates that can be efficiently implemented using P-trees. Chapter 1 introduces the topics of data mining, classification, clustering, and P-trees. Chapter 2 analyzes bit-column-based data organization and P-trees. Chapter 3 describes P-trees and a new sorting scheme. Chapters 4-7 present various classification and clustering algorithms developed using the P-tree framework, including a kernel-based classifier, a decision tree approach, a semi-naive Bayes classifier, and a hierarchical clustering method. Chapter 8 concludes the thesis.
This document discusses GCUBE indexing, which is a method for indexing and aggregating spatial/continuous values in a data warehouse. The key challenges addressed are defining and aggregating spatial/continuous values, and efficiently representing, indexing, updating and querying data that includes both categorical and continuous dimensions. The proposed GCUBE approach maps multi-dimensional data to a linear ordering using the Hilbert curve, and then constructs an index structure on the ordered data to enable efficient query processing. Empirical results show the GCUBE indexing offers significant performance advantages over alternative approaches.
XML COMPACTION IMPROVEMENTS BASED ON BINARY STRING ENCODINGSijdms
Due to the flexibility and the easy use of XML, it is nowadays widely used in a vast number of application areas and new information is increasingly being encoded as XML documents. Therefore, it is important to provide a repository for XML documents, which supports efficient management and storage of XML data.For this purpose, many proposals have been made, the most common ones are node labeling schemes. On
the other hand, XML repeatedly uses tags to describe the data itself. This self-describing nature of XML makes it verbose with the result that the storage requirements of XML are often expanded and can be excessive. In addition, the increased size leads to increased costs for data manipulation. Therefore, it also seems natural to use compression techniques to increase the efficiency of storing and querying XML data.
In our previous works, we aimed at combining the advantages of both areas (labeling and compaction technologies), Specially, we took advantage of XML structural peculiarities for attempting to reduce storage space requirements and to improve the efficiency of XML query processing using labeling schemes. In this paper, we continue our investigations on variations of binary string encoding forms to decrease the
label size. Also We report the experimental results to examine the impact of binary string encoding on the query performance and the storage size needed to store the compacted XML documents.
Extracting interesting knowledge from versions of dynamic xml documentseSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
IRJET- Clustering of Hierarchical Documents based on the Similarity Deduc...IRJET Journal
This document discusses techniques for clustering hierarchical documents based on their structural similarity. It summarizes several existing approaches:
1) A tree edit distance-based method that represents trees as paths and computes the distance between subtrees. However, it requires trees to have a pre-specified structure.
2) Chawathe's algorithm that uses pre-order tree traversal and transforms trees into sequences of node labels and depths to calculate distances. It allows efficient assignment of new documents to clusters.
3) The XCLSC algorithm that clusters documents in two phases - grouping structurally similar documents and then searching to further improve clustering results and performance. However, it has high computational requirements.
4) The XPattern and PathXP
The growth of internet of things and wireless technology has led to enormous generation of data for various application uses such as healthcare, scientific and data intensive application. Cloud based Storage Area Network (SAN) has been widely in recent time for storing and processing these data. Providing fault tolerant and continuous access to data with minimal latency and cost is challenging. For that efficient fault tolerant mechanism is required. Data replication is an efficient mechanism for providing fault tolerant mechanism that has been considered by exiting methodologies. However, data replica placement is challenging and existing method are not efficient considering application dynamic requirement of cloud based storage area network. Thus, incurring latency, due to which induce higher cost of data transmission. This work present an efficient replica placement and transmission technique using Bipartite Graph based Data Replica Placement (BGDRP) technique that aid in minimizing latency and computing cost. Performance of BGDRP is evaluated using real-time scientific application workflow. The outcome shows BGDRP technique minimize data access latency, computation time and cost over state-of-art technique.
Bitmap Indexes for Relational XML Twig Query ProcessingKyong-Ha Lee
This document proposes bitmap indexes to improve the processing of XML twig queries on shredded XML data stored in relational database tables. It introduces several bitmap indexes built on the tag, path, and tag+level domains that provide quick access to relevant XML elements. A hybrid "bitTwig" index is also presented that can find all instances matching a twig query using only a pair of cursor-enabled bitmap indexes, without accessing the actual XML data stored in the tables. Experiments show this bitTwig index outperforms existing indexing approaches in most cases.
New proximity estimate for incremental update of non uniformly distributed cl...IJDKP
The conventional clustering algorithms mine static databases and generate a set of patterns in the form of
clusters. Many real life databases keep growing incrementally. For such dynamic databases, the patterns
extracted from the original database become obsolete. Thus the conventional clustering algorithms are not
suitable for incremental databases due to lack of capability to modify the clustering results in accordance
with recent updates. In this paper, the author proposes a new incremental clustering algorithm called
CFICA(Cluster Feature-Based Incremental Clustering Approach for numerical data) to handle numerical
data and suggests a new proximity metric called Inverse Proximity Estimate (IPE) which considers the
proximity of a data point to a cluster representative as well as its proximity to a farthest point in its vicinity.
CFICA makes use of the proposed proximity metric to determine the membership of a data point into a
cluster.
XML is a standard of data exchange between web applications such as in e-commerce, elearning
and other web portals. The data volume has grown substantially in the web and in
order to effectively retrieve or store these data, it is recommended to be physically or virtually
fragmented and distributed into different nodes. Basically, fragmentation design contains of
two parts: fragmentation operation and fragmentation method. There are three different kinds
of fragmentation operation: Horizontal, Vertical and Hybrid, determines how the XML should
be fragmented. The aim of this paper is to give an overview on the fragmentation design
consideration.
This document presents a new link-based approach for improving categorical data clustering through cluster ensembles. It transforms categorical data matrices into numerical representations to apply graph partitioning techniques. The approach uses a Weighted Triple-Quality similarity algorithm to construct the representation and measure cluster similarity. An experimental evaluation shows the link-based method outperforms traditional categorical clustering algorithms and benchmark ensemble techniques on several real datasets in terms of accuracy, normalized mutual information, and adjusted rand index.
USING ONTOLOGIES TO IMPROVE DOCUMENT CLASSIFICATION WITH TRANSDUCTIVE SUPPORT...IJDKP
Many applications of automatic document classification require learning accurately with little training
data. The semi-supervised classification technique uses labeled and unlabeled data for training. This
technique has shown to be effective in some cases; however, the use of unlabeled data is not always
beneficial.
On the other hand, the emergence of web technologies has originated the collaborative development of
ontologies. In this paper, we propose the use of ontologies in order to improve the accuracy and efficiency
of the semi-supervised document classification.
We used support vector machines, which is one of the most effective algorithms that have been studied for
text. Our algorithm enhances the performance of transductive support vector machines through the use of
ontologies. We report experimental results applying our algorithm to three different datasets. Our
experiments show an increment of accuracy of 4% on average and up to 20%, in comparison with the
traditional semi-supervised model.
SOURCE CODE RETRIEVAL USING SEQUENCE BASED SIMILARITYIJDKP
This document summarizes an approach to improve source code retrieval using structural information from source code. A lexical parser is developed to extract control statements and method identifiers from Java programs. A similarity measure is proposed that calculates the ratio of fully matching statements to partially matching statements in a sequence. Experiments show the retrieval model using this measure improves retrieval performance over other models by up to 90.9% relative to the number of retrieved methods.
Hex-Cell is an interconnection network that has attractive features like the embedding capability of topological structures; such as; bus, ring, tree and mesh topologies. In this paper, we present two algorithms for embedding bus and ring topologies onto Hex-Cell interconnection network. We use three metrics to evaluate our proposed algorithms: dilation, congestion, and expansion. Our evaluation results
show that the congestion of our two proposed algorithms is equal to one; and the dilation is equal to 2d-1 for the first algorithm and 1 for the second.
MAP REDUCE BASED ON CLOAK DHT DATA REPLICATION EVALUATIONijdms
Distributed databases and data replication are effective ways to increase the accessibility and reliability of
un-structured, semi-structured and structured data to extract new knowledge. Replications offer better
performance and greater availability of data. With the advent of Big Data, new storage and processing
challenges are emerging.
To meet these challenges, Hadoop and DHTs compete in the storage domain and MapReduce and others in
distributed processing, with their strengths and weaknesses.
We propose an analysis of the circular and radial replication mechanisms of the CLOAK DHT. We
evaluate their performance through a comparative study of data from simulations. The results show that
radial replication is better in storage, unlike circular replication, which gives better search results.
This document describes Dremel, an interactive query system for analyzing large nested datasets. Dremel uses a multi-level execution tree to parallelize queries across thousands of CPUs. It stores nested data in a novel columnar format that improves performance by only reading relevant columns from storage. Dremel has been in production at Google since 2006 and is used by thousands of users to interactively analyze datasets containing trillions of records.
TRANSFORMATION RULES FOR BUILDING OWL ONTOLOGIES FROM RELATIONAL DATABASEScsandit
Relational Databases (RDB) are used as the backend database by most of information systems.
RDB encapsulate conceptual model and metadata needed in the ontology construction. Schema
mapping is a technique that is used by all existing approaches for ontology building from RDB.
However, most of those methods use poor transformation rules that prevent advanced database
mining for building rich ontologies. In this paper, we propose transformation rules for building
owl ontologies from RDBs. It allows transforming all possible cases in RDBs into ontological
constructs. The proposed rules are enriched by analyzing stored data to detect disjointness and
totalness constraints in hierarchies, and calculating the participation level of tables in n-ary
relations. In addition, our technique is generic; hence it can be applied to any RDB. The
proposed rules were evaluated using a normalized and open RDB. The obtained ontology is
richer in terms of non- taxonomic relationships.
The document discusses advanced database management systems (ADBMS). It provides background on how databases have become essential in modern society and outlines new applications like multimedia databases, geographic information systems, and data warehouses. The document then covers the history of database applications from early hierarchical and network systems to relational databases and object-oriented databases needed for e-commerce. It also discusses how database capabilities have been extended to support new applications involving scientific data, images, videos, data mining, spatial data, and time series data.
Column store databases approaches and optimization techniquesIJDKP
Column-Stores database stores data column-by-column. The need for Column-Stores database arose for
the efficient query processing in read-intensive relational databases. Also, for read-intensive relational
databases,extensive research has performed for efficient data storage and query processing. This paper
gives an overview of storage and performance optimization techniques used in Column-Stores.
Towards a new hybrid approach for building documentoriented data warehIJECEIAES
Schemaless databases offer a large storage capacity while guaranteeing high performance in data processing. Unlike relational databases, which are rigid and have shown their limitations in managing large amounts of data. However, the absence of a well-defined schema and structure in not only SQL (NoSQL) databases makes the use of data for decision analysis purposes even more complex and difficult. In this paper, we propose an original approach to build a document-oriented data warehouse from unstructured data. The new approach follows a hybrid paradigm that combines data analysis and user requirements analysis. The first data-driven step exploits the fast and distributed processing of the spark engine to generate a general schema for each collection in the database. The second requirement-driven step consists of analyzing the semantics of the decisional requirements expressed in natural language and mapping them to the schemas of the collections. At the end of the process, a decisional schema is generated in JavaScript object notation (JSON) format and the data loading with the necessary transformations is performed.
ENHANCING KEYWORD SEARCH OVER RELATIONAL DATABASES USING ONTOLOGIEScsandit
This document summarizes a research paper that proposes a system to enhance keyword search over relational databases using ontologies. The system builds structures during pre-processing like a reachability index to store connectivity information and an ontology concept graph. During querying, it maps keywords to concepts, uses the ontology to find related concepts and tuples, and generates top-k answer trees combining syntactic and semantic matches while limiting redundant results. The system is expected to perform better than existing approaches by reducing storage requirements through its approach to materializing neighborhood information in the reachability index.
Enhancing keyword search over relational databases using ontologiescsandit
Keyword Search Over Relational Databases (KSORDB) provides an easy way for casual users
to access relational databases using a set of keywords. Although much research has been done
and several prototypes have been developed recently, most of this research implements exact
(also called syntactic or keyword) match. So, if there is a vocabulary mismatch, the user cannot
get an answer although the database may contain relevant data. In this paper we propose a
system that overcomes this issue. Our system extends existing schema-free KSORDB systems
with semantic match features. So, if there were no or very few answers, our system exploits
domain ontology to progressively return related terms that can be used to retrieve more
relevant answers to user.
ENHANCING KEYWORD SEARCH OVER RELATIONAL DATABASES USING ONTOLOGIES cscpconf
Keyword Search Over Relational Databases (KSORDB) provides an easy way for casual users to access relational databases using a set of keywords. Although much research has been done and several prototypes have been developed recently, most of this research implements exact also called syntactic or keyword) match. So, if there is a vocabulary mismatch, the user cannotget an answer although the database may contain relevant data. In this paper we propose a
system that overcomes this issue. Our system extends existing schema-free KSORDB systems with semantic match features. So, if there were no or very few answers, our system exploits
domain ontology to progressively return related terms that can be used to retrieve morerelevant answers to user.
This document summarizes an article from the International Journal of Computer Engineering and Technology (IJCET) that proposes an algorithm called Replica Placement in Graph Topology Grid (RPGTG) to optimally place data replicas in a graph-based data grid while ensuring quality of service (QoS). The algorithm aims to minimize data access time, balance load among replica servers, and avoid unnecessary replications, while restricting QoS in terms of number of hops and deadline to complete requests. The article describes how the algorithm converts the graph structure of the data grid to a hierarchical structure to better manage replica servers and proposes services to facilitate dynamic replication, including a replica catalog to track replica locations and a replica manager to perform replication
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...acijjournal
Apriori is one of the key algorithms to generate frequent itemsets. Analysing frequent itemset is a crucial
step in analysing structured data and in finding association relationship between items. This stands as an
elementary foundation to supervised learning, which encompasses classifier and feature extraction
methods. Applying this algorithm is crucial to understand the behaviour of structured data. Most of the
structured data in scientific domain are voluminous. Processing such kind of data requires state of the art
computing machines. Setting up such an infrastructure is expensive. Hence a distributed environment
such as a clustered setup is employed for tackling such scenarios. Apache Hadoop distribution is one of
the cluster frameworks in distributed environment that helps by distributing voluminous data across a
number of nodes in the framework. This paper focuses on map/reduce design and implementation of
Apriori algorithm for structured data analysis.
AN ENTROPIC OPTIMIZATION TECHNIQUE IN HETEROGENEOUS GRID COMPUTING USING BION...ijcsit
This document summarizes a research paper that proposes a new method for improving both fault tolerance and load balancing in grid computing networks. The method converts the tree structure of grid computing nodes into a distributed R-tree index structure and then applies an entropy estimation technique. This entropy estimation helps discard nodes with high entropy from the tree, reducing complexity. The method then uses thresholding and control algorithms to select optimal route paths based on load balance and fault tolerance. Various optimization techniques like genetic algorithms, ant colony optimization, and particle swarm optimization are also applied to reach better solutions. Experimental results showed the proposed method improved performance over other existing methods.
The document proposes using an A* algorithm along with a relational framework to more efficiently calculate shortest paths in graph data stored in a relational database. The system initializes a source node, then iteratively selects the next frontier node and expands paths until the target node is found. Experimental results on road network data show the proposed approach has faster execution time than bidirectional search, especially on larger datasets containing over 500,000 records. The approach requires more memory than bidirectional search but is more efficient than other shortest path algorithms.
MULTIDIMENSIONAL ANALYSIS FOR QOS IN WIRELESS SENSOR NETWORKSijcses
Nodes in Mobile Ad-hoc network are connected wirelessly and the network is auto configuring [1]. This paper introduces the usefulness of data warehouse as an alternative to manage data collected by WSN.Wireless Sensor Network produces huge quantity of data that need to be proceeded and homogenised, so as to help researchers and other people interested in the information. Collected data is managed and compared with other coming from datasources and systems could participate in technical report and decision making. This paper proposes a model to design, extract, transform and normalize data collected by Wireless Sensor Networks by implementing a multidimensional warehouse for comparing many aspects in WSN such as (routing protocol[4], sensor, sensor mobility, cluster ….). Hence, data warehouse defined and applied to the context above is presented as a useful approach that gives specialists row data and information for decision processes and navigate from one aspect to another.
Data Partitioning in Mongo DB with CloudIJAAS Team
Cloud computing offers various and useful services like IAAS, PAAS SAAS for deploying the applications at low cost. Making it available anytime anywhere with the expectation to be it scalable and consistent. One of the technique to improve the scalability is Data partitioning. The alive techniques which are used are not that capable to track the data access pattern. This paper implements the scalable workload-driven technique for polishing the scalability of web applications. The experiments are carried out over cloud using NoSQL data store MongoDB to scale out. This approach offers low response time, high throughput and less number of distributed transaction. The result of partitioning technique is conducted and evaluated using TPC-C benchmark.
IRJET- Review of Existing Methods in K-Means Clustering AlgorithmIRJET Journal
The document reviews existing methods for the k-means clustering algorithm. It discusses how k-means clustering works and some of its limitations when dealing with large datasets, such as being dependent on the initial choice of centroids. It then proposes using Hadoop to overcome big data challenges and calculate preliminary centroids for k-means clustering in a distributed manner. Finally, it reviews different techniques that have been proposed in other research to improve k-means clustering, such as methods for selecting better initial centroids or determining the optimal number of clusters.
Query Optimization Techniques in Graph Databasesijdms
Graph databases (GDB) have recently been arisen to overcome the limits of traditional databases for
storing and managing data with graph-like structure. Today, they represent a requirementfor many
applications that manage graph-like data,like social networks.Most of the techniques, applied to optimize
queries in graph databases, have been used in traditional databases, distribution systems,… or they are
inspired from graph theory. However, their reuse in graph databases should take care of the main
characteristics of graph databases, such as dynamic structure, highly interconnected data, and ability to
efficiently access data relationships. In this paper, we survey the query optimization techniques in graph
databases. In particular,we focus on the features they have in
TASK-DECOMPOSITION BASED ANOMALY DETECTION OF MASSIVE AND HIGH-VOLATILITY SES...ijdpsjournal
This document summarizes a research paper that presents a task-decomposition based anomaly detection system for analyzing massive and highly volatile session data from the Science Information Network (SINET), Japan's academic backbone network. The system uses a master-worker design with dynamic task scheduling to process over 1 billion sessions per day. It discriminates incoming and outgoing traffic using GPU parallelization and generates histograms of traffic volumes over time. Long short-term memory (LSTM) neural networks detect anomalies like spikes in incoming traffic volumes. The experiment analyzed SINET data from February 27 to March 8, 2021, detecting some anomalies while processing 500-650 gigabytes of daily session data.
Growth of relational model: Interdependence and complementary to big data IJECEIAES
A database management system is a constant application of science that provides a platform for the creation, movement, and use of voluminous data. The area has witnessed a series of developments and technological advancements from its conventional structured database to the recent buzzword, bigdata. This paper aims to provide a complete model of a relational database that is still being widely used because of its well known ACID properties namely, atomicity, consistency, integrity and durability. Specifically, the objective of this paper is to highlight the adoption of relational model approaches by bigdata techniques. Towards addressing the reason for this in corporation, this paper qualitatively studied the advancements done over a while on the relational data model. First, the variations in the data storage layout are illustrated based on the needs of the application. Second, quick data retrieval techniques like indexing, query processing and concurrency control methods are revealed. The paper provides vital insights to appraise the efficiency of the structured database in the unstructured environment, particularly when both consistency and scalability become an issue in the working of the hybrid transactional and analytical database management system.
Big data is the latest industry buzzword to describe large volume of structured and unstructured data that can be difficult to process and analyze. Most of organization looking for the best approach to manage and analyze the large volume of data especially in making a decision. XML is chosen by many organization because of powerful approach during retrieval and storage processes. However, XML approach, the execution time for retrieving large volume of data are still considerably inefficient due to several factors. In this contribution, two databases approaches namely Extensible Markup Language (XML) and Java Object Notation (JSON) were investigated to evaluate their suitability for handling thousands records of publication data. The results showed JSON is the best choice for query retrieving speed and CPU usage. These are essential to cope with the characteristics of publication’s data. Whilst, XML and JSON technologies are relatively new to date in comparison to the relational database. Indeed, JSON technology demonstrates greater potential to become a key database technology for handling huge data due to increase of data annually.
This document summarizes an article from the International Journal of Computer Engineering and Technology (IJCET) that proposes a new dynamic data replication and job scheduling strategy for data grids. The strategy aims to improve data access time and reduce bandwidth consumption by replicating data based on file popularity, storage limitations at nodes, and data category. It replicates more popular files that are in the same category as frequently accessed data to nodes close to where jobs are run. This is intended to optimize performance by locating data and jobs close together. The document provides context on related work and outlines the proposed system architecture and replication/scheduling approach.
A New Architecture for Group Replication in Data GridEditor IJCATR
Nowadays, grid systems are vital technology for programs running with high performance and problems solving with largescale
in scientific, engineering and business. In grid systems, heterogeneous computational resources and data should be shared
between independent organizations that are scatter geographically. A data grid is a kind of grid types that make relations computational
and storage resources. Data replication is an efficient way in data grid to obtain high performance and high availability by saving
numerous replicas in different locations e.g. grid sites. In this research, we propose a new architecture for dynamic Group data
replication. In our architecture, we added two components to OptorSim architecture: Group Replication Management component
(GRM) and Management of Popular Files Group component (MPFG). OptorSim developed by European Data Grid projects for
evaluate replication algorithm. By using this architecture, popular files group will be replicated in grid sites at the end of each
predefined time interval.
A Survey of File Replication Techniques In Grid SystemsEditor IJCATR
Grid is a type of parallel and distributed systems that is designed to provide reliable access to data
and computational resources in wide area networks. These resources are distributed in different geographical
locations. Efficient data sharing in global networks is complicated by erratic node failure, unreliable network
connectivity and limited bandwidth. Replication is a technique used in grid systems to improve the
applications’ response time and to reduce the bandwidth consumption. In this paper, we present a survey on
basic and new replication techniques that have been proposed by other researchers. After that, we have a full
comparative study on these replication strategies.
This document provides a survey of file replication techniques used in grid systems. It begins with an introduction to grid systems and discusses their use of replication to improve response times and reduce bandwidth consumption. It then categorizes replication techniques as static or dynamic and describes challenges of replication including maintaining consistency and overhead. The document surveys various replication strategies for different grid topologies like peer-to-peer, tree and hybrid. It evaluates strategies based on factors like access latency, bandwidth consumption and fault tolerance. Specific replication techniques are discussed for peer-to-peer architectures aimed at availability, placement strategies and balancing workloads.
A Survey of File Replication Techniques In Grid SystemsEditor IJCATR
Grid is a type of parallel and distributed systems that is designed to provide reliable access to data
and computational resources in wide area networks. These resources are distributed in different geographical
locations. Efficient data sharing in global networks is complicated by erratic node failure, unreliable network
connectivity and limited bandwidth. Replication is a technique used in grid systems to improve the
applications’ response time and to reduce the bandwidth consumption. In this paper, we present a survey on
basic and new replication techniques that have been proposed by other researchers. After that, we have a full
comparative study on these replication strategies
Similar to Transforming data-centric eXtensible markup language into relational databases using hybrid approach (20)
Square transposition: an approach to the transposition process in block cipherjournalBEEI
The transposition process is needed in cryptography to create a diffusion effect on data encryption standard (DES) and advanced encryption standard (AES) algorithms as standard information security algorithms by the National Institute of Standards and Technology. The problem with DES and AES algorithms is that their transposition index values form patterns and do not form random values. This condition will certainly make it easier for a cryptanalyst to look for a relationship between ciphertexts because some processes are predictable. This research designs a transposition algorithm called square transposition. Each process uses square 8 × 8 as a place to insert and retrieve 64-bits. The determination of the pairing of the input scheme and the retrieval scheme that have unequal flow is an important factor in producing a good transposition. The square transposition can generate random and non-pattern indices so that transposition can be done better than DES and AES.
Hyper-parameter optimization of convolutional neural network based on particl...journalBEEI
The document proposes using a particle swarm optimization (PSO) algorithm to optimize the hyperparameters of a convolutional neural network (CNN) for image classification. The PSO algorithm is used to find optimal values for CNN hyperparameters like the number and size of convolutional filters. In experiments on the MNIST handwritten digit dataset, the optimized CNN achieved a testing error rate of 0.87%, which is competitive with state-of-the-art models. The proposed approach finds optimized CNN architectures automatically without requiring manual design or encoding strategies during training.
Supervised machine learning based liver disease prediction approach with LASS...journalBEEI
In this contemporary era, the uses of machine learning techniques are increasing rapidly in the field of medical science for detecting various diseases such as liver disease (LD). Around the globe, a large number of people die because of this deadly disease. By diagnosing the disease in a primary stage, early treatment can be helpful to cure the patient. In this research paper, a method is proposed to diagnose the LD using supervised machine learning classification algorithms, namely logistic regression, decision tree, random forest, AdaBoost, KNN, linear discriminant analysis, gradient boosting and support vector machine (SVM). We also deployed a least absolute shrinkage and selection operator (LASSO) feature selection technique on our taken dataset to suggest the most highly correlated attributes of LD. The predictions with 10 fold cross-validation (CV) made by the algorithms are tested in terms of accuracy, sensitivity, precision and f1-score values to forecast the disease. It is observed that the decision tree algorithm has the best performance score where accuracy, precision, sensitivity and f1-score values are 94.295%, 92%, 99% and 96% respectively with the inclusion of LASSO. Furthermore, a comparison with recent studies is shown to prove the significance of the proposed system.
A secure and energy saving protocol for wireless sensor networksjournalBEEI
The research domain for wireless sensor networks (WSN) has been extensively conducted due to innovative technologies and research directions that have come up addressing the usability of WSN under various schemes. This domain permits dependable tracking of a diversity of environments for both military and civil applications. The key management mechanism is a primary protocol for keeping the privacy and confidentiality of the data transmitted among different sensor nodes in WSNs. Since node's size is small; they are intrinsically limited by inadequate resources such as battery life-time and memory capacity. The proposed secure and energy saving protocol (SESP) for wireless sensor networks) has a significant impact on the overall network life-time and energy dissipation. To encrypt sent messsages, the SESP uses the public-key cryptography’s concept. It depends on sensor nodes' identities (IDs) to prevent the messages repeated; making security goals- authentication, confidentiality, integrity, availability, and freshness to be achieved. Finally, simulation results show that the proposed approach produced better energy consumption and network life-time compared to LEACH protocol; sensors are dead after 900 rounds in the proposed SESP protocol. While, in the low-energy adaptive clustering hierarchy (LEACH) scheme, the sensors are dead after 750 rounds.
Plant leaf identification system using convolutional neural networkjournalBEEI
This paper proposes a leaf identification system using convolutional neural network (CNN). This proposed system can identify five types of local Malaysia leaf which were acacia, papaya, cherry, mango and rambutan. By using CNN from deep learning, the network is trained from the database that acquired from leaf images captured by mobile phone for image classification. ResNet-50 was the architecture has been used for neural networks image classification and training the network for leaf identification. The recognition of photographs leaves requested several numbers of steps, starting with image pre-processing, feature extraction, plant identification, matching and testing, and finally extracting the results achieved in MATLAB. Testing sets of the system consists of 3 types of images which were white background, and noise added and random background images. Finally, interfaces for the leaf identification system have developed as the end software product using MATLAB app designer. As a result, the accuracy achieved for each training sets on five leaf classes are recorded above 98%, thus recognition process was successfully implemented.
Customized moodle-based learning management system for socially disadvantaged...journalBEEI
This study aims to develop Moodle-based LMS with customized learning content and modified user interface to facilitate pedagogical processes during covid-19 pandemic and investigate how teachers of socially disadvantaged schools perceived usability and technology acceptance. Co-design process was conducted with two activities: 1) need assessment phase using an online survey and interview session with the teachers and 2) the development phase of the LMS. The system was evaluated by 30 teachers from socially disadvantaged schools for relevance to their distance learning activities. We employed computer software usability questionnaire (CSUQ) to measure perceived usability and the technology acceptance model (TAM) with insertion of 3 original variables (i.e., perceived usefulness, perceived ease of use, and intention to use) and 5 external variables (i.e., attitude toward the system, perceived interaction, self-efficacy, user interface design, and course design). The average CSUQ rating exceeded 5.0 of 7 point-scale, indicated that teachers agreed that the information quality, interaction quality, and user interface quality were clear and easy to understand. TAM results concluded that the LMS design was judged to be usable, interactive, and well-developed. Teachers reported an effective user interface that allows effective teaching operations and lead to the system adoption in immediate time.
Understanding the role of individual learner in adaptive and personalized e-l...journalBEEI
Dynamic learning environment has emerged as a powerful platform in a modern e-learning system. The learning situation that constantly changing has forced the learning platform to adapt and personalize its learning resources for students. Evidence suggested that adaptation and personalization of e-learning systems (APLS) can be achieved by utilizing learner modeling, domain modeling, and instructional modeling. In the literature of APLS, questions have been raised about the role of individual characteristics that are relevant for adaptation. With several options, a new problem has been raised where the attributes of students in APLS often overlap and are not related between studies. Therefore, this study proposed a list of learner model attributes in dynamic learning to support adaptation and personalization. The study was conducted by exploring concepts from the literature selected based on the best criteria. Then, we described the results of important concepts in student modeling and provided definitions and examples of data values that researchers have used. Besides, we also discussed the implementation of the selected learner model in providing adaptation in dynamic learning.
Prototype mobile contactless transaction system in traditional markets to sup...journalBEEI
1) Researchers developed a prototype contactless transaction system using QR codes and digital payments to support physical distancing during the COVID-19 pandemic in traditional markets.
2) The system allows sellers and buyers in traditional markets to conduct fast, secure transactions via smartphones without direct cash exchange. Buyers scan sellers' QR codes to view product details and make e-wallet payments.
3) Testing showed the system's functions worked properly and users found it easy to use and useful for supporting contactless transactions and digital transformation of traditional markets. However, further development is needed to increase trust in digital payments for users unfamiliar with the technology.
Wireless HART stack using multiprocessor technique with laxity algorithmjournalBEEI
The use of a real-time operating system is required for the demarcation of industrial wireless sensor network (IWSN) stacks (RTOS). In the industrial world, a vast number of sensors are utilised to gather various types of data. The data gathered by the sensors cannot be prioritised ahead of time. Because all of the information is equally essential. As a result, a protocol stack is employed to guarantee that data is acquired and processed fairly. In IWSN, the protocol stack is implemented using RTOS. The data collected from IWSN sensor nodes is processed using non-preemptive scheduling and the protocol stack, and then sent in parallel to the IWSN's central controller. The real-time operating system (RTOS) is a process that occurs between hardware and software. Packets must be sent at a certain time. It's possible that some packets may collide during transmission. We're going to undertake this project to get around this collision. As a prototype, this project is divided into two parts. The first uses RTOS and the LPC2148 as a master node, while the second serves as a standard data collection node to which sensors are attached. Any controller may be used in the second part, depending on the situation. Wireless HART allows two nodes to communicate with each other.
Implementation of double-layer loaded on octagon microstrip yagi antennajournalBEEI
This document describes the implementation of a double-layer structure on an octagon microstrip yagi antenna (OMYA) to improve its performance at 5.8 GHz. The double-layer consists of two double positive (DPS) substrates placed above the OMYA. Simulation and experimental results show that the double-layer configuration increases the gain of the OMYA by 2.5 dB compared to without the double-layer. The measured bandwidth of the OMYA with double-layer is 14.6%, indicating the double-layer can increase both the gain and bandwidth of the OMYA.
The calculation of the field of an antenna located near the human headjournalBEEI
In this work, a numerical calculation was carried out in one of the universal programs for automatic electro-dynamic design. The calculation is aimed at obtaining numerical values for specific absorbed power (SAR). It is the SAR value that can be used to determine the effect of the antenna of a wireless device on biological objects; the dipole parameters will be selected for GSM1800. Investigation of the influence of distance to a cell phone on radiation shows that absorbed in the head of a person the effect of electromagnetic radiation on the brain decreases by three times this is a very important result the SAR value has decreased by almost three times it is acceptable results.
Exact secure outage probability performance of uplinkdownlink multiple access...journalBEEI
In this paper, we study uplink-downlink non-orthogonal multiple access (NOMA) systems by considering the secure performance at the physical layer. In the considered system model, the base station acts a relay to allow two users at the left side communicate with two users at the right side. By considering imperfect channel state information (CSI), the secure performance need be studied since an eavesdropper wants to overhear signals processed at the downlink. To provide secure performance metric, we derive exact expressions of secrecy outage probability (SOP) and and evaluating the impacts of main parameters on SOP metric. The important finding is that we can achieve the higher secrecy performance at high signal to noise ratio (SNR). Moreover, the numerical results demonstrate that the SOP tends to a constant at high SNR. Finally, our results show that the power allocation factors, target rates are main factors affecting to the secrecy performance of considered uplink-downlink NOMA systems.
Design of a dual-band antenna for energy harvesting applicationjournalBEEI
This report presents an investigation on how to improve the current dual-band antenna to enhance the better result of the antenna parameters for energy harvesting application. Besides that, to develop a new design and validate the antenna frequencies that will operate at 2.4 GHz and 5.4 GHz. At 5.4 GHz, more data can be transmitted compare to 2.4 GHz. However, 2.4 GHz has long distance of radiation, so it can be used when far away from the antenna module compare to 5 GHz that has short distance in radiation. The development of this project includes the scope of designing and testing of antenna using computer simulation technology (CST) 2018 software and vector network analyzer (VNA) equipment. In the process of designing, fundamental parameters of antenna are being measured and validated, in purpose to identify the better antenna performance.
Key performance requirement of future next wireless networks (6G)journalBEEI
The document provides an overview of the key performance indicators (KPIs) for 6G wireless networks compared to 5G networks. Some of the major KPIs discussed for 6G include: achieving data rates of up to 1 Tbps and individual user data rates up to 100 Gbps; reducing latency below 10 milliseconds; supporting up to 10 million connected devices per square kilometer; improving spectral efficiency by up to 100 times through technologies like terahertz communications and smart surfaces; and achieving an energy efficiency of 1 pico-joule per bit transmitted through techniques like wireless power transmission and energy harvesting. The document outlines how 6G aims to integrate terrestrial, aerial and maritime communications into a single network to provide ubiquitous connectivity with higher
Noise resistance territorial intensity-based optical flow using inverse confi...journalBEEI
This paper presents the use of the inverse confidential technique on bilateral function with the territorial intensity-based optical flow to prove the effectiveness in noise resistance environment. In general, the image’s motion vector is coded by the technique called optical flow where the sequences of the image are used to determine the motion vector. But, the accuracy rate of the motion vector is reduced when the source of image sequences is interfered by noises. This work proved that the inverse confidential technique on bilateral function can increase the percentage of accuracy in the motion vector determination by the territorial intensity-based optical flow under the noisy environment. We performed the testing with several kinds of non-Gaussian noises at several patterns of standard image sequences by analyzing the result of the motion vector in a form of the error vector magnitude (EVM) and compared it with several noise resistance techniques in territorial intensity-based optical flow method.
Modeling climate phenomenon with software grids analysis and display system i...journalBEEI
This study aims to model climate change based on rainfall, air temperature, pressure, humidity and wind with grADS software and create a global warming module. This research uses 3D model, define, design, and develop. The results of the modeling of the five climate elements consist of the annual average temperature in Indonesia in 2009-2015 which is between 29oC to 30.1oC, the horizontal distribution of the annual average pressure in Indonesia in 2009-2018 is between 800 mBar to 1000 mBar, the horizontal distribution the average annual humidity in Indonesia in 2009 and 2011 ranged between 27-57, in 2012-2015, 2017 and 2018 it ranged between 30-60, during the East Monsoon, the wind circulation moved from northern Indonesia to the southern region Indonesia. During the west monsoon, the wind circulation moves from the southern part of Indonesia to the northern part of Indonesia. The global warming module for SMA/MA produced is feasible to use, this is in accordance with the value given by the validate of 69 which is in the appropriate category and the response of teachers and students through a 91% questionnaire.
An approach of re-organizing input dataset to enhance the quality of emotion ...journalBEEI
The purpose of this paper is to propose an approach of re-organizing input data to recognize emotion based on short signal segments and increase the quality of emotional recognition using physiological signals. MIT's long physiological signal set was divided into two new datasets, with shorter and overlapped segments. Three different classification methods (support vector machine, random forest, and multilayer perceptron) were implemented to identify eight emotional states based on statistical features of each segment in these two datasets. By re-organizing the input dataset, the quality of recognition results was enhanced. The random forest shows the best classification result among three implemented classification methods, with an accuracy of 97.72% for eight emotional states, on the overlapped dataset. This approach shows that, by re-organizing the input dataset, the high accuracy of recognition results can be achieved without the use of EEG and ECG signals.
Parking detection system using background subtraction and HSV color segmentationjournalBEEI
Manual system vehicle parking makes finding vacant parking lots difficult, so it has to check directly to the vacant space. If many people do parking, then the time needed for it is very much or requires many people to handle it. This research develops a real-time parking system to detect parking. The system is designed using the HSV color segmentation method in determining the background image. In addition, the detection process uses the background subtraction method. Applying these two methods requires image preprocessing using several methods such as grayscaling, blurring (low-pass filter). In addition, it is followed by a thresholding and filtering process to get the best image in the detection process. In the process, there is a determination of the ROI to determine the focus area of the object identified as empty parking. The parking detection process produces the best average accuracy of 95.76%. The minimum threshold value of 255 pixels is 0.4. This value is the best value from 33 test data in several criteria, such as the time of capture, composition and color of the vehicle, the shape of the shadow of the object’s environment, and the intensity of light. This parking detection system can be implemented in real-time to determine the position of an empty place.
Quality of service performances of video and voice transmission in universal ...journalBEEI
The universal mobile telecommunications system (UMTS) has distinct benefits in that it supports a wide range of quality of service (QoS) criteria that users require in order to fulfill their requirements. The transmission of video and audio in real-time applications places a high demand on the cellular network, therefore QoS is a major problem in these applications. The ability to provide QoS in the UMTS backbone network necessitates an active QoS mechanism in order to maintain the necessary level of convenience on UMTS networks. For UMTS networks, investigation models for end-to-end QoS, total transmitted and received data, packet loss, and throughput providing techniques are run and assessed and the simulation results are examined. According to the results, appropriate QoS adaption allows for specific voice and video transmission. Finally, by analyzing existing QoS parameters, the QoS performance of 4G/UMTS networks may be improved.
A multi-task learning based hybrid prediction algorithm for privacy preservin...journalBEEI
There is ever increasing need to use computer vision devices to capture videos as part of many real-world applications. However, invading privacy of people is the cause of concern. There is need for protecting privacy of people while videos are used purposefully based on objective functions. One such use case is human activity recognition without disclosing human identity. In this paper, we proposed a multi-task learning based hybrid prediction algorithm (MTL-HPA) towards realising privacy preserving human activity recognition framework (PPHARF). It serves the purpose by recognizing human activities from videos while preserving identity of humans present in the multimedia object. Face of any person in the video is anonymized to preserve privacy while the actions of the person are exposed to get them extracted. Without losing utility of human activity recognition, anonymization is achieved. Humans and face detection methods file to reveal identity of the persons in video. We experimentally confirm with joint-annotated human motion data base (JHMDB) and daily action localization in YouTube (DALY) datasets that the framework recognises human activities and ensures non-disclosure of privacy information. Our approach is better than many traditional anonymization techniques such as noise adding, blurring, and masking.
Online music portal management system project report.pdfKamal Acharya
The iMMS is a unique application that is synchronizing both user
experience and copyrights while providing services like online music
management, legal downloads, artists’ management. There are several
other applications available in the market that either provides some
specific services or large scale integrated solutions. Our product differs
from the rest in a way that we give more power to the users remaining
within the copyrights circle.
How to Manage Internal Notes in Odoo 17 POSCeline George
In this slide, we'll explore how to leverage internal notes within Odoo 17 POS to enhance communication and streamline operations. Internal notes provide a platform for staff to exchange crucial information regarding orders, customers, or specific tasks, all while remaining invisible to the customer. This fosters improved collaboration and ensures everyone on the team is on the same page.
Natural Is The Best: Model-Agnostic Code Simplification for Pre-trained Large...YanKing2
Pre-trained Large Language Models (LLM) have achieved remarkable successes in several domains. However, code-oriented LLMs are often heavy in computational complexity, and quadratically with the length of the input code sequence. Toward simplifying the input program of an LLM, the state-of-the-art approach has the strategies to filter the input code tokens based on the attention scores given by the LLM. The decision to simplify the input program should not rely on the attention patterns of an LLM, as these patterns are influenced by both the model architecture and the pre-training dataset. Since the model and dataset are part of the solution domain, not the problem domain where the input program belongs, the outcome may differ when the model is trained on a different dataset. We propose SlimCode, a model-agnostic code simplification solution for LLMs that depends on the nature of input code tokens. As an empirical study on the LLMs including CodeBERT, CodeT5, and GPT-4 for two main tasks: code search and summarization. We reported that 1) the reduction ratio of code has a linear-like relation with the saving ratio on training time, 2) the impact of categorized tokens on code simplification can vary significantly, 3) the impact of categorized tokens on code simplification is task-specific but model-agnostic, and 4) the above findings hold for the paradigm–prompt engineering and interactive in-context learning and this study can save reduce the cost of invoking GPT-4 by 24%per API query. Importantly, SlimCode simplifies the input code with its greedy strategy and can obtain at most 133 times faster than the state-of-the-art technique with a significant improvement. This paper calls for a new direction on code-based, model-agnostic code simplification solutions to further empower LLMs.
Response & Safe AI at Summer School of AI at IIITHIIIT Hyderabad
Talk covering Guardrails , Jailbreak, What is an alignment problem? RLHF, EU AI Act, Machine & Graph unlearning, Bias, Inconsistency, Probing, Interpretability, Bias
OCS Training Institute is pleased to co-operate with
a Global provider of Rig Inspection/Audits,
Commission-ing, Compliance & Acceptance as well as
& Engineering for Offshore Drilling Rigs, to deliver
Drilling Rig Inspec-tion Workshops (RIW) which
teaches the inspection & maintenance procedures
required to ensure equipment integrity. Candidates
learn to implement the relevant standards &
understand industry requirements so that they can
verify the condition of a rig’s equipment & improve
safety, thus reducing the number of accidents and
protecting the asset.
Best Practices of Clothing Businesses in Talavera, Nueva Ecija, A Foundation ...IJAEMSJORNAL
This study primarily aimed to determine the best practices of clothing businesses to use it as a foundation of strategic business advancements. Moreover, the frequency with which the business's best practices are tracked, which best practices are the most targeted of the apparel firms to be retained, and how does best practices can be used as strategic business advancement. The respondents of the study is the owners of clothing businesses in Talavera, Nueva Ecija. Data were collected and analyzed using a quantitative approach and utilizing a descriptive research design. Unveiling best practices of clothing businesses as a foundation for strategic business advancement through statistical analysis: frequency and percentage, and weighted means analyzing the data in terms of identifying the most to the least important performance indicators of the businesses among all of the variables. Based on the survey conducted on clothing businesses in Talavera, Nueva Ecija, several best practices emerge across different areas of business operations. These practices are categorized into three main sections, section one being the Business Profile and Legal Requirements, followed by the tracking of indicators in terms of Product, Place, Promotion, and Price, and Key Performance Indicators (KPIs) covering finance, marketing, production, technical, and distribution aspects. The research study delved into identifying the core best practices of clothing businesses, serving as a strategic guide for their advancement. Through meticulous analysis, several key findings emerged. Firstly, prioritizing product factors, such as maintaining optimal stock levels and maximizing customer satisfaction, was deemed essential for driving sales and fostering loyalty. Additionally, selecting the right store location was crucial for visibility and accessibility, directly impacting footfall and sales. Vigilance towards competitors and demographic shifts was highlighted as essential for maintaining relevance. Understanding the relationship between marketing spend and customer acquisition proved pivotal for optimizing budgets and achieving a higher ROI. Strategic analysis of profit margins across clothing items emerged as crucial for maximizing profitability and revenue. Creating a positive customer experience, investing in employee training, and implementing effective inventory management practices were also identified as critical success factors. In essence, these findings underscored the holistic approach needed for sustainable growth in the clothing business, emphasizing the importance of product management, marketing strategies, customer experience, and operational efficiency.
Development of Chatbot Using AI/ML Technologiesmaisnampibarel
The rapid advancements in artificial intelligence and natural language processing have significantly transformed human-computer interactions. This thesis presents the design, development, and evaluation of an intelligent chatbot capable of engaging in natural and meaningful conversations with users. The chatbot leverages state-of-the-art deep learning techniques, including transformer-based architectures, to understand and generate human-like responses.
Key contributions of this research include the implementation of a context- aware conversational model that can maintain coherent dialogue over extended interactions. The chatbot's performance is evaluated through both automated metrics and user studies, demonstrating its effectiveness in various applications such as customer service, mental health support, and educational assistance. Additionally, ethical considerations and potential biases in chatbot responses are examined to ensure the responsible deployment of this technology.
The findings of this thesis highlight the potential of intelligent chatbots to enhance user experience and provide valuable insights for future developments in conversational AI.
Software Engineering and Project Management - Introduction to Project ManagementPrakhyath Rai
Introduction to Project Management: Introduction, Project and Importance of Project Management, Contract Management, Activities Covered by Software Project Management, Plans, Methods and Methodologies, some ways of categorizing Software Projects, Stakeholders, Setting Objectives, Business Case, Project Success and Failure, Management and Management Control, Project Management life cycle, Traditional versus Modern Project Management Practices.
A brief introduction to quadcopter (drone) working. It provides an overview of flight stability, dynamics, general control system block diagram, and the electronic hardware.
In May 2024, globally renowned natural diamond crafting company Shree Ramkrishna Exports Pvt. Ltd. (SRK) became the first company in the world to achieve GNFZ’s final net zero certification for existing buildings, for its two two flagship crafting facilities SRK House and SRK Empire. Initially targeting 2030 to reach net zero, SRK joined forces with the Global Network for Zero (GNFZ) to accelerate its target to 2024 — a trailblazing achievement toward emissions elimination.
Transforming data-centric eXtensible markup language into relational databases using hybrid approach
1. Bulletin of Electrical Engineering and Informatics
Vol. 10, No. 6, December 2021, pp. 3256~3264
ISSN: 2302-9285, DOI: 10.11591/eei.v10i6.2865 3256
Journal homepage: http://beei.org
Transforming data-centric eXtensible markup language into
relational databases using hybrid approach
Su-Cheng Haw, Emyliana Song
Faculty of Computing and Informatics, Multimedia University, 63100 Cyberjaya, Malaysia
Article Info ABSTRACT
Article history:
Received Apr 17, 2021
Revised Jul 19, 2021
Accepted Oct 12, 2021
eXtensible markup language (XML) appeared internationally as the format for
data representation over the web. Yet, most organizations are still utilising
relational databases as their database solutions. As such, it is crucial to
provide seamless integration via effective transformation between these
database infrastructures. In this paper, we propose XML-REG to bridge these
two technologies based on node-based and path-based approaches. The node-
based approach is good to annotate each positional node uniquely, while the
path-based approach provides summarised path information to join the nodes.
On top of that, a new range labelling is also proposed to annotate nodes
uniquely by ensuring the structural relationships are maintained between
nodes. If a new node is to be added to the document, re-labelling is not
required as the new label will be assigned to the node via the new proposed
labelling scheme. Experimental evaluations indicated that the performance of
XML-REG exceeded XMap, XRecursive, XAncestor and Mini-XML
concerning storing time, query retrieval time and scalability. This research
produces a core framework for XML to relational databases (RDB) mapping,
which could be adopted in various industries.
Keywords:
Model-based mapping
XML database
XML labelling
XML to RDB
XML transformation
This is an open access article under the CC BY-SA license.
Corresponding Author:
Su-Cheng Haw
Faculty of Computing and Informatics
Multimedia University
Jalan Multimedia, 63100 Cyberjaya, Malaysia
Email: sucheng@mmu.edu.my
1. INTRODUCTION
In today’s information age, technology is taking precedence over traditional way of getting work
done. Technology is advancing every minute of the day and millions of data are being produced each second.
For instance, the social media such as Twitter and Facebook tags data using XML. Information in XML
format is then exported or imported to make it usable and standardized for others to use [1]-[3]. Hence, a
significant amount of data is generated and need to be properly processed by organizations for data storage
and manipulation. XML is often used to distribute data over the internet because this format eases data
exchange process. Due to the advantages of XML, many organizations have choosen XML as the standard
format for business transaction [4].
On database management trend, the ability on processing various types of data such as structured,
semi-structured, and unstructured data has become important [5]. Although native XML databases do
present, the migration cost from the previous database storage management into XML-based is not a come-
and-go category. There are various possible underlying storages such as big data, temporal database, object-
oriented database, relational database (RDB) and object-relational database [6]. Nevertheless, the focus of
this paper is only on RDB as it is the most widely used back-end database in many organizations.
2. Bulletin of Electr Eng & Inf ISSN: 2302-9285
Transforming data-centric eXtensible markup language into relational … (Su-Cheng Haw)
3257
Subsequently, many attempts to come out with the efficient mapping scheme between the two technologies
has emerged [7].
The mapping scheme of XML to RDB are categorized into four main groups, namely, edge-based,
node-based, path-based and hybrid-based scheme [8]. Among these, the edge-based scheme is the easiest,
whereby all the edges are stored in a table. Nevertheless, this technique requires huge storage space and may
requires several self-join within the table to answer complex queries. The path-based scheme tracks the
hierarchical path information [9]. For the storage, it consists of two tables, whereby the first table stores the
path information on non-leaf nodes, while the second table properties the leaf nodes path information. As
such, depending on the query expressed, it could involve retrieving the results from either table or both
tables. On the contray, the node-based scheme annotate each node to denote the absolute position of the node
in the XML tree. To be able to identify the nodes and their associated relationship uniquely, this technique
may assign label that may become overhead as the size of the XML tree grows. The hybrid-based scheme,
however, is the combinative of several techniques. Bousalem and Cherti [10] proposed XMap to transform
XML into RDB based on ORDPATH scheme [11] to annotate the data in XML document. The authors done
the theoretical comparisons between Edge [12], XRel [13], XParent [14] and XMap. Theoretically, it shows
their approach leads in support for dynamic updates as compared to other approaches. Fakharaldien et al.
[15] proposed XRecursive as the mapping scheme between XML and RDB. XRecursive used the parent id
information to identify the path to each node recursively. Experimental evaluation indicated that XRecursive
performed better as compared SUXCENT [16] as it only uses two tables while SUCCENT uses five tables.
Qtaish and Ahmad [8] proposed XAncestor, which has the uniqueness of storing the path
information in a pre-defined scheme. This reduces the necessary storage size. It is most significant when it
involves a huge size of the dataset. The experimental evaluation demonstrated that XAncestor has the most
robust storage time and space as compared to XRel [13], XRecursive [15], s-XML [17], SMX/R [18] and
Ying et al. approach [19]. Zhu et al. proposed a path-based mapping scheme, Mini-XML [20], which stores
leaf nodes independently from the data table. They adopted the persistent labelling scheme [21] to annotate
each node as the traversal proceeds in depth-first manners. Experimental evaluations were compared against
s-XML [17] on various dataset sizes (range of 2.2 MB to 683 MB). The results exhibited that Mini-XML
achieved better performance in storage time and storage space. In a more recent study, Hsu and Liao [22]
proposed a compact indexing scheme named UCIS-X based on a branch map. The branch map preserves the
mapped information between parent and child nodes without the need to annotate each node. In another
separate research, Taktek and Thakker proposed a pentagonal scheme based on prefix labelling to capture the
structural relationships. As a result, it is not needed to access the physical document during query processing
[23]. The summary of some approaches is depicted in Table 1.
Table 1. Comparison on various approaches
Approach
Number of
Table
Advantages Disadvantages
XMap [10] 3 Utilise the ORDPath scheme [9] that support for dynamic
support.
Redundant attributes stored in data table.
XRecursive
[15]
2 Good for query retrieval involving P-C relationship. Suffers from query retrieval that involved
recursive search and A-D relatinoship.
XAncestor
[8]
2 Utilise the pre-define scheme for more efficient storage
and retrieval.
Labelling technique does not support
certain types of queries.
s-XML
[17]
2 Efficient query retrieval over large datasets. Redundant attributes stored in tables.
Ying et al.
[19]
4 Utilise the path table for efficient storage and retrieval. Storing unnecessary leaf node path
information.
Mini-XML
[20]
2 Utilise the persistent labelling scheme for dynamic
updates.
Redundant attributes stored in tables.
In this section, we have reviewed the XML mapping schemes and the labelling technologies. It
shows that the recent few approaches are adopting path-based approach and some even proposed hybrid
system, which comprises of node-based and path-based schemes. From the review thus far, we noticed that
the number of tables used, affect highly on query retrieval time. This is due to the join operation required to
acquire desired query results. A rule of thumb for our proposed mapping scheme is to have the minimum
number of tables and yet rich enough to provide necessary information to support assorted query retrieval. As
such, we propose a new mapping scheme named XML-REG, where the REG represents Region, which is the
labelling scheme under the region-based grouping. The following section consists of further elaboration on
our proposed scheme.
3. ISSN: 2302-9285
Bulletin of Electr Eng & Inf, Vol. 10, No. 6, December 2021 : 3256 – 3264
3258
2. RESEARCH METHOD
2.1. The system architecture
Figure 1 illustrates overview of system architecture design. The processes involved are reading the
XML document, XML mapping (tree annotation and data transformation) and query retrieval. Firstly, XML
document will be read by an event-driven parser, StAX parser [24]. Being an event-based triggers parser,
StAX does not have to wait for the document to be full load in order to parse the document. This is also
affected for the fact that it is a Pull API where it pulls data that client request. None of the data will be stored
into the memory. Thus, for data to be re-read, one must parse the document all over again. StAX reads each
tag and data, then create an event that calling program can use.
Figure 1. Overview of system architecture design
According to Khanjari and Gaeini [25], a labeling scheme is efficient if "it is small size labels
keeping the simplicity of the exploited algorithm in order to avoid complex computations as well as retaining
the readability of structural relationships between nodes". We designed the labeling scheme based on this
definition. Figure 2 shows the XML data model annotated with XML-REG. Each node will be traversed in
depth-first order to annotate the label based on its position. Each node is represented as (l, s, e) label,
whereby l is the level of the node, s is the startid, and e is the endid. The nodes are said to be in Parent-Child
(P-C) relationship, if the difference within the level of parent and child node is one. Conversely, the nodes
are said to be in ancestor-descendant (A-D) under two conditions: (i) level difference of more than one, and
(ii) the nodes must be in a range (region). For example, to identify if node 5 has A-D relationship with node
1, the id of node 5 must be within the range of startid and endid of node 1.
Figure 2. XML document labelled with XML-REG labeling scheme
4. Bulletin of Electr Eng & Inf ISSN: 2302-9285
Transforming data-centric eXtensible markup language into relational … (Su-Cheng Haw)
3259
Dynamic updates can be identified in a few groups, such are insertion of new node, node deletion
and edit on node information. Nevertheless, the edit and delete operations are pretty clear-cut as there will
not be any changes on the node positioning. Insertion on the other hand, is the most crucial operation as it
may causes the whole XML to be regenerated due to changes on the node positioning. Henceforth, the focus
on this paper is on insertion operation on the dynamic updates. Figure 3 exemplifies the three possible
insertion situations: (i) leftmost, (ii) rightmost and (iii) in-between. For any leftmost insertion, the value of
startid will be appended with ‘.0’ to the end of the initial label. In this example, since the leftmost node is
with label 2, the node to be inserted will be labeled as 2.0 (see node A). For subsequent leftmost insertion, for
instance, if node B is to be inserted, it will be added ‘.0’. Thus, the label for node B is 2.00. In the rightmost
insertion case, the id given label will be added into the value table as it is. For instance, Node D will be
labeled with 6. On the other hand, for an in-between insertion, in this case, node F, the node will be added .0
at the end of the node label.
Figure 3. Dynamic updates based on XML-REG labeling scheme
In XML-REG, two tables, namely, (i) Element_path table, and (ii) Value table are created. By
having the minimum number of tables, and yet sufficient to facilitate the join operation, it could save the
storage space while enable complex query to be supported. The Element_ path table stores all distinct path
information of nodes in the XML document. Figure 4 (a) depicts the snippet of the Element_path table, while
Figure 4 (b) depicts the snippet of the Value table. The ancestor and parent nodes are identified based on the
RPathId and PathId attributes in the Value and Element_path tables respectively.
(a) (b)
Figure 4. Snippet of, (a) element_path, (b) value table
2.2. XML-REG implementation
Figure 5 shows the algorithm of XML-REG annotation. The algorithm takes in an XML file and
output the annotated XML file based on XML-REG labeling scheme (line 2 to 3). Firstly, the connection to
the database needs to be established as in line 4. Next, the chosen XML dataset is to be loaded as in line 5. In
the algorithm, a stack named stackPath is constructed to keep path information in a hierarchical manner
commencing from root to the current node (see line 5). Subsequently, the StAX parser is activated through
the function getEventType to return the type of the node (see line 9, line 30 and line 37).
The startElement retrieves the elements and attributes that exist within the angle bracket tag (< >). If
the elements are encountered, it retrieves the respective element name and id to stored it in variable qName.
The element name will then be concatenate to form the path. Yet, if the path is found in the path table, it will
5. ISSN: 2302-9285
Bulletin of Electr Eng & Inf, Vol. 10, No. 6, December 2021 : 3256 – 3264
3260
not be stored in stackPath. On the other hand, if the attributes are encountered, the information will be stored
into Value table with RPathid formed in the stackPath. The other EventType is character. When this tag is
encountered, the text node information will be saved into Value table. As for the endElement, which is
encountered when the </ > is reach, the end qName in string path will be removed and level will be
subtracted by 1.
2.3. Experimental evaluation
Four existing approaches will be implemented, namely Mini-XML [20], XAncestor [8], XMap [10]
and XRecursive [15]. Unlike XML-REG, XRecursive uses node-based mapping technique, while XAncestor
and Mini-XML approach uses the path-based mapping technique to store XML data into RDB.
All experiments were evaluated on AMD Ryzen 7 processor with the RAM of 32 GB and memory
of 237 GB. The system is implemented in Java SE Development Kit, while the RDBMS is in Microsoft SQL
Server. The experiments are carried out on DBLP original dataset (130.73 MB) [26] on the query evaluation,
while to prove the scaliability of our proposed approach, we demonstrate the behavior of XML-REG across
DBLP original dataset, while increasing the scale up to 15 times, DBLP15 (1.99 GB) as depicted in Table 2.
The DBLP dataset is selected based on two main reaons: (1) the depth of the dataset should be at least three
level in order to test for the A-D relationship, and (2) the dataset should contains attributes in order to
demonstrate the query retrieval with attribute as the constraints. In terms of query, there are two classes of
queries, namely path query (simple query) and twig query (complex query with branching nodes) [27].
Figure 5. XML-REG pseudocode
Algorithm 1: XML-REG annotation
1. Function createXML-REG {
2. Input: An XML document
3. Output: Annotated XML document
4. Establish database connection
5. Stack stackPath = new Stack()
6. Create table
7. Get event type
8.
9. If is start element
10. Id++
11. Level++
12. qName = startElement.getName()
13. Path = currentpath + “/” + qName
14. If lstackPath.contain(path)
15. Pathid++
16. stackPah.add(path(
17. While (attributes.hasNext())
18. Id++
19. Level++
20. attName = “@” + attr.getName()
21. strPath = path + “/” + attribute
22. If !stackPath.contains(path)
23. Path++
24. stackPath.add(path)
25. Pathvalue = stackPath.IndexOf(attrPath) + 1
26. Insert into value table
27. Level = level – 1
28. End If
29.
30. If character
31. elementValue = characters.getData();
32. If !elementValue.isempty[]
33. Pathvalue = stackPath.IndexOf(path) + 1
34. Insert into Value table
35. End If
36.
37. If end element
38. Pathlist = path.substring(path.lastIndexOf(“/”) + ;
39. If pathlast.equals(eName)
40. Path = path.substring(0, path. lastIndexof(“/”))
41. Level = level – 1
42. End If
43. InsertPath()
44. } //End function
6. Bulletin of Electr Eng & Inf ISSN: 2302-9285
Transforming data-centric eXtensible markup language into relational … (Su-Cheng Haw)
3261
Table 2 DBLP dataset on various sizes
Dataset DBLP DBLP5 DBLP10 DBLP15
Data size 130.726 MB 653.625 MB 1307.25 MB 1960.874 MB
In the first section of evaluation, data storing, the XML document is mapped and transformed into
RDB storage. The structure and number of tables is designed uniquely in each approach with the aim to have
an efficient and yet lossless transformation. Thus, the time taken for each approach to map and to complete
will be recorded. In addition, the storage size after each mapping approach will also be recorded. To achieve
higher accuracy of time taken to store, each mapping approach will be executed six times, and the average
time is taken as the result the five-consecutive run, excluding the first run as the first run usually involved
some buffer time.
For the second part of the evaluation test, each approach will be put to test on the duration it takes to
retrieve various queries from the stored RDB after each mapping approach. The results were varying due to
the amount of join operation uses by each approach considering the design of its table and how data were
stored in the tables. Two types of query design were prepared to assess these approaches that will cover P-C,
A-D and mixed relationships. These queries consist of three paths and three twig queries. Last but not least,
our proposed approach, XML-REG was put on some tests in order to demonstrate that XML-REG is able to
support dynamic updates. These tests include right-most insertion, left-most insertion, and in-between
insertion.
Figure 6 (a) and Figure 6 (b) exhibit the simulation engine for performance evaluation on data
storing as well as query retrieval process. Figure 6 (a) shows XML-RDB mapping tab, which is used for data
storing evaluation. The user clicks on the browse button to choose a dataset to be transformed into RDB
storage. On the left text area, the selected XML document content will be illustrated. The result of the
evaluation will be presented in the result text area (on the right text area). Figure 6 (b) depicts the query
retrieval tab in the simulation engine. Firstly, the user will select the mapping approach, followed by dataset
and query selection respectively. Once the query is selected, the corresponding SQL statement will be
displayed in the SQL window. User can then click the “Load Query” button to confirm their selection. As
such, the time taken for the query execution and the number of return results will be presented at the
interface. In addition, the bottom part of the interface also outlines the time taken by other approaches to
perform the same evaluation.
(a) (b)
Figure 6. These figures are; (a) data storing process, (b) query retrieval process on simulation engine
3. RESULTS AND DISCUSSIONS
Three evaluation involving data storing time, query retrieval, and scalability were recorded. The
result is discussed in details in sub-section according to the evaluation type.
3.1. Results on data storing evaluation
The experiment is repeated seven times with the first reading being eliminated to avoid inaccuracy
due to buffering effects. Subsequently, the final result is generated as the average of six times consecutive
runs. Table 3 depicts the data strong results. We observed that XML-REG has the best storing time, followed
by XMap [10], Xancestor [8], Mini-XML [20] and XRecursive [15]. As the dataset increases, the
7. ISSN: 2302-9285
Bulletin of Electr Eng & Inf, Vol. 10, No. 6, December 2021 : 3256 – 3264
3262
competence of the approach can be observed obviously. XRecursive suffers greatly as it recursively calls the
child node and stores irrelevant data into the database. In contrast, XML-REG stores minimum information,
that is the unique path of the respective node and the values based on the proposed unique label id.
Table 3. Data storing evaluation on various approaches
Insertion time (minute)
XML-Reg Mini-XML XAncestor XMap XRecursive
11.23 18.58 15.72 15.01 26.16
3.2. Results on query retrieval evaluation
Table 4 shows the query response time for the DBLP dataset. The evaluation outcomes indicated
that XML-REG is the best, succeeded by XMap, XAncestor, Mini-XML and XRecursive. It is noteworthy
that XRecursive unable to support any query that involved the A-D relationship. This is because, in the
XRecursive approach, one will need to recursively find the ancestor and parent node. This is impossible as
one does not know the nodes that existed in between the ancestor to the particular node. As such for query
with the A-D relationship (PQ2, PQ3, TQ2 and TQ3), the remark ‘not supported’ is placed in the respective
column. It can be observed that XMap is faster than XAncestor because it requires fewer joins. Nevertheless,
for query with high complexity such as TQ3, XML-REG performs the best followed by XAncestor, XMap
and finally Mini-XML. XMap employed three tables in the storage; as such, the number of join operation are
far more as it requires to firstly, find the join path among path and vertex tables, and consequently, find the
intersection join to retrieve the query match.
Table 4 Query retrieval evaluation on various approaches
Approaches (ms)
XML-REG Mini-XML XAncestor XMap XRecursive
PQ1 433 1047.8 882.6 1237.4 1610.2
PQ2 402.4 5530 768.4 1294 not supported
PQ3 400 1049.8 884.2 458.4 not supported
TQ1 1246.4 1826 1661.6 3979.8 8715
TQ2 1791.6 5699.4 4350.8 4168.8 not supported
TQ3 1229.6 10764.2 1899.6 4198.2 not supported
3.2. Results on scaling evaluation
The scalability test indicates the efficiency of how each approach handles large-scale datasets.
Concerning this, the DBLP dataset is multiplied by 5 times for each iteration (DBLP1, DBLP5, DBLP10 and
DBLP15) to show the scalability of each approach. Figure 7 expresses the time taken for the storage
evaluation on various sizes of DBLP. It is observed that XML-REG shows the best performance as the line
graph is almost flat with the increment of DBLP sizes. XRecursive approach, however, has the sharpest
increase with the growth of the DBLP sizes. As such, it is the least scalable.
Figure 7. Result of path query on various DBLP datasets
8. Bulletin of Electr Eng & Inf ISSN: 2302-9285
Transforming data-centric eXtensible markup language into relational … (Su-Cheng Haw)
3263
4. CONCLUSION
In this paper, we proposed XML-REG, which utilized the hybridisation of the best features of path-
based and node-based approaches to map XML to RDB storage. Experimental evaluations demonstrated that
XML-REG exhibits the most excellent outcomes for all the evaluations: (i) data storing time and (ii) query
retrieval time as compared to XMap, Xancestor, Mini-XML, and XRecursive. In addition, in the scalability
test, the complexity of XML-REG is O(n), indicating that XML-REG is scalable to support huge datasets.
The results also indicated the workable hybrid approach for effective mapping. In our future work, we
propose to look into the possible way of XML compression technology to reduce the label size to
accommodate for huge insertion. To improve the performance, we will implement XML-REG in the
distributed environment.
REFERENCES
[1] P. Singh and S. Sachdevaa, "A Landscape of XML Data from Analytics Perspective," Procedia Computer Science,
vol. 173, pp. 392-402, 2020, doi: 10.1016/j.procs.2020.06.046.
[2] X. Lin et al., "A Fast Filtering Method of Invalid Information in XML File," Big Data Analytics for Cyber-Physical
System in Smart City, 2020, pp 259-264, doi: 10.1007/978-981-33-4572-0_38.
[3] F. Azzedin, S. Mohammed, M. Ghaleb, J. Yazdani and A. Ahmed, "Systematic Partitioning and Labeling XML
Subtrees for Efficient Processing of XML Queries in IoT Environments," in IEEE Access, vol. 8, pp. 61817-61833,
2020, doi: 10.1109/ACCESS.2020.2984600.
[4] K. T. Chau, Q. He, X. Hu and R. Wu, "Comparison on Performance of Text-Based and Model-Based Architecture
in Open Source Native XML Database," 2019 IEEE 4th International Conference on Signal and Image Processing
(ICSIP), 2019, pp. 340-344, doi: 10.1109/SIPROCESS.2019.8868709.
[5] A. H. Al-Hamami and A. A. Flayyih, "Enhancing Big Data Analysis by using Map-reduce Technique," Bulletin of
Electrical Engineering and Informatics, vol. 7, no. 1, pp. 113-116, 2018, doi: 10.11591/eei.v7i1.895.
[6] F. Arena and G. Pau, "An overview of big data analysis," Bulletin of Electrical Engineering and Informatics, vol. 9,
no. 4, pp. 1646-1652, 2020, doi: 10.11591/eei.v9i4.2359.
[7] H. Nassiri et al., "Integrating XML to Relational Data," Procedia Computer Science, vol. 110, pp. 422-427, 2017,
doi: 10.1016/j.procs.2017.06.107
[8] A. Qtaish and K. Ahmad, "XAncestor: An efficient mapping approach for storing and querying XML documents in
relational database using path-based technique," Knowledge-Based Systems, vol. 114, pp. 167-192, 2016, doi:
10.1016/j.knosys.2016.10.009.
[9] S. Jawari Kapisha and G. Vijaya Lakshmi, "Exploring XML Index Structures and Evaluating C-Tree Index-based
Algorithm," 2020 3rd International Conference on Intelligent Sustainable Systems (ICISS), 2020, pp. 212-218, doi:
10.1109/ICISS49785.2020.9316052.
[10] Z. Bousalem and I. Cherti, "XMap: A Novel Approach to Store and Retrieve XML Document in Relational
Databases," Journal Of Software, vol. 10, no. 12, pp. 1389-1401, 2015, doi: 10.17706/jsw.10.12.1389-1401.
[11] P. O'Neil et al., "ORDPATHs: insert-friendly XML node labels," ACM SIGMOD International Conference on
Management of Data, 2004, pp. 903-908, doi: 10.1145/1007568.1007686.
[12] D. Florescu and D. Kossmann, "Storing and Querying XML Data using an RDMBS," IEEE Data Engineering
Bulletin, vol. 1060, no. 22, pp. 27-34, 1999.
[13] M. Yoshikawa et al., "XREL: A Path-Based Approach to Storage and Retrieval of XML Documents Using
Relational Databases," ACM Transactions on Internet Technology, vol. 1, no. 1, pp. 110-141, 2001, doi:
10.1145/383034.383038.
[14] H. Jiang, H. Lu, W. Wang and J. X. Yu, "XParent: an efficient RDBMS-Based XML database system,"
Proceedings 18th International Conference on Data Engineering, 2002, pp. 335-336, doi:
10.1109/ICDE.2002.994745.
[15] M. A. I. Fakharaldien, J. M. Zain and N. Sulaiman, "XRecursive: AStorage Method for XML Document Based on
Relational Database," in International Conference on Software Engineering and Computer Systems. Springer,
Berlin, Heidelberg, 2011, doi: 10.1007/978-3-642-22191-0_40.
[16] P. Sandeep, "Efficient storage and query processing of XML data in relational database systems," Master’s thesis,
Nanyang Technological University, Singapore, 2005.
[17] S. Subramaniam et al., "s-XML: An efficient mapping scheme to bridge XML and relational database,"
Knowledge-Based Systems, vol. 27, pp. 369-380, 2012, doi: 10.1016/j.knosys.2011.11.007.
[18] F. Abduljwad, W. Ning and X. De, "SMX/R: Efficient way of storing and managing XML documents using
RDBMSs based on paths," 2010 2nd International Conference on Computer Engineering and Technology, 2010,
pp. V1-143-V1-147, doi: 10.1109/ICCET.2010.5486247.
[19] J. Ying, S. Cao and Y. Long, "An efficient mapping approach to store and query XML documents in relational
database," Proceedings of 2012 2nd International Conference on Computer Science and Network Technology,
2012, pp. 2140-2144, doi: 10.1109/ICCSNT.2012.6526341.
[20] H. Zhu, H. Yu, G. Fan and H. Sun, "Mini-XML: An efficient mapping approach between XML and relational
database," 2017 IEEE/ACIS 16th International Conference on Computer and Information Science (ICIS), 2017, pp.
839-843, doi: 10.1109/ICIS.2017.7960109.
[21] A. Gabillon and M. Fansi, "A new Persistent Labelling Scheme for XML," Journal of Digital Information
Management, vol. 4, no. 2, pp. 112-116, 2006.
9. ISSN: 2302-9285
Bulletin of Electr Eng & Inf, Vol. 10, No. 6, December 2021 : 3256 – 3264
3264
[22] W. -C. Hsu and I. -E. Liao, "UCIS-X: An Updatable Compact Indexing Scheme for Efficient Extensible Markup
Language Document Updating and Query Evaluation," in IEEE Access, vol. 8, pp. 176375-176392, 2020, doi:
10.1109/ACCESS.2020.3025566.
[23] E. Taktek and D. Thakker, "Pentagonal scheme for dynamic XML prefix labelling," Knowledge-Based Systems,
vol. 209, p. 106446, 2020, doi: 10.1016/j.knosys.2020.106446.
[24] Sun Microsystems, "Streaming APIs for XML Parsers, Java Web Services Performance," Team White Paper,
retrived on Jan 2021.
[25] E. Khanjari and L. Gaeini, "A new effective method for labeling dynamic XML data," Journal of Big Data, vol. 5,
no. 50, pp. 1-17, 2018, doi: 10.1186/s40537-018-0161-4.
[26] UW. Washington University XML repository, retrived on Jan 2021.
http://aiweb.cs.washington.edu/research/projects/xmltk/xmldata/www/repository.html.
[27] S. Subramaniam, S. -C. Haw and L. -K. Soon, "Improved Centralized XML Query Processing Using Distributed
Query Workload," in IEEE Access, vol. 9, pp. 29127-29142, 2021, doi: 10.1109/ACCESS.2021.3058383.
BIOGRAPHIES OF AUTHORS
Su-Cheng Haw is Associate Professor at Faculty of Computing and Informatics, Multimedia
University, where she leads several funded researches on the XML databases. Her research
interests include XML databases, query optimization, data modeling, semantic web, and
recommender system.
Emyliana Song is a postgraduate student at Faculty of Computing and Informatics in
Multimedia University. She is currently doing a research on XML data mapping scheme in
transforming XML document into relational database.