call for papers, research paper publishing, where to publish research paper, journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJEI, call for papers 2012,journal of science and technolog
This document provides a literature survey on latent semantic indexing (LSI). It begins with an abstract that describes LSI as an indexing technique used by search engines to store and retrieve web pages. The introduction provides an overview of LSI and discusses how it identifies patterns in unstructured text to find relationships. Several information retrieval techniques are then compared, including vector space models and LSI. The document focuses on describing the LSI technique and its algorithm, which uses singular value decomposition to project text into a concept space and reduce noise from synonyms and multiple meanings. Limitations of LSI are also mentioned.
The Search of New Issues in the Detection of Near-duplicated Documentsijceronline
International Journal of Computational Engineering Research(IJCER) is an intentional online Journal in English monthly publishing journal. This Journal publish original research work that contributes significantly to further the scientific knowledge in engineering and Technology.
Centralized Class Specific Dictionary Learning for wearable sensors based phy...Sherin Mathews
With recent progress in pervasive healthcare,
physical activity recognition with wearable body sensors has
become an important and challenging area in both research and
industrial communities. Here, we address a novel technique for
a sensor platform that performs physical activity recognition by
leveraging a class specific regularizer term into the dictionary
pair learning objective function. The proposed algorithm jointly
learns a synthesis dictionary and an analysis dictionary in
order to simultaneously perform signal representation and
classification once the time-domain features have been extracted.
Specifically, the class specific regularizer term ensures that the
sparse codes belonging to the same class will be concentrated
thereby proving beneficial for the classification stage. In order
to develop a more practical approach, we employ a combination
of an alternating direction method of multipliers and a l1 − ls
minimization method to approximately minimize the objective
function. We validate the effectiveness of our proposed model
by employing it on two activity recognition problem and an
intensity estimation problem, both of which include a large
number of physical activities. Experimental results demonstrate
that classifiers built in this dictionary learning based framework
outperforms state of art algorithms by using simple features,
thereby achieving competitive results when compared with
classical systems built upon features with prior knowledge
International Journal of Engineering Research and DevelopmentIJERD Editor
This paper proposes a framework to enhance the performance of digital notes organization based on an auto arranger approach. The framework uses artificial intelligence techniques like fuzzy logic and data mining to separate a single document containing notes on multiple subjects into individual subject-specific folders. It analyzes the document and identifies "cue words" that are frequently associated with each subject based on a pre-existing word frequency database. These cue words are then used to automatically distribute portions of the original text to the relevant subject folders based on cue word frequency within each portion. The framework has the potential to save user time by automatically organizing notes instead of requiring manual sorting.
This document proposes a method for harvesting training examples of bi-concepts (images containing two visual concepts) from social media images to build bi-concept detectors. It presents a multi-modal approach that uses both visual features and semantic text to gather positive and negative bi-concept examples at a large scale from sources like Flickr. Experiments show this approach can accurately learn bi-concept detectors for complex queries, outperforming combinations of single concept detectors. The method introduces a framework for collecting examples, detecting bi-concepts in unlabeled images, and iteratively improving bi-concept retrieval.
This document discusses scoring and ranking documents in information retrieval systems. It introduces the vector space model and term weighting schemes like TF-IDF that are used to assign relevance scores to documents for a given query. TF-IDF weighting increases scores for terms that appear frequently in a document but rarely in the whole collection. This allows more relevant documents containing rare, informative query terms to be ranked higher. IDF on its own does not affect ranking for single-term queries but boosts rarer terms' influence for multi-term queries.
Semantic Annotation Framework For Intelligent Information Retrieval Using KIM...dannyijwest
Due to the explosion of information/knowledge on the web and wide use of search engines for desired
information,the role of knowledge management(KM) is becoming more significant in an organization.
Knowledge Management in an Organization is used to create ,capture, store, share, retrieve and manage
information efficiently. The semantic web, an intelligent and meaningful web, tend to provide a promising
platform for knowledge management systems and vice versa, since they have the potential to give each
other the real substance for machine-understandable web resources which in turn will lead to an
intelligent, meaningful and efficient information retrieval on web. Today,the challenge for web community
is to integrate the distributed heterogeneous resources on web with an objective of an intelligent web
environment focusing on data semantics and user requirements. Semantic Annotation(SA) is being widely
used which is about assigning to the entities in the text and links to their semantic descriptions. Various
tools like KIM, Amaya etc may be used for semantic Annotation.
This document provides an overview of different techniques for clustering categorical data. It discusses various clustering algorithms that have been used for categorical data, including K-modes, ROCK, COBWEB, and EM algorithms. It also reviews more recently developed algorithms for categorical data clustering, such as algorithms based on particle swarm optimization, rough set theory, and feature weighting schemes. The document concludes that clustering categorical data remains an important area of research, with opportunities to develop techniques that initialize cluster centers better.
EXPERT OPINION AND COHERENCE BASED TOPIC MODELINGijnlc
In this paper, we propose a novel algorithm that rearrange the topic assignment results obtained from topic
modeling algorithms, including NMF and LDA. The effectiveness of the algorithm is measured by how much
the results conform to expert opinion, which is a data structure called TDAG that we defined to represent the
probability that a pair of highly correlated words appear together. In order to make sure that the internal
structure does not get changed too much from the rearrangement, coherence, which is a well known metric
for measuring the effectiveness of topic modeling, is used to control the balance of the internal structure.
We developed two ways to systematically obtain the expert opinion from data, depending on whether the
data has relevant expert writing or not. The final algorithm which takes into account both coherence and
expert opinion is presented. Finally we compare amount of adjustments needed to be done for each topic
modeling method, NMF and LDA.
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGdannyijwest
Social Networks has become one of the most popular platforms to allow users to communicate, and share their interests without being at the same geographical location. With the great and rapid growth of Social Media sites such as Facebook, LinkedIn, Twitter…etc. causes huge amount of user-generated content. Thus, the improvement in the information quality and integrity becomes a great challenge to all social media sites, which allows users to get the desired content or be linked to the best link relation using improved search / link technique. So introducing semantics to social networks will widen up the representation of the social networks. In this paper, a new model of social networks based on semantic tag ranking is introduced. This model is based on the concept of multi-agent systems. In this proposed model the representation of social links will be extended by the semantic relationships found in the vocabularies which are known as (tags) in most of social networks.The proposed model for the social media engine is based on enhanced Latent Dirichlet Allocation(E-LDA) as a semantic indexing algorithm, combined with Tag Rank as social network ranking algorithm. The improvements on (E-LDA) phase is done by optimizing (LDA) algorithm using the optimal parameters. Then a filter is introduced to enhance the final indexing output. In ranking phase, using Tag Rank based on the indexing phase has improved the output of the ranking. Simulation results of the proposed model have shown improvements in indexing and ranking output.
With ever increasing number of documents on web and other repositories, the task of organizing and
categorizing these documents to the diverse need of the user by manual means is a complicated job, hence
a machine learning technique named clustering is very useful. Text documents are clustered by pair wise
similarity of documents with similarity measures like Cosine, Jaccard or Pearson. Best clustering results
are seen when overlapping of terms in documents is less, that is, when clusters are distinguishable. Hence
for this problem, to find document similarity we apply link and neighbor introduced in ROCK. Link
specifies number of shared neighbors of a pair of documents. Significantly similar documents are called as
neighbors. This work applies links and neighbors to Bisecting K-means clustering in identifying seed
documents in the dataset, as a heuristic measure in choosing a cluster to be partitioned and as a means to
find the number of partitions possible in the dataset. Our experiments on real-time datasets showed a
significant improvement in terms of accuracy with minimum time.
Document ranking using qprp with concept of multi dimensional subspacePrakash Dubey
This presentation discusses a project titled "Document Ranking Using QPRP with Concept of Multi-Dimensional Subspace". It was presented by Prakash Kumar Dubey and guided by Mr. Sourish Dhar and Mr. Bhagaban Swain of the Department of IT. The presentation provides an overview of the project, including an introduction to information retrieval, classical IR models such as Boolean, vector space, and probabilistic models. It then discusses quantum probability and how it can be applied to document ranking. The presentation outlines the proposed solution, data collection and implementation, and concludes with future work.
This document discusses using the bag-of-words model to classify text data from a sentiment analysis dataset. It preprocesses the text data by converting it to lowercase, tokenizing it, removing punctuation and stop words. This reduces the data by 16.26%. A bag-of-words model with 58714 words across 40000 documents is created. Results are visualized through word clouds and histograms. The word cloud for clean data shows the most frequent words. The histogram shows neutral sentiment has the highest frequency distribution.
An Enhanced Suffix Tree Approach to Measure Semantic Similarity between Multi...iosrjce
1) The document discusses an approach to measure semantic similarity between multiple documents using an enhanced suffix tree. It involves preprocessing documents, constructing a suffix tree with documents' phrases as edges, calculating weights of shared nodes using TF-IDF, and applying cosine, dice, and hellinger similarity measures to determine pairwise document similarities.
2) The approach first preprocesses documents by removing stop words, special characters, and converting to lowercase. A suffix tree is constructed with documents' phrases as edges. Shared nodes in the tree represent common phrases between documents.
3) Node weights are calculated using TF-IDF, with higher weights given to rarer phrases. Several similarity measures (cosine, dice, hellinger) are then applied
A Novel Clustering Method for Similarity Measuring in Text DocumentsIJMER
International Journal of Modern Engineering Research (IJMER) is Peer reviewed, online Journal. It serves as an international archival forum of scholarly research related to engineering and science education.
A Novel Multi- Viewpoint based Similarity Measure for Document ClusteringIJMER
International Journal of Modern Engineering Research (IJMER) is Peer reviewed, online Journal. It serves as an international archival forum of scholarly research related to engineering and science education.
International Journal of Modern Engineering Research (IJMER) covers all the fields of engineering and science: Electrical Engineering, Mechanical Engineering, Civil Engineering, Chemical Engineering, Computer Engineering, Agricultural Engineering, Aerospace Engineering, Thermodynamics, Structural Engineering, Control Engineering, Robotics, Mechatronics, Fluid Mechanics, Nanotechnology, Simulators, Web-based Learning, Remote Laboratories, Engineering Design Methods, Education Research, Students' Satisfaction and Motivation, Global Projects, and Assessment…. And many more.
Activity Context Modeling in Context-AwareEditor IJCATR
The explosion of mobile devices has fuelled the advancement of pervasive computing to provide personal assistance in this
information-driven world. Pervasive computing takes advantage of context-aware computing to track, use and adapt to contextual
information. The context that has attracted the attention of many researchers is the activity context. There are six major techniques that
are used to model activity context. These techniques are key-value, logic-based, ontology-based, object-oriented, mark-up schemes and
graphical. This paper analyses these techniques in detail by describing how each technique is implemented while reviewing their pros
and cons. The paper ends with a hybrid modeling method that fits heterogeneous environment while considering the entire of modeling
through data acquisition and utilization stages. The modeling stages of activity context are data sensation, data abstraction and
reasoning and planning. The work revealed that mark-up schemes and object-oriented are best applicable at the data sensation stage.
Key-value and object-oriented techniques fairly support data abstraction stage whereas the logic-based and ontology-based techniques
are the ideal techniques for reasoning and planning stage. In a distributed system, mark-up schemes are very useful in data
communication over a network and graphical technique should be used when saving context data into database.
Reduct generation for the incremental data using rough set theorycsandit
n today’s changing world huge amount of data is ge
nerated and transferred frequently.
Although the data is sometimes static but most comm
only it is dynamic and transactional. New
data that is being generated is getting constantly
added to the old/existing data. To discover the
knowledge from this incremental data, one approach
is to run the algorithm repeatedly for the
modified data sets which is time consuming. The pap
er proposes a dimension reduction
algorithm that can be applied in dynamic environmen
t for generation of reduced attribute set as
dynamic reduct.
The method analyzes the new dataset, when it become
s available, and modifies
the reduct accordingly to fit the entire dataset. T
he concepts of discernibility relation, attribute
dependency and attribute significance of Rough Set
Theory are integrated for the generation of
dynamic reduct set, which not only reduces the comp
lexity but also helps to achieve higher
accuracy of the decision system. The proposed metho
d has been applied on few benchmark
dataset collected from the UCI repository and a dyn
amic reduct is computed. Experimental
result shows the efficiency of the proposed method
An object is a software entity that encapsulates both data and code. It mirrors real-world entities and protects its internal data from external access. Object-oriented programming breaks programs into small objects that fuse code and data. Classes are templates that define objects, which are instances of classes. Objects receive and respond to messages by invoking methods that access only their own data and may send messages to other objects. Abstraction, classification, generalization, and aggregation structure information into object hierarchies. Inheritance allows defining new classes that extend existing ones, while polymorphism allows different classes to respond to the same message.
A Guide to Wireless Communication, Addison Wesley, 1995.
3. J.F.Kurose and K.W.Ross: Computer Networking: A Top Down Approach Featuring
the Internet, Addison Wesley, 2002.
4. C.Siva Ram Murthy and B.S.Manoj: Adhoc Wireless Networks: Architectures and
Protocols, Prentice Hall, 2004.
5. William Stallings: Wireless Communications and Networks, Prentice Hall, 2002.
Image Processing
Introduction to Digital Image Processing, Image acquisition and digitization, Image
enhancement, Image restoration, Image compression, Image segmentation, Image
representation and description, Object recognition
Similar to call for papers, research paper publishing, where to publish research paper, journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJEI, call for papers 2012,journal of science and technolog
This document describes a proposed concept-based mining model that aims to improve document clustering and information retrieval by extracting concepts and semantic relationships rather than just keywords. The model uses natural language processing techniques like part-of-speech tagging and parsing to extract concepts from text. It represents concepts and their relationships in a semantic network and clusters documents based on conceptual similarity rather than term frequency. The model is evaluated using singular value decomposition to increase the precision of key term and phrase extraction.
1) The document discusses a review of semantic approaches for nearest neighbor search. It describes using an ontology to add a semantic layer to an information retrieval system to relate concepts using query words.
2) A technique called spatial inverted index is proposed to locate multidimensional information and handle nearest neighbor queries by finding the hospitals closest to a given address.
3) Several semantic approaches are described including using clustering measures, specificity measures, link analysis, and relation-based page ranking to improve search and interpret hidden concepts behind keywords.
Web Service Discovery Mechanisms Based on IR ModelsIRJET Journal
This document discusses various approaches for web service discovery that employ information retrieval (IR) methods. It describes five main approaches:
1. Using singular value decomposition (SVD) to find similar services by representing them as vectors and calculating cosine similarity.
2. Applying the vector space model of IR to represent services and queries as vectors and calculate cosine similarity to discover analogous services.
3. Combining the vector space model with a structure matching algorithm to refine service discovery results.
4. Measuring semantic similarity of services instead of structural similarity by representing data types as trees and calculating edit distances.
5. Enhancing service requests and descriptions with ontologies, representing them as vectors using latent semantic
The document discusses various information retrieval models, including:
1) Classic models like Boolean and vector space models that use index terms to represent documents and queries.
2) Probabilistic models that view IR as estimating the probability of relevance between documents and queries.
3) Structured models that incorporate document structure, including models based on non-overlapping text regions and hierarchical document structure.
4) Browsing models like flat, structure-guided, and hypertext models for navigating document collections.
Information retrival system and PageRank algorithmRupali Bhatnagar
We discuss the various models for Information retrieval system present in literature and discuss them mathematically. We also study the PageRank Algorithm which is used for relevant search.
Different Similarity Measures for Text Classification Using KnnIOSR Journals
This document summarizes research on classifying textual data using the k-nearest neighbors (KNN) algorithm and different similarity measures. It explores generating 9 different vector representations of text documents and using KNN with similarity measures like Euclidean, Manhattan, squared Euclidean, etc. to classify documents. The researchers tested KNN on a Reuters news corpus with 5,485 training documents across 8 classes and found that normalization and k=4 produced the best accuracy of 94.47%. They conclude KNN with different similarity measures and vector representations is effective for multi-class text classification.
Scaling Down Dimensions and Feature Extraction in Document Repository Classif...ijdmtaiir
-In this study a comprehensive evaluation of two
supervised feature selection methods for dimensionality
reduction is performed - Latent Semantic Indexing (LSI) and
Principal Component Analysis (PCA). This is gauged against
unsupervised techniques like fuzzy feature clustering using
hard fuzzy C-means (FCM) . The main objective of the study is
to estimate the relative efficiency of two supervised techniques
against unsupervised fuzzy techniques while reducing the
feature space. It is found that clustering using FCM leads to
better accuracy in classifying documents in the face of
evolutionary algorithms like LSI and PCA. Results show that
the clustering of features improves the accuracy of document
classification
Clustering of Deep WebPages: A Comparative Studyijcsit
The internethas massive amount of information. This information is stored in the form of zillions of
webpages. The information that can be retrieved by search engines is huge, and this information constitutes
the ‘surface web’.But the remaining information, which is not indexed by search engines – the ‘deep web’,
is much bigger in size than the ‘surface web’, and remains unexploited yet.
Several machine learning techniques have been commonly employed to access deep web content. Under
machine learning, topic models provide a simple way to analyze large volumes of unlabeled text. A ‘topic’is
a cluster of words that frequently occur together and topic models can connect words with similar
meanings and distinguish between words with multiple meanings. In this paper, we cluster deep web
databases employing several methods, and then perform a comparative study. In the first method, we apply
Latent Semantic Analysis (LSA) over the dataset. In the second method, we use a generative probabilistic
model called Latent Dirichlet Allocation(LDA) for modeling content representative of deep web
databases.Both these techniques are implemented after preprocessing the set of web pages to extract page
contents and form contents.Further, we propose another version of Latent Dirichlet Allocation (LDA) to the
dataset. Experimental results show that the proposed method outperforms the existing clustering methods.
Algorithm for calculating relevance of documents in information retrieval sys...IRJET Journal
The document proposes an algorithm to calculate the relevance of documents returned in response to user queries in information retrieval systems. It is based on classical similarity formulas like cosine, Jaccard, and dice that calculate similarity between document and query vectors. The algorithm aims to integrate user search preferences as a variable in determining document relevance, as classic models do not account for this. It uses text and web mining techniques to process user query and document metadata.
International Journal of Engineering and Science Invention (IJESI) is an international journal intended for professionals and researchers in all fields of computer science and electronics. IJESI publishes research articles and reviews within the whole field Engineering Science and Technology, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online.
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANSijseajournal
ABSTRACT
In this paper we propose a novel method to cluster categorical data while retaining their context. Typically, clustering is performed on numerical data. However it is often useful to cluster categorical data as well, especially when dealing with data in real-world contexts. Several methods exist which can cluster categorical data, but our approach is unique in that we use recent text-processing and machine learning advancements like GloVe and t- SNE to develop a a context-aware clustering approach (using pre-trained
word embeddings). We encode words or categorical data into numerical, context-aware, vectors that we use to cluster the data points using common clustering algorithms like K-means.
Context Driven Technique for Document ClassificationIDES Editor
In this paper we present an innovative hybrid Text
Classification (TC) system that bridges the gap between
statistical and context based techniques. Our algorithm
harnesses contextual information at two stages. First it extracts
a cohesive set of keywords for each category by using lexical
references, implicit context as derived from LSA and wordvicinity
driven semantics. And secondly, each document is
represented by a set of context rich features whose values are
derived by considering both lexical cohesion as well as the extent
of coverage of salient concepts via lexical chaining. After
keywords are extracted, a subset of the input documents is
apportioned as training set. Its members are assigned categories
based on their keyword representation. These labeled
documents are used to train binary SVM classifiers, one for
each category. The remaining documents are supplied to the
trained classifiers in the form of their context-enhanced feature
vectors. Each document is finally ascribed its appropriate
category by an SVM classifier.
SEARCH OF INFORMATION BASED CONTENT IN SEMI-STRUCTURED DOCUMENTS USING INTERF...ijcsitcejournal
This paper proposes a semi-structured information retrieval model based on a new method for calculation
of similarity. We have developed CASISS (Calculation of Similarity of Semi-Structured documents)
method to quantify how two given texts are similar. This new method identifies elements of semi-structured
documents using elements descriptors. Each semi-structured document is pre-processed before the
extraction of a set of descriptors for each element, which characterize the contents of elements.It can be
used to increase the accuracy of the information retrieval process by taking into account not only the
presence of query terms in the given document but also the topology (position continuity) of these terms.
This document provides lecture notes on information retrieval systems. It covers key concepts like precision and recall, different retrieval strategies including vector space model and probabilistic models, and retrieval utilities. The vector space model represents documents and queries as vectors in a shared space and calculates similarity using cosine similarity. Probabilistic models assign probabilities to terms and documents and estimate relevance probabilities. The notes discuss term weighting schemes, inverted indexes to improve efficiency, and integrating structured data with text retrieval. The overall objective is for students to learn fundamental models and techniques for information storage and retrieval.
Construction of Keyword Extraction using Statistical Approaches and Document ...IJERA Editor
Organize continuing growth of dynamic unstructured documents is the major challenge to the field experts.
Handling of such unorganized documents causes more expensive. Clustering of such dynamic documents helps
to reduce the cost. Document clustering by analysing the keywords of the documents is one the best method to
organize the unstructured dynamic documents. Statistical analysis is the best adaptive method to extract the
keywords from the documents. In this paper an algorithm was proposed to cluster the documents. It has two
parts, first part extracts the keywords using statistical method and the second part construct the clusters by
keyword using agglomerative method. This proposed algorithm gives more than 90% of accuracy.
Construction of Keyword Extraction using Statistical Approaches and Document ...IJERA Editor
Organize continuing growth of dynamic unstructured documents is the major challenge to the field experts.
Handling of such unorganized documents causes more expensive. Clustering of such dynamic documents helps
to reduce the cost. Document clustering by analysing the keywords of the documents is one the best method to
organize the unstructured dynamic documents. Statistical analysis is the best adaptive method to extract the
keywords from the documents. In this paper an algorithm was proposed to cluster the documents. It has two
parts, first part extracts the keywords using statistical method and the second part construct the clusters by
keyword using agglomerative method. This proposed algorithm gives more than 90% of accuracy.
IRJET- Concept Extraction from Ambiguous Text Document using K-MeansIRJET Journal
This document discusses using a K-means clustering algorithm to extract concepts from ambiguous text documents. It involves preprocessing the text by tokenizing, removing stop words, and stemming words. The words are then represented as vectors and dimensionality reduction using PCA is applied. Finally, K-means clustering is used to group similar words into clusters to identify the overall concepts in the document without reading the entire text. The aim is to help users understand the key topics in a document in a time-efficient manner without having to read the full text.
International Journal of Computational Engineering Research(IJCER)ijceronline
This document summarizes a research paper on reengineering relational databases to object-oriented databases. It discusses developing an integrated environment that maps a relational schema to an object-oriented schema without modifying the existing relational schema. The proposed system architecture has two major components - one for mapping the relational schema to an object-oriented schema, and another for mapping relational data to objects. The schema mapping process is two-phased - the first phase transforms the relational schema, and the second phase extracts object-oriented structures. The system aims to allow existing applications and data in a relational database to be accessible from object-oriented programs.
An effective pre processing algorithm for information retrieval systemsijdms
The Internet is probably the most successful distributed computing system ever. However, our capabilities
for data querying and manipulation on the internet are primordial at best. The user expectations are
enhancing over the period of time along with increased amount of operational data past few decades. The
data-user expects more deep, exact, and detailed results. Result retrieval for the user query is always
relative o the pattern of data storage and index. In Information retrieval systems, tokenization is an
integrals part whose prime objective is to identifying the token and their count. In this paper, we have
proposed an effective tokenization approach which is based on training vector and result shows that
efficiency/ effectiveness of proposed algorithm. Tokenization on documents helps to satisfy user’s
information need more precisely and reduced search sharply, is believed to be a part of information
retrieval. Pre-processing of input document is an integral part of Tokenization, which involves preprocessing
of documents and generates its respective tokens which is the basis of these tokens probabilistic
IR generate its scoring and gives reduced search space. The comparative analysis is based on the two
parameters; Number of Token generated, Pre-processing time.
Electrical, Electronics and Computer Engineering,
Information Engineering and Technology,
Mechanical, Industrial and Manufacturing Engineering,
Automation and Mechatronics Engineering,
Material and Chemical Engineering,
Civil and Architecture Engineering,
Biotechnology and Bio Engineering,
Environmental Engineering,
Petroleum and Mining Engineering,
Marine and Agriculture engineering,
Aerospace Engineering.
Similar to call for papers, research paper publishing, where to publish research paper, journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJEI, call for papers 2012,journal of science and technolog (20)
This document discusses the impact of data mining on business intelligence. It begins by defining business intelligence as using new technologies to quickly respond to changes in the business environment. Data mining is an important part of the business intelligence lifecycle, which includes determining requirements, collecting and analyzing data, generating reports, and measuring performance. Data mining allows businesses to access real-time, accurate data from multiple sources to improve decision making. Using business intelligence and data mining techniques can help businesses become more efficient and make better decisions to increase profits and customer satisfaction. The expected results of applying business intelligence include improved decision making through accurate, timely information to support organizational goals and strategic plans.
This document presents a novel technique for solving the transcendental equations of selective harmonics elimination pulse width modulation (SHEPWM) inverters based on the secant method. The proposed algorithm uses the secant method to simplify the numerical solution of the nonlinear equations and solve them faster compared to other methods. Simulation results validate that the proposed method accurately estimates the switching angles to eliminate specific harmonics from the output voltage waveform and achieves near sinusoidal output current for various modulation indices and numbers of harmonics eliminated.
This document summarizes a research paper that designed and implemented a dual tone multi-frequency (DTMF) based GSM-controlled car security system. The system uses a DTMF decoder and GSM module to allow a car to be remotely controlled and secured from a mobile phone. It works by sending DTMF tones from the phone through calls to the GSM module in the car. The decoder interprets the tones and a microcontroller executes commands to disable the ignition or control other devices. The system was created to improve car security and accessibility through remote monitoring and control with DTMF and GSM technology.
This document presents an algorithm for imperceptibly embedding a DNA-encoded watermark into a color image for authentication purposes. It applies a multi-resolution discrete wavelet transform to decompose the image. The watermark, encoded into DNA nucleotides, is then embedded into the third-level wavelet coefficients through a quantization process. Specifically, the watermark nucleotides are complemented and used to quantize coefficients in the middle frequency band, modifying the coefficients. The watermarked image is reconstructed through inverse wavelet transform. Extraction reverses these steps to recover the watermark without the original image. The algorithm aims to balance imperceptibility and robustness through this wavelet-based, blind watermarking scheme.
1) The document analyzes the dynamic saturation point of a deep-water channel in Shanghai port based on actual traffic data and a ship domain model.
2) A dynamic channel transit capacity model is established that considers factors like channel width, ship density, speed, and reductions due to traffic conditions.
3) Based on AIS data from the channel, the average traffic flow is calculated to be 15.7 ships per hour, resulting in a dynamic saturation of 32.5%, or 43.3% accounting for uneven day/night traffic volumes.
The document summarizes research on the use of earth air tunnels and wind towers as passive solar techniques. Key findings include:
- Earth air tunnels circulate air through underground pipes to take advantage of the stable temperature 4 meters below ground for cooling in summer and heating in winter. Testing showed the technique can reduce ambient temperatures by up to 14 degrees Celsius.
- Wind towers circulate air through tall shafts to cool air entering buildings at night and provide downward airflow of cooled air during the day.
- Experimental testing of an earth air tunnel system over multiple months found maximum temperature reductions of 33% in spring and minimum reductions of 15% in summer.
The document compares the mechanical and physical properties of low density polyethylene (LDPE) thin films and sheets reinforced with graphene nanoparticles. LDPE/graphene thin films were produced via solution casting, while sheets were made by compression molding. Testing showed that the thin films had enhanced tensile strength, lower melt flow index, and higher thermal stability compared to sheets. The tensile strength of thin films increased by up to 160% with 1% graphene, while sheets increased by 70%. Melt flow index decreased more for thin films, indicating higher viscosity. Thin films also showed greater improvement in glass transition temperature. These results demonstrate that processing technique affects the properties of LDPE/graphene nanocomposites.
The document describes improvements made to a friction testing machine. A stepper motor and PLC control system were added to automatically vary the load on friction pairs, replacing the manual method. Tests using the improved machine found that the friction coefficient decreases as the load increases, and that abrasive and adhesive wear increased with higher loads. The improved machine allows more accurate and convenient testing of friction pairs under varying load conditions.
This document summarizes a research article that investigates the steady, two-dimensional Falkner-Skan boundary layer flow over a stationary wedge with momentum and thermal slip boundary conditions. The flow considers a temperature-dependent thermal conductivity in the presence of a porous medium and viscous dissipation. Governing partial differential equations are non-dimensionalized and transformed into ordinary differential equations using similarity transformations. The equations are highly nonlinear and cannot be solved analytically, so a numerical solver is used. Numerical results are presented for the skin friction coefficient, local Nusselt number, velocity and temperature profiles for varying parameters like the Falkner-Skan parameter and Eckert number.
An improvised white board compass was designed and developed to enhance the teaching of geometrical construction concepts in basic technology courses. The compass allows teachers to visually demonstrate geometric concepts and constructions on a white board in an engaging, hands-on manner. It supports constructivist learning principles by enabling students to observe and emulate the teacher. The design process utilized design and development research methodology to test educational theories and validate the practical application of the compass. The improvised compass was found to effectively engage students and improve their performance in learning geometric constructions.
The document describes the design of an energy meter that calculates energy using a one second logic for improved accuracy. The meter samples voltage and current values using an ADC synchronized to the line frequency via PLL. It calculates active and reactive power by averaging the sampled values over each second. The accumulated active power for each second is multiplied by one second to calculate energy, which is accumulated and converted to kWh. Test results showed the meter achieved an error of 0.3%, within the acceptable limit for class 1 meters. Considering energy over longer durations like one second helps reduce percentage error in the calculation.
This document presents a two-stage method for solving fuzzy transportation problems where the costs, supplies, and demands are represented by symmetric trapezoidal fuzzy numbers. In the first stage, the problem is solved to satisfy minimum demand requirements. Remaining supplies are then distributed in the second stage to further minimize costs. A numerical example demonstrates using robust ranking techniques to convert the fuzzy problem into a crisp one, which is then solved using a zero suffix method. The total optimal costs from both stages provide the solution to the original fuzzy transportation problem.
1) The document proposes using an Adaptive Neuro-Fuzzy Inference System (ANFIS) controller for a Distributed Power Flow Controller (DPFC) to improve voltage regulation and power quality in a transmission system.
2) A DPFC is placed at a load bus in an IEEE 4 bus system and its performance is compared using a PI controller and ANFIS controller.
3) Simulation results show the ANFIS controller provides faster convergence and better voltage profile maintenance during voltage sags and swells compared to the PI controller.
The document describes an improved particle swarm optimization algorithm to solve vehicle routing problems. It introduces concepts of leptons and hadrons to particles in the algorithm. Leptons interact weakly based on individual and neighborhood best positions, while hadrons (local best particles) undergo strong interactions by colliding with the global best particle. When stagnation occurs, particle decay is used to increase diversity. Simulations show the improved algorithm avoids premature convergence and finds better solutions compared to the basic particle swarm optimization.
This document presents a method for analyzing photoplethysmographic (PPG) signals using correlative analysis. The method involves calculating the autocorrelation function of the PPG signal, extracting the envelope of the autocorrelation function using a low pass filter, and approximating the envelope by determining attenuation coefficients. Ten PPG signals were collected from volunteers and analyzed using this method. The attenuation coefficients were found to have similar values around 0.46, providing a potentially useful parameter for medical diagnosis.
This document describes the simulation and design of a process to recover monoethylene glycol (MEG) from effluent waste streams of a petrochemical company in Iran. Aspen Plus simulation software was used to model the process, which involves separating water, salts, and various glycols (MEG, DEG, TEG, TTEG) using a series of distillation columns. Sensitivity analyses were performed to optimize column parameters such as pressure, reflux ratio, and boilup ratio. The results showed that MEG, DEG, TEG, and TTEG could be recovered at rates of 5.01, 2.039, 0.062, and 0.089 kg/hr, respectively.
This document presents a numerical analysis of fluid flow and heat transfer characteristics of ventilated disc brake rotors using computational fluid dynamics (CFD). Two types of rotor configurations are considered: circular pillared (CP) and diamond pillared radial vane (DP). A 20° sector of each rotor is modeled and meshed. Governing equations for mass, momentum, and energy are solved using ANSYS CFX. Boundary conditions include 900K and 1500K isothermal rotor walls for different speeds. Results show the DP rotor has 70% higher mass flow and 24% higher heat dissipation than the CP rotor. Velocity and pressure distributions are more uniform for the DP rotor at higher speeds, ensuring more uniform cooling. The
This document describes the design and testing of an automated cocoa drying house prototype in Trinidad and Tobago. The prototype included automated features like a retractable roof, automatic heaters, and remote control. It aims to address issues with the traditional manual sun drying process, which is time-consuming and relies on human monitoring of changing weather conditions. Initial testing with farmers showed interest in the automated system as a potential solution.
This document presents the design of a telemedical system for remote monitoring of cardiac insufficiency. The system includes an electrocardiography (ECG) device that collects and digitizes ECG signals. The ECG signals undergo digital signal processing including autocorrelation analysis. Graphical interfaces allow patients and doctors to view ECG data and attenuation coefficients derived from autocorrelation analysis. Data is transmitted between parties using TCP/IP protocol. The system aims to facilitate remote monitoring of cardiac patients to reduce hospitalizations through early detection of health changes.
The document summarizes a polygon oscillating piston engine invention. The engine uses multiple pistons arranged around the sides of a polygon within cylinders. As the pistons oscillate, they compress and combust air-fuel mixtures to produce power. This design achieves a very high power-to-weight ratio of up to 2 hp per pound. Engineering analysis and design of a prototype 6-sided engine is presented, showing it can produce 168 hp from a 353 cubic feet per minute air flow at 12,960 rpm. The invention overcomes issues with prior oscillating piston designs by keeping the pistons moving in straight lines within cylinders using conventional piston rings.
More from International Journal of Engineering Inventions www.ijeijournal.com (20)
call for papers, research paper publishing, where to publish research paper, journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJEI, call for papers 2012,journal of science and technolog
1. International Journal of Engineering Inventions
ISSN: 2278-7461, www.ijeijournal.com
Volume 1, Issue 4 (September 2012) PP: 01-05
A Literature Survey on Latent Semantic Indexing
Ashwini Deshmukh1, Gayatri Hegde2
1,2
Computer Engineering Department,Mumbai University,New Panvel,India.
Abstract––Working of web engine is to store and retrieve web pages. One of the various methods such as crawling,
indexing is used. In this paper we will be discussing about latent semantic indexing which uses indexing technique. When
a user enters a query into a search engine the engine examines its index and provides a listing of best-matching web
pages according to its criteria, usually with a short summary containing the document's title and sometimes parts of the
text. The index is built from the information stored with the data and the method by which the information is indexed. In
this study, we perform a systematic introduction and presentation of different information retrieval techniques and its
comparison with latent semantic indexing (LSI). A comparison table has been made to give a better look of different
models. Few limitations of LSI are discussed.
Keywords –– Data mining, Latent Semantic Indexing, retrieval techniques, Singular Value Decomposition,
I. INTRODUCTION
Latent semantic indexing is a retrieval technique that indexes and uses a mathematical technique called
(SVD)Singular Value Decomposition, which identifies the pattern in an unstructured collection of text and find relationship
between patterns. Latent semantic indexing (LSI) has emerged as a competitive text retrieval technique. LSI is a variant of
the vector space model in which a low-rank approximation to the vector space representation of the database is computed
[1]. There are usually many ways to express a given concept, so the literal terms in a user‘s query may not match those of a
relevant document. In addition, most words have multiple meanings, so terms in a user‘s query will literally match terms in
documents that are not of interest to the user [2]. Textual documents have many similar words such documents can be
classified depending upon similarities in terms using various techniques. The most classic technique is Vector Space Model
(VSM) and SMART system, but due to few disadvantages of this methods improved techniques like LSI & PLSI is used for
semantic search of documents. The first session of the paper gives an overview about different techniques for information
retrieval and its comparison with LSI. Second session discuss the various applications of LSI.
II. LATENT SEMANTIC INDEXING (LSI)
Latent Semantic Indexing is an information retrieval technique that indexes and uses mathematical technique SVD,
which identifies the pattern in an unstructured collection of text and finds relationship between them. A well known method
for improving the quality of similarity search in text is called LSI in which the data is transformed into a new concept space
[6]. This concept space depends upon the document collection in question, since different collections would have different
sets of concepts. LSI is a technique which tries to capture this hidden structure using techniques from linear algebra. The
idea in LSI is to project the data into a small subspace of the original data such that the noise effects of synonymy and
polysemy are removed. The advantageous effects of the conceptual representation extend to problems well beyond the text
domain LSI has emerged as a competitive text retrieval technique. LSI is a variant of the vector space model in which a low-
rank approximation to the vector space representation of the database is computed.
2.1 Algorithm for LSI
To perform Latent Semantic Indexing on a group of documents, following steps are performed:
Firstly, convert each document in your index into a vector of word occurrences. The number of dimensions your vector
exists in is equal to the number of unique words in the entire document set. Most document vectors will have large empty
patches, some will be quite full. Next, scale each vector so that every term reflects the frequency of its occurrence in context.
Next, combine these column vectors into a large term-document matrix. Rows represent terms, columns represent
documents. Perform SVD on the term-document matrix. This will result in three matrices commonly called U, S and V. S is
of particular interest, it is a diagonal matrix of singular values for document system.
Set all but the k highest singular values to 0. k is a parameter that needs to be tuned based on your space. Very low values
of k are very lossy, and net poor results. But very high values of k do not change the results much from simple vector search.
This makes a new matrix, S'. Recombine the terms to form the original matrix (i.e., U * S' * V(t) = M' where (t) signifies
transpose). Break this reduced rank term-document matrix back into column vectors. Associate these with their
corresponding documents. Finally this results into Latent Semantic Index. The first paragraph under each heading or
subheading should be flush left, and subsequent paragraphs should have
III. INFORMATION RETRIEVAL TECHNIQUES
There exist various information retrieval techniques.
1.1 SMART System
1
2. A Literature Survey On Latent Semantic Indexing
In the Smart system, the vector-processing model of retrieval is used to transform both the available information requests as
well as the stored documents into vectors of the form:
Di = (wi1; wi2;:::; wit) (1)
Where Di represents a document (or query) text and wi is the weight of term Tk in document Di. A weight zero is
used for terms that are absent from a particular document, and positive weights characterize terms actually assigned[4].The
SMART system is a fully-automatic document retrieval system, capable of processing on a 7094 computer search requests,
documents available in English, and of retrieving those documents most nearly similar to the corresponding queries. The
machine programs, consisting of approximately 150,000 program steps, can be used not only for language analysis and
retrieval, but also for the evaluation of search effectiveness by processing each search request in several different ways while
comparing the results obtained in each case [5]. Steps for information retrieval SMART systemare as shown in Fig 1: [11]
Establish Classify Index by
Document Document Concepts
collection and Weights
Concepts
User Calculate Relevance
Conveys Similarity Feedback
Information
by User
need
Figure1. Information Retrieval SMART System[11]
3.2 Vector Space Model (VSM)
The representation of a set of documents as vectors in a common vector space is known as the vector space model
and is fundamental to a host of information. Vector Space Model‘s retrieval operations ranges from scoring documents on a
query, document classification and document clustering.The vector model is a mathematical-based model that represents
terms, documents and queries by vectors and provides a ranking[6].VSM can be divided in three steps:
Document indexing where content bearing words are extracted
Weighting of indexed terms to enhance retrieval of document
Ranks the document with respect to the query according to the similarity measure [14] Vector space model is an
information retrieval technique in which , a word is represented as a vector. Each dimension corresponds to a
contextual pattern. The similarity between two words is calculated by the cosine of the angle between their vectors
[7].
3 3. Concept Indexing (CI)
In term-matching method similarity between query and the document is tested lexically [7]. Polysemy (words
having multiple meaning) and synonymy (multiple words having the same meaning) are two fundamental problems in
efficient information retrieval. Here we compare two techniques for conceptual indexing based on projection of vectors of
documents (in means of least squares) on lower-dimensional vector space Latent semantic indexing (LSI)Concept indexing
(CI)Indexing using concept decomposition(CD) instead of SVD like in LSI Concept decomposition was introduced in 2001:
First step: clustering of documents in term-document matrix A on k groups
Clustering algorithms: Spherical k-means algorithm is a variant of k-means algorithm which uses the fact that vectors of
documents are of the unit norm Concept matrix is matrix whose columns are centroids of groups cj– centroid of j-th group.
Second step: calculating the concept decomposition
Concept decomposition Dk of term-document matrix A is least squares approximation of A on the space of concept vectors:
DK= CKZ (2)
Where Z is solution of the least squares problem given as :
Z= (3)
Rows of Ck = terms
Columns of Z = documents
3.4 . Probabilistic Latent Semantic Analysis (PLSA)
It is a statistical technique for the analysis of two-modes and co-occurrence data. PLSA evolved from latent
semantic analysis, adding a sounder probabilistic model. Compared to standard latent semantic analysis which stems
from linear algebra and downsizes the occurrence tables (usually via a singular value decomposition), probabilistic latent
semantic analysis is based on a mixture decomposition derived from a latent class model. This results in a more principled
approach which has a solid foundation in statistics. Probabilistic Latent Semantic Analysis (PLSA) is one of the most
popular statistical techniques for the analysis of two-model and co-occurrence data [8]. Considering observations in the form
of co-occurrences (w,d) of words and documents, PLSA models the probability of each co-occurrence as a mixture of
conditionally independent multinomial distributions:
2
3. A Literature Survey On Latent Semantic Indexing
(4)
The first formulation is the symmetric formulation, where ‗w‘ and ‗d‘ are both generated from the latent
class ‗c‘ in similar ways (using the conditional probabilities P(d|c) and P(w|c) ), whereas the second formulation is
the asymmetric formulation, where, for each document d, a latent class is chosen conditionally to the document according to
P(c|d), and a word is then generated from that class according to P(w|c).The starting point for Probabilistic Latent Semantic
Analysis is a statistical model which has been called aspect model [13] The aspect model is a latent variable model for co-
occurrence data which associates an unobserved class variable z 2 Z = fz1; : : zKg with each observation. A joint probability
model over D W is as shown is Fig 2.
Figure 2: Graphical model representation of the aspect model in the asymmetric (a) and symmetric (b) parameterization
IV. COMPARING VARIOUS TECHNIQUES
While composing a document, different expressions are used to express same or somewhat different data and the
use of synonyms or acronyms is very common. vector space model regards each term as a separate dimension axis .various
search has been done which proves that VSM creates a hyper dimension space vector of which 99% of term document
matrix are empty [3]. Limitations of VSM like long documents are poorly represented because they have poor similarity
values, Search keywords must precisely match document terms; word substrings might result in a "false positive match" and
more makes it difficult to used in some cases ,hence a more improved method for semantic search and dimension reduction
LSI is introduced. SMART system prepossess the document by tokenizing the text into words ,removing common words that
appear on its stop-list and performing stemming on the remaining words to derive a set of terms .it is a UNIX based retrieval
system hence cataloging and retrieval of software component is difficult. LSI is superior in performance that SMART in
most of cases[9].
Table1: Document Table 1
D1 Survey of textmining: clustering, classification, and retrieval
Automatic text processing: the transformation analysis and retrieval of
D2
information by computer
D3 Elementary linearalgebra: A matrix approach
D4 Matrixalgebra& its applications statistics and econometrics
D5 Effective databases for text&document management
D6 Matrices, vectorspaces, and informationretrieval
D7 Matrixanalysis and appliedlinearalgebra
D8 Topological vectorspaces and algebras
Table2: Document Table 2
D9 Informationretrieval: data structures &algorithms
D10 Vectorspaces and algebras for chemistry and physics
D11 Classification, clustering and dataanalysis
D12 Clustering of large data sets
D13 Clustering algorithms
D14 Document warehousing and textmining: techniques for improving
business operations, marketing and sales
D15 Datamining and knowledge discovery
Table 3: Terms
Document Linear Algebra Neutral term
Term
Text Linear Analysis
Mining Algebra Application
Clustering Matrix Algorithm
3
4. A Literature Survey On Latent Semantic Indexing
Classification Vector
Retrieval Space
Information
Document
Data
0.5
0.5 data mining terms
linear algebra terms
0.45
0.4
0.4
0.3
0.35
0.2
0.3
0.1
0 0.25
-0.1 0.2
-0.2 0.15
-0.3 0.1
data mining terms 0.05
-0.4
linear algebra terms
-0.5 0
0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
Projection of terms by SVD Projection of terms by CD
4.1 Result from Anlysis
V. APPLICATIONS OF LSI
LSI is used in various application .as web technology is growing new web search engines are developed to retrieve
as accurate data as possible .Few areas where LSI is used is listed as follows :-
LSI is used to clustering algorithm in medical documents as such documents as it includes many acronyms of
clinical data [3].Used as retrieval method for analysing broken web links. The best results are obtained applying KLD when
the cache page is available, otherwise a co-occurrence method. The use of Latent Semantic Analysis has been prevalent in
the study of human memory, especially in areas of free recall and memory search it was found ,a positive correlation
between the semantic similarity of two words (as measured by LSA) and the probability that the words would be recalled
one after another in free recall tasks. To Identify Similar Pages in Web Applications This approach is based on a process that
first computes the dissimilarity between Web pages using latent semantic indexing, and then groups similar pages using
clustering algorithms.[12] To compare content of audio we use LSI. The latent semantic indexing (LSI) is applied on audio
clips-feature vectors matrix mapping the clips content into low dimensional latent semantic space. The clips are compared
using document-document comparison measure based in LSI. The similarity based on LSI is compared with the results
obtained by using the standard vector space model [15]. LSI to overcome semantic problem on image retrieval based on
automatic image annotation. Statistical machine translation used to automatically annotates the image. This approach
considers image annotation as the translation of image regions to words [15]. A bug triage system is used for validation and
allocation of bug reports to the most appropriate developers. Automatic bug triage system may reduce the software
maintenance time and improve its quality by correct and timely assignment of new bug reports to the appropriate developers
[16].
4