SlideShare a Scribd company logo
Reengineering PDF-Based Documents
Targeting Complex Software
Specifications
Moutasm tamimi, Ahid yaseen
Software Engineering
Nojoumian, M., & Lethbridge, T. C. (2011). Reengineering PDF-based documents targeting complex
software specifications. International Journal of Knowledge and Web Intelligence, 2(4), 292-319.
Outline
o Review
o Abstract
o Contribution and Motivation
o Related Work
o Document Transformation
o Evaluation
o Logical Structure Extraction
o multilayer hypertext versions elements
o Checking Well-formedness and Validity
◦ Producing Multiple Outputs
◦ Examples
◦ Concept extraction
◦ Cross referencing
◦ Evaluation, Usability, And Architecture
◦ Architecture of the proposed framework
◦ Conclusion
◦ Future Work
Review
1. Extensible Mark-up Language (XML) is a mark-up language that
defines a set of rules for encoding documents in a format that is
both human-readable and machine-readable.
2. XPath function: You can use XML Path Language (Xpath) functions
to refine XPath queries and enhance the programming power and
flexibility of XPath.
Abstract
• This paper investigated the process of reengineering the complex
PDF documents by focusing on the Object Management Group
(OMG) standards and roles to produce the multilayer hypertext
interfaces, which can be more applicable of electronic documents.
Contribution and Motivation
Key contributions:
1. An efficient technique for capturing document structure
2. Various techniques for text extraction
3. A general approach for document engineering
4. Significant values and usability in the final result.
Related Work
1. Document Structure Analysis
2. PDF Document Analysis
3. Leveraging Tables of Contents
Document Transformation
Criteria extract the document’s logical structure and convert it to
XML:
Generality
Low
volume
Easy
processing
Tagging
structure
Containing
clues
Evaluation
The techniques of examining the given transformation criteria
DOC and RTF formats are
generally messy
PDF complexity
Logical Structure Extraction
1. First Refinement Approach (it failed in different chapters)
• In this method start of search and correspond the main tags like
<Part>, <Sect> and <Div>, which indicated at start and end of chapter
or sections in Adobe Acrobat.
• In practice authors applied the methods in sample of large document
and uneven chapters and found that this method unlikely failed, with
reason of forget tagging rightly the method close for<Sect> tag
incorrectly in wrong places
1. Logical Structure Extraction
• 2- Second Implementation Approach (LinkTarget,
LinkTargetQueue)
• In this method start of search and correspond the main tags like
<Part>, <Sect> and <Div>, which indicated at start and end of
chapter or sections in Adobe Acrobat.
• In practice authors applied the methods in sample of large
document and uneven chapters and found that this method
unlikely failed, with reason of forget tagging rightly the method
close for<Sect> tag incorrectly in wrong places
2. Text Extraction
• In 1990, Nielsen demonstrated the Hypertext and
hypermedia which considered the related information in
other data sources, the importance of these issues has
illustrated in the computer applications associated with
structured information like on-line documentation or
computer-aided learning, in order to construct a general
structure for our hypertext interfaces.
Multilayer Hypertext Versions Elements
A page for the table of contents
A separate page for each heading types
Hyperlinks for accessing to the table of
contents
Some pages for extracted concepts
Various cross references throughout the document
i.e. : a single page of a document
i.e. : part, chapter, section, and subsection
i.e. : Associations
i.e. : package and class hierarchy of the
UML
i.e. : content linked with figures
2.1 Checking Well-formedness and Validity
• A well-formed content based on the XML document with
opening and closing tags, and nested logical rules to be able
to check and validate it by Stylus Studio® XML tool. i.e.,
document must have it conducted schema, the uses tags
must be within the schema content.
2.2 Producing Multiple Outputs
• Five motivations to generate small hypertext pages:
1. A better sense of location: Best practice to the cross-references
in the content,
i.e syntax <a name=“xyz”> and <a href=“#xyz”> to navigate and move
between sections.
2. Less chance of getting lost: The end-users can scroll between
pages and have the movements between the parts. The problem
of a jump when the end-users move from part to another.
3. A less-overwhelming sensation: The end-user can operate the
large amounts of data and comprehend the content from the
small document.
4. Faster loading: The end-user ignoring the download of the big
document.
5. Statistical analysis: looking at the importance of information to
deal with the enhancement of the specification itself.
A better sense of location
Less chance of getting lost
A less-overwhelming sensation
Faster loading
Statistical analysis
The produced function based on 3 issues
• Folder named “folder-name”: contains the hypertext files
• @Number = attribute <Part>, <Chapter>, <Section>, <Subsection>
• Outputs: I.html, 7.html, 7.1.html, 7.2.html, 7.3.html, 7.3.1.html,
7.3.2.html.
Examples
2.3 Connecting Hypertext Pages Sequentially
• A Hypertext can be presented based on
XSLT code in a file by Previous and
Next at the above of the pages.
• By extracting elements attribute
sequentially (1, 2, …, 7, 7.1, 7.2, 7.3,
7.3.1, etc) stored in the Num.txt file to
carry out the Procedure Linker ()
algorithm to deal with the process of
building the hypertext pages.
2.4 Forming Major Document Elements
• 2.4.1 Figure
• 2.4.2 Table
• 2.4.3 List
2.4.1 Figures
• This section carried out in
transformation phase by the following
procedures for Figures XPath
expressions and XSLT codes;
• Convert the document to initial XML file
by the Adobe Acrobat Professional,
create a folder called “images” to the
same file. Store overall the figures in
that folder “folder-name_img_1.jpg”,
the XML file contains two elements
“src” means <ImageData>, and figure
<Caption>.
Cells Level string
<TD> When: position () =
1 <TD>
Level 1
<TD> When: position ()
=2 <TD>
Level 2
2.4.2 Tables
• In this section authors generated the relevant caption, and then
selected the TableRow element. Therefore, they constructed all table
cells. After that authors returned the index position of the node that
is currently being processed by XPath function: position(). Finally they
applied many expressions on each column.
2.4.3 Lists
• This section supported the XPath expressions based on a
style sheet design to recover the process of extracting
and transforming the Lists data in a document. According
to the XPath expressions given the table below:
Style sheet design XPath expressions
element <L></L>
lists <LI_Label> ……….. </LI_Label>
<LI_Title> ……….. </ LI_Title>
<xsl:for-each select="LI_Label">……….
<xsl:for-each select="LI_Title">
3. Concept extraction
1. Modeling Class Hierarchy Extraction
2. Modeling Package Hierarchy Extraction
4. Cross referencing
• To facilitate document browsing for end users, we created hyperlinks
for major document keywords (for example, class names as well as
package names) throughout the generated user interfaces. As we
mentioned previously, since these keywords were among document
headings, each of them had an independent hypertext page or anchor
link in the final user interfaces.
Evaluation, Usability, And Architecture
1. Reengineering of Various OMG Specifications
2. Usability of Multilayer Hypertext Interfaces: following benefits
through our usability studies, which did not exist in the original
PDF formats, or Adobe-Generated HTML formats:.
• Navigating
• Scrolling
• Processing
• Learning
• Monitoring
• Downloading
• Referencing
• Coloring
• Keeping track
Architecture of the Proposed Framework
Conclusion
• An approach for taking raw PDF versions of complex documents (e.g.,
specifications) and converting them into multilayer hypertext
interfaces. For each document, we first generated a clean XML
document with meaningful tags, and then constructed from this a
series of hypertext pages constituting the final system.
Future Work
1. Extract the initial XML document from other formats such as DOC,
RTF, HTML, etc. This can extend our framework for other kinds of
formats and documents.
2. Automate the concept extractions or at least create some features
for the detection of the logical relationships among headings
3. Improve the current solution and discover new users’ demands.
Only by such an investigation we can have a deep understanding of
users’ difficulties.
Example
• https://www.iro.umontreal.ca/~pift1025/bigjava/Ch26/ch26.html
Thank you
Speaker Information
 Moutasm tamimi
 Masters of Software Engineering
 Independent Consultant , IT Researcher.
 CEO at ITG7.com , IT-CRG.com
 Email: tamimi@itg7.com,
Click Here
Click HereI T G 7
Click Here
Click HereIT-CRG

More Related Content

What's hot

SE2018_Lec 15_ Software Design
SE2018_Lec 15_ Software DesignSE2018_Lec 15_ Software Design
SE2018_Lec 15_ Software Design
Amr E. Mohamed
 
Chapter17 system implementation
Chapter17 system implementationChapter17 system implementation
Chapter17 system implementation
Dhani Ahmad
 
PS02CINT22 SE Software Maintenance
PS02CINT22 SE Software MaintenancePS02CINT22 SE Software Maintenance
PS02CINT22 SE Software Maintenance
Conestoga Collage
 
Software Evolution
Software EvolutionSoftware Evolution
Software Evolution
Md. Shafiuzzaman Hira
 
Ch21-Software Engineering 9
Ch21-Software Engineering 9Ch21-Software Engineering 9
Ch21-Software Engineering 9
Ian Sommerville
 
Object Oriented Design
Object Oriented DesignObject Oriented Design
Object Oriented Design
AMITJain879
 
SE18_Lec 02_Software Life Cycle Model
SE18_Lec 02_Software Life Cycle ModelSE18_Lec 02_Software Life Cycle Model
SE18_Lec 02_Software Life Cycle Model
Amr E. Mohamed
 
System Development Life Cycle & Implementation of MIS
System Development Life Cycle & Implementation of MISSystem Development Life Cycle & Implementation of MIS
System Development Life Cycle & Implementation of MIS
George V James
 
Work of art practices in software development.
Work of art practices in software development. Work of art practices in software development.
Work of art practices in software development.
Communication Progress
 
Software Reliability
Software ReliabilitySoftware Reliability
Software Reliability
Gurkamal Rakhra
 
Ch5- Software Engineering 9
Ch5- Software Engineering 9Ch5- Software Engineering 9
Ch5- Software Engineering 9
Ian Sommerville
 
Software re engineering
Software re engineeringSoftware re engineering
Software re engineering
deshpandeamrut
 
Contributors to Reduce Maintainability Cost at the Software Implementation Phase
Contributors to Reduce Maintainability Cost at the Software Implementation PhaseContributors to Reduce Maintainability Cost at the Software Implementation Phase
Contributors to Reduce Maintainability Cost at the Software Implementation Phase
Waqas Tariq
 
System Analysis And Design 2011
System Analysis And Design  2011System Analysis And Design  2011
System Analysis And Design 2011
tgushi12
 
Quality Attribute: Testability
Quality Attribute: TestabilityQuality Attribute: Testability
Quality Attribute: Testability
Pranay Singh
 
M azhar
M azharM azhar
M azhar
Mazhar Saleem
 
Customizing iso 9126 quality model for evaluation of b2 b applications
Customizing iso 9126 quality model for evaluation of b2 b applicationsCustomizing iso 9126 quality model for evaluation of b2 b applications
Customizing iso 9126 quality model for evaluation of b2 b applications
Dean Yeh, PMP®, PMI-RMP®, Certified ScrumMaster®
 
Test software use case
Test software use caseTest software use case
Test software use case
Carlos J. Brito Abundis
 
Configuration Management in Software Engineering - SE29
Configuration Management in Software Engineering - SE29Configuration Management in Software Engineering - SE29
Configuration Management in Software Engineering - SE29
koolkampus
 
5 chap - MAINTENANCE
5 chap - MAINTENANCE5 chap - MAINTENANCE
5 chap - MAINTENANCE
sujitkumar Sujit.Karande
 

What's hot (20)

SE2018_Lec 15_ Software Design
SE2018_Lec 15_ Software DesignSE2018_Lec 15_ Software Design
SE2018_Lec 15_ Software Design
 
Chapter17 system implementation
Chapter17 system implementationChapter17 system implementation
Chapter17 system implementation
 
PS02CINT22 SE Software Maintenance
PS02CINT22 SE Software MaintenancePS02CINT22 SE Software Maintenance
PS02CINT22 SE Software Maintenance
 
Software Evolution
Software EvolutionSoftware Evolution
Software Evolution
 
Ch21-Software Engineering 9
Ch21-Software Engineering 9Ch21-Software Engineering 9
Ch21-Software Engineering 9
 
Object Oriented Design
Object Oriented DesignObject Oriented Design
Object Oriented Design
 
SE18_Lec 02_Software Life Cycle Model
SE18_Lec 02_Software Life Cycle ModelSE18_Lec 02_Software Life Cycle Model
SE18_Lec 02_Software Life Cycle Model
 
System Development Life Cycle & Implementation of MIS
System Development Life Cycle & Implementation of MISSystem Development Life Cycle & Implementation of MIS
System Development Life Cycle & Implementation of MIS
 
Work of art practices in software development.
Work of art practices in software development. Work of art practices in software development.
Work of art practices in software development.
 
Software Reliability
Software ReliabilitySoftware Reliability
Software Reliability
 
Ch5- Software Engineering 9
Ch5- Software Engineering 9Ch5- Software Engineering 9
Ch5- Software Engineering 9
 
Software re engineering
Software re engineeringSoftware re engineering
Software re engineering
 
Contributors to Reduce Maintainability Cost at the Software Implementation Phase
Contributors to Reduce Maintainability Cost at the Software Implementation PhaseContributors to Reduce Maintainability Cost at the Software Implementation Phase
Contributors to Reduce Maintainability Cost at the Software Implementation Phase
 
System Analysis And Design 2011
System Analysis And Design  2011System Analysis And Design  2011
System Analysis And Design 2011
 
Quality Attribute: Testability
Quality Attribute: TestabilityQuality Attribute: Testability
Quality Attribute: Testability
 
M azhar
M azharM azhar
M azhar
 
Customizing iso 9126 quality model for evaluation of b2 b applications
Customizing iso 9126 quality model for evaluation of b2 b applicationsCustomizing iso 9126 quality model for evaluation of b2 b applications
Customizing iso 9126 quality model for evaluation of b2 b applications
 
Test software use case
Test software use caseTest software use case
Test software use case
 
Configuration Management in Software Engineering - SE29
Configuration Management in Software Engineering - SE29Configuration Management in Software Engineering - SE29
Configuration Management in Software Engineering - SE29
 
5 chap - MAINTENANCE
5 chap - MAINTENANCE5 chap - MAINTENANCE
5 chap - MAINTENANCE
 

Similar to Reengineering PDF-Based Documents Targeting Complex Software Specifications

IRJET- Resume Information Extraction Framework
IRJET- Resume Information Extraction FrameworkIRJET- Resume Information Extraction Framework
IRJET- Resume Information Extraction Framework
IRJET Journal
 
LangChain + Docugami Webinar
LangChain + Docugami WebinarLangChain + Docugami Webinar
LangChain + Docugami Webinar
Taqi Jaffri
 
Remus_3_0
Remus_3_0Remus_3_0
Remus_3_0
Prashasth Patil
 
DOC-20210303-WA0017..pptx,coding stuff in c
DOC-20210303-WA0017..pptx,coding stuff in cDOC-20210303-WA0017..pptx,coding stuff in c
DOC-20210303-WA0017..pptx,coding stuff in c
floraaluoch3
 
The Missing Link: Metadata Conversion Workflows for Everyone
The Missing Link: Metadata Conversion Workflows for EveryoneThe Missing Link: Metadata Conversion Workflows for Everyone
The Missing Link: Metadata Conversion Workflows for Everyone
Andrea Payant
 
Multimedia system
Multimedia systemMultimedia system
Multimedia system
pavishkumarsingh
 
Multimedia system(OPEN DOCUMENT ARCHITECTURE AND INTERCHANGING FORMAT)
Multimedia system(OPEN DOCUMENT ARCHITECTURE AND INTERCHANGING FORMAT)Multimedia system(OPEN DOCUMENT ARCHITECTURE AND INTERCHANGING FORMAT)
Multimedia system(OPEN DOCUMENT ARCHITECTURE AND INTERCHANGING FORMAT)
pavishkumarsingh
 
accessible_pdf_webinar.ppt
accessible_pdf_webinar.pptaccessible_pdf_webinar.ppt
accessible_pdf_webinar.ppt
Chelo603470
 
Technical writing tools
Technical writing toolsTechnical writing tools
Technical writing tools
Anil Menon
 
Pem Overview20090130
Pem Overview20090130Pem Overview20090130
Pem Overview20090130
brianlbrinker
 
Section 508 Compliance and Remediation Procdure_MMEdits (2)
Section 508 Compliance and Remediation Procdure_MMEdits (2)Section 508 Compliance and Remediation Procdure_MMEdits (2)
Section 508 Compliance and Remediation Procdure_MMEdits (2)
Carlos F. Camargo, Ph.D. - Realtor, CalBRE #01988431
 
Chapter 1 Getting Started with HTML5
Chapter 1 Getting Started with HTML5Chapter 1 Getting Started with HTML5
Chapter 1 Getting Started with HTML5
Dr. Ahmed Al Zaidy
 
Mastering Web Scraping with JSoup Unlocking the Secrets of HTML Parsing
Mastering Web Scraping with JSoup Unlocking the Secrets of HTML ParsingMastering Web Scraping with JSoup Unlocking the Secrets of HTML Parsing
Mastering Web Scraping with JSoup Unlocking the Secrets of HTML Parsing
Knoldus Inc.
 
Web Design
Web DesignWeb Design
Web Design
Rawshan Ali
 
Data mining model for the data retrieval from central server configuration
Data mining model for the data retrieval from central server configurationData mining model for the data retrieval from central server configuration
Data mining model for the data retrieval from central server configuration
ijcsit
 
FME World Tour 2015 - FME & Data Migration Simon McCabe
FME World Tour 2015 -  FME & Data Migration Simon McCabeFME World Tour 2015 -  FME & Data Migration Simon McCabe
FME World Tour 2015 - FME & Data Migration Simon McCabe
IMGS
 
The Trip to DITA
The Trip to DITAThe Trip to DITA
The Trip to DITA
ClearPath, LLC
 
5010
50105010
CSCI6505 Project:Construct search engine using ML approach
CSCI6505 Project:Construct search engine using ML approachCSCI6505 Project:Construct search engine using ML approach
CSCI6505 Project:Construct search engine using ML approach
butest
 
Unit-III(Design).pptx
Unit-III(Design).pptxUnit-III(Design).pptx
Unit-III(Design).pptx
Fajar Baskoro
 

Similar to Reengineering PDF-Based Documents Targeting Complex Software Specifications (20)

IRJET- Resume Information Extraction Framework
IRJET- Resume Information Extraction FrameworkIRJET- Resume Information Extraction Framework
IRJET- Resume Information Extraction Framework
 
LangChain + Docugami Webinar
LangChain + Docugami WebinarLangChain + Docugami Webinar
LangChain + Docugami Webinar
 
Remus_3_0
Remus_3_0Remus_3_0
Remus_3_0
 
DOC-20210303-WA0017..pptx,coding stuff in c
DOC-20210303-WA0017..pptx,coding stuff in cDOC-20210303-WA0017..pptx,coding stuff in c
DOC-20210303-WA0017..pptx,coding stuff in c
 
The Missing Link: Metadata Conversion Workflows for Everyone
The Missing Link: Metadata Conversion Workflows for EveryoneThe Missing Link: Metadata Conversion Workflows for Everyone
The Missing Link: Metadata Conversion Workflows for Everyone
 
Multimedia system
Multimedia systemMultimedia system
Multimedia system
 
Multimedia system(OPEN DOCUMENT ARCHITECTURE AND INTERCHANGING FORMAT)
Multimedia system(OPEN DOCUMENT ARCHITECTURE AND INTERCHANGING FORMAT)Multimedia system(OPEN DOCUMENT ARCHITECTURE AND INTERCHANGING FORMAT)
Multimedia system(OPEN DOCUMENT ARCHITECTURE AND INTERCHANGING FORMAT)
 
accessible_pdf_webinar.ppt
accessible_pdf_webinar.pptaccessible_pdf_webinar.ppt
accessible_pdf_webinar.ppt
 
Technical writing tools
Technical writing toolsTechnical writing tools
Technical writing tools
 
Pem Overview20090130
Pem Overview20090130Pem Overview20090130
Pem Overview20090130
 
Section 508 Compliance and Remediation Procdure_MMEdits (2)
Section 508 Compliance and Remediation Procdure_MMEdits (2)Section 508 Compliance and Remediation Procdure_MMEdits (2)
Section 508 Compliance and Remediation Procdure_MMEdits (2)
 
Chapter 1 Getting Started with HTML5
Chapter 1 Getting Started with HTML5Chapter 1 Getting Started with HTML5
Chapter 1 Getting Started with HTML5
 
Mastering Web Scraping with JSoup Unlocking the Secrets of HTML Parsing
Mastering Web Scraping with JSoup Unlocking the Secrets of HTML ParsingMastering Web Scraping with JSoup Unlocking the Secrets of HTML Parsing
Mastering Web Scraping with JSoup Unlocking the Secrets of HTML Parsing
 
Web Design
Web DesignWeb Design
Web Design
 
Data mining model for the data retrieval from central server configuration
Data mining model for the data retrieval from central server configurationData mining model for the data retrieval from central server configuration
Data mining model for the data retrieval from central server configuration
 
FME World Tour 2015 - FME & Data Migration Simon McCabe
FME World Tour 2015 -  FME & Data Migration Simon McCabeFME World Tour 2015 -  FME & Data Migration Simon McCabe
FME World Tour 2015 - FME & Data Migration Simon McCabe
 
The Trip to DITA
The Trip to DITAThe Trip to DITA
The Trip to DITA
 
5010
50105010
5010
 
CSCI6505 Project:Construct search engine using ML approach
CSCI6505 Project:Construct search engine using ML approachCSCI6505 Project:Construct search engine using ML approach
CSCI6505 Project:Construct search engine using ML approach
 
Unit-III(Design).pptx
Unit-III(Design).pptxUnit-III(Design).pptx
Unit-III(Design).pptx
 

More from Moutasm Tamimi

Software Quality Assessment Practices
Software Quality Assessment PracticesSoftware Quality Assessment Practices
Software Quality Assessment Practices
Moutasm Tamimi
 
An integrated security testing framework and tool
An integrated security testing framework  and toolAn integrated security testing framework  and tool
An integrated security testing framework and tool
Moutasm Tamimi
 
Critical Success Factors along ERP life-cycle in Small medium enterprises
Critical Success Factors along ERP life-cycle in Small medium enterprises Critical Success Factors along ERP life-cycle in Small medium enterprises
Critical Success Factors along ERP life-cycle in Small medium enterprises
Moutasm Tamimi
 
Critical Success Factors (CSFs) In International ERP Implementations with que...
Critical Success Factors (CSFs) In International ERP Implementations with que...Critical Success Factors (CSFs) In International ERP Implementations with que...
Critical Success Factors (CSFs) In International ERP Implementations with que...
Moutasm Tamimi
 
Best Practices For Business Analyst - Part 3
Best Practices For Business Analyst - Part 3Best Practices For Business Analyst - Part 3
Best Practices For Business Analyst - Part 3
Moutasm Tamimi
 
Concepts Of business analyst Practices - Part 1
Concepts Of business analyst Practices - Part 1Concepts Of business analyst Practices - Part 1
Concepts Of business analyst Practices - Part 1
Moutasm Tamimi
 
Recovery in Multi database Systems
Recovery in Multi database SystemsRecovery in Multi database Systems
Recovery in Multi database Systems
Moutasm Tamimi
 
Software Quality Models: A Comparative Study paper
Software Quality Models: A Comparative Study  paperSoftware Quality Models: A Comparative Study  paper
Software Quality Models: A Comparative Study paper
Moutasm Tamimi
 
ISO 29110 Software Quality Model For Software SMEs
ISO 29110 Software Quality Model For Software SMEsISO 29110 Software Quality Model For Software SMEs
ISO 29110 Software Quality Model For Software SMEs
Moutasm Tamimi
 
Windows form application - C# Training
Windows form application - C# Training Windows form application - C# Training
Windows form application - C# Training
Moutasm Tamimi
 
Asp.net Programming Training (Web design, Web development)
Asp.net Programming Training (Web design, Web  development)Asp.net Programming Training (Web design, Web  development)
Asp.net Programming Training (Web design, Web development)
Moutasm Tamimi
 
Database Management System - SQL Advanced Training
Database Management System - SQL Advanced TrainingDatabase Management System - SQL Advanced Training
Database Management System - SQL Advanced Training
Moutasm Tamimi
 
Database Management System - SQL beginner Training
Database Management System - SQL beginner Training Database Management System - SQL beginner Training
Database Management System - SQL beginner Training
Moutasm Tamimi
 
Measurement and Quality in Object-Oriented Design
Measurement and Quality in Object-Oriented DesignMeasurement and Quality in Object-Oriented Design
Measurement and Quality in Object-Oriented Design
Moutasm Tamimi
 
SQL Injection and Clickjacking Attack in Web security
SQL Injection and Clickjacking Attack in Web securitySQL Injection and Clickjacking Attack in Web security
SQL Injection and Clickjacking Attack in Web security
Moutasm Tamimi
 

More from Moutasm Tamimi (15)

Software Quality Assessment Practices
Software Quality Assessment PracticesSoftware Quality Assessment Practices
Software Quality Assessment Practices
 
An integrated security testing framework and tool
An integrated security testing framework  and toolAn integrated security testing framework  and tool
An integrated security testing framework and tool
 
Critical Success Factors along ERP life-cycle in Small medium enterprises
Critical Success Factors along ERP life-cycle in Small medium enterprises Critical Success Factors along ERP life-cycle in Small medium enterprises
Critical Success Factors along ERP life-cycle in Small medium enterprises
 
Critical Success Factors (CSFs) In International ERP Implementations with que...
Critical Success Factors (CSFs) In International ERP Implementations with que...Critical Success Factors (CSFs) In International ERP Implementations with que...
Critical Success Factors (CSFs) In International ERP Implementations with que...
 
Best Practices For Business Analyst - Part 3
Best Practices For Business Analyst - Part 3Best Practices For Business Analyst - Part 3
Best Practices For Business Analyst - Part 3
 
Concepts Of business analyst Practices - Part 1
Concepts Of business analyst Practices - Part 1Concepts Of business analyst Practices - Part 1
Concepts Of business analyst Practices - Part 1
 
Recovery in Multi database Systems
Recovery in Multi database SystemsRecovery in Multi database Systems
Recovery in Multi database Systems
 
Software Quality Models: A Comparative Study paper
Software Quality Models: A Comparative Study  paperSoftware Quality Models: A Comparative Study  paper
Software Quality Models: A Comparative Study paper
 
ISO 29110 Software Quality Model For Software SMEs
ISO 29110 Software Quality Model For Software SMEsISO 29110 Software Quality Model For Software SMEs
ISO 29110 Software Quality Model For Software SMEs
 
Windows form application - C# Training
Windows form application - C# Training Windows form application - C# Training
Windows form application - C# Training
 
Asp.net Programming Training (Web design, Web development)
Asp.net Programming Training (Web design, Web  development)Asp.net Programming Training (Web design, Web  development)
Asp.net Programming Training (Web design, Web development)
 
Database Management System - SQL Advanced Training
Database Management System - SQL Advanced TrainingDatabase Management System - SQL Advanced Training
Database Management System - SQL Advanced Training
 
Database Management System - SQL beginner Training
Database Management System - SQL beginner Training Database Management System - SQL beginner Training
Database Management System - SQL beginner Training
 
Measurement and Quality in Object-Oriented Design
Measurement and Quality in Object-Oriented DesignMeasurement and Quality in Object-Oriented Design
Measurement and Quality in Object-Oriented Design
 
SQL Injection and Clickjacking Attack in Web security
SQL Injection and Clickjacking Attack in Web securitySQL Injection and Clickjacking Attack in Web security
SQL Injection and Clickjacking Attack in Web security
 

Recently uploaded

Shivam Pandit working on Php Web Developer.
Shivam Pandit working on Php Web Developer.Shivam Pandit working on Php Web Developer.
Shivam Pandit working on Php Web Developer.
shivamt017
 
Cultural Shifts: Embracing DevOps for Organizational Transformation
Cultural Shifts: Embracing DevOps for Organizational TransformationCultural Shifts: Embracing DevOps for Organizational Transformation
Cultural Shifts: Embracing DevOps for Organizational Transformation
Mindfire Solution
 
一比一原版英国牛津大学毕业证(oxon毕业证书)如何办理
一比一原版英国牛津大学毕业证(oxon毕业证书)如何办理一比一原版英国牛津大学毕业证(oxon毕业证书)如何办理
一比一原版英国牛津大学毕业证(oxon毕业证书)如何办理
avufu
 
WhatsApp Tracker - Tracking WhatsApp to Boost Online Safety.pdf
WhatsApp Tracker -  Tracking WhatsApp to Boost Online Safety.pdfWhatsApp Tracker -  Tracking WhatsApp to Boost Online Safety.pdf
WhatsApp Tracker - Tracking WhatsApp to Boost Online Safety.pdf
onemonitarsoftware
 
Overview of ERP - Mechlin Technologies.pptx
Overview of ERP - Mechlin Technologies.pptxOverview of ERP - Mechlin Technologies.pptx
Overview of ERP - Mechlin Technologies.pptx
Mitchell Marsh
 
React vs Next js: Which is Better for Web Development? - Semiosis Software Pr...
React vs Next js: Which is Better for Web Development? - Semiosis Software Pr...React vs Next js: Which is Better for Web Development? - Semiosis Software Pr...
React vs Next js: Which is Better for Web Development? - Semiosis Software Pr...
Semiosis Software Private Limited
 
Seamless PostgreSQL to Snowflake Data Transfer in 8 Simple Steps
Seamless PostgreSQL to Snowflake Data Transfer in 8 Simple StepsSeamless PostgreSQL to Snowflake Data Transfer in 8 Simple Steps
Seamless PostgreSQL to Snowflake Data Transfer in 8 Simple Steps
Estuary Flow
 
Attendance Tracking From Paper To Digital
Attendance Tracking From Paper To DigitalAttendance Tracking From Paper To Digital
Attendance Tracking From Paper To Digital
Task Tracker
 
dachnug51 - Whats new in domino 14 .pdf
dachnug51 - Whats new in domino 14  .pdfdachnug51 - Whats new in domino 14  .pdf
dachnug51 - Whats new in domino 14 .pdf
DNUG e.V.
 
Break data silos with real-time connectivity using Confluent Cloud Connectors
Break data silos with real-time connectivity using Confluent Cloud ConnectorsBreak data silos with real-time connectivity using Confluent Cloud Connectors
Break data silos with real-time connectivity using Confluent Cloud Connectors
confluent
 
How we built TryBoxLang in under 48 hours
How we built TryBoxLang in under 48 hoursHow we built TryBoxLang in under 48 hours
How we built TryBoxLang in under 48 hours
Ortus Solutions, Corp
 
dachnug51 - All you ever wanted to know about domino licensing.pdf
dachnug51 - All you ever wanted to know about domino licensing.pdfdachnug51 - All you ever wanted to know about domino licensing.pdf
dachnug51 - All you ever wanted to know about domino licensing.pdf
DNUG e.V.
 
Software development... for all? (keynote at ICSOFT'2024)
Software development... for all? (keynote at ICSOFT'2024)Software development... for all? (keynote at ICSOFT'2024)
Software development... for all? (keynote at ICSOFT'2024)
miso_uam
 
MVP Mobile Application - Codearrest.pptx
MVP Mobile Application - Codearrest.pptxMVP Mobile Application - Codearrest.pptx
MVP Mobile Application - Codearrest.pptx
Mitchell Marsh
 
Prada Group Reports Strong Growth in First Quarter …
Prada Group Reports Strong Growth in First Quarter …Prada Group Reports Strong Growth in First Quarter …
Prada Group Reports Strong Growth in First Quarter …
908dutch
 
Google ML-Kit - Understanding on-device machine learning
Google ML-Kit - Understanding on-device machine learningGoogle ML-Kit - Understanding on-device machine learning
Google ML-Kit - Understanding on-device machine learning
VishrutGoyani1
 
active-directory-auditing-solution (2).pptx
active-directory-auditing-solution (2).pptxactive-directory-auditing-solution (2).pptx
active-directory-auditing-solution (2).pptx
sudsdeep
 
Independence Day Hasn’t Always Been a U.S. Holiday.pdf
Independence Day Hasn’t Always Been a U.S. Holiday.pdfIndependence Day Hasn’t Always Been a U.S. Holiday.pdf
Independence Day Hasn’t Always Been a U.S. Holiday.pdf
Livetecs LLC
 
Development of Chatbot Using AI\ML Technologies
Development of Chatbot Using AI\ML TechnologiesDevelopment of Chatbot Using AI\ML Technologies
Development of Chatbot Using AI\ML Technologies
MaisnamLuwangPibarel
 
CViewSurvey Digitech Pvt Ltd that works on a proven C.A.A.G. model.
CViewSurvey Digitech Pvt Ltd that  works on a proven C.A.A.G. model.CViewSurvey Digitech Pvt Ltd that  works on a proven C.A.A.G. model.
CViewSurvey Digitech Pvt Ltd that works on a proven C.A.A.G. model.
bhatinidhi2001
 

Recently uploaded (20)

Shivam Pandit working on Php Web Developer.
Shivam Pandit working on Php Web Developer.Shivam Pandit working on Php Web Developer.
Shivam Pandit working on Php Web Developer.
 
Cultural Shifts: Embracing DevOps for Organizational Transformation
Cultural Shifts: Embracing DevOps for Organizational TransformationCultural Shifts: Embracing DevOps for Organizational Transformation
Cultural Shifts: Embracing DevOps for Organizational Transformation
 
一比一原版英国牛津大学毕业证(oxon毕业证书)如何办理
一比一原版英国牛津大学毕业证(oxon毕业证书)如何办理一比一原版英国牛津大学毕业证(oxon毕业证书)如何办理
一比一原版英国牛津大学毕业证(oxon毕业证书)如何办理
 
WhatsApp Tracker - Tracking WhatsApp to Boost Online Safety.pdf
WhatsApp Tracker -  Tracking WhatsApp to Boost Online Safety.pdfWhatsApp Tracker -  Tracking WhatsApp to Boost Online Safety.pdf
WhatsApp Tracker - Tracking WhatsApp to Boost Online Safety.pdf
 
Overview of ERP - Mechlin Technologies.pptx
Overview of ERP - Mechlin Technologies.pptxOverview of ERP - Mechlin Technologies.pptx
Overview of ERP - Mechlin Technologies.pptx
 
React vs Next js: Which is Better for Web Development? - Semiosis Software Pr...
React vs Next js: Which is Better for Web Development? - Semiosis Software Pr...React vs Next js: Which is Better for Web Development? - Semiosis Software Pr...
React vs Next js: Which is Better for Web Development? - Semiosis Software Pr...
 
Seamless PostgreSQL to Snowflake Data Transfer in 8 Simple Steps
Seamless PostgreSQL to Snowflake Data Transfer in 8 Simple StepsSeamless PostgreSQL to Snowflake Data Transfer in 8 Simple Steps
Seamless PostgreSQL to Snowflake Data Transfer in 8 Simple Steps
 
Attendance Tracking From Paper To Digital
Attendance Tracking From Paper To DigitalAttendance Tracking From Paper To Digital
Attendance Tracking From Paper To Digital
 
dachnug51 - Whats new in domino 14 .pdf
dachnug51 - Whats new in domino 14  .pdfdachnug51 - Whats new in domino 14  .pdf
dachnug51 - Whats new in domino 14 .pdf
 
Break data silos with real-time connectivity using Confluent Cloud Connectors
Break data silos with real-time connectivity using Confluent Cloud ConnectorsBreak data silos with real-time connectivity using Confluent Cloud Connectors
Break data silos with real-time connectivity using Confluent Cloud Connectors
 
How we built TryBoxLang in under 48 hours
How we built TryBoxLang in under 48 hoursHow we built TryBoxLang in under 48 hours
How we built TryBoxLang in under 48 hours
 
dachnug51 - All you ever wanted to know about domino licensing.pdf
dachnug51 - All you ever wanted to know about domino licensing.pdfdachnug51 - All you ever wanted to know about domino licensing.pdf
dachnug51 - All you ever wanted to know about domino licensing.pdf
 
Software development... for all? (keynote at ICSOFT'2024)
Software development... for all? (keynote at ICSOFT'2024)Software development... for all? (keynote at ICSOFT'2024)
Software development... for all? (keynote at ICSOFT'2024)
 
MVP Mobile Application - Codearrest.pptx
MVP Mobile Application - Codearrest.pptxMVP Mobile Application - Codearrest.pptx
MVP Mobile Application - Codearrest.pptx
 
Prada Group Reports Strong Growth in First Quarter …
Prada Group Reports Strong Growth in First Quarter …Prada Group Reports Strong Growth in First Quarter …
Prada Group Reports Strong Growth in First Quarter …
 
Google ML-Kit - Understanding on-device machine learning
Google ML-Kit - Understanding on-device machine learningGoogle ML-Kit - Understanding on-device machine learning
Google ML-Kit - Understanding on-device machine learning
 
active-directory-auditing-solution (2).pptx
active-directory-auditing-solution (2).pptxactive-directory-auditing-solution (2).pptx
active-directory-auditing-solution (2).pptx
 
Independence Day Hasn’t Always Been a U.S. Holiday.pdf
Independence Day Hasn’t Always Been a U.S. Holiday.pdfIndependence Day Hasn’t Always Been a U.S. Holiday.pdf
Independence Day Hasn’t Always Been a U.S. Holiday.pdf
 
Development of Chatbot Using AI\ML Technologies
Development of Chatbot Using AI\ML TechnologiesDevelopment of Chatbot Using AI\ML Technologies
Development of Chatbot Using AI\ML Technologies
 
CViewSurvey Digitech Pvt Ltd that works on a proven C.A.A.G. model.
CViewSurvey Digitech Pvt Ltd that  works on a proven C.A.A.G. model.CViewSurvey Digitech Pvt Ltd that  works on a proven C.A.A.G. model.
CViewSurvey Digitech Pvt Ltd that works on a proven C.A.A.G. model.
 

Reengineering PDF-Based Documents Targeting Complex Software Specifications

  • 1. Reengineering PDF-Based Documents Targeting Complex Software Specifications Moutasm tamimi, Ahid yaseen Software Engineering Nojoumian, M., & Lethbridge, T. C. (2011). Reengineering PDF-based documents targeting complex software specifications. International Journal of Knowledge and Web Intelligence, 2(4), 292-319.
  • 2. Outline o Review o Abstract o Contribution and Motivation o Related Work o Document Transformation o Evaluation o Logical Structure Extraction o multilayer hypertext versions elements o Checking Well-formedness and Validity ◦ Producing Multiple Outputs ◦ Examples ◦ Concept extraction ◦ Cross referencing ◦ Evaluation, Usability, And Architecture ◦ Architecture of the proposed framework ◦ Conclusion ◦ Future Work
  • 3. Review 1. Extensible Mark-up Language (XML) is a mark-up language that defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. 2. XPath function: You can use XML Path Language (Xpath) functions to refine XPath queries and enhance the programming power and flexibility of XPath.
  • 4. Abstract • This paper investigated the process of reengineering the complex PDF documents by focusing on the Object Management Group (OMG) standards and roles to produce the multilayer hypertext interfaces, which can be more applicable of electronic documents.
  • 5. Contribution and Motivation Key contributions: 1. An efficient technique for capturing document structure 2. Various techniques for text extraction 3. A general approach for document engineering 4. Significant values and usability in the final result.
  • 6. Related Work 1. Document Structure Analysis 2. PDF Document Analysis 3. Leveraging Tables of Contents
  • 7. Document Transformation Criteria extract the document’s logical structure and convert it to XML: Generality Low volume Easy processing Tagging structure Containing clues
  • 8. Evaluation The techniques of examining the given transformation criteria DOC and RTF formats are generally messy PDF complexity
  • 9. Logical Structure Extraction 1. First Refinement Approach (it failed in different chapters) • In this method start of search and correspond the main tags like <Part>, <Sect> and <Div>, which indicated at start and end of chapter or sections in Adobe Acrobat. • In practice authors applied the methods in sample of large document and uneven chapters and found that this method unlikely failed, with reason of forget tagging rightly the method close for<Sect> tag incorrectly in wrong places
  • 10. 1. Logical Structure Extraction • 2- Second Implementation Approach (LinkTarget, LinkTargetQueue) • In this method start of search and correspond the main tags like <Part>, <Sect> and <Div>, which indicated at start and end of chapter or sections in Adobe Acrobat. • In practice authors applied the methods in sample of large document and uneven chapters and found that this method unlikely failed, with reason of forget tagging rightly the method close for<Sect> tag incorrectly in wrong places
  • 11. 2. Text Extraction • In 1990, Nielsen demonstrated the Hypertext and hypermedia which considered the related information in other data sources, the importance of these issues has illustrated in the computer applications associated with structured information like on-line documentation or computer-aided learning, in order to construct a general structure for our hypertext interfaces.
  • 12. Multilayer Hypertext Versions Elements A page for the table of contents A separate page for each heading types Hyperlinks for accessing to the table of contents Some pages for extracted concepts Various cross references throughout the document i.e. : a single page of a document i.e. : part, chapter, section, and subsection i.e. : Associations i.e. : package and class hierarchy of the UML i.e. : content linked with figures
  • 13. 2.1 Checking Well-formedness and Validity • A well-formed content based on the XML document with opening and closing tags, and nested logical rules to be able to check and validate it by Stylus Studio® XML tool. i.e., document must have it conducted schema, the uses tags must be within the schema content.
  • 14. 2.2 Producing Multiple Outputs • Five motivations to generate small hypertext pages: 1. A better sense of location: Best practice to the cross-references in the content, i.e syntax <a name=“xyz”> and <a href=“#xyz”> to navigate and move between sections. 2. Less chance of getting lost: The end-users can scroll between pages and have the movements between the parts. The problem of a jump when the end-users move from part to another. 3. A less-overwhelming sensation: The end-user can operate the large amounts of data and comprehend the content from the small document. 4. Faster loading: The end-user ignoring the download of the big document. 5. Statistical analysis: looking at the importance of information to deal with the enhancement of the specification itself. A better sense of location Less chance of getting lost A less-overwhelming sensation Faster loading Statistical analysis
  • 15. The produced function based on 3 issues • Folder named “folder-name”: contains the hypertext files • @Number = attribute <Part>, <Chapter>, <Section>, <Subsection> • Outputs: I.html, 7.html, 7.1.html, 7.2.html, 7.3.html, 7.3.1.html, 7.3.2.html.
  • 17. 2.3 Connecting Hypertext Pages Sequentially • A Hypertext can be presented based on XSLT code in a file by Previous and Next at the above of the pages. • By extracting elements attribute sequentially (1, 2, …, 7, 7.1, 7.2, 7.3, 7.3.1, etc) stored in the Num.txt file to carry out the Procedure Linker () algorithm to deal with the process of building the hypertext pages.
  • 18. 2.4 Forming Major Document Elements • 2.4.1 Figure • 2.4.2 Table • 2.4.3 List
  • 19. 2.4.1 Figures • This section carried out in transformation phase by the following procedures for Figures XPath expressions and XSLT codes; • Convert the document to initial XML file by the Adobe Acrobat Professional, create a folder called “images” to the same file. Store overall the figures in that folder “folder-name_img_1.jpg”, the XML file contains two elements “src” means <ImageData>, and figure <Caption>. Cells Level string <TD> When: position () = 1 <TD> Level 1 <TD> When: position () =2 <TD> Level 2
  • 20. 2.4.2 Tables • In this section authors generated the relevant caption, and then selected the TableRow element. Therefore, they constructed all table cells. After that authors returned the index position of the node that is currently being processed by XPath function: position(). Finally they applied many expressions on each column.
  • 21. 2.4.3 Lists • This section supported the XPath expressions based on a style sheet design to recover the process of extracting and transforming the Lists data in a document. According to the XPath expressions given the table below: Style sheet design XPath expressions element <L></L> lists <LI_Label> ……….. </LI_Label> <LI_Title> ……….. </ LI_Title> <xsl:for-each select="LI_Label">………. <xsl:for-each select="LI_Title">
  • 22. 3. Concept extraction 1. Modeling Class Hierarchy Extraction 2. Modeling Package Hierarchy Extraction
  • 23. 4. Cross referencing • To facilitate document browsing for end users, we created hyperlinks for major document keywords (for example, class names as well as package names) throughout the generated user interfaces. As we mentioned previously, since these keywords were among document headings, each of them had an independent hypertext page or anchor link in the final user interfaces.
  • 24. Evaluation, Usability, And Architecture 1. Reengineering of Various OMG Specifications 2. Usability of Multilayer Hypertext Interfaces: following benefits through our usability studies, which did not exist in the original PDF formats, or Adobe-Generated HTML formats:. • Navigating • Scrolling • Processing • Learning • Monitoring • Downloading • Referencing • Coloring • Keeping track
  • 25. Architecture of the Proposed Framework
  • 26. Conclusion • An approach for taking raw PDF versions of complex documents (e.g., specifications) and converting them into multilayer hypertext interfaces. For each document, we first generated a clean XML document with meaningful tags, and then constructed from this a series of hypertext pages constituting the final system.
  • 27. Future Work 1. Extract the initial XML document from other formats such as DOC, RTF, HTML, etc. This can extend our framework for other kinds of formats and documents. 2. Automate the concept extractions or at least create some features for the detection of the logical relationships among headings 3. Improve the current solution and discover new users’ demands. Only by such an investigation we can have a deep understanding of users’ difficulties.
  • 30. Speaker Information  Moutasm tamimi  Masters of Software Engineering  Independent Consultant , IT Researcher.  CEO at ITG7.com , IT-CRG.com  Email: tamimi@itg7.com, Click Here Click HereI T G 7 Click Here Click HereIT-CRG