SlideShare a Scribd company logo
AnaROXIN–ana-maria.roxin@u-bourgogne.fr
PieterPAUWELS–Pieter.pauwels@ugent.be
Querying and reasoning over
large scale building datasets: an outline of
a performance benchmark
Pieter Pauwels, Tarcisio Mendes deFarias, ChiZhang, Ana Roxin, Jakob Beetz, Jos De Roo,
Christophe Nicolle
International Workshop on Semantic Big Data (SBD 2016)
in conjunction with the 2016 ACM SIGMOD Conference in San Francisco, USA
AnaROXIN–ana-maria.roxin@u-bourgogne.fr
PieterPAUWELS–Pieter.pauwels@ugent.be
Agenda
Introduction
• Contextdescription
• Problemidentified
Testing environment
• ifcOWLandbuildingmodels
• Rulesandqueries
• Triplestores
Results
• Query performance
• Additionalfindings
Queryingand reasoningover large scale buildingdatasets: anoutline of a performancebenchmark
2
July1st, 2016
AnaROXIN–ana-maria.roxin@u-bourgogne.fr
PieterPAUWELS–Pieter.pauwels@ugent.be
Context description
◼ The architecturaldesign andconstructiondomainsworkonadailybasis withmassiveamountsof
data.
◼ In the contextofBIM, aneutral,interoperablerepresentationofinformationconsistsin the Industry
FoundationClasses(IFC) standard
 Difficult to handlethe EXPRESS format
◼ SemanticWebtechnologies havebeen identifiedas apossible solution
 Semantic data enrichment
 Schema and data transformations
◼ A semanticapproachinvolves 3 maincomponents:
Schema (Tbox)
• OWL ontology
• Informationstructure
Instances (ABox)
• Assertions
• Respectsschema definition
Rules(RBox)
• If-Thenstatements
• Involving elementsfrom the
ABoxandtheTBox
Queryingand reasoningover large scale buildingdatasets: anoutline of a performancebenchmark
3
July1st, 2016
AnaROXIN–ana-maria.roxin@u-bourgogne.fr
PieterPAUWELS–Pieter.pauwels@ugent.be
Problem identified
◼ Differentimplementationsexist forthecomponents(TBox,ABox,RBox)ofsuch Semanticapproach
 Diverse reasoning engines
 Diverse queryprocessing techniques
 Diverse queryhandling
 Diverse dataset size
 Diverse dataset complexity
◼ Missing anappropriaterule andqueryexecutionperformancebenchmark
Expressiveness vs.
performance
Queryingand reasoningover large scale buildingdatasets: anoutline of a performancebenchmark
4
July1st, 2016
AnaROXIN–ana-maria.roxin@u-bourgogne.fr
PieterPAUWELS–Pieter.pauwels@ugent.be
Performance benchmark variables
◼ Main components
◼ Theseelements areimplemented into3differentsystems
 SPIN(SPARQL InferenceNotation) and Jena
 EYE
 Stardog
◼ An ensemble ofqueries isaddressedtotheso-createdsystems
Schema
(TBox)
• ifcOWL
Instances(ABox)
• 369ifcOWL-
compliantbuilding
models
Rules
(RBox)
• 68 data
transformationrules
Queryingand reasoningover large scale buildingdatasets: anoutline of a performancebenchmark
5
July1st, 2016
AnaROXIN–ana-maria.roxin@u-bourgogne.fr
PieterPAUWELS–Pieter.pauwels@ugent.be
TBox - the ifcOWL ontology
◼ All building modelsareencoded usingtheifcOWLontology
 Built up underthe impulse of numerousinitiatives during the last 10years
◼ The ontologyused isthe onethatis madepublicly availablebythe buildingSMARTLinked Data
Working Group(LDWG)
 http://ifcowl.openbimstandards.org/IFC4#
 http://ifcowl.openbimstandards.org/IFC4_ADD1#
 http://ifcowl.openbimstandards.org/IFC2X3_TC1#
 http://ifcowl.openbimstandards.org/IFC2X3_Final#
Queryingand reasoningover large scale buildingdatasets: anoutline of a performancebenchmark
6
July1st, 2016
AnaROXIN–ana-maria.roxin@u-bourgogne.fr
PieterPAUWELS–Pieter.pauwels@ugent.be
ifcOWL Stats
July1st, 2016 Queryingand reasoningover large scale buildingdatasets: anoutline of a performancebenchmark
7
Axioms 21306
Logical Axioms 13649
Classes 1230
Object properties 1578
Data properties 5
Individuals 1627
DL expressivity SROIQ(D)
SubClassOf axioms 4622
EquivalentClasses axioms 266
DisjointClasses axioms 2429
SubObjectPropertyOf axioms 1
InverseObjectProperties axioms 94
FunctionalObjectProperty axioms 1441
TransitiveObjectProperty axioms 1
ObjectPropertyDomain axioms 1577
ObjectPropertyRange axioms 1576
FunctionalDataProperty axioms 5
DataPropertyDomain axioms 5
DataPropertyRange axioms 5
Pieter Pauwels and Walter Terkaj, EXPRESS to OWL for
construction industry: towards a recommendable and usable
ifcOWL ontology. Automation in Construction 63: 100-133 (2016).
AnaROXIN–ana-maria.roxin@u-bourgogne.fr
PieterPAUWELS–Pieter.pauwels@ugent.be
Call for papers – special issue in SWJ
◼ SemanticWebJournal– Interoperability,Usability,Applicability
 http://www.semantic-web-journal.net
◼ Specialissue on"SemanticTechnologies andInteroperability intheBuilt Environment"
◼ Importantdates
 March, 1st 2017– paper submission deadline
 May 1st 2017– notification of acceptance
OntologiesforAEC/FM
LinkingBIM modelsto
externaldatasources
Multiplescaleintegration
throughsemanitc
interoperability
Multilingualdataaccess
andannotation
Queryprocessing,query
performance
Semantic-basedbuilding
monitoringsystems
Reasoningwithbuilding
data
Buildingdatapublication
strategies
BigLinkedDatafor
buildinginformation
Queryingand reasoningover large scale buildingdatasets: anoutline of a performancebenchmark
8
July1st, 2016
AnaROXIN–ana-maria.roxin@u-bourgogne.fr
PieterPAUWELS–Pieter.pauwels@ugent.be
ABox – Building sets
◼ SomeBIMmodelsarepubliclyavailable(364), whereasotherareundisclosed (5)
Buildinginformationmodelscreatedwith
differentBIM modellingenvironments
Exported to IFC2x3
Transformed intoifcOWL-compliantRDF
graphs usinga publiclyavailableconverter
BIM environment Number of files
TeklaStructures 227(61,5%)
unknownor manual 38 (10,3%)
AutodeskRevit 27 (7,3%)
XellaBIM 15
AutodeskAutoCAD 12
iTConcrete 9
SDS 8
NemetschekAllPlan 7
GraphiSoftArchiCAD 5
Variousothers 21
IFC instances Average file size
0 – 500,000 0 – 30 MB
500,000– 2,000,000 30 – 100MB
> 2,000,000 > 100MB
Queryingand reasoningover large scale buildingdatasets: anoutline of a performancebenchmark
9
July1st, 2016
AnaROXIN–ana-maria.roxin@u-bourgogne.fr
PieterPAUWELS–Pieter.pauwels@ugent.be
RBox – Data transformation rules
◼ Need forarepresentativesetof rewriterules
◼ 68manuallybuiltrules
◼ Classified in several rule setsaccordingtotheir content
Rule Set
(RS)
Description
RS1
Contains2 rulesforrewritingpropertysetreferencesintoadditionalpropertystatements sbd:hasPropertySet and
sbd:hasProperty. Thisis a small, yetoftenusedrulesetthat can beusedin manycontextsto simplifyqueryinganddatapublicationof common
simplepropertiesattachedtoIFC entityinstances.
RS2
Includes31 rules,allinvolving subtypesofthe IfcRelationship class(e.g. ifcowl:IfcRelAssigns,
ifcowl:IfcRelDecomposes, ifcowl:IfcRelAssociates, ifcowl:IfcRelDefines,
ifcowl:IfcRelConnects)
RS3 Contains3 rulesrelatedto handlinglistsin IFC.
RS4 Containsone rulethat allowswrappingsimpledatatypes.
RS4 Consistsof20 rulesforinferring singlepropertystatements sbd:hasPropertySet andsbd:hasProperty.
RS6 ExtendsRS5 andRS1with 6 additionalrulesforinferringwhetheranobjetis internalorexternalto a building.
RS7 Contains7 rulesdealingwith the (de)compositionof buildingspacesandspatialelements.
Queryingand reasoningover large scale buildingdatasets: anoutline of a performancebenchmark
10
July1st, 2016
AnaROXIN–ana-maria.roxin@u-bourgogne.fr
PieterPAUWELS–Pieter.pauwels@ugent.be
ifcOWL Example Transformation
July1st, 2016 Queryingand reasoningover large scale buildingdatasets: anoutline of a performancebenchmark
11
inst:IfcWindow_1893 inst:IfcWindow_1842
inst:IfcWallStandardCase_696
sbim:hasWindowsbim:hasWindow
AnaROXIN–ana-maria.roxin@u-bourgogne.fr
PieterPAUWELS–Pieter.pauwels@ugent.be
Implementation
• Implementedbased ontheopen source APIs
of Topbraid SPIN(SPIN API 1.4.0)and Apache
Jena (Jena Core 2.11.0,Jena ARQ2.11.0,Jena
TDB 1.0.0)
• Rulesare writtenwith TopbraidComposer
Freeversion,and theyare exportedas RDF
Turtlefiles.
• A smallJava program isimplementedto read
RDF models,schema,rulesfrom the TDB store
and query data.
• AlltheSPARQLqueriesareconfigured using
theJena org.apache.jena.sparql.algebra
package
• To avoid unnecessaryreasoningprocesses,in
thistestenvironment onlythe RDFS
vocabulary issupported.
SPIN+ JenaTDB
• Version‘EYE-Winter16.0302.1557’(‘SWI-
Prolog 7.2.3 (amd64):Aug 252015,
12:24:59’).
• EYE isa semi-backwardreasonerenhanced
with Eulerpath detection.
• Asour rulesetcurrently containsonly rules
using=>, forward reasoningwilltake place.
• Each command is executed5 times
• Each command includesthefullontology, the
fullsetof rulesand theRDFS vocabulary, as
wellasone of the 369buildingmodel filesand
one of the3 query files.
• Notriplestoreisused:triplesare processed
directlyfrom theconsideredfiles.
EYE
• 4.0.2Stardogsemanticgraphdatabase(Java 8,
RDF 1.1graph datamodel, OWL2profiles,
SPARQL1.1)
• OWL reasoner+rule engine.
• Support of SWRLrules,backward-chaining
reasoning
• Reasoningis performedby applyinga query
rewritingapproach (SWRLrulesare taken into
account during thequery rewritingprocess).
• Stardogallowsattaininga DL-expressivity
levelof SROIQ(D).
• Inthisapproach, SWRLrulesare taken into
account during thequery rewritingprocess.
Stardog
Queryingand reasoningover large scale buildingdatasets: anoutline of a performancebenchmark
12
July1st, 2016
AnaROXIN–ana-maria.roxin@u-bourgogne.fr
PieterPAUWELS–Pieter.pauwels@ugent.be
Queries
◼ We have builta limited listof60 queries,eachofwhichtriggersatleast oneof theavailablerules.
◼ As we focushereon queryexecutionperformance,the consideredqueriesareentirelybasedon the
right-handsides ofthe consideredrules.
◼ 3 queries:
 Q1 a simple querywith little results,
 Q2 a simple querywith manyresults,
 and Q3 a complex querythat triggers a considerable numberof rules
Query Query Contents
Q1 ?obj sbd:hasProperty ?p
Q2
?point sbd:hasCoordinateX ?x .
?point sbd:hasCoordinateY ?y .
?point sbd:hasCoordinateZ ?z
Q3 ?d rdf:type sbd:ExternalWall
Queryingand reasoningover large scale buildingdatasets: anoutline of a performancebenchmark
13
July1st, 2016
AnaROXIN–ana-maria.roxin@u-bourgogne.fr
PieterPAUWELS–Pieter.pauwels@ugent.be
Test environment
◼ In onecentralserver
 Supplied by the Universityof Burgundy,researchgroup CheckSem,
 Following specifications: UbuntuOS, Intel Xeon CPU E5-2430at 2.2GHz,6 coresand 16GB of DDR3
RAMmemory
◼ 3 VirtualMachines(VMs) wereset up in thiscentralserver
 SPIN VM (Jena TDB), EYE VM(EYE inferenceengine), Stardog VM(Stardog triplestore)
◼ The VMsweremanagedas separatetestenvironments and
 Each of these VMs had 2 coresout of 6 allocated
 Each containedthe above resources (ontologies, data, rules, queries).
Queryingand reasoningover large scale buildingdatasets: anoutline of a performancebenchmark
14
July1st, 2016
AnaROXIN–ana-maria.roxin@u-bourgogne.fr
PieterPAUWELS–Pieter.pauwels@ugent.be
Results
◼ Queriesappliedon 6hand-picked
building models ofvaryingsize
◼ In theSPINapproach
 For Q1 and Q2, the execution time =
backward-chaininginference process +
actual query execution time
 For Q3, execution time = queryexecution
time itself
◼ In the EYEapproach
 Networktraffic time is ignored
◼ In the Stardogapproach
 Execution time = backward-chaining
inference+ actual queryexecution time
Query
Building
Model
SPIN
(s)
EYE
(s)
Stardog(s)
Q1
(simple,little
results)
BM1 135,36 37,11 13,44
BM2 1,47 0,29 0,17
BM3 24,01 4,87 1,4
BM4 41,28 12,95 3,55
BM5 4,99 1,05 0,33
BM6 0,55 0,16 0,08
Q2
(simple,many
results)
BM1 46,17 2,10 6,82
BM2 92,03 4,20 15,83
BM3 82,68 4,12 15,28
BM4 19,93 1,04 2,81
BM5 3,69 0,21 1,36
BM6 0,74 0,045 1,00
Q3
(complex)
BM1 0,001 0,001 0,07
BM2 0,006 0,003 0,12
BM3 0,002 0,003 0,31
BM4 0,005 0,001 0,20
BM5 0,006 0,013 0,20
BM6 0,001 0,001 0,13
Queryingand reasoningover large scale buildingdatasets: anoutline of a performancebenchmark
15
July1st, 2016
AnaROXIN–ana-maria.roxin@u-bourgogne.fr
PieterPAUWELS–Pieter.pauwels@ugent.be
Query time related to result count
For Q1 for each of the considered
approaches
(green = SPIN; blue = EYE; black = Stardog)
For Q2 for each of the considered
approaches
(green = SPIN; blue = EYE; black = Stardog)
Queryingand reasoningover large scale buildingdatasets: anoutline of a performancebenchmark
16
July1st, 2016
AnaROXIN–ana-maria.roxin@u-bourgogne.fr
PieterPAUWELS–Pieter.pauwels@ugent.be
Additional findings
• The three considered procedures arequite farapartfrom each other, explaining the considerable performance differences, not only between the
procedures, but alsobetween diverse usages within one and the samesystem.
• Algorithms and optimization techniques used for each approach aren't entirely used: differences in indexation algorithms, query rewriting techniques
and rule handling strategies used.
Indexing algorithms, query rewriting techniques, and rule handling strategies
• The disadvantage of forward-chaining reasoning process is that millions of triples can bematerialized (EYE, SPIN forQ1 and Q2)
• Using backward-chaining reasoning allows avoiding triple materialization, thus saving query execution time (Stardog, SPIN forQ3).
Forward- versus backward-chaining
• Query Q3 triggers a rule that in turn triggers several other rules in the rule set. If the firstrule does not fire, however, the process stops early.
• Query Q2, however, fires relatively long rules. It takes more time to make these matches in all three approaches.
Typeof data in the building model
• Loading files in memory at query execution time leads to considerable delays.
Impact of the triple store
• Linear relation: the more results areavailable, the more triples need to bematched, leading to more assertions.
Impact of the number of output results
Queryingand reasoningover large scale buildingdatasets: anoutline of a performancebenchmark
17
July1st, 2016
AnaROXIN–ana-maria.roxin@u-bourgogne.fr
PieterPAUWELS–Pieter.pauwels@ugent.be
Conclusion and future work
◼ Comparisonof3 differentapproaches
 SPIN, EYE andStardog
◼ 3 queriesappliedover 6 differentbuilding models
◼ Futureworkconsistsin
 Specifying morethis initial performancebenchmarkwith additional data and rules
 Executing additional queries on the rest of the set of building models
 Comparingresults ona wider scale:
―forthe individual approaches separately,
―as well as with other approaches not considered here.
Queryingand reasoningover large scale buildingdatasets: anoutline of a performancebenchmark
18
July1st, 2016
AnaROXIN–ana-maria.roxin@u-bourgogne.fr
PieterPAUWELS–Pieter.pauwels@ugent.be
Thank you for your attention.
Pieter Pauwels, Tarcisio Mendes deFarias, ChiZhang,
Ana Roxin, JakobBeetz, Jos De Roo, Christophe Nicolle
International Workshop on Semantic Big Data (SBD 2016)
in conjunction with the 2016 ACM SIGMOD Conference in San Francisco, USA

More Related Content

ACM SIGMOD SBD2016 - Querying and reasoning over large scale building datasets: an outline of a performance benchmark

  • 1. AnaROXIN–ana-maria.roxin@u-bourgogne.fr PieterPAUWELS–Pieter.pauwels@ugent.be Querying and reasoning over large scale building datasets: an outline of a performance benchmark Pieter Pauwels, Tarcisio Mendes deFarias, ChiZhang, Ana Roxin, Jakob Beetz, Jos De Roo, Christophe Nicolle International Workshop on Semantic Big Data (SBD 2016) in conjunction with the 2016 ACM SIGMOD Conference in San Francisco, USA
  • 2. AnaROXIN–ana-maria.roxin@u-bourgogne.fr PieterPAUWELS–Pieter.pauwels@ugent.be Agenda Introduction • Contextdescription • Problemidentified Testing environment • ifcOWLandbuildingmodels • Rulesandqueries • Triplestores Results • Query performance • Additionalfindings Queryingand reasoningover large scale buildingdatasets: anoutline of a performancebenchmark 2 July1st, 2016
  • 3. AnaROXIN–ana-maria.roxin@u-bourgogne.fr PieterPAUWELS–Pieter.pauwels@ugent.be Context description ◼ The architecturaldesign andconstructiondomainsworkonadailybasis withmassiveamountsof data. ◼ In the contextofBIM, aneutral,interoperablerepresentationofinformationconsistsin the Industry FoundationClasses(IFC) standard  Difficult to handlethe EXPRESS format ◼ SemanticWebtechnologies havebeen identifiedas apossible solution  Semantic data enrichment  Schema and data transformations ◼ A semanticapproachinvolves 3 maincomponents: Schema (Tbox) • OWL ontology • Informationstructure Instances (ABox) • Assertions • Respectsschema definition Rules(RBox) • If-Thenstatements • Involving elementsfrom the ABoxandtheTBox Queryingand reasoningover large scale buildingdatasets: anoutline of a performancebenchmark 3 July1st, 2016
  • 4. AnaROXIN–ana-maria.roxin@u-bourgogne.fr PieterPAUWELS–Pieter.pauwels@ugent.be Problem identified ◼ Differentimplementationsexist forthecomponents(TBox,ABox,RBox)ofsuch Semanticapproach  Diverse reasoning engines  Diverse queryprocessing techniques  Diverse queryhandling  Diverse dataset size  Diverse dataset complexity ◼ Missing anappropriaterule andqueryexecutionperformancebenchmark Expressiveness vs. performance Queryingand reasoningover large scale buildingdatasets: anoutline of a performancebenchmark 4 July1st, 2016
  • 5. AnaROXIN–ana-maria.roxin@u-bourgogne.fr PieterPAUWELS–Pieter.pauwels@ugent.be Performance benchmark variables ◼ Main components ◼ Theseelements areimplemented into3differentsystems  SPIN(SPARQL InferenceNotation) and Jena  EYE  Stardog ◼ An ensemble ofqueries isaddressedtotheso-createdsystems Schema (TBox) • ifcOWL Instances(ABox) • 369ifcOWL- compliantbuilding models Rules (RBox) • 68 data transformationrules Queryingand reasoningover large scale buildingdatasets: anoutline of a performancebenchmark 5 July1st, 2016
  • 6. AnaROXIN–ana-maria.roxin@u-bourgogne.fr PieterPAUWELS–Pieter.pauwels@ugent.be TBox - the ifcOWL ontology ◼ All building modelsareencoded usingtheifcOWLontology  Built up underthe impulse of numerousinitiatives during the last 10years ◼ The ontologyused isthe onethatis madepublicly availablebythe buildingSMARTLinked Data Working Group(LDWG)  http://ifcowl.openbimstandards.org/IFC4#  http://ifcowl.openbimstandards.org/IFC4_ADD1#  http://ifcowl.openbimstandards.org/IFC2X3_TC1#  http://ifcowl.openbimstandards.org/IFC2X3_Final# Queryingand reasoningover large scale buildingdatasets: anoutline of a performancebenchmark 6 July1st, 2016
  • 7. AnaROXIN–ana-maria.roxin@u-bourgogne.fr PieterPAUWELS–Pieter.pauwels@ugent.be ifcOWL Stats July1st, 2016 Queryingand reasoningover large scale buildingdatasets: anoutline of a performancebenchmark 7 Axioms 21306 Logical Axioms 13649 Classes 1230 Object properties 1578 Data properties 5 Individuals 1627 DL expressivity SROIQ(D) SubClassOf axioms 4622 EquivalentClasses axioms 266 DisjointClasses axioms 2429 SubObjectPropertyOf axioms 1 InverseObjectProperties axioms 94 FunctionalObjectProperty axioms 1441 TransitiveObjectProperty axioms 1 ObjectPropertyDomain axioms 1577 ObjectPropertyRange axioms 1576 FunctionalDataProperty axioms 5 DataPropertyDomain axioms 5 DataPropertyRange axioms 5 Pieter Pauwels and Walter Terkaj, EXPRESS to OWL for construction industry: towards a recommendable and usable ifcOWL ontology. Automation in Construction 63: 100-133 (2016).
  • 8. AnaROXIN–ana-maria.roxin@u-bourgogne.fr PieterPAUWELS–Pieter.pauwels@ugent.be Call for papers – special issue in SWJ ◼ SemanticWebJournal– Interoperability,Usability,Applicability  http://www.semantic-web-journal.net ◼ Specialissue on"SemanticTechnologies andInteroperability intheBuilt Environment" ◼ Importantdates  March, 1st 2017– paper submission deadline  May 1st 2017– notification of acceptance OntologiesforAEC/FM LinkingBIM modelsto externaldatasources Multiplescaleintegration throughsemanitc interoperability Multilingualdataaccess andannotation Queryprocessing,query performance Semantic-basedbuilding monitoringsystems Reasoningwithbuilding data Buildingdatapublication strategies BigLinkedDatafor buildinginformation Queryingand reasoningover large scale buildingdatasets: anoutline of a performancebenchmark 8 July1st, 2016
  • 9. AnaROXIN–ana-maria.roxin@u-bourgogne.fr PieterPAUWELS–Pieter.pauwels@ugent.be ABox – Building sets ◼ SomeBIMmodelsarepubliclyavailable(364), whereasotherareundisclosed (5) Buildinginformationmodelscreatedwith differentBIM modellingenvironments Exported to IFC2x3 Transformed intoifcOWL-compliantRDF graphs usinga publiclyavailableconverter BIM environment Number of files TeklaStructures 227(61,5%) unknownor manual 38 (10,3%) AutodeskRevit 27 (7,3%) XellaBIM 15 AutodeskAutoCAD 12 iTConcrete 9 SDS 8 NemetschekAllPlan 7 GraphiSoftArchiCAD 5 Variousothers 21 IFC instances Average file size 0 – 500,000 0 – 30 MB 500,000– 2,000,000 30 – 100MB > 2,000,000 > 100MB Queryingand reasoningover large scale buildingdatasets: anoutline of a performancebenchmark 9 July1st, 2016
  • 10. AnaROXIN–ana-maria.roxin@u-bourgogne.fr PieterPAUWELS–Pieter.pauwels@ugent.be RBox – Data transformation rules ◼ Need forarepresentativesetof rewriterules ◼ 68manuallybuiltrules ◼ Classified in several rule setsaccordingtotheir content Rule Set (RS) Description RS1 Contains2 rulesforrewritingpropertysetreferencesintoadditionalpropertystatements sbd:hasPropertySet and sbd:hasProperty. Thisis a small, yetoftenusedrulesetthat can beusedin manycontextsto simplifyqueryinganddatapublicationof common simplepropertiesattachedtoIFC entityinstances. RS2 Includes31 rules,allinvolving subtypesofthe IfcRelationship class(e.g. ifcowl:IfcRelAssigns, ifcowl:IfcRelDecomposes, ifcowl:IfcRelAssociates, ifcowl:IfcRelDefines, ifcowl:IfcRelConnects) RS3 Contains3 rulesrelatedto handlinglistsin IFC. RS4 Containsone rulethat allowswrappingsimpledatatypes. RS4 Consistsof20 rulesforinferring singlepropertystatements sbd:hasPropertySet andsbd:hasProperty. RS6 ExtendsRS5 andRS1with 6 additionalrulesforinferringwhetheranobjetis internalorexternalto a building. RS7 Contains7 rulesdealingwith the (de)compositionof buildingspacesandspatialelements. Queryingand reasoningover large scale buildingdatasets: anoutline of a performancebenchmark 10 July1st, 2016
  • 11. AnaROXIN–ana-maria.roxin@u-bourgogne.fr PieterPAUWELS–Pieter.pauwels@ugent.be ifcOWL Example Transformation July1st, 2016 Queryingand reasoningover large scale buildingdatasets: anoutline of a performancebenchmark 11 inst:IfcWindow_1893 inst:IfcWindow_1842 inst:IfcWallStandardCase_696 sbim:hasWindowsbim:hasWindow
  • 12. AnaROXIN–ana-maria.roxin@u-bourgogne.fr PieterPAUWELS–Pieter.pauwels@ugent.be Implementation • Implementedbased ontheopen source APIs of Topbraid SPIN(SPIN API 1.4.0)and Apache Jena (Jena Core 2.11.0,Jena ARQ2.11.0,Jena TDB 1.0.0) • Rulesare writtenwith TopbraidComposer Freeversion,and theyare exportedas RDF Turtlefiles. • A smallJava program isimplementedto read RDF models,schema,rulesfrom the TDB store and query data. • AlltheSPARQLqueriesareconfigured using theJena org.apache.jena.sparql.algebra package • To avoid unnecessaryreasoningprocesses,in thistestenvironment onlythe RDFS vocabulary issupported. SPIN+ JenaTDB • Version‘EYE-Winter16.0302.1557’(‘SWI- Prolog 7.2.3 (amd64):Aug 252015, 12:24:59’). • EYE isa semi-backwardreasonerenhanced with Eulerpath detection. • Asour rulesetcurrently containsonly rules using=>, forward reasoningwilltake place. • Each command is executed5 times • Each command includesthefullontology, the fullsetof rulesand theRDFS vocabulary, as wellasone of the 369buildingmodel filesand one of the3 query files. • Notriplestoreisused:triplesare processed directlyfrom theconsideredfiles. EYE • 4.0.2Stardogsemanticgraphdatabase(Java 8, RDF 1.1graph datamodel, OWL2profiles, SPARQL1.1) • OWL reasoner+rule engine. • Support of SWRLrules,backward-chaining reasoning • Reasoningis performedby applyinga query rewritingapproach (SWRLrulesare taken into account during thequery rewritingprocess). • Stardogallowsattaininga DL-expressivity levelof SROIQ(D). • Inthisapproach, SWRLrulesare taken into account during thequery rewritingprocess. Stardog Queryingand reasoningover large scale buildingdatasets: anoutline of a performancebenchmark 12 July1st, 2016
  • 13. AnaROXIN–ana-maria.roxin@u-bourgogne.fr PieterPAUWELS–Pieter.pauwels@ugent.be Queries ◼ We have builta limited listof60 queries,eachofwhichtriggersatleast oneof theavailablerules. ◼ As we focushereon queryexecutionperformance,the consideredqueriesareentirelybasedon the right-handsides ofthe consideredrules. ◼ 3 queries:  Q1 a simple querywith little results,  Q2 a simple querywith manyresults,  and Q3 a complex querythat triggers a considerable numberof rules Query Query Contents Q1 ?obj sbd:hasProperty ?p Q2 ?point sbd:hasCoordinateX ?x . ?point sbd:hasCoordinateY ?y . ?point sbd:hasCoordinateZ ?z Q3 ?d rdf:type sbd:ExternalWall Queryingand reasoningover large scale buildingdatasets: anoutline of a performancebenchmark 13 July1st, 2016
  • 14. AnaROXIN–ana-maria.roxin@u-bourgogne.fr PieterPAUWELS–Pieter.pauwels@ugent.be Test environment ◼ In onecentralserver  Supplied by the Universityof Burgundy,researchgroup CheckSem,  Following specifications: UbuntuOS, Intel Xeon CPU E5-2430at 2.2GHz,6 coresand 16GB of DDR3 RAMmemory ◼ 3 VirtualMachines(VMs) wereset up in thiscentralserver  SPIN VM (Jena TDB), EYE VM(EYE inferenceengine), Stardog VM(Stardog triplestore) ◼ The VMsweremanagedas separatetestenvironments and  Each of these VMs had 2 coresout of 6 allocated  Each containedthe above resources (ontologies, data, rules, queries). Queryingand reasoningover large scale buildingdatasets: anoutline of a performancebenchmark 14 July1st, 2016
  • 15. AnaROXIN–ana-maria.roxin@u-bourgogne.fr PieterPAUWELS–Pieter.pauwels@ugent.be Results ◼ Queriesappliedon 6hand-picked building models ofvaryingsize ◼ In theSPINapproach  For Q1 and Q2, the execution time = backward-chaininginference process + actual query execution time  For Q3, execution time = queryexecution time itself ◼ In the EYEapproach  Networktraffic time is ignored ◼ In the Stardogapproach  Execution time = backward-chaining inference+ actual queryexecution time Query Building Model SPIN (s) EYE (s) Stardog(s) Q1 (simple,little results) BM1 135,36 37,11 13,44 BM2 1,47 0,29 0,17 BM3 24,01 4,87 1,4 BM4 41,28 12,95 3,55 BM5 4,99 1,05 0,33 BM6 0,55 0,16 0,08 Q2 (simple,many results) BM1 46,17 2,10 6,82 BM2 92,03 4,20 15,83 BM3 82,68 4,12 15,28 BM4 19,93 1,04 2,81 BM5 3,69 0,21 1,36 BM6 0,74 0,045 1,00 Q3 (complex) BM1 0,001 0,001 0,07 BM2 0,006 0,003 0,12 BM3 0,002 0,003 0,31 BM4 0,005 0,001 0,20 BM5 0,006 0,013 0,20 BM6 0,001 0,001 0,13 Queryingand reasoningover large scale buildingdatasets: anoutline of a performancebenchmark 15 July1st, 2016
  • 16. AnaROXIN–ana-maria.roxin@u-bourgogne.fr PieterPAUWELS–Pieter.pauwels@ugent.be Query time related to result count For Q1 for each of the considered approaches (green = SPIN; blue = EYE; black = Stardog) For Q2 for each of the considered approaches (green = SPIN; blue = EYE; black = Stardog) Queryingand reasoningover large scale buildingdatasets: anoutline of a performancebenchmark 16 July1st, 2016
  • 17. AnaROXIN–ana-maria.roxin@u-bourgogne.fr PieterPAUWELS–Pieter.pauwels@ugent.be Additional findings • The three considered procedures arequite farapartfrom each other, explaining the considerable performance differences, not only between the procedures, but alsobetween diverse usages within one and the samesystem. • Algorithms and optimization techniques used for each approach aren't entirely used: differences in indexation algorithms, query rewriting techniques and rule handling strategies used. Indexing algorithms, query rewriting techniques, and rule handling strategies • The disadvantage of forward-chaining reasoning process is that millions of triples can bematerialized (EYE, SPIN forQ1 and Q2) • Using backward-chaining reasoning allows avoiding triple materialization, thus saving query execution time (Stardog, SPIN forQ3). Forward- versus backward-chaining • Query Q3 triggers a rule that in turn triggers several other rules in the rule set. If the firstrule does not fire, however, the process stops early. • Query Q2, however, fires relatively long rules. It takes more time to make these matches in all three approaches. Typeof data in the building model • Loading files in memory at query execution time leads to considerable delays. Impact of the triple store • Linear relation: the more results areavailable, the more triples need to bematched, leading to more assertions. Impact of the number of output results Queryingand reasoningover large scale buildingdatasets: anoutline of a performancebenchmark 17 July1st, 2016
  • 18. AnaROXIN–ana-maria.roxin@u-bourgogne.fr PieterPAUWELS–Pieter.pauwels@ugent.be Conclusion and future work ◼ Comparisonof3 differentapproaches  SPIN, EYE andStardog ◼ 3 queriesappliedover 6 differentbuilding models ◼ Futureworkconsistsin  Specifying morethis initial performancebenchmarkwith additional data and rules  Executing additional queries on the rest of the set of building models  Comparingresults ona wider scale: ―forthe individual approaches separately, ―as well as with other approaches not considered here. Queryingand reasoningover large scale buildingdatasets: anoutline of a performancebenchmark 18 July1st, 2016
  • 19. AnaROXIN–ana-maria.roxin@u-bourgogne.fr PieterPAUWELS–Pieter.pauwels@ugent.be Thank you for your attention. Pieter Pauwels, Tarcisio Mendes deFarias, ChiZhang, Ana Roxin, JakobBeetz, Jos De Roo, Christophe Nicolle International Workshop on Semantic Big Data (SBD 2016) in conjunction with the 2016 ACM SIGMOD Conference in San Francisco, USA

Editor's Notes

  1. Depending on the rules that are being triggered, one set of information then has the potential to be made available in a diverse number of forms, bringing an entirely new form of interoperability for an industry that has always relied heavily on the combination of an agreed standard with many intransparent import and export procedures that were implemented using procedural programming languages.