ACM SIGMOD SBD2016 - Querying and reasoning over large scale building datasets: an outline of a performance benchmark
- 3. AnaROXIN–ana-maria.roxin@u-bourgogne.fr
PieterPAUWELS–Pieter.pauwels@ugent.be
Context description
◼ The architecturaldesign andconstructiondomainsworkonadailybasis withmassiveamountsof
data.
◼ In the contextofBIM, aneutral,interoperablerepresentationofinformationconsistsin the Industry
FoundationClasses(IFC) standard
Difficult to handlethe EXPRESS format
◼ SemanticWebtechnologies havebeen identifiedas apossible solution
Semantic data enrichment
Schema and data transformations
◼ A semanticapproachinvolves 3 maincomponents:
Schema (Tbox)
• OWL ontology
• Informationstructure
Instances (ABox)
• Assertions
• Respectsschema definition
Rules(RBox)
• If-Thenstatements
• Involving elementsfrom the
ABoxandtheTBox
Queryingand reasoningover large scale buildingdatasets: anoutline of a performancebenchmark
3
July1st, 2016
- 6. AnaROXIN–ana-maria.roxin@u-bourgogne.fr
PieterPAUWELS–Pieter.pauwels@ugent.be
TBox - the ifcOWL ontology
◼ All building modelsareencoded usingtheifcOWLontology
Built up underthe impulse of numerousinitiatives during the last 10years
◼ The ontologyused isthe onethatis madepublicly availablebythe buildingSMARTLinked Data
Working Group(LDWG)
http://ifcowl.openbimstandards.org/IFC4#
http://ifcowl.openbimstandards.org/IFC4_ADD1#
http://ifcowl.openbimstandards.org/IFC2X3_TC1#
http://ifcowl.openbimstandards.org/IFC2X3_Final#
Queryingand reasoningover large scale buildingdatasets: anoutline of a performancebenchmark
6
July1st, 2016
- 7. AnaROXIN–ana-maria.roxin@u-bourgogne.fr
PieterPAUWELS–Pieter.pauwels@ugent.be
ifcOWL Stats
July1st, 2016 Queryingand reasoningover large scale buildingdatasets: anoutline of a performancebenchmark
7
Axioms 21306
Logical Axioms 13649
Classes 1230
Object properties 1578
Data properties 5
Individuals 1627
DL expressivity SROIQ(D)
SubClassOf axioms 4622
EquivalentClasses axioms 266
DisjointClasses axioms 2429
SubObjectPropertyOf axioms 1
InverseObjectProperties axioms 94
FunctionalObjectProperty axioms 1441
TransitiveObjectProperty axioms 1
ObjectPropertyDomain axioms 1577
ObjectPropertyRange axioms 1576
FunctionalDataProperty axioms 5
DataPropertyDomain axioms 5
DataPropertyRange axioms 5
Pieter Pauwels and Walter Terkaj, EXPRESS to OWL for
construction industry: towards a recommendable and usable
ifcOWL ontology. Automation in Construction 63: 100-133 (2016).
- 8. AnaROXIN–ana-maria.roxin@u-bourgogne.fr
PieterPAUWELS–Pieter.pauwels@ugent.be
Call for papers – special issue in SWJ
◼ SemanticWebJournal– Interoperability,Usability,Applicability
http://www.semantic-web-journal.net
◼ Specialissue on"SemanticTechnologies andInteroperability intheBuilt Environment"
◼ Importantdates
March, 1st 2017– paper submission deadline
May 1st 2017– notification of acceptance
OntologiesforAEC/FM
LinkingBIM modelsto
externaldatasources
Multiplescaleintegration
throughsemanitc
interoperability
Multilingualdataaccess
andannotation
Queryprocessing,query
performance
Semantic-basedbuilding
monitoringsystems
Reasoningwithbuilding
data
Buildingdatapublication
strategies
BigLinkedDatafor
buildinginformation
Queryingand reasoningover large scale buildingdatasets: anoutline of a performancebenchmark
8
July1st, 2016
- 9. AnaROXIN–ana-maria.roxin@u-bourgogne.fr
PieterPAUWELS–Pieter.pauwels@ugent.be
ABox – Building sets
◼ SomeBIMmodelsarepubliclyavailable(364), whereasotherareundisclosed (5)
Buildinginformationmodelscreatedwith
differentBIM modellingenvironments
Exported to IFC2x3
Transformed intoifcOWL-compliantRDF
graphs usinga publiclyavailableconverter
BIM environment Number of files
TeklaStructures 227(61,5%)
unknownor manual 38 (10,3%)
AutodeskRevit 27 (7,3%)
XellaBIM 15
AutodeskAutoCAD 12
iTConcrete 9
SDS 8
NemetschekAllPlan 7
GraphiSoftArchiCAD 5
Variousothers 21
IFC instances Average file size
0 – 500,000 0 – 30 MB
500,000– 2,000,000 30 – 100MB
> 2,000,000 > 100MB
Queryingand reasoningover large scale buildingdatasets: anoutline of a performancebenchmark
9
July1st, 2016
- 10. AnaROXIN–ana-maria.roxin@u-bourgogne.fr
PieterPAUWELS–Pieter.pauwels@ugent.be
RBox – Data transformation rules
◼ Need forarepresentativesetof rewriterules
◼ 68manuallybuiltrules
◼ Classified in several rule setsaccordingtotheir content
Rule Set
(RS)
Description
RS1
Contains2 rulesforrewritingpropertysetreferencesintoadditionalpropertystatements sbd:hasPropertySet and
sbd:hasProperty. Thisis a small, yetoftenusedrulesetthat can beusedin manycontextsto simplifyqueryinganddatapublicationof common
simplepropertiesattachedtoIFC entityinstances.
RS2
Includes31 rules,allinvolving subtypesofthe IfcRelationship class(e.g. ifcowl:IfcRelAssigns,
ifcowl:IfcRelDecomposes, ifcowl:IfcRelAssociates, ifcowl:IfcRelDefines,
ifcowl:IfcRelConnects)
RS3 Contains3 rulesrelatedto handlinglistsin IFC.
RS4 Containsone rulethat allowswrappingsimpledatatypes.
RS4 Consistsof20 rulesforinferring singlepropertystatements sbd:hasPropertySet andsbd:hasProperty.
RS6 ExtendsRS5 andRS1with 6 additionalrulesforinferringwhetheranobjetis internalorexternalto a building.
RS7 Contains7 rulesdealingwith the (de)compositionof buildingspacesandspatialelements.
Queryingand reasoningover large scale buildingdatasets: anoutline of a performancebenchmark
10
July1st, 2016
- 12. AnaROXIN–ana-maria.roxin@u-bourgogne.fr
PieterPAUWELS–Pieter.pauwels@ugent.be
Implementation
• Implementedbased ontheopen source APIs
of Topbraid SPIN(SPIN API 1.4.0)and Apache
Jena (Jena Core 2.11.0,Jena ARQ2.11.0,Jena
TDB 1.0.0)
• Rulesare writtenwith TopbraidComposer
Freeversion,and theyare exportedas RDF
Turtlefiles.
• A smallJava program isimplementedto read
RDF models,schema,rulesfrom the TDB store
and query data.
• AlltheSPARQLqueriesareconfigured using
theJena org.apache.jena.sparql.algebra
package
• To avoid unnecessaryreasoningprocesses,in
thistestenvironment onlythe RDFS
vocabulary issupported.
SPIN+ JenaTDB
• Version‘EYE-Winter16.0302.1557’(‘SWI-
Prolog 7.2.3 (amd64):Aug 252015,
12:24:59’).
• EYE isa semi-backwardreasonerenhanced
with Eulerpath detection.
• Asour rulesetcurrently containsonly rules
using=>, forward reasoningwilltake place.
• Each command is executed5 times
• Each command includesthefullontology, the
fullsetof rulesand theRDFS vocabulary, as
wellasone of the 369buildingmodel filesand
one of the3 query files.
• Notriplestoreisused:triplesare processed
directlyfrom theconsideredfiles.
EYE
• 4.0.2Stardogsemanticgraphdatabase(Java 8,
RDF 1.1graph datamodel, OWL2profiles,
SPARQL1.1)
• OWL reasoner+rule engine.
• Support of SWRLrules,backward-chaining
reasoning
• Reasoningis performedby applyinga query
rewritingapproach (SWRLrulesare taken into
account during thequery rewritingprocess).
• Stardogallowsattaininga DL-expressivity
levelof SROIQ(D).
• Inthisapproach, SWRLrulesare taken into
account during thequery rewritingprocess.
Stardog
Queryingand reasoningover large scale buildingdatasets: anoutline of a performancebenchmark
12
July1st, 2016
- 13. AnaROXIN–ana-maria.roxin@u-bourgogne.fr
PieterPAUWELS–Pieter.pauwels@ugent.be
Queries
◼ We have builta limited listof60 queries,eachofwhichtriggersatleast oneof theavailablerules.
◼ As we focushereon queryexecutionperformance,the consideredqueriesareentirelybasedon the
right-handsides ofthe consideredrules.
◼ 3 queries:
Q1 a simple querywith little results,
Q2 a simple querywith manyresults,
and Q3 a complex querythat triggers a considerable numberof rules
Query Query Contents
Q1 ?obj sbd:hasProperty ?p
Q2
?point sbd:hasCoordinateX ?x .
?point sbd:hasCoordinateY ?y .
?point sbd:hasCoordinateZ ?z
Q3 ?d rdf:type sbd:ExternalWall
Queryingand reasoningover large scale buildingdatasets: anoutline of a performancebenchmark
13
July1st, 2016
- 14. AnaROXIN–ana-maria.roxin@u-bourgogne.fr
PieterPAUWELS–Pieter.pauwels@ugent.be
Test environment
◼ In onecentralserver
Supplied by the Universityof Burgundy,researchgroup CheckSem,
Following specifications: UbuntuOS, Intel Xeon CPU E5-2430at 2.2GHz,6 coresand 16GB of DDR3
RAMmemory
◼ 3 VirtualMachines(VMs) wereset up in thiscentralserver
SPIN VM (Jena TDB), EYE VM(EYE inferenceengine), Stardog VM(Stardog triplestore)
◼ The VMsweremanagedas separatetestenvironments and
Each of these VMs had 2 coresout of 6 allocated
Each containedthe above resources (ontologies, data, rules, queries).
Queryingand reasoningover large scale buildingdatasets: anoutline of a performancebenchmark
14
July1st, 2016
- 15. AnaROXIN–ana-maria.roxin@u-bourgogne.fr
PieterPAUWELS–Pieter.pauwels@ugent.be
Results
◼ Queriesappliedon 6hand-picked
building models ofvaryingsize
◼ In theSPINapproach
For Q1 and Q2, the execution time =
backward-chaininginference process +
actual query execution time
For Q3, execution time = queryexecution
time itself
◼ In the EYEapproach
Networktraffic time is ignored
◼ In the Stardogapproach
Execution time = backward-chaining
inference+ actual queryexecution time
Query
Building
Model
SPIN
(s)
EYE
(s)
Stardog(s)
Q1
(simple,little
results)
BM1 135,36 37,11 13,44
BM2 1,47 0,29 0,17
BM3 24,01 4,87 1,4
BM4 41,28 12,95 3,55
BM5 4,99 1,05 0,33
BM6 0,55 0,16 0,08
Q2
(simple,many
results)
BM1 46,17 2,10 6,82
BM2 92,03 4,20 15,83
BM3 82,68 4,12 15,28
BM4 19,93 1,04 2,81
BM5 3,69 0,21 1,36
BM6 0,74 0,045 1,00
Q3
(complex)
BM1 0,001 0,001 0,07
BM2 0,006 0,003 0,12
BM3 0,002 0,003 0,31
BM4 0,005 0,001 0,20
BM5 0,006 0,013 0,20
BM6 0,001 0,001 0,13
Queryingand reasoningover large scale buildingdatasets: anoutline of a performancebenchmark
15
July1st, 2016
- 17. AnaROXIN–ana-maria.roxin@u-bourgogne.fr
PieterPAUWELS–Pieter.pauwels@ugent.be
Additional findings
• The three considered procedures arequite farapartfrom each other, explaining the considerable performance differences, not only between the
procedures, but alsobetween diverse usages within one and the samesystem.
• Algorithms and optimization techniques used for each approach aren't entirely used: differences in indexation algorithms, query rewriting techniques
and rule handling strategies used.
Indexing algorithms, query rewriting techniques, and rule handling strategies
• The disadvantage of forward-chaining reasoning process is that millions of triples can bematerialized (EYE, SPIN forQ1 and Q2)
• Using backward-chaining reasoning allows avoiding triple materialization, thus saving query execution time (Stardog, SPIN forQ3).
Forward- versus backward-chaining
• Query Q3 triggers a rule that in turn triggers several other rules in the rule set. If the firstrule does not fire, however, the process stops early.
• Query Q2, however, fires relatively long rules. It takes more time to make these matches in all three approaches.
Typeof data in the building model
• Loading files in memory at query execution time leads to considerable delays.
Impact of the triple store
• Linear relation: the more results areavailable, the more triples need to bematched, leading to more assertions.
Impact of the number of output results
Queryingand reasoningover large scale buildingdatasets: anoutline of a performancebenchmark
17
July1st, 2016
- 18. AnaROXIN–ana-maria.roxin@u-bourgogne.fr
PieterPAUWELS–Pieter.pauwels@ugent.be
Conclusion and future work
◼ Comparisonof3 differentapproaches
SPIN, EYE andStardog
◼ 3 queriesappliedover 6 differentbuilding models
◼ Futureworkconsistsin
Specifying morethis initial performancebenchmarkwith additional data and rules
Executing additional queries on the rest of the set of building models
Comparingresults ona wider scale:
―forthe individual approaches separately,
―as well as with other approaches not considered here.
Queryingand reasoningover large scale buildingdatasets: anoutline of a performancebenchmark
18
July1st, 2016
Editor's Notes
- Depending on the rules that are being triggered, one set of information then has the potential to be made available in a diverse number of forms,
bringing an entirely new form of interoperability for an industry
that has always relied heavily on the combination of an agreed standard with many intransparent import and export procedures that were implemented using procedural programming languages.