ACM SIGMOD SBD2016 - Querying and reasoning over large scale building datasets: an outline of a performance benchmark

AnaROXIN–ana-maria.roxin@u-bourgogne.fr
PieterPAUWELS–Pieter.pauwels@ugent.be
Querying and reasoning over
large scale building datasets: an outline of
a performance benchmark
Pieter Pauwels, Tarcisio Mendes deFarias, ChiZhang, Ana Roxin, Jakob Beetz, Jos De Roo,
Christophe Nicolle
International Workshop on Semantic Big Data (SBD 2016)
in conjunction with the 2016 ACM SIGMOD Conference in San Francisco, USA

Agenda
Introduction
• Contextdescription
• Problemidentified
Testing environment
• ifcOWLandbuildingmodels
• Rulesandqueries
• Triplestores
Results
• Query performance
• Additionalfindings
Queryingand reasoningover large scale buildingdatasets: anoutline of a performancebenchmark
2
July1st, 2016

Context description
◼ The architecturaldesign andconstructiondomainsworkonadailybasis withmassiveamountsof
data.
◼ In the contextofBIM, aneutral,interoperablerepresentationofinformationconsistsin the Industry
FoundationClasses(IFC) standard
 Difficult to handlethe EXPRESS format
◼ SemanticWebtechnologies havebeen identifiedas apossible solution
 Semantic data enrichment
 Schema and data transformations
◼ A semanticapproachinvolves 3 maincomponents:
Schema (Tbox)
• OWL ontology
• Informationstructure
Instances (ABox)
• Assertions
• Respectsschema definition
Rules(RBox)
• If-Thenstatements
• Involving elementsfrom the
ABoxandtheTBox
3
July1st, 2016

Problem identified
◼ Differentimplementationsexist forthecomponents(TBox,ABox,RBox)ofsuch Semanticapproach
 Diverse reasoning engines
 Diverse queryprocessing techniques
 Diverse queryhandling
 Diverse dataset size
 Diverse dataset complexity
◼ Missing anappropriaterule andqueryexecutionperformancebenchmark
Expressiveness vs.
performance
4
July1st, 2016

Performance benchmark variables
◼ Main components
◼ Theseelements areimplemented into3differentsystems
 SPIN(SPARQL InferenceNotation) and Jena
 EYE
 Stardog
◼ An ensemble ofqueries isaddressedtotheso-createdsystems
Schema
(TBox)
• ifcOWL
Instances(ABox)
• 369ifcOWL-
compliantbuilding
models
Rules
(RBox)
• 68 data
transformationrules
5
July1st, 2016

TBox - the ifcOWL ontology
◼ All building modelsareencoded usingtheifcOWLontology
 Built up underthe impulse of numerousinitiatives during the last 10years
◼ The ontologyused isthe onethatis madepublicly availablebythe buildingSMARTLinked Data
Working Group(LDWG)
 http://ifcowl.openbimstandards.org/IFC4#
 http://ifcowl.openbimstandards.org/IFC4_ADD1#
 http://ifcowl.openbimstandards.org/IFC2X3_TC1#
 http://ifcowl.openbimstandards.org/IFC2X3_Final#
6
July1st, 2016

ifcOWL Stats
July1st, 2016 Queryingand reasoningover large scale buildingdatasets: anoutline of a performancebenchmark
7
Axioms 21306
Logical Axioms 13649
Classes 1230
Object properties 1578
Data properties 5
Individuals 1627
DL expressivity SROIQ(D)
SubClassOf axioms 4622
EquivalentClasses axioms 266
DisjointClasses axioms 2429
SubObjectPropertyOf axioms 1
InverseObjectProperties axioms 94
FunctionalObjectProperty axioms 1441
TransitiveObjectProperty axioms 1
ObjectPropertyDomain axioms 1577
ObjectPropertyRange axioms 1576
FunctionalDataProperty axioms 5
DataPropertyDomain axioms 5
DataPropertyRange axioms 5
Pieter Pauwels and Walter Terkaj, EXPRESS to OWL for
construction industry: towards a recommendable and usable
ifcOWL ontology. Automation in Construction 63: 100-133 (2016).

Call for papers – special issue in SWJ
◼ SemanticWebJournal– Interoperability,Usability,Applicability
 http://www.semantic-web-journal.net
◼ Specialissue on"SemanticTechnologies andInteroperability intheBuilt Environment"
◼ Importantdates
 March, 1st 2017– paper submission deadline
 May 1st 2017– notification of acceptance
OntologiesforAEC/FM
LinkingBIM modelsto
externaldatasources
Multiplescaleintegration
throughsemanitc
interoperability
Multilingualdataaccess
andannotation
Queryprocessing,query
performance
Semantic-basedbuilding
monitoringsystems
Reasoningwithbuilding
data
Buildingdatapublication
strategies
BigLinkedDatafor
buildinginformation
8
July1st, 2016

ABox – Building sets
◼ SomeBIMmodelsarepubliclyavailable(364), whereasotherareundisclosed (5)
Buildinginformationmodelscreatedwith
differentBIM modellingenvironments
Exported to IFC2x3
Transformed intoifcOWL-compliantRDF
graphs usinga publiclyavailableconverter
BIM environment Number of files
TeklaStructures 227(61,5%)
unknownor manual 38 (10,3%)
AutodeskRevit 27 (7,3%)
XellaBIM 15
AutodeskAutoCAD 12
iTConcrete 9
SDS 8
NemetschekAllPlan 7
GraphiSoftArchiCAD 5
Variousothers 21
IFC instances Average file size
0 – 500,000 0 – 30 MB
500,000– 2,000,000 30 – 100MB
> 2,000,000 > 100MB
9
July1st, 2016

RBox – Data transformation rules
◼ Need forarepresentativesetof rewriterules
◼ 68manuallybuiltrules
◼ Classified in several rule setsaccordingtotheir content
Rule Set
(RS)
Description
RS1
Contains2 rulesforrewritingpropertysetreferencesintoadditionalpropertystatements sbd:hasPropertySet and
sbd:hasProperty. Thisis a small, yetoftenusedrulesetthat can beusedin manycontextsto simplifyqueryinganddatapublicationof common
simplepropertiesattachedtoIFC entityinstances.
RS2
Includes31 rules,allinvolving subtypesofthe IfcRelationship class(e.g. ifcowl:IfcRelAssigns,
ifcowl:IfcRelDecomposes, ifcowl:IfcRelAssociates, ifcowl:IfcRelDefines,
ifcowl:IfcRelConnects)
RS3 Contains3 rulesrelatedto handlinglistsin IFC.
RS4 Containsone rulethat allowswrappingsimpledatatypes.
RS4 Consistsof20 rulesforinferring singlepropertystatements sbd:hasPropertySet andsbd:hasProperty.
RS6 ExtendsRS5 andRS1with 6 additionalrulesforinferringwhetheranobjetis internalorexternalto a building.
RS7 Contains7 rulesdealingwith the (de)compositionof buildingspacesandspatialelements.
10
July1st, 2016

ifcOWL Example Transformation
July1st, 2016 Queryingand reasoningover large scale buildingdatasets: anoutline of a performancebenchmark
11
inst:IfcWindow_1893 inst:IfcWindow_1842
inst:IfcWallStandardCase_696
sbim:hasWindowsbim:hasWindow

Implementation
• Implementedbased ontheopen source APIs
of Topbraid SPIN(SPIN API 1.4.0)and Apache
Jena (Jena Core 2.11.0,Jena ARQ2.11.0,Jena
TDB 1.0.0)
• Rulesare writtenwith TopbraidComposer
Freeversion,and theyare exportedas RDF
Turtlefiles.
• A smallJava program isimplementedto read
RDF models,schema,rulesfrom the TDB store
and query data.
• AlltheSPARQLqueriesareconfigured using
theJena org.apache.jena.sparql.algebra
package
• To avoid unnecessaryreasoningprocesses,in
thistestenvironment onlythe RDFS
vocabulary issupported.
SPIN+ JenaTDB
• Version‘EYE-Winter16.0302.1557’(‘SWI-
Prolog 7.2.3 (amd64):Aug 252015,
12:24:59’).
• EYE isa semi-backwardreasonerenhanced
with Eulerpath detection.
• Asour rulesetcurrently containsonly rules
using=>, forward reasoningwilltake place.
• Each command is executed5 times
• Each command includesthefullontology, the
fullsetof rulesand theRDFS vocabulary, as
wellasone of the 369buildingmodel filesand
one of the3 query files.
• Notriplestoreisused:triplesare processed
directlyfrom theconsideredfiles.
EYE
• 4.0.2Stardogsemanticgraphdatabase(Java 8,
RDF 1.1graph datamodel, OWL2profiles,
SPARQL1.1)
• OWL reasoner+rule engine.
• Support of SWRLrules,backward-chaining
reasoning
• Reasoningis performedby applyinga query
rewritingapproach (SWRLrulesare taken into
account during thequery rewritingprocess).
• Stardogallowsattaininga DL-expressivity
levelof SROIQ(D).
• Inthisapproach, SWRLrulesare taken into
account during thequery rewritingprocess.
Stardog
12
July1st, 2016

Queries
◼ We have builta limited listof60 queries,eachofwhichtriggersatleast oneof theavailablerules.
◼ As we focushereon queryexecutionperformance,the consideredqueriesareentirelybasedon the
right-handsides ofthe consideredrules.
◼ 3 queries:
 Q1 a simple querywith little results,
 Q2 a simple querywith manyresults,
 and Q3 a complex querythat triggers a considerable numberof rules
Query Query Contents
Q1 ?obj sbd:hasProperty ?p
Q2
?point sbd:hasCoordinateX ?x .
?point sbd:hasCoordinateY ?y .
?point sbd:hasCoordinateZ ?z
Q3 ?d rdf:type sbd:ExternalWall
13
July1st, 2016

Test environment
◼ In onecentralserver
 Supplied by the Universityof Burgundy,researchgroup CheckSem,
 Following specifications: UbuntuOS, Intel Xeon CPU E5-2430at 2.2GHz,6 coresand 16GB of DDR3
RAMmemory
◼ 3 VirtualMachines(VMs) wereset up in thiscentralserver
 SPIN VM (Jena TDB), EYE VM(EYE inferenceengine), Stardog VM(Stardog triplestore)
◼ The VMsweremanagedas separatetestenvironments and
 Each of these VMs had 2 coresout of 6 allocated
 Each containedthe above resources (ontologies, data, rules, queries).
14
July1st, 2016

Results
◼ Queriesappliedon 6hand-picked
building models ofvaryingsize
◼ In theSPINapproach
 For Q1 and Q2, the execution time =
backward-chaininginference process +
actual query execution time
 For Q3, execution time = queryexecution
time itself
◼ In the EYEapproach
 Networktraffic time is ignored
◼ In the Stardogapproach
 Execution time = backward-chaining
inference+ actual queryexecution time
Query
Building
Model
SPIN
(s)
EYE
(s)
Stardog(s)
Q1
(simple,little
results)
BM1 135,36 37,11 13,44
BM2 1,47 0,29 0,17
BM3 24,01 4,87 1,4
BM4 41,28 12,95 3,55
BM5 4,99 1,05 0,33
BM6 0,55 0,16 0,08
Q2
(simple,many
results)
BM1 46,17 2,10 6,82
BM2 92,03 4,20 15,83
BM3 82,68 4,12 15,28
BM4 19,93 1,04 2,81
BM5 3,69 0,21 1,36
BM6 0,74 0,045 1,00
Q3
(complex)
BM1 0,001 0,001 0,07
BM2 0,006 0,003 0,12
BM3 0,002 0,003 0,31
BM4 0,005 0,001 0,20
BM5 0,006 0,013 0,20
BM6 0,001 0,001 0,13
15
July1st, 2016

Query time related to result count
For Q1 for each of the considered
approaches
(green = SPIN; blue = EYE; black = Stardog)
For Q2 for each of the considered
approaches
(green = SPIN; blue = EYE; black = Stardog)
16
July1st, 2016

Additional findings
• The three considered procedures arequite farapartfrom each other, explaining the considerable performance differences, not only between the
procedures, but alsobetween diverse usages within one and the samesystem.
• Algorithms and optimization techniques used for each approach aren't entirely used: differences in indexation algorithms, query rewriting techniques
and rule handling strategies used.
Indexing algorithms, query rewriting techniques, and rule handling strategies
• The disadvantage of forward-chaining reasoning process is that millions of triples can bematerialized (EYE, SPIN forQ1 and Q2)
• Using backward-chaining reasoning allows avoiding triple materialization, thus saving query execution time (Stardog, SPIN forQ3).
Forward- versus backward-chaining
• Query Q3 triggers a rule that in turn triggers several other rules in the rule set. If the firstrule does not fire, however, the process stops early.
• Query Q2, however, fires relatively long rules. It takes more time to make these matches in all three approaches.
Typeof data in the building model
• Loading files in memory at query execution time leads to considerable delays.
Impact of the triple store
• Linear relation: the more results areavailable, the more triples need to bematched, leading to more assertions.
Impact of the number of output results
17
July1st, 2016

Conclusion and future work
◼ Comparisonof3 differentapproaches
 SPIN, EYE andStardog
◼ 3 queriesappliedover 6 differentbuilding models
◼ Futureworkconsistsin
 Specifying morethis initial performancebenchmarkwith additional data and rules
 Executing additional queries on the rest of the set of building models
 Comparingresults ona wider scale:
―forthe individual approaches separately,
―as well as with other approaches not considered here.
18
July1st, 2016

Thank you for your attention.
Pieter Pauwels, Tarcisio Mendes deFarias, ChiZhang,
Ana Roxin, JakobBeetz, Jos De Roo, Christophe Nicolle
International Workshop on Semantic Big Data (SBD 2016)
in conjunction with the 2016 ACM SIGMOD Conference in San Francisco, USA

ACM SIGMOD SBD2016 - Querying and reasoning over large scale building datasets: an outline of a performance benchmark

Related slideshows

More Related Content

ACM SIGMOD SBD2016 - Querying and reasoning over large scale building datasets: an outline of a performance benchmark

Editor's Notes