SlideShare a Scribd company logo
©w Cwowp.ysrtiig-ihntn 2s0b1r2u c k S.aTtI INNSBRUCK www.sti-innsbruck.at 
Steps towards a Data Value Chain 
José M. García, University of Innsbruck 
Tirrenia(Pisa), Italy, June 2013
www.sti-innsbruck.at 
Contents 
1. 
Big Data 
2. 
Public Open Data 
3. 
Linked (Open) Data 
4. 
Data Economy 
2
©w Cwowp.ysrtiig-ihntn 2s0b1r2u c k S.aTtI INNSBRUCK www.sti-innsbruck.at 
BIG Data
www.sti-innsbruck.at 
What is Big Data? 
• Every day, we create 2.5 quintillion* bytes of data — so much that 90% 
of the data in the world today has been created in the last two years alone. 
• These data come from everywhere: sensors used to gather climate 
information, posts to social media sites, digital pictures and videos, 
purchase transaction records, and cell phone GPS signals to name a few. 
• These data are big data.** 
30 
* 10 
** http://www-01.ibm.com/software/data/bigdata 
4
www.sti-innsbruck.at 
What is Big Data? 
• 
“Big data”is a loosely-defined term 
• 
used to describe data sets so large and complex that they become awkward to work with using on-hand database management tools. 
– 
White, Tom. Hadoop: The Definitive Guide. 2009. 1st Edition. O'Reilly Media. Pg 3. 
– 
MIKE2.0, Big Data Definition http://mike2.openmethodology.org/wiki/Big_Data_Definition 
Infromation Explosion in data 
and real world events (IBM) 
5
www.sti-innsbruck.at 
Big Data Application Areas 
Picture taken from http://www-01.ibm.com/software/data/bigdata/industry.html 
6
www.sti-innsbruck.at 
Use case : Climate Research 
• 
Eiscat and Eiscat 3D are multimillion reserch projects doing environmental research as well as evaluation of the built infrastructures. 
– 
Observation of climate: sun, troposphere, etc. 
– 
Simulations, e.g. Creation of artificial Nothern light 
– 
Run by European Incoherent Scatter Association 
• 
1,5 Petabytes of data are generated daily (1,5 Million Gigabytes). 
– 
Processing of this data would require1K petaFLOPSperformance 
– 
Or1 billionEuro electricitycostsp.a. 
7
www.sti-innsbruck.at 
Large Scale Reasoning 
• 
Performing deductive inference with a given set of axioms at the Web scale is practically impossible 
– 
Too manyRDFtriples to process 
– 
Too much processing power is needed 
– 
Too much time is needed 
• 
LarKCaimed at contributing to an ‘infinitely scalable’ Semantic Web reasoning platform by 
– 
Giving up on 100% correctness and completeness (trading quality for size) 
– 
Include heuristic search and logic reasoning into a new process 
– 
Massive parallelization (cluster computing) 
8
www.sti-innsbruck.at 
Large Scale Reasoning 
9
www.sti-innsbruck.at 
Volumes of Data Exceed the Availale Storage Volume Globally 
There is a need 
to throw the data 
away due to 
the limited storage 
space. 
10
www.sti-innsbruck.at 
Data Stream Processing for Big Data 
• 
Before throwing the data away some processing can be done at run- time 
– 
Processing streams of data as they happen 
• 
Embracing the streaming model 
– 
Data is seen as a constant flow (sequence) of transient elements 
– 
Fits naturally with many application domains (sensors, social media, etc.) 
• 
“Big data” is bringing an inherent set of complexities 
– 
Data structures exceeding the available memory 
– 
Approximate/incomplete results are taken as granted 
• 
Always look at the latest part of a dataset 
11
www.sti-innsbruck.at 
Data Stream Processing for Big Data 
• 
Logical reasoning in real time on multiple, heterogeneous, gigantic and inevitably noisy data streams in order to support the decision process… 
--S. Ceri,E. Della Valle,F. van HarmelenandH. Stuckenschmidt, 2010 
window 
Extremely large 
input streams 
streams of answer 
Registered Continuous Query 
Picture taken from EmanueleDella Valle “Challenges, Approaches, and Solutions in Stream Reasoning”, Semantic Days 2012 
Query engine 
takes stream subsets for query answering 
12
www.sti-innsbruck.at 
Conclusions 
• 
Big Data describes datasets so large and complex that they become awkward to work with using on-hand database management tools. 
• 
Big data application domains are diverse 
– 
Embracing a big data processing strategy can have a significant impact 
• 
Tacking the issues of big data processing requires to loose the requirements on completeness and precision 
• 
Big data on Web scale suffers from an inherent heterogeneity and different levels of expressiveness 
• 
Complexity is more than just size! 
13
©w Cwowp.ysrtiig-ihntn 2s0b1r2u c k S.aTtI INNSBRUCK www.sti-innsbruck.at 
Public Open Data 
Open 
Data
www.sti-innsbruck.at 
Public Open Data: What is Open Data? 
Definitions: 
• 
Open data is non-personally identifiable data produced in the course of an organisation’sordinary business, which has been released under an unrestricted licence(like the Open Government Licence). 
•Open public data is underpinned by the philosophy that data generated or collected by organisationsin the public sector should belong to the taxpayers, wherever financially feasible and where releasing it won’t violate any laws or rights to privacy (either for citizens or government staff). 
[linkedgovproject] 
http://linkedgov.org 
15
www.sti-innsbruck.at 
Public Open Data: What is Open Data? 
Definitions: 
The idea behind open data is that information held by government should be freely available to use and re-mix by the public. It’s a movement to make non-personal data: 
•open so that it can be turned into useful applications 
•support transparency and accountability 
•make sharing data between public sector partners more efficient. 
The Government is committed to making much more public data openly available. On 22 March 2010, the Prime Minister announced that the Government was going to: 
“...use digital technology to open up data with the aim of providing every citizen in Britain with true ownership and accountability over the services they demand from government.” 
http://www.idea.gov.uk/ 
16
www.sti-innsbruck.at 
Public Open Data: Features of Open Data 
Open Data principles [1]: 
1. 
completeness–all data that can be open (w.r.t. privacy and security) should be open 
2. 
primary source–all open data should be gathered at their source in raw format 
3. 
temporal closeness–all open data should be up-to-date 
4. 
easy access–all open data should be easily accessible 
5. 
machine readability–all open data should be structured for machine processing 
[1] Source [Kaltenböck M., Thurner T., (Hg.): Open Government Data Weißbuch, 2011] 
17
www.sti-innsbruck.at 
Public Open Data: Features of Open Data 
Open Data principles [1]: 
6. 
non-discriminating–all open data should be accessible for everyone 
7. 
open standards–all open data should use open standards 
8. 
liberal licensing–all open data should use a liberal licensing without huge obligations for potential users 
9. 
durability–all open data should be available on a long term basis 
10. 
non-discriminating usage costs–some open data might involve usage costs. These should be kept as low as possible. 
[1] Source [Kaltenböck M., Thurner T., (Hg.): Open Government Data Weißbuch, 2011] 
18
www.sti-innsbruck.at 
Public Open Data: Anditworks(SeeUK) 
19
www.sti-innsbruck.at 
Public Open Data: Anditworks(SeeUK) 
• 
See UK uses data that have been sourced from data.gov.uk and processed into Linked Data . 
• 
All the datasets are enriched and cross-linked to additional sources. 
• 
The visualisationprovides a view centredon a chosen region of the specified size, and most noticeably gives a "pie- chart" that shows the viewer how that region compares with similar regions around it. 20
www.sti-innsbruck.at 
Public Open Data: Anditworks(police.uk) 
21
www.sti-innsbruck.at 
Public Open Data: Anditworks(police.uk) 
• 
Different appssuch as„VehicleCrime & Road AccidentMap“, „Crime Sounds“ and„UK Crimeview“ areprovided 
• 
The usercangeta quick ideaaboutdifferent areasofcitiesandtownsandtheircrimestatistics. 
22
www.sti-innsbruck.at 
Public Open Data 
• 
Openess: Open Data is about changing behaviour 
• 
Heterogenity: Different vocabularies are used 
• 
Interlinkage: Need to link these data sets to prevent data silos 
• 
LinkedOpen Data 
23
©w Cwowp.ysrtiig-ihntn 2s0b1r2u c k S.aTtI INNSBRUCK www.sti-innsbruck.at 
Linked Open Data
www.sti-innsbruck.at 
Motivation: From a Web of Documents to a Web of 
Data 
• 
Web of Documents 
• 
Fundamental elements: 
1.Names(URIs) 
2.Documents(Resources) described by HTML, XML, etc. 
3.Interactionsvia HTTP 
4.(Hyper)Linksbetween documents or anchors in these documents 
•Shortcomings: 
–Untyped links 
–Web search engines fail on complex queries 
“Documents” 
Hyperlinks 
25
www.sti-innsbruck.at 
Motivation: From a Web of Documents to a Web of 
Data 
• 
Web of Documents 
• 
Web of Data 
“Documents” 
“Things” 
Hyperlinks 
Typed Links 
26
www.sti-innsbruck.at 
Motivation: From a Web of Documents to a Web of 
Data 
• 
Characteristics: 
–Links between arbitrary things (e.g., persons, locations, events, buildings) 
–Structure of data on Web pages is made explicit 
–Things described on Web pages are named and get URIs 
–Links between things are made explicit and are typed 
• 
Web of Data 
“Things” 
Typed Links 
27
www.sti-innsbruck.at 
Google Knowledge Graph 
• 
“A huge knowledge graph of interconnected entities and their attributes”. 
AmitSinghal, Senior Vice President at Google 
• 
“A knowledge based used by Google to enhance its search engine’s results with semantic-search information gathered from a wide variety of sources” 
http://en.wikipedia.org/wiki/Knowledge_Graph 
• 
Based on information derived from many sources including Freebase, CIA World Factbook, Wikipedia 
• 
Contains about 3.5 billion facts about 500 million objects 
28
www.sti-innsbruck.at 
Google Knowledge Graph 
http://goo.gl/zp3IH 
29
www.sti-innsbruck.at 
Linked Data –a definition and principles 
• 
Linked Datais about the use of Semantic Web technologies to publish structured data on the Web and set links between data sources. 
Figure from C. Bizer 
30
www.sti-innsbruck.at 
Linked Data –a definition and principles 
1. 
Use URIs as names for things. 
2. 
Use HTTP URIs so that people can look up (dereference) those names. 
3. 
When someone looks up a URI, provide useful information, using the standards (RDF*, SPARQL) 
4. 
Include links to other URIs. so that they can discover more things. 
31
www.sti-innsbruck.at 
5-star Linked OPEN Data 
★Available on the web (whatever format) but with an open licence, to be Open Data 
★★Available as machine- readable structured data (e.g. excel instead of image scan of a table) 
★★★as (2) plus non-proprietary format (e.g. CSV instead of excel) 
★★★★All the above plus, Use open standards from W3C (URIs, RDF and SPARQL) to identify things, so that people can point at your stuff 
★★★★★All the above, plus: Link your data to other people’s data to provide context 
32
www.sti-innsbruck.at 
★Available on the web (+ open licence) 
• 
Easy to publish web data 
• 
Data can be easily accessed and stored locally 
• 
Data can be entered manually into another system 
Source [Bauer F., Kaltenböck, M.: Linked Open Data: The Essentials, 2012] 
33
www.sti-innsbruck.at 
★★Available as structured data 
• 
All benefits from ★ 
• 
Data can be directly processed with proprietary software 
• 
Easy to export it into another structured format 
Source [Bauer F., Kaltenböck, M.: Linked Open Data: The Essentials, 2012] 
34
www.sti-innsbruck.at 
★★★Non-proprietary format is used 
• 
All benefits from ★★ 
• 
No need to pay for a format controlled by a single organization 
Source [Bauer F., Kaltenböck, M.: Linked Open Data: The Essentials, 2012] 
35
www.sti-innsbruck.at 
★★★★Use open standards to identify things 
• 
All benefits from ★★★ 
• 
Link to data from anywhere, either on the web or locally 
• 
It can be bookmarked and parts of the data can be reused 
• 
Access to data items can be optimized (caching, load balancing, etc.) 
• 
BUT the publisher needs to identify separable items, assign URIs to each one, and 
Source [Bauer F., Kaltenböck, M.: Linked Open Data: The Essentials, 2012] 
36
www.sti-innsbruck.at 
★★★★★Data is linked to provide context 
• 
All benefits from ★★★★ 
• 
New data of interest can be discovered while consuming other 
• 
Data schema can be obtained 
• 
Added value to the data 
• 
Linked datasets are discoverable 
• 
BUT resources have to be invested to link datasets Source [Bauer F., Kaltenböck, M.: Linked Open Data: The Essentials, 2012] 
37
www.sti-innsbruck.at 
LOD Cloud May 2007 
Figure from http://linkeddata.org/ 
38
www.sti-innsbruck.at 
LOD Cloud May 2007 
Basics: 
The Linked Open Data cloud is an interconnected set of datasets all of which were published and interlinked following the Linked Data principles. 
Facts: 
•Focal points: 
•DBPedia: RDFized vesion of Wikipiedia; many ingoing and outgoing links 
•Music-related datasets 
•Big datasets include FOAF, US Census data 
•Size approx. 1 billion triples, 250k links 
Figure from http://linkeddata.org/ 
39
www.sti-innsbruck.at 
LOD Cloud March 2009 
Figure from http://linkeddata.org/ 
40
www.sti-innsbruck.at 
LOD Cloud September 2011 
Figure from http://linkeddata.org/ 
41
www.sti-innsbruck.at 
LOD Cloud September 2011 
Facts: 
•295 data sets 
•Over 31 billion triples 
•Over 504 billion RDF links between data sources 
Figure from http://linkeddata.org/ 
42
www.sti-innsbruck.at 
Linked Open Data –silver bullet for data integration 
• 
Linked Open Data can be seen as a global data integration platform 
– 
Heterogeneous data items from different data sets are linked to each other following the Linked Data principles 
– 
Widely deployed vocabularies (e.g. FOAF) provide the predicates to specify links between data items 
• 
Data integration with LOD requires: 
1. 
Access to Linked Data 
• 
HTTP, SPARQL endpoints, RDF dumps 
• 
Crawling and caching 
2. 
Normalize vocabularies –data sets that overlap in content use different vocabularies 
• 
Use schema mapping techniques based on rules (e.g. RIF, SWRL) or query languages (e.g. SPARQL Construct, etc.) 
3. 
Resolve identifies –data sets that overlap in content use different URIs for the same real world entities 
• 
Use manual merging or approaches such as SILK (part of Linked Data Integration Framework) or LIMES 
4. 
Filter data 
• 
Use SIVE ((part of Linked Data Integration Framework) 
See: 
http://www4.wiwiss.fu-berlin.de/bizer/ldif/ 
43
www.sti-innsbruck.at 
Example -Mashup: DBPedia Mobile 
• 
Geospatial entry point into the Web of Data. 
• 
It exploits information coming from DBpedia, Revyu and Flickr data. 
• 
It provides a way to explore maps of cities and gives pointers to more information which can be explored 
44 
Try yourself: http://wiki.dbpedia.org/DBpediaMobile 
Pictures from DBPedia Mobile 
44
www.sti-innsbruck.at 
From Intranet to Enterprise Data Web around a knowledge hub
©w Cwowp.ysrtiig-ihntn 2s0b1r2u c k S.aTtI INNSBRUCK www.sti-innsbruck.at 
Data Economy 
“Your data is worth more if you give it away.” 
Commission Vice President 
NeelieKroes 
“Your data is worth more if you give it away.” 
Commission Vice President 
NeelieKroes
www.sti-innsbruck.at 
What is Data Economy? 
• 
Non tangible assets (i.e. data) play a significant role in the creation of economic value 
• 
Data is nowadays more important than, for example, search or advertisement 
• 
The value of the data, its potential to be used to create new products and services, is more important than the data itself 
47
www.sti-innsbruck.at 
Why a Data Economy? 
• 
Total market for public sector information €28 billion in 2008for the EU27 
• 
Annual growth of 7% leading to around €32billion in 2010 
• 
Estimated €40 billion annual boost for the European economy. 
• 
The total direct and indirect economic gains across the whole EU27 economy would be in the order of €140 billionannually. 
See: 
Review of recent studies on PSI re-use and related market developments, G. Vickery, August 2011. 
48
www.sti-innsbruck.at 
Why a Data Economy? 
• 
New businesses can be built on the back of these data: Data are an essential raw material for a wide range of new information products and services which build on new possibilities to analyseand visualisedata from different sources. Facilitating re-use of these raw data will create jobs and thus stimulate growth. 
• 
More Transparency: Open data is a powerful tool to increase the transparency of public administration, improving the visibility of previously inaccessible information, informing citizens and business about policies, public spending and outcomes. 
• 
Evidence-based policy making and administrative efficiency: The availability of solid EU-wide public data will lead to better evidence- based policy making at all levels of government, resulting in better public services and more efficient public spending. 
See: 
http://europa.eu/rapid/pressReleasesAction.do?reference=MEMO11/891&format=HTML&aged=0&language=EN&guiLanguage=en 
49
www.sti-innsbruck.at 
CombiningOpen Data andServices –Tourist MapAustria 
• 
Use LOD to integrate and lookup data about 
– 
places and routes 
– 
time-tables for public transport 
– 
hiking trails 
– 
ski slopes 
– 
points-of-interest 
50
www.sti-innsbruck.at 
CombiningOpen Data andServices –Tourist MapAustria 
LOD data sets 
• 
Open Streetmap 
• 
Google Places 
• 
Databases of government 
– 
TIRIS 
– 
DVT 
• 
Tourism & Ticketing association 
• 
IVB (busses and trams) 
• 
OEBB (trains) 
• 
Ärztekammer 
• 
Supermarket chains: listing of products 
• 
Hofer and similar: weekly offers 
• 
ASFINAG: Traffic/Congestion data 
• 
Herold(yellow pages) 
• 
City archive 
• 
Museums/Zoo 
• 
News sources like TT (Tyrol's major daily newspaper) 
• 
StatistikAustria 
• 
Innsbruck Airport (travel times, airline schedules) 
• 
ZAMG (Weather) 
• 
University of Innsbruck (Curricula, student statistics, study possibilities) 
• 
IKB (electricity, water consumption) 
• 
Entertainment facilities (Stadtcafe, Cinema...) 
• 
Special offers (Groupon)
www.sti-innsbruck.at 
CombiningOpen Data andServices –Tourist MapAustria 
• 
Data and services from destination sites integrated for recommendation and booking of 
– 
Hotels 
– 
Restaurants 
– 
Cultural and entertainment events 
– 
Sightseeing 
– 
Shops 
52
www.sti-innsbruck.at 
• 
Web scraping integration 
• 
Create wrappers for current web sites and extract data automatically 
• 
Many Web scraping tools available on the market 
CombiningOpen Data andServices –Tourist MapAustria 
53
www.sti-innsbruck.at 
CombiningOpen Data andServices –Tourist MapAustria 
• 
Integration intoa comprehensivemapofmulti-channelcommunication, seekdabookingengine, LinkedOpen Data andon theflyserviceintegrationasyoupayto generate addedvalueforbusinessesaswell ascustomers 
• 
Combinationofmultichannelcommunicationandyieldmanagement 
– 
SemanticCommunication Engine Innsbruck (SCEI) 
– 
seekdabookingsolutions 
• 
enrichedwithLinked(Open) Data 
– 
Machineunderstandableinterlinkeddata 
– 
Bike andhikingtrails, sightinformation, etc. 
• 
and on the fly service integration as you pay 
– 
Solutions for ad-hoc service integration for touristic destination sites 
– 
Bike rental, ski passes, etc. 
– 
Services are quickly integrated through scraping (and later through legal frameworks and backendintegration in case of business volumes)
www.sti-innsbruck.at 
CombiningOpen Data andServices –Tourist MapAustria 
• Based on Open 
Street Map 
55
www.sti-innsbruck.at 
CombiningOpen Data andServices –Tourist MapAustria 
• Based on Open 
Street Map 
• Increase on-line 
visibility for hotels 
and destinations via 
multi-channel 
communication – 
SCEI 
SCEI 
56
www.sti-innsbruck.at 
CombiningOpen Data andServices –Tourist MapAustria 
• Based on Open 
Street Map 
• Increase on-line 
visibility for hotels 
and destinations via 
multi-channel 
communication – 
SCEI 
• Hotels, ski passes, 
etc. are directly 
bookable – seekda 
engine 
SCEI 
57
www.sti-innsbruck.at 
CombiningOpen Data andServices –Tourist MapAustria 
• Based on Open 
Street Map 
• Increase on-line 
visibility for hotels 
and destinations via 
multi-channel 
communication – 
SCEI 
• Hotels, ski passes, 
etc. are directly 
bookable – seekda 
engine 
• LOD to integrate 
and lookup data 
about hiking trails, 
ski slopes, etc. 
LOD 
SCEI 
58
www.sti-innsbruck.at 
CombiningOpen Data andServices –Tourist MapAustria 
• Based on Open 
Street Map 
• Increase on-line 
visibility for hotels 
and destinations via 
multi-channel 
communication – 
SCEI 
• Hotels, ski passes, 
etc. are directly 
bookable – seekda 
engine 
• LOD to integrate 
and lookup data 
about hiking trails, 
ski slopes, etc. 
• On the fly service 
integration as you 
pay 
LOD 
SCEI 
59
www.sti-innsbruck.at 
“There's No Money in Linked (Open) Data” 
http://knoesis.wright.edu/faculty/pascal/pub/nomoneylod.pdf 
• 
It turns out that using LOD datasets in realistic settings is not always easy. 
– 
Surprisingly, in many cases the underlying issues are not technical but legal barriers erected by the LD data publishers. 
– 
Generally, mostly non-technical but socio-economical barriers hamper the reuse of date (do patents and IPR protections hamper or facilitate knowledge reuse?). 
– 
Business intelligence 
– 
Dynamic Data 
– 
On the fly generation of data 
60
©w Cwowp.ysrtiig-ihntn 2s0b1r2u c k S.aTtI INNSBRUCK www.sti-innsbruck.at 
Thanks for your attention 
61

More Related Content

Steps towards a Data Value Chain

  • 1. ©w Cwowp.ysrtiig-ihntn 2s0b1r2u c k S.aTtI INNSBRUCK www.sti-innsbruck.at Steps towards a Data Value Chain José M. García, University of Innsbruck Tirrenia(Pisa), Italy, June 2013
  • 2. www.sti-innsbruck.at Contents 1. Big Data 2. Public Open Data 3. Linked (Open) Data 4. Data Economy 2
  • 3. ©w Cwowp.ysrtiig-ihntn 2s0b1r2u c k S.aTtI INNSBRUCK www.sti-innsbruck.at BIG Data
  • 4. www.sti-innsbruck.at What is Big Data? • Every day, we create 2.5 quintillion* bytes of data — so much that 90% of the data in the world today has been created in the last two years alone. • These data come from everywhere: sensors used to gather climate information, posts to social media sites, digital pictures and videos, purchase transaction records, and cell phone GPS signals to name a few. • These data are big data.** 30 * 10 ** http://www-01.ibm.com/software/data/bigdata 4
  • 5. www.sti-innsbruck.at What is Big Data? • “Big data”is a loosely-defined term • used to describe data sets so large and complex that they become awkward to work with using on-hand database management tools. – White, Tom. Hadoop: The Definitive Guide. 2009. 1st Edition. O'Reilly Media. Pg 3. – MIKE2.0, Big Data Definition http://mike2.openmethodology.org/wiki/Big_Data_Definition Infromation Explosion in data and real world events (IBM) 5
  • 6. www.sti-innsbruck.at Big Data Application Areas Picture taken from http://www-01.ibm.com/software/data/bigdata/industry.html 6
  • 7. www.sti-innsbruck.at Use case : Climate Research • Eiscat and Eiscat 3D are multimillion reserch projects doing environmental research as well as evaluation of the built infrastructures. – Observation of climate: sun, troposphere, etc. – Simulations, e.g. Creation of artificial Nothern light – Run by European Incoherent Scatter Association • 1,5 Petabytes of data are generated daily (1,5 Million Gigabytes). – Processing of this data would require1K petaFLOPSperformance – Or1 billionEuro electricitycostsp.a. 7
  • 8. www.sti-innsbruck.at Large Scale Reasoning • Performing deductive inference with a given set of axioms at the Web scale is practically impossible – Too manyRDFtriples to process – Too much processing power is needed – Too much time is needed • LarKCaimed at contributing to an ‘infinitely scalable’ Semantic Web reasoning platform by – Giving up on 100% correctness and completeness (trading quality for size) – Include heuristic search and logic reasoning into a new process – Massive parallelization (cluster computing) 8
  • 10. www.sti-innsbruck.at Volumes of Data Exceed the Availale Storage Volume Globally There is a need to throw the data away due to the limited storage space. 10
  • 11. www.sti-innsbruck.at Data Stream Processing for Big Data • Before throwing the data away some processing can be done at run- time – Processing streams of data as they happen • Embracing the streaming model – Data is seen as a constant flow (sequence) of transient elements – Fits naturally with many application domains (sensors, social media, etc.) • “Big data” is bringing an inherent set of complexities – Data structures exceeding the available memory – Approximate/incomplete results are taken as granted • Always look at the latest part of a dataset 11
  • 12. www.sti-innsbruck.at Data Stream Processing for Big Data • Logical reasoning in real time on multiple, heterogeneous, gigantic and inevitably noisy data streams in order to support the decision process… --S. Ceri,E. Della Valle,F. van HarmelenandH. Stuckenschmidt, 2010 window Extremely large input streams streams of answer Registered Continuous Query Picture taken from EmanueleDella Valle “Challenges, Approaches, and Solutions in Stream Reasoning”, Semantic Days 2012 Query engine takes stream subsets for query answering 12
  • 13. www.sti-innsbruck.at Conclusions • Big Data describes datasets so large and complex that they become awkward to work with using on-hand database management tools. • Big data application domains are diverse – Embracing a big data processing strategy can have a significant impact • Tacking the issues of big data processing requires to loose the requirements on completeness and precision • Big data on Web scale suffers from an inherent heterogeneity and different levels of expressiveness • Complexity is more than just size! 13
  • 14. ©w Cwowp.ysrtiig-ihntn 2s0b1r2u c k S.aTtI INNSBRUCK www.sti-innsbruck.at Public Open Data Open Data
  • 15. www.sti-innsbruck.at Public Open Data: What is Open Data? Definitions: • Open data is non-personally identifiable data produced in the course of an organisation’sordinary business, which has been released under an unrestricted licence(like the Open Government Licence). •Open public data is underpinned by the philosophy that data generated or collected by organisationsin the public sector should belong to the taxpayers, wherever financially feasible and where releasing it won’t violate any laws or rights to privacy (either for citizens or government staff). [linkedgovproject] http://linkedgov.org 15
  • 16. www.sti-innsbruck.at Public Open Data: What is Open Data? Definitions: The idea behind open data is that information held by government should be freely available to use and re-mix by the public. It’s a movement to make non-personal data: •open so that it can be turned into useful applications •support transparency and accountability •make sharing data between public sector partners more efficient. The Government is committed to making much more public data openly available. On 22 March 2010, the Prime Minister announced that the Government was going to: “...use digital technology to open up data with the aim of providing every citizen in Britain with true ownership and accountability over the services they demand from government.” http://www.idea.gov.uk/ 16
  • 17. www.sti-innsbruck.at Public Open Data: Features of Open Data Open Data principles [1]: 1. completeness–all data that can be open (w.r.t. privacy and security) should be open 2. primary source–all open data should be gathered at their source in raw format 3. temporal closeness–all open data should be up-to-date 4. easy access–all open data should be easily accessible 5. machine readability–all open data should be structured for machine processing [1] Source [Kaltenböck M., Thurner T., (Hg.): Open Government Data Weißbuch, 2011] 17
  • 18. www.sti-innsbruck.at Public Open Data: Features of Open Data Open Data principles [1]: 6. non-discriminating–all open data should be accessible for everyone 7. open standards–all open data should use open standards 8. liberal licensing–all open data should use a liberal licensing without huge obligations for potential users 9. durability–all open data should be available on a long term basis 10. non-discriminating usage costs–some open data might involve usage costs. These should be kept as low as possible. [1] Source [Kaltenböck M., Thurner T., (Hg.): Open Government Data Weißbuch, 2011] 18
  • 19. www.sti-innsbruck.at Public Open Data: Anditworks(SeeUK) 19
  • 20. www.sti-innsbruck.at Public Open Data: Anditworks(SeeUK) • See UK uses data that have been sourced from data.gov.uk and processed into Linked Data . • All the datasets are enriched and cross-linked to additional sources. • The visualisationprovides a view centredon a chosen region of the specified size, and most noticeably gives a "pie- chart" that shows the viewer how that region compares with similar regions around it. 20
  • 21. www.sti-innsbruck.at Public Open Data: Anditworks(police.uk) 21
  • 22. www.sti-innsbruck.at Public Open Data: Anditworks(police.uk) • Different appssuch as„VehicleCrime & Road AccidentMap“, „Crime Sounds“ and„UK Crimeview“ areprovided • The usercangeta quick ideaaboutdifferent areasofcitiesandtownsandtheircrimestatistics. 22
  • 23. www.sti-innsbruck.at Public Open Data • Openess: Open Data is about changing behaviour • Heterogenity: Different vocabularies are used • Interlinkage: Need to link these data sets to prevent data silos • LinkedOpen Data 23
  • 24. ©w Cwowp.ysrtiig-ihntn 2s0b1r2u c k S.aTtI INNSBRUCK www.sti-innsbruck.at Linked Open Data
  • 25. www.sti-innsbruck.at Motivation: From a Web of Documents to a Web of Data • Web of Documents • Fundamental elements: 1.Names(URIs) 2.Documents(Resources) described by HTML, XML, etc. 3.Interactionsvia HTTP 4.(Hyper)Linksbetween documents or anchors in these documents •Shortcomings: –Untyped links –Web search engines fail on complex queries “Documents” Hyperlinks 25
  • 26. www.sti-innsbruck.at Motivation: From a Web of Documents to a Web of Data • Web of Documents • Web of Data “Documents” “Things” Hyperlinks Typed Links 26
  • 27. www.sti-innsbruck.at Motivation: From a Web of Documents to a Web of Data • Characteristics: –Links between arbitrary things (e.g., persons, locations, events, buildings) –Structure of data on Web pages is made explicit –Things described on Web pages are named and get URIs –Links between things are made explicit and are typed • Web of Data “Things” Typed Links 27
  • 28. www.sti-innsbruck.at Google Knowledge Graph • “A huge knowledge graph of interconnected entities and their attributes”. AmitSinghal, Senior Vice President at Google • “A knowledge based used by Google to enhance its search engine’s results with semantic-search information gathered from a wide variety of sources” http://en.wikipedia.org/wiki/Knowledge_Graph • Based on information derived from many sources including Freebase, CIA World Factbook, Wikipedia • Contains about 3.5 billion facts about 500 million objects 28
  • 29. www.sti-innsbruck.at Google Knowledge Graph http://goo.gl/zp3IH 29
  • 30. www.sti-innsbruck.at Linked Data –a definition and principles • Linked Datais about the use of Semantic Web technologies to publish structured data on the Web and set links between data sources. Figure from C. Bizer 30
  • 31. www.sti-innsbruck.at Linked Data –a definition and principles 1. Use URIs as names for things. 2. Use HTTP URIs so that people can look up (dereference) those names. 3. When someone looks up a URI, provide useful information, using the standards (RDF*, SPARQL) 4. Include links to other URIs. so that they can discover more things. 31
  • 32. www.sti-innsbruck.at 5-star Linked OPEN Data ★Available on the web (whatever format) but with an open licence, to be Open Data ★★Available as machine- readable structured data (e.g. excel instead of image scan of a table) ★★★as (2) plus non-proprietary format (e.g. CSV instead of excel) ★★★★All the above plus, Use open standards from W3C (URIs, RDF and SPARQL) to identify things, so that people can point at your stuff ★★★★★All the above, plus: Link your data to other people’s data to provide context 32
  • 33. www.sti-innsbruck.at ★Available on the web (+ open licence) • Easy to publish web data • Data can be easily accessed and stored locally • Data can be entered manually into another system Source [Bauer F., Kaltenböck, M.: Linked Open Data: The Essentials, 2012] 33
  • 34. www.sti-innsbruck.at ★★Available as structured data • All benefits from ★ • Data can be directly processed with proprietary software • Easy to export it into another structured format Source [Bauer F., Kaltenböck, M.: Linked Open Data: The Essentials, 2012] 34
  • 35. www.sti-innsbruck.at ★★★Non-proprietary format is used • All benefits from ★★ • No need to pay for a format controlled by a single organization Source [Bauer F., Kaltenböck, M.: Linked Open Data: The Essentials, 2012] 35
  • 36. www.sti-innsbruck.at ★★★★Use open standards to identify things • All benefits from ★★★ • Link to data from anywhere, either on the web or locally • It can be bookmarked and parts of the data can be reused • Access to data items can be optimized (caching, load balancing, etc.) • BUT the publisher needs to identify separable items, assign URIs to each one, and Source [Bauer F., Kaltenböck, M.: Linked Open Data: The Essentials, 2012] 36
  • 37. www.sti-innsbruck.at ★★★★★Data is linked to provide context • All benefits from ★★★★ • New data of interest can be discovered while consuming other • Data schema can be obtained • Added value to the data • Linked datasets are discoverable • BUT resources have to be invested to link datasets Source [Bauer F., Kaltenböck, M.: Linked Open Data: The Essentials, 2012] 37
  • 38. www.sti-innsbruck.at LOD Cloud May 2007 Figure from http://linkeddata.org/ 38
  • 39. www.sti-innsbruck.at LOD Cloud May 2007 Basics: The Linked Open Data cloud is an interconnected set of datasets all of which were published and interlinked following the Linked Data principles. Facts: •Focal points: •DBPedia: RDFized vesion of Wikipiedia; many ingoing and outgoing links •Music-related datasets •Big datasets include FOAF, US Census data •Size approx. 1 billion triples, 250k links Figure from http://linkeddata.org/ 39
  • 40. www.sti-innsbruck.at LOD Cloud March 2009 Figure from http://linkeddata.org/ 40
  • 41. www.sti-innsbruck.at LOD Cloud September 2011 Figure from http://linkeddata.org/ 41
  • 42. www.sti-innsbruck.at LOD Cloud September 2011 Facts: •295 data sets •Over 31 billion triples •Over 504 billion RDF links between data sources Figure from http://linkeddata.org/ 42
  • 43. www.sti-innsbruck.at Linked Open Data –silver bullet for data integration • Linked Open Data can be seen as a global data integration platform – Heterogeneous data items from different data sets are linked to each other following the Linked Data principles – Widely deployed vocabularies (e.g. FOAF) provide the predicates to specify links between data items • Data integration with LOD requires: 1. Access to Linked Data • HTTP, SPARQL endpoints, RDF dumps • Crawling and caching 2. Normalize vocabularies –data sets that overlap in content use different vocabularies • Use schema mapping techniques based on rules (e.g. RIF, SWRL) or query languages (e.g. SPARQL Construct, etc.) 3. Resolve identifies –data sets that overlap in content use different URIs for the same real world entities • Use manual merging or approaches such as SILK (part of Linked Data Integration Framework) or LIMES 4. Filter data • Use SIVE ((part of Linked Data Integration Framework) See: http://www4.wiwiss.fu-berlin.de/bizer/ldif/ 43
  • 44. www.sti-innsbruck.at Example -Mashup: DBPedia Mobile • Geospatial entry point into the Web of Data. • It exploits information coming from DBpedia, Revyu and Flickr data. • It provides a way to explore maps of cities and gives pointers to more information which can be explored 44 Try yourself: http://wiki.dbpedia.org/DBpediaMobile Pictures from DBPedia Mobile 44
  • 45. www.sti-innsbruck.at From Intranet to Enterprise Data Web around a knowledge hub
  • 46. ©w Cwowp.ysrtiig-ihntn 2s0b1r2u c k S.aTtI INNSBRUCK www.sti-innsbruck.at Data Economy “Your data is worth more if you give it away.” Commission Vice President NeelieKroes “Your data is worth more if you give it away.” Commission Vice President NeelieKroes
  • 47. www.sti-innsbruck.at What is Data Economy? • Non tangible assets (i.e. data) play a significant role in the creation of economic value • Data is nowadays more important than, for example, search or advertisement • The value of the data, its potential to be used to create new products and services, is more important than the data itself 47
  • 48. www.sti-innsbruck.at Why a Data Economy? • Total market for public sector information €28 billion in 2008for the EU27 • Annual growth of 7% leading to around €32billion in 2010 • Estimated €40 billion annual boost for the European economy. • The total direct and indirect economic gains across the whole EU27 economy would be in the order of €140 billionannually. See: Review of recent studies on PSI re-use and related market developments, G. Vickery, August 2011. 48
  • 49. www.sti-innsbruck.at Why a Data Economy? • New businesses can be built on the back of these data: Data are an essential raw material for a wide range of new information products and services which build on new possibilities to analyseand visualisedata from different sources. Facilitating re-use of these raw data will create jobs and thus stimulate growth. • More Transparency: Open data is a powerful tool to increase the transparency of public administration, improving the visibility of previously inaccessible information, informing citizens and business about policies, public spending and outcomes. • Evidence-based policy making and administrative efficiency: The availability of solid EU-wide public data will lead to better evidence- based policy making at all levels of government, resulting in better public services and more efficient public spending. See: http://europa.eu/rapid/pressReleasesAction.do?reference=MEMO11/891&format=HTML&aged=0&language=EN&guiLanguage=en 49
  • 50. www.sti-innsbruck.at CombiningOpen Data andServices –Tourist MapAustria • Use LOD to integrate and lookup data about – places and routes – time-tables for public transport – hiking trails – ski slopes – points-of-interest 50
  • 51. www.sti-innsbruck.at CombiningOpen Data andServices –Tourist MapAustria LOD data sets • Open Streetmap • Google Places • Databases of government – TIRIS – DVT • Tourism & Ticketing association • IVB (busses and trams) • OEBB (trains) • Ärztekammer • Supermarket chains: listing of products • Hofer and similar: weekly offers • ASFINAG: Traffic/Congestion data • Herold(yellow pages) • City archive • Museums/Zoo • News sources like TT (Tyrol's major daily newspaper) • StatistikAustria • Innsbruck Airport (travel times, airline schedules) • ZAMG (Weather) • University of Innsbruck (Curricula, student statistics, study possibilities) • IKB (electricity, water consumption) • Entertainment facilities (Stadtcafe, Cinema...) • Special offers (Groupon)
  • 52. www.sti-innsbruck.at CombiningOpen Data andServices –Tourist MapAustria • Data and services from destination sites integrated for recommendation and booking of – Hotels – Restaurants – Cultural and entertainment events – Sightseeing – Shops 52
  • 53. www.sti-innsbruck.at • Web scraping integration • Create wrappers for current web sites and extract data automatically • Many Web scraping tools available on the market CombiningOpen Data andServices –Tourist MapAustria 53
  • 54. www.sti-innsbruck.at CombiningOpen Data andServices –Tourist MapAustria • Integration intoa comprehensivemapofmulti-channelcommunication, seekdabookingengine, LinkedOpen Data andon theflyserviceintegrationasyoupayto generate addedvalueforbusinessesaswell ascustomers • Combinationofmultichannelcommunicationandyieldmanagement – SemanticCommunication Engine Innsbruck (SCEI) – seekdabookingsolutions • enrichedwithLinked(Open) Data – Machineunderstandableinterlinkeddata – Bike andhikingtrails, sightinformation, etc. • and on the fly service integration as you pay – Solutions for ad-hoc service integration for touristic destination sites – Bike rental, ski passes, etc. – Services are quickly integrated through scraping (and later through legal frameworks and backendintegration in case of business volumes)
  • 55. www.sti-innsbruck.at CombiningOpen Data andServices –Tourist MapAustria • Based on Open Street Map 55
  • 56. www.sti-innsbruck.at CombiningOpen Data andServices –Tourist MapAustria • Based on Open Street Map • Increase on-line visibility for hotels and destinations via multi-channel communication – SCEI SCEI 56
  • 57. www.sti-innsbruck.at CombiningOpen Data andServices –Tourist MapAustria • Based on Open Street Map • Increase on-line visibility for hotels and destinations via multi-channel communication – SCEI • Hotels, ski passes, etc. are directly bookable – seekda engine SCEI 57
  • 58. www.sti-innsbruck.at CombiningOpen Data andServices –Tourist MapAustria • Based on Open Street Map • Increase on-line visibility for hotels and destinations via multi-channel communication – SCEI • Hotels, ski passes, etc. are directly bookable – seekda engine • LOD to integrate and lookup data about hiking trails, ski slopes, etc. LOD SCEI 58
  • 59. www.sti-innsbruck.at CombiningOpen Data andServices –Tourist MapAustria • Based on Open Street Map • Increase on-line visibility for hotels and destinations via multi-channel communication – SCEI • Hotels, ski passes, etc. are directly bookable – seekda engine • LOD to integrate and lookup data about hiking trails, ski slopes, etc. • On the fly service integration as you pay LOD SCEI 59
  • 60. www.sti-innsbruck.at “There's No Money in Linked (Open) Data” http://knoesis.wright.edu/faculty/pascal/pub/nomoneylod.pdf • It turns out that using LOD datasets in realistic settings is not always easy. – Surprisingly, in many cases the underlying issues are not technical but legal barriers erected by the LD data publishers. – Generally, mostly non-technical but socio-economical barriers hamper the reuse of date (do patents and IPR protections hamper or facilitate knowledge reuse?). – Business intelligence – Dynamic Data – On the fly generation of data 60
  • 61. ©w Cwowp.ysrtiig-ihntn 2s0b1r2u c k S.aTtI INNSBRUCK www.sti-innsbruck.at Thanks for your attention 61