SlideShare a Scribd company logo
Data 2.0 a new way of integrating data? Neil Chue Hong SC07, Reno
Summary From Data Grids To Data Services The Rise of Web 2.0 Towards Data 2.0
Grid versus Users Grid is about: sharing resources interoperable middleware allowing bigger problems integrating communities improving security bringing together data Users want to: access more resources ignore middleware solve bigger problems form communities have simple security bring together data Grid and Users want very similar things and yet there is still a “want-got-gap” between them how can this be bridged?
Data Grids The first generation of Grids concentrated on  Compute Grids harnessing capacity to improve capability Then came the first  Data Grids mechanisms for dealing with the large amounts of data generated by sensors and simulations
Data Challenges Diversity Scale Ownership Security of data resource types, vendors, middleware, schema, metadata of collections, formats, geographical, political and social distance on individual, group, and organisation levels; intersecting yet independent for client, service and data owner; at many levels, with many tradeoffs
Move towards data services Defined interface to stored collection of data e.g. Google and Amazon But the data could be: replicated shared federated virtual incomplete Improve the ability to discover, reference,  annotate, search, and provide provenance Make access transparent Make integration easy Make management simple
Grid Data Services Data middleware provides a way of publishing data in a uniform way accessible discoverable searchable Provide tools such as registries replica catalogs mediators
Grid versus User: Round 2 Grids provide: data discovery services distributed queries basic provenance workflows to represent analysis process Users want: information to find the right data cross-database searches sophisticated annotation  to explore the information space Data 2.0 must go beyond simple data access domain-specific vs generic data services composability, interoperability  and  ease of use
The Rise of Web 2.0 New sites allow non-technical users to share information and interact in programmable environments Social Networking: MySpace, Bebo, Facebook GIS: Google Maps, Google Earth Preference Matching: Amazon Meta-clustering: digg, del.icio.us Information Publishing: Flickr
The Rise of Web 2.0 New sites allow non-technical users to share information and interact in programmable environments Social Networking: MySpace, Bebo, Facebook GIS: Google Maps, Google Earth Preference Matching: Amazon Meta-clustering: digg, del.icio.us Information Publishing: Flickr An army of curators, a world of information
The Four Levels of e-Science Enlightenment 1)  Resources:  Providing access to a larger and wider diversity of resources 2)  Automation:  Increasing the automation and repeatability of experimentation 3)  Collaboration:  Allowing intra and cross disciplinary collaboration through enabling networks 4)  Participation:  Increasing access to a wider set of users and increasing knowledge in a domain by bringing new people to the subject
From DSs to VREs Virtual Research Environments bridge gap between middleware and users integrate functionality and facilities Harness interest in communities and make it easy to contribute and easy to benefit infrastructure annotation tools graphical environment
SEE-GEO: Geolinking Census DB Borders DB WFS GDAS OGSA-DAI getData getFeature geoLink Feature Portrayal GLS Portal Map Server Receive ticket for results Retrieve annotated image Store image on server Send parameterised query FPS Call out to existing FP service Cache attributes Stream polygons Request attributes Request features Run algorithm Stream relevant annotated polygons Concentrate on algorithm Access domain-specific data sets Utilise existing  services Efficient  delivery methods
Virtual Workspace for the Study of Ancient Documents An interface allowing browsing and searching of multiple image collections, including tools to compare and annotate the researcher’s personal collection
Data 2.0: From Silos to Sharing Choose data based on stored metadata bring together for each user Build a community by providing tools to contribute back Manc Data Soton Data OD OD Choose Dataset Dataset Annotation VRE Portal Amy Annot. Add Annotation Edin Data OD Bob Annot. Central Annot.
Data 2.0: a new way of integrating data?  Many diverse data sources independently owned and curated Many diverse users each sharing and utilising multiple datasets A personalised, virtual data warehouse bring together many sources to appear as one Allow shared, distributed, centralised, replicated annotation to build a community
What is the future of data? Data must be available to all to be useful Individuals must be able to harness the data to make it important to them The work you have seen today will help this happen Data 2.0 is not as far away as you think!

More Related Content

Data 2.0|

  • 1. Data 2.0 a new way of integrating data? Neil Chue Hong SC07, Reno
  • 2. Summary From Data Grids To Data Services The Rise of Web 2.0 Towards Data 2.0
  • 3. Grid versus Users Grid is about: sharing resources interoperable middleware allowing bigger problems integrating communities improving security bringing together data Users want to: access more resources ignore middleware solve bigger problems form communities have simple security bring together data Grid and Users want very similar things and yet there is still a “want-got-gap” between them how can this be bridged?
  • 4. Data Grids The first generation of Grids concentrated on Compute Grids harnessing capacity to improve capability Then came the first Data Grids mechanisms for dealing with the large amounts of data generated by sensors and simulations
  • 5. Data Challenges Diversity Scale Ownership Security of data resource types, vendors, middleware, schema, metadata of collections, formats, geographical, political and social distance on individual, group, and organisation levels; intersecting yet independent for client, service and data owner; at many levels, with many tradeoffs
  • 6. Move towards data services Defined interface to stored collection of data e.g. Google and Amazon But the data could be: replicated shared federated virtual incomplete Improve the ability to discover, reference, annotate, search, and provide provenance Make access transparent Make integration easy Make management simple
  • 7. Grid Data Services Data middleware provides a way of publishing data in a uniform way accessible discoverable searchable Provide tools such as registries replica catalogs mediators
  • 8. Grid versus User: Round 2 Grids provide: data discovery services distributed queries basic provenance workflows to represent analysis process Users want: information to find the right data cross-database searches sophisticated annotation to explore the information space Data 2.0 must go beyond simple data access domain-specific vs generic data services composability, interoperability and ease of use
  • 9. The Rise of Web 2.0 New sites allow non-technical users to share information and interact in programmable environments Social Networking: MySpace, Bebo, Facebook GIS: Google Maps, Google Earth Preference Matching: Amazon Meta-clustering: digg, del.icio.us Information Publishing: Flickr
  • 10. The Rise of Web 2.0 New sites allow non-technical users to share information and interact in programmable environments Social Networking: MySpace, Bebo, Facebook GIS: Google Maps, Google Earth Preference Matching: Amazon Meta-clustering: digg, del.icio.us Information Publishing: Flickr An army of curators, a world of information
  • 11. The Four Levels of e-Science Enlightenment 1) Resources: Providing access to a larger and wider diversity of resources 2) Automation: Increasing the automation and repeatability of experimentation 3) Collaboration: Allowing intra and cross disciplinary collaboration through enabling networks 4) Participation: Increasing access to a wider set of users and increasing knowledge in a domain by bringing new people to the subject
  • 12. From DSs to VREs Virtual Research Environments bridge gap between middleware and users integrate functionality and facilities Harness interest in communities and make it easy to contribute and easy to benefit infrastructure annotation tools graphical environment
  • 13. SEE-GEO: Geolinking Census DB Borders DB WFS GDAS OGSA-DAI getData getFeature geoLink Feature Portrayal GLS Portal Map Server Receive ticket for results Retrieve annotated image Store image on server Send parameterised query FPS Call out to existing FP service Cache attributes Stream polygons Request attributes Request features Run algorithm Stream relevant annotated polygons Concentrate on algorithm Access domain-specific data sets Utilise existing services Efficient delivery methods
  • 14. Virtual Workspace for the Study of Ancient Documents An interface allowing browsing and searching of multiple image collections, including tools to compare and annotate the researcher’s personal collection
  • 15. Data 2.0: From Silos to Sharing Choose data based on stored metadata bring together for each user Build a community by providing tools to contribute back Manc Data Soton Data OD OD Choose Dataset Dataset Annotation VRE Portal Amy Annot. Add Annotation Edin Data OD Bob Annot. Central Annot.
  • 16. Data 2.0: a new way of integrating data? Many diverse data sources independently owned and curated Many diverse users each sharing and utilising multiple datasets A personalised, virtual data warehouse bring together many sources to appear as one Allow shared, distributed, centralised, replicated annotation to build a community
  • 17. What is the future of data? Data must be available to all to be useful Individuals must be able to harness the data to make it important to them The work you have seen today will help this happen Data 2.0 is not as far away as you think!