SlideShare a Scribd company logo
Model-Driven
Cloud Data Storage
Juan Castrejón, Genoveva Vargas-Solar, Christine Collet, Rafael Lozano
Université de Grenoble, CNRS, Grenoble INP, Tecnológico de Monterrey




CloudMDE 2012
2




Background
•  Cloud computing (NIST-2011)
   •  Utility computing model for enabling ubiquitous, convenient, on-
      demand network access to a shared pool of configurable resources

•  Cloud data storage (Ruiz-2011, Armbrust-2009)
   •  Store, retrieve and manage large amounts of data, using highly
      scalable distributed infrastructures


•  Polyglot persistence (Fowler-2011)
   •  Different data storage technologies for different kinds of data
   •  Each storage mechanism introduces a new interface to be learned
   •  To get decent performance, you have to understand a lot about
      how the technology works
3




Background
•  Variety of data storage models and implementations
 (Cattell-2011, Edlich-2012)
  •  Models: key-value, document, extensible record, graph, blob,
     object, queue, xml, relational
  •  Implementations: Redis, Voldemort, MongoDB, CouchDB,
     Cassandra, Neo4J, db4o, eXist-db, etc. (As of today, over 120 options)


•  Cloud deployment environments (Ruiz-2011)
   •  Different combinations of pricing, support, service level
      agreements, and management APIs
   •  Public providers (Amazon, Windows Azure, Xeround, etc.)
   •  Private providers (Eucalyptus, OpenNebula, etc.)
4

Use the right tool for the right job…




                                                        How do I know which is the
                                                        right tool for the right job?




                                        (Katsov-2012)
5




Problem
•  How to specify data requirements for cloud environments?


•  For a set of data requirements, how to choose an
 appropriate combination of cloud storage system
 implementation and deployment provider?

•  How to generate/manage everything that’s required to
 work with the selection that I make?
6




Existing solutions
•  Integration of cloud storage platforms (Livenson-2011)
    •  Cloud Data Management Interface (CDMI) (SNIA-2011) proxy to
       integrate blob and queue data stores
•  Data integration over NoSQL stores (Curé-2011)
   •  Integration of relational and NoSQL databases (Document, column)
   •  Focus on efficient answering of queries
•  Storage provider selection (Ruiz-2011, Ruiz-2012)
   •  Characterize storage providers features (Ex: performance, cost)
   •  Specify requirements for application datasets (Ex: expected size,
      access latency, concurrent clients)
   •  Based on the previous information, an assignment of datasets to
      different storage systems is proposed
7




Existing solutions
•  Modeling as a Service (Bruneliere-2010)
   •  Deploy and execute model-driven services over the Internet (SaaS)


•  Design and deploy applications in the cloud (Peidro-2011)
   •  Promotes graphical models to capture cloud requirements
   •  Models automatically deployed to PaaS and IaaS environments


•  Application design/execution in multiple clouds (Ardagna-2012)
  •  MDE quality-driven method for design, development and operation
  •  Monitoring and feedback system
8




Limitations of existing solutions
•  Support for a limited set of cloud storage interfaces


•  Data integration can be highly based on the relational
 model

•  Limited information for the selection of data storage
 systems

•  Consideration for high-level cloud models (SaaS) but
 limited support for low-level models (PaaS and IaaS)
9




Objectives
1.  Provide adequate notations and environments to
   characterize cloud data storage requirements

2.  Selection of cloud data storage implementations and
   deployment providers

3.  Management of the required artifacts to work with
   different combinations of cloud storage implementations
   and providers
10




  Objectives
                          Cloud
                       requirements
                                Conceptual                    High-level of abstraction
                                  models                (Conceptual models and environments)




Selection process      Logical    Logical    Logical
Artifacts management   model      model      model




                       Physical   Physical   Physical           Low-level of abstraction
                        model      model      model     (Storage implementations and providers)
11




Proposed solution
•  Rely on Model-Driven Engineering (MDE) (Kent-2002) to:
   •  Characterize cloud storage requirements
   •  Encapsulate selection, administration and use of cloud data
      storage implementations


•  Why MDE?
   •  Avoid dependencies between high-level (data models) and low-
      level abstractions (storage implementations and providers)
   •  Emphasis on relying on different levels of modeling notations
   •  Generation of low-level abstractions by using automatic
      transformation procedures
12




Objective 1: Data requirements for the cloud
•  Do traditional modeling notations (ER and UML diagrams)
 make sense for data storage in the cloud?
  •  Define-extend notations and environments for cloud data modeling
•  What requirements should a cloud data storage notation
 consider?
  •  Rely on quality standards (ISO/IEC SQuaRE, S-Cube) to guide this
    analysis. Example: performance, efficiency, portability, etc.
•  How to characterize the proposed requirements?
   •  Associate quality metrics relevant to (cloud) scenarios, based on
      the characteristics of the reference standard (Jureta-2010)
   •  Validate currently proposed metrics. For example: throughput, cost,
      access latency, etc.
13




Objective 2: Data storage selection
•  Based on the analysis of historic data and usage patterns
   •  Both in test applications and within systems generated in our modeling
      environment
•  Monitoring data is gathered in a non-intrusive manner
   •  AOP monitoring
   •  Monitor the behaviour of the selected implementation/providers, based
      on the metrics specified in the modeling environment
   •  Compare expected values and actual performance
•  Monitoring data is shared in open/collaborative manner
   •  Used by our decision process
   •  Available for external users
•  Users could work, at the same time, with multiple combinations
 of storage implementations and providers
  •  Test the performance of the different combinations
14




Objective 3: Cloud artifacts management
•  Generate the low-level artifacts to work with data storage
 implementations and deployment providers
  •  Configuration files for deployment providers
  •  Data management interfaces (CDMI, Spring Data, etc.)


•  Different levels of transformation procedures
   •  From the high-level data model to an intermediate Domain Specific
      Language (DSL) (Liu-2010, SpringRoo-2012)
   •  From the intermediate DSL to configuration files, AOP monitoring
      aspects and data management interfaces (SpringData-2012)


•  MDE transformation techniques
   •  Model-to-Model (M2M), Model-to-Text (M2T)
15




Proof of concept                                      Work in progress…

                                                                        1
•  Extension - Model2Roo (http://code.google.com/p/model2roo/)
                                                                  High-level
                                                                 abstractions

                                               Java
                                               web
                                               App
                                                          Spring Data
UML class diagram        Spring Roo




                    2
               Low-level
              abstractions
                              Graph database
                                                        Relational database
16




Preliminary results
•  Castrejón, J., Vargas-Solar, G., Collet, C., Lozano, R., :
 “Model-Driven Cloud Data Storage”. In: First International
 Workshop on Model-Driven Engineering on and for the
 Cloud (CloudMDE 2012). Co-located with ECMFA ’12.
 July 2012

•  Castrejón, J., Vargas-Solar, G., Lozano, R., : “Model2Roo:
 Web Application Development based on the Eclipse
 Modeling Framework and Spring Roo”. In: First Workshop
 on Academics Modeling with Eclipse (ACME 2012). Co-
 located with ECMFA ’12. July 2012
17




Demonstration / Questions



  Contact: Juan.Castrejon@imag.fr
18




References
•  Ardagna, D., Di Nitto, E., Casale, G., et al. MODACLOUDS, A Model-Driven Approach for the
     Design and Execution of Applications on Multiple Clouds. Models in Software Engineering
     Workshop (MiSE 2012). Co-located with ICSE ’12. (2012)
•    Armbrust M. , Fox A., Griffith R., Joseph A. D, et al. Above the Clouds: A Berkeley View of
     Cloud Computing, 2009.
•    Bruneliere, H., Cabot, J., Jouault, F.: Combining model-driven engineering and cloud
     computing. In: Modeling, Design, and Analysis for the Service Cloud Workshop.
     MDA4ServiceCloud ’10 (2010)
•    Cattell, R.: Scalable sql and nosql data stores. SIGMOD Rec. 39, 12–27 (May 2011)
•    Curé, O., Hecht, R., Le Duc, C., Lamolle, M.: Data Integration over NoSQL Stores Using
     Access Path Based Mappings. A. In: Proceedings of the 22nd International Conference on
     Database and Expert Systems Applications (DEXA 2011). Hameurlain et al. (Eds.), Part I,
     LNCS 6860, pp. 481–495, (2011)
•    Edlich, S.: List of nosql databases. http://nosqldatabase.org/ (March 2012)
•    Fowler, M.: Polyglot persistence. http://martinfowler.com/bliki/PolyglotPersistence.html
     (November 2011)
•    Jureta, I., Borgida, A., Ernst, N., Mylopoulos, J.: Techne: Towards a New Generation of
     Requirements Modeling Languages with Goals, Preferences, and Inconsistency Handling. In:
     Proceedings of the 18th IEEE International Requirements Engineering Conference. pp.
     115-124. RE 2010. IEEE Computer Society (2010)
•    Katsov, I.: Nosql data modeling techniques. http://highlyscalable.wordpress.com/ 2012/03/01/
     nosql-data-modeling-techniques/ (March 2012)
19




References
•  Kent, S.: Model driven engineering. In: Butler, M., Petre, L., Sere, K. (eds.) Integrated Formal Methods,
     LNCS, vol. 2335, pp. 286–298. Springer Berlin (2002)
•    Lenzerini, M.: Data integration is harder than you thought. In: Proceedings of the 9th International
     Conference on Cooperative Information Systems. pp. 22-26. CooplS ’01, Springer-Verlag, London, UK
     (2001)
•    Livenson, I., Laure, E.: Towards Transparent Integration of Heterogeneous Cloud Storage Platforms. In:
     Fourth International Workshop on Data Intensive Distributed Computing. DIDC ’11. Co-located with HDPC
     ‘11 (2011)
•    Liu, D., Zic, J.: Cloud#: A specification language for modeling cloud. In: Proceedings of the 2011 IEEE 4th
     International Conference on Cloud Computing. pp. 533–540. CLOUD ’11, IEEE Computer Society,
     Washington, DC, USA (2011)
•    Peidro, J.E., Muñoz-Escoí, F.D.: Towards the next generation of model driven cloud platforms. In: 1st
     International Conference on Cloud Computing and Services Science. pp. 494–500. CLOSER ’11 (2011)
•    Ruiz-Alvarez, A., Humphrey, M.: An automated approach to cloud storage service selection. In: Proceedings
     of the 2nd international workshop on Scientific cloud computing. pp. 39–48. ScienceCloud ’11, ACM, New
     York, NY, USA (2011)
•    Ruiz-Alvarez, A., Humphrey, M.: A model and decision procedure for data storage in cloud computing. In:
     Proceedings of the IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing. CCGrid ’12
     (2012)
•    Storage Networking Industry Association (SNIA): Cloud data management interface (CDMI). http://
     www.snia.org/cdmi (September 2011)
•    SpringSource: Spring data projects. http://www.springsource.org/spring-data (March 2012)
•    SpringSource: Spring roo. http://www.springsource.org/spring-roo (March 2012)

More Related Content

Model-Driven Cloud Data Storage

  • 1. Model-Driven Cloud Data Storage Juan Castrejón, Genoveva Vargas-Solar, Christine Collet, Rafael Lozano Université de Grenoble, CNRS, Grenoble INP, Tecnológico de Monterrey CloudMDE 2012
  • 2. 2 Background •  Cloud computing (NIST-2011) •  Utility computing model for enabling ubiquitous, convenient, on- demand network access to a shared pool of configurable resources •  Cloud data storage (Ruiz-2011, Armbrust-2009) •  Store, retrieve and manage large amounts of data, using highly scalable distributed infrastructures •  Polyglot persistence (Fowler-2011) •  Different data storage technologies for different kinds of data •  Each storage mechanism introduces a new interface to be learned •  To get decent performance, you have to understand a lot about how the technology works
  • 3. 3 Background •  Variety of data storage models and implementations (Cattell-2011, Edlich-2012) •  Models: key-value, document, extensible record, graph, blob, object, queue, xml, relational •  Implementations: Redis, Voldemort, MongoDB, CouchDB, Cassandra, Neo4J, db4o, eXist-db, etc. (As of today, over 120 options) •  Cloud deployment environments (Ruiz-2011) •  Different combinations of pricing, support, service level agreements, and management APIs •  Public providers (Amazon, Windows Azure, Xeround, etc.) •  Private providers (Eucalyptus, OpenNebula, etc.)
  • 4. 4 Use the right tool for the right job… How do I know which is the right tool for the right job? (Katsov-2012)
  • 5. 5 Problem •  How to specify data requirements for cloud environments? •  For a set of data requirements, how to choose an appropriate combination of cloud storage system implementation and deployment provider? •  How to generate/manage everything that’s required to work with the selection that I make?
  • 6. 6 Existing solutions •  Integration of cloud storage platforms (Livenson-2011) •  Cloud Data Management Interface (CDMI) (SNIA-2011) proxy to integrate blob and queue data stores •  Data integration over NoSQL stores (Curé-2011) •  Integration of relational and NoSQL databases (Document, column) •  Focus on efficient answering of queries •  Storage provider selection (Ruiz-2011, Ruiz-2012) •  Characterize storage providers features (Ex: performance, cost) •  Specify requirements for application datasets (Ex: expected size, access latency, concurrent clients) •  Based on the previous information, an assignment of datasets to different storage systems is proposed
  • 7. 7 Existing solutions •  Modeling as a Service (Bruneliere-2010) •  Deploy and execute model-driven services over the Internet (SaaS) •  Design and deploy applications in the cloud (Peidro-2011) •  Promotes graphical models to capture cloud requirements •  Models automatically deployed to PaaS and IaaS environments •  Application design/execution in multiple clouds (Ardagna-2012) •  MDE quality-driven method for design, development and operation •  Monitoring and feedback system
  • 8. 8 Limitations of existing solutions •  Support for a limited set of cloud storage interfaces •  Data integration can be highly based on the relational model •  Limited information for the selection of data storage systems •  Consideration for high-level cloud models (SaaS) but limited support for low-level models (PaaS and IaaS)
  • 9. 9 Objectives 1.  Provide adequate notations and environments to characterize cloud data storage requirements 2.  Selection of cloud data storage implementations and deployment providers 3.  Management of the required artifacts to work with different combinations of cloud storage implementations and providers
  • 10. 10 Objectives Cloud requirements Conceptual High-level of abstraction models (Conceptual models and environments) Selection process Logical Logical Logical Artifacts management model model model Physical Physical Physical Low-level of abstraction model model model (Storage implementations and providers)
  • 11. 11 Proposed solution •  Rely on Model-Driven Engineering (MDE) (Kent-2002) to: •  Characterize cloud storage requirements •  Encapsulate selection, administration and use of cloud data storage implementations •  Why MDE? •  Avoid dependencies between high-level (data models) and low- level abstractions (storage implementations and providers) •  Emphasis on relying on different levels of modeling notations •  Generation of low-level abstractions by using automatic transformation procedures
  • 12. 12 Objective 1: Data requirements for the cloud •  Do traditional modeling notations (ER and UML diagrams) make sense for data storage in the cloud? •  Define-extend notations and environments for cloud data modeling •  What requirements should a cloud data storage notation consider? •  Rely on quality standards (ISO/IEC SQuaRE, S-Cube) to guide this analysis. Example: performance, efficiency, portability, etc. •  How to characterize the proposed requirements? •  Associate quality metrics relevant to (cloud) scenarios, based on the characteristics of the reference standard (Jureta-2010) •  Validate currently proposed metrics. For example: throughput, cost, access latency, etc.
  • 13. 13 Objective 2: Data storage selection •  Based on the analysis of historic data and usage patterns •  Both in test applications and within systems generated in our modeling environment •  Monitoring data is gathered in a non-intrusive manner •  AOP monitoring •  Monitor the behaviour of the selected implementation/providers, based on the metrics specified in the modeling environment •  Compare expected values and actual performance •  Monitoring data is shared in open/collaborative manner •  Used by our decision process •  Available for external users •  Users could work, at the same time, with multiple combinations of storage implementations and providers •  Test the performance of the different combinations
  • 14. 14 Objective 3: Cloud artifacts management •  Generate the low-level artifacts to work with data storage implementations and deployment providers •  Configuration files for deployment providers •  Data management interfaces (CDMI, Spring Data, etc.) •  Different levels of transformation procedures •  From the high-level data model to an intermediate Domain Specific Language (DSL) (Liu-2010, SpringRoo-2012) •  From the intermediate DSL to configuration files, AOP monitoring aspects and data management interfaces (SpringData-2012) •  MDE transformation techniques •  Model-to-Model (M2M), Model-to-Text (M2T)
  • 15. 15 Proof of concept Work in progress… 1 •  Extension - Model2Roo (http://code.google.com/p/model2roo/) High-level abstractions Java web App Spring Data UML class diagram Spring Roo 2 Low-level abstractions Graph database Relational database
  • 16. 16 Preliminary results •  Castrejón, J., Vargas-Solar, G., Collet, C., Lozano, R., : “Model-Driven Cloud Data Storage”. In: First International Workshop on Model-Driven Engineering on and for the Cloud (CloudMDE 2012). Co-located with ECMFA ’12. July 2012 •  Castrejón, J., Vargas-Solar, G., Lozano, R., : “Model2Roo: Web Application Development based on the Eclipse Modeling Framework and Spring Roo”. In: First Workshop on Academics Modeling with Eclipse (ACME 2012). Co- located with ECMFA ’12. July 2012
  • 17. 17 Demonstration / Questions Contact: Juan.Castrejon@imag.fr
  • 18. 18 References •  Ardagna, D., Di Nitto, E., Casale, G., et al. MODACLOUDS, A Model-Driven Approach for the Design and Execution of Applications on Multiple Clouds. Models in Software Engineering Workshop (MiSE 2012). Co-located with ICSE ’12. (2012) •  Armbrust M. , Fox A., Griffith R., Joseph A. D, et al. Above the Clouds: A Berkeley View of Cloud Computing, 2009. •  Bruneliere, H., Cabot, J., Jouault, F.: Combining model-driven engineering and cloud computing. In: Modeling, Design, and Analysis for the Service Cloud Workshop. MDA4ServiceCloud ’10 (2010) •  Cattell, R.: Scalable sql and nosql data stores. SIGMOD Rec. 39, 12–27 (May 2011) •  Curé, O., Hecht, R., Le Duc, C., Lamolle, M.: Data Integration over NoSQL Stores Using Access Path Based Mappings. A. In: Proceedings of the 22nd International Conference on Database and Expert Systems Applications (DEXA 2011). Hameurlain et al. (Eds.), Part I, LNCS 6860, pp. 481–495, (2011) •  Edlich, S.: List of nosql databases. http://nosqldatabase.org/ (March 2012) •  Fowler, M.: Polyglot persistence. http://martinfowler.com/bliki/PolyglotPersistence.html (November 2011) •  Jureta, I., Borgida, A., Ernst, N., Mylopoulos, J.: Techne: Towards a New Generation of Requirements Modeling Languages with Goals, Preferences, and Inconsistency Handling. In: Proceedings of the 18th IEEE International Requirements Engineering Conference. pp. 115-124. RE 2010. IEEE Computer Society (2010) •  Katsov, I.: Nosql data modeling techniques. http://highlyscalable.wordpress.com/ 2012/03/01/ nosql-data-modeling-techniques/ (March 2012)
  • 19. 19 References •  Kent, S.: Model driven engineering. In: Butler, M., Petre, L., Sere, K. (eds.) Integrated Formal Methods, LNCS, vol. 2335, pp. 286–298. Springer Berlin (2002) •  Lenzerini, M.: Data integration is harder than you thought. In: Proceedings of the 9th International Conference on Cooperative Information Systems. pp. 22-26. CooplS ’01, Springer-Verlag, London, UK (2001) •  Livenson, I., Laure, E.: Towards Transparent Integration of Heterogeneous Cloud Storage Platforms. In: Fourth International Workshop on Data Intensive Distributed Computing. DIDC ’11. Co-located with HDPC ‘11 (2011) •  Liu, D., Zic, J.: Cloud#: A specification language for modeling cloud. In: Proceedings of the 2011 IEEE 4th International Conference on Cloud Computing. pp. 533–540. CLOUD ’11, IEEE Computer Society, Washington, DC, USA (2011) •  Peidro, J.E., Muñoz-Escoí, F.D.: Towards the next generation of model driven cloud platforms. In: 1st International Conference on Cloud Computing and Services Science. pp. 494–500. CLOSER ’11 (2011) •  Ruiz-Alvarez, A., Humphrey, M.: An automated approach to cloud storage service selection. In: Proceedings of the 2nd international workshop on Scientific cloud computing. pp. 39–48. ScienceCloud ’11, ACM, New York, NY, USA (2011) •  Ruiz-Alvarez, A., Humphrey, M.: A model and decision procedure for data storage in cloud computing. In: Proceedings of the IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing. CCGrid ’12 (2012) •  Storage Networking Industry Association (SNIA): Cloud data management interface (CDMI). http:// www.snia.org/cdmi (September 2011) •  SpringSource: Spring data projects. http://www.springsource.org/spring-data (March 2012) •  SpringSource: Spring roo. http://www.springsource.org/spring-roo (March 2012)