This document discusses model-driven approaches for cloud data storage. It outlines objectives to 1) characterize cloud data storage requirements using conceptual models, 2) select appropriate cloud data storage implementations and providers based on requirements, and 3) manage artifacts for working with different storage solutions. Existing solutions are limited and the proposed approach uses model-driven engineering with multiple levels of modeling and transformation to map between requirements and storage solutions.
Report
Share
Report
Share
1 of 19
Download to read offline
More Related Content
Model-Driven Cloud Data Storage
1. Model-Driven
Cloud Data Storage
Juan Castrejón, Genoveva Vargas-Solar, Christine Collet, Rafael Lozano
Université de Grenoble, CNRS, Grenoble INP, Tecnológico de Monterrey
CloudMDE 2012
2. 2
Background
• Cloud computing (NIST-2011)
• Utility computing model for enabling ubiquitous, convenient, on-
demand network access to a shared pool of configurable resources
• Cloud data storage (Ruiz-2011, Armbrust-2009)
• Store, retrieve and manage large amounts of data, using highly
scalable distributed infrastructures
• Polyglot persistence (Fowler-2011)
• Different data storage technologies for different kinds of data
• Each storage mechanism introduces a new interface to be learned
• To get decent performance, you have to understand a lot about
how the technology works
3. 3
Background
• Variety of data storage models and implementations
(Cattell-2011, Edlich-2012)
• Models: key-value, document, extensible record, graph, blob,
object, queue, xml, relational
• Implementations: Redis, Voldemort, MongoDB, CouchDB,
Cassandra, Neo4J, db4o, eXist-db, etc. (As of today, over 120 options)
• Cloud deployment environments (Ruiz-2011)
• Different combinations of pricing, support, service level
agreements, and management APIs
• Public providers (Amazon, Windows Azure, Xeround, etc.)
• Private providers (Eucalyptus, OpenNebula, etc.)
4. 4
Use the right tool for the right job…
How do I know which is the
right tool for the right job?
(Katsov-2012)
5. 5
Problem
• How to specify data requirements for cloud environments?
• For a set of data requirements, how to choose an
appropriate combination of cloud storage system
implementation and deployment provider?
• How to generate/manage everything that’s required to
work with the selection that I make?
6. 6
Existing solutions
• Integration of cloud storage platforms (Livenson-2011)
• Cloud Data Management Interface (CDMI) (SNIA-2011) proxy to
integrate blob and queue data stores
• Data integration over NoSQL stores (Curé-2011)
• Integration of relational and NoSQL databases (Document, column)
• Focus on efficient answering of queries
• Storage provider selection (Ruiz-2011, Ruiz-2012)
• Characterize storage providers features (Ex: performance, cost)
• Specify requirements for application datasets (Ex: expected size,
access latency, concurrent clients)
• Based on the previous information, an assignment of datasets to
different storage systems is proposed
7. 7
Existing solutions
• Modeling as a Service (Bruneliere-2010)
• Deploy and execute model-driven services over the Internet (SaaS)
• Design and deploy applications in the cloud (Peidro-2011)
• Promotes graphical models to capture cloud requirements
• Models automatically deployed to PaaS and IaaS environments
• Application design/execution in multiple clouds (Ardagna-2012)
• MDE quality-driven method for design, development and operation
• Monitoring and feedback system
8. 8
Limitations of existing solutions
• Support for a limited set of cloud storage interfaces
• Data integration can be highly based on the relational
model
• Limited information for the selection of data storage
systems
• Consideration for high-level cloud models (SaaS) but
limited support for low-level models (PaaS and IaaS)
9. 9
Objectives
1. Provide adequate notations and environments to
characterize cloud data storage requirements
2. Selection of cloud data storage implementations and
deployment providers
3. Management of the required artifacts to work with
different combinations of cloud storage implementations
and providers
10. 10
Objectives
Cloud
requirements
Conceptual High-level of abstraction
models (Conceptual models and environments)
Selection process Logical Logical Logical
Artifacts management model model model
Physical Physical Physical Low-level of abstraction
model model model (Storage implementations and providers)
11. 11
Proposed solution
• Rely on Model-Driven Engineering (MDE) (Kent-2002) to:
• Characterize cloud storage requirements
• Encapsulate selection, administration and use of cloud data
storage implementations
• Why MDE?
• Avoid dependencies between high-level (data models) and low-
level abstractions (storage implementations and providers)
• Emphasis on relying on different levels of modeling notations
• Generation of low-level abstractions by using automatic
transformation procedures
12. 12
Objective 1: Data requirements for the cloud
• Do traditional modeling notations (ER and UML diagrams)
make sense for data storage in the cloud?
• Define-extend notations and environments for cloud data modeling
• What requirements should a cloud data storage notation
consider?
• Rely on quality standards (ISO/IEC SQuaRE, S-Cube) to guide this
analysis. Example: performance, efficiency, portability, etc.
• How to characterize the proposed requirements?
• Associate quality metrics relevant to (cloud) scenarios, based on
the characteristics of the reference standard (Jureta-2010)
• Validate currently proposed metrics. For example: throughput, cost,
access latency, etc.
13. 13
Objective 2: Data storage selection
• Based on the analysis of historic data and usage patterns
• Both in test applications and within systems generated in our modeling
environment
• Monitoring data is gathered in a non-intrusive manner
• AOP monitoring
• Monitor the behaviour of the selected implementation/providers, based
on the metrics specified in the modeling environment
• Compare expected values and actual performance
• Monitoring data is shared in open/collaborative manner
• Used by our decision process
• Available for external users
• Users could work, at the same time, with multiple combinations
of storage implementations and providers
• Test the performance of the different combinations
14. 14
Objective 3: Cloud artifacts management
• Generate the low-level artifacts to work with data storage
implementations and deployment providers
• Configuration files for deployment providers
• Data management interfaces (CDMI, Spring Data, etc.)
• Different levels of transformation procedures
• From the high-level data model to an intermediate Domain Specific
Language (DSL) (Liu-2010, SpringRoo-2012)
• From the intermediate DSL to configuration files, AOP monitoring
aspects and data management interfaces (SpringData-2012)
• MDE transformation techniques
• Model-to-Model (M2M), Model-to-Text (M2T)
15. 15
Proof of concept Work in progress…
1
• Extension - Model2Roo (http://code.google.com/p/model2roo/)
High-level
abstractions
Java
web
App
Spring Data
UML class diagram Spring Roo
2
Low-level
abstractions
Graph database
Relational database
16. 16
Preliminary results
• Castrejón, J., Vargas-Solar, G., Collet, C., Lozano, R., :
“Model-Driven Cloud Data Storage”. In: First International
Workshop on Model-Driven Engineering on and for the
Cloud (CloudMDE 2012). Co-located with ECMFA ’12.
July 2012
• Castrejón, J., Vargas-Solar, G., Lozano, R., : “Model2Roo:
Web Application Development based on the Eclipse
Modeling Framework and Spring Roo”. In: First Workshop
on Academics Modeling with Eclipse (ACME 2012). Co-
located with ECMFA ’12. July 2012
18. 18
References
• Ardagna, D., Di Nitto, E., Casale, G., et al. MODACLOUDS, A Model-Driven Approach for the
Design and Execution of Applications on Multiple Clouds. Models in Software Engineering
Workshop (MiSE 2012). Co-located with ICSE ’12. (2012)
• Armbrust M. , Fox A., Griffith R., Joseph A. D, et al. Above the Clouds: A Berkeley View of
Cloud Computing, 2009.
• Bruneliere, H., Cabot, J., Jouault, F.: Combining model-driven engineering and cloud
computing. In: Modeling, Design, and Analysis for the Service Cloud Workshop.
MDA4ServiceCloud ’10 (2010)
• Cattell, R.: Scalable sql and nosql data stores. SIGMOD Rec. 39, 12–27 (May 2011)
• Curé, O., Hecht, R., Le Duc, C., Lamolle, M.: Data Integration over NoSQL Stores Using
Access Path Based Mappings. A. In: Proceedings of the 22nd International Conference on
Database and Expert Systems Applications (DEXA 2011). Hameurlain et al. (Eds.), Part I,
LNCS 6860, pp. 481–495, (2011)
• Edlich, S.: List of nosql databases. http://nosqldatabase.org/ (March 2012)
• Fowler, M.: Polyglot persistence. http://martinfowler.com/bliki/PolyglotPersistence.html
(November 2011)
• Jureta, I., Borgida, A., Ernst, N., Mylopoulos, J.: Techne: Towards a New Generation of
Requirements Modeling Languages with Goals, Preferences, and Inconsistency Handling. In:
Proceedings of the 18th IEEE International Requirements Engineering Conference. pp.
115-124. RE 2010. IEEE Computer Society (2010)
• Katsov, I.: Nosql data modeling techniques. http://highlyscalable.wordpress.com/ 2012/03/01/
nosql-data-modeling-techniques/ (March 2012)
19. 19
References
• Kent, S.: Model driven engineering. In: Butler, M., Petre, L., Sere, K. (eds.) Integrated Formal Methods,
LNCS, vol. 2335, pp. 286–298. Springer Berlin (2002)
• Lenzerini, M.: Data integration is harder than you thought. In: Proceedings of the 9th International
Conference on Cooperative Information Systems. pp. 22-26. CooplS ’01, Springer-Verlag, London, UK
(2001)
• Livenson, I., Laure, E.: Towards Transparent Integration of Heterogeneous Cloud Storage Platforms. In:
Fourth International Workshop on Data Intensive Distributed Computing. DIDC ’11. Co-located with HDPC
‘11 (2011)
• Liu, D., Zic, J.: Cloud#: A specification language for modeling cloud. In: Proceedings of the 2011 IEEE 4th
International Conference on Cloud Computing. pp. 533–540. CLOUD ’11, IEEE Computer Society,
Washington, DC, USA (2011)
• Peidro, J.E., Muñoz-Escoí, F.D.: Towards the next generation of model driven cloud platforms. In: 1st
International Conference on Cloud Computing and Services Science. pp. 494–500. CLOSER ’11 (2011)
• Ruiz-Alvarez, A., Humphrey, M.: An automated approach to cloud storage service selection. In: Proceedings
of the 2nd international workshop on Scientific cloud computing. pp. 39–48. ScienceCloud ’11, ACM, New
York, NY, USA (2011)
• Ruiz-Alvarez, A., Humphrey, M.: A model and decision procedure for data storage in cloud computing. In:
Proceedings of the IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing. CCGrid ’12
(2012)
• Storage Networking Industry Association (SNIA): Cloud data management interface (CDMI). http://
www.snia.org/cdmi (September 2011)
• SpringSource: Spring data projects. http://www.springsource.org/spring-data (March 2012)
• SpringSource: Spring roo. http://www.springsource.org/spring-roo (March 2012)