SlideShare a Scribd company logo
GigaSpaces Data Caching / Data Grid overview August 2009
Scaling Up Your Database by Adding a Data Grid
Scaling Up Your Database by Adding a Data Grid To scale up your database, use the IMDG directly On the backend, the IMDG persists the data to your database using your  existing  Hibernate O/R mapping.  Hibernate used by the IMDG Application using Native IMDG API - object/SQL API very similar to Hibernate Gain full power of the IMDG Good for write and read scenarios
Benefits of using GigaSpaces as the system of record Decreasing database load  through partitioning and data distribution - enables higher data volumes and higher throughput with low latency  Better decoupling  between your application and the database - no need to hard-wire Hibernate and database concepts into your code and runtime environment  Event-driven  model enables notifications when data is modified  Database access can be synchronous or asynchronous  - the  GigaSpaces Mirror Service  allows data to be persisted to the database asynchronously,  without a performance penalty
IMDG Access support  Main Features: Direct persistency (Write/Read Through) Asynchronous Reliable persistency (Write Behind) Fast Data load once IMDG started Lazy load in case of a cache miss Delegating IMDG SQL Queries to database Advanced Hibernate and nHibernate integration Java , C++ and .Net objects persistency Custom persistency support
Step 2: Access data via IMDG SQL Queries Supported Options and Queries Opeations: =, <>, <,>, >=, <=, [NOT] like, is [NOT] null, IN. GROUP BY – performs DISTINCT on the POJO properties Order By (ASC | DESC) SQLQuery rquery = new SQLQuery(MyPojo.class,&quot;firstName rlike '(a|c).*' or ago > 0 and lastName rlike '(d|k).*'&quot;); Object[] result = space.readMultiple(rquery); Dynamic Query Support SQLQuery query = new SQLQuery(MyClass.class,“firstName = ? or lastName = ? and ago>?&quot;); query.setParameters(“david”,”lee”,50); Supported Options via JDBC API COUNT, MAX, MIN, SUM, AVG , DISTINCT , Blob and Clob , rownum , sysdate , Table aliases  Join with 2 tables Non Supported HAVING, VIEW, TRIGGERS, EXISTS, BETWEEN, NOT, CREATE USER, GRANT, REVOKE, SET PASSWORD, CONNECT USER, ON. NOT NULL, IDENTITY, UNIQUE, PRIMARY KEY, Foreign Key/REFERENCES, NO ACTION, CASCADE, SET NULL, SET DEFAULT, CHECK. Union, Minus, Union All. STDEV, STDEVP, VAR, VARP, FIRST, LAST. # LEFT , RIGHT [INNER] or [OUTER] JOIN
GigaSpaces In-Memory-Data-Grid
The IMDG – Runtime Modes – Embedded An IMDG (space) instance that runs within the application memory address space Accessed by reference without going through network or serialization calls Most efficient configuration mode  Used as the primary space configuration setup C++
The IMDG – Runtime Modes – Remote Accessing a remote space involves network calls and serialization/de-serialization of the cached objects between the client and the space process Used only in cases where:  Client application cannot run an embedded space (due to memory capacity limitations, etc.)  In cases where there are a large number of concurrent updates on the same cached object using different remote processes C++ C++
The IMDG – Runtime Modes – Master-Local Cache A local ‘cache’  Embedded with a client Set of cached objects is a snapshot No additional objects get added to the local space unless new queries are made Writes should be made on the master only Use when Many distributed clients  Accessing the same space Read-mostly
The IMDG – Runtime Modes – Master-Local View A local 'View' Embedded with a client Contains updated and changing results based on a client specified query Used when Clients want to get a streaming view of a subset of the 'main' space Writes can be made to the view Contains a proxy to the master
The IMDG – Runtime Modes – Persistent Stores data both into memory and on disk in a relational database Can use custom Mapping or built in Hibernate/nHibernate plug-in
Asynchronous Reliable Persistency - Write Behind The most common architecture Database is out of the critical path of the transaction IMDG operations and data are delegated to the database in a reliable, consistent manner Support read and write scenarios Hibernate Hibernate Feeder
The Initial Load – Fast Data load from the Database
IMDG Deployment Topologies
IMDG Basic Deployment Topologies Primary-Backup Partitioned Feeder Feeder Partitioned + Backup Feeder
IMDG Operations
IMDG Basic Operations
Move into SBA
What is Space Based Architecture (SBA) Space-Based Architecture  ( SBA ) is a  software architecture pattern  for achieving  linear  scalability  of stateful, high-performance applications, based on Yale’s Tuple-Space Model  (Source Wikipedia) What is a Processing Unit : Bundle of services, data, messaging Collocation into single VM Unified Messaging & Data  In-Memory  Cloud of Processing Units   Scale through Partitioning Virtualized middleware What is a Space : Elegant – 4 API Solves: Data sharing Messaging Workflow Parallel processing
Move into SBA Deploy Application components as Processing Units Form a composite SOA Application Distributed Data Processing  Use GigaSpaces event driven and data processing components to process incoming data in real time Collocate business logic and Data Scale these as one entity to allow true linear scalability
Space Based Architecture – Business logic and data collocated Pushing data into the backend system In-Memory-Data-Grid and collocated Processing units Collects results / reporting Service Primary 1 Primary 2 Primary 3 Backup 3 Backup 2 Backup 1 Replication Replication Replication
Map-Reduce Approach to perform Parallel Query How The GigaSpaces Task Executors works? Phase 1 - Sending the Task to be executed:
Map-Reduce Approach to perform Parallel Query How Task Executors works?   Phase 2 - Getting the results back to be reduced. The Task itself will query the IMDG instance and perform whatever calculations needed.
Distributed Task  Example public class MyDistTask implements DistributedTask<Integer, Long> { public Integer execute() throws Exception {  return 1;  } public Long reduce(List<AsyncResult<Integer>> results) throws Exception { long sum = 0; for (AsyncResult<Integer> result : results) { if (result.getException() != null) { throw result.getException(); } sum += result.getResult(); } return sum; } } AsyncFuture<Long> future = gigaSpace.execute(new MyDistTask()); long result = future.get(); // result will be the number of primary spaces The Task  Reducer Implementation– Run at the client Side The Task execution – Called from the Client side The Task  execute   Implementation – Run at the Space Side
SBA Fundamental
Space Based Architecture Fundamentals The CAP properties: Strong  C onsistency:  all clients see the same view, even in the presence of updates High  A vailability:  all clients can find some replica of the data, even in the presence of failures P artition-tolerance:  the system properties hold even when the system is partitioned  Partitioned Data Grid Partitioned Data Grid with Backup Feeder Integrating with existing database
Using SBA to Virtualize the Middleware = GigaSpaces XAP Steps to virtualize the middleware: Decouple the application from the deployment environment  Use partitioning to split the load and the data Move manual process to SLA driven deployment Inject dynamic scaling and self healing The result:  a scale-out application server  providing: End-end scale-out middleware for Web data, messaging and business logic In memory clustering Unique database scalability Automatic self healing Enterprise-grade and OEM-ready: Supports open-source and standard development frameworks Supports Java, .NET, C++ and scripting languages
The Service Grid Continuous Application Availability To Achieve 99999’s  Automatically provision additional resources after failures Maintain optimal application performance Dynamically scale (or shrink) system resources based upon business demand Dramatic reduction in enterprise server utilization rates Dynamic provisioning eliminates the need to design for peak loads Significant reduction in IT Operations and system management costs An automated, SLA-based application provisioning & management engine
Typical Web Application Architecture Dynamic LB Configuration  Managed Jetty Web Containers,  Http Session on top of the Space  Business Logic and Data on top of the Data Grid  Interact with BL and Data via Space API, events, remoting or task executors Partitioning and collocation for best performance and scalability  Async. Persistency  Proactive Administration

More Related Content

Giga Spaces Data Grid / Data Caching Overview

  • 1. GigaSpaces Data Caching / Data Grid overview August 2009
  • 2. Scaling Up Your Database by Adding a Data Grid
  • 3. Scaling Up Your Database by Adding a Data Grid To scale up your database, use the IMDG directly On the backend, the IMDG persists the data to your database using your existing Hibernate O/R mapping. Hibernate used by the IMDG Application using Native IMDG API - object/SQL API very similar to Hibernate Gain full power of the IMDG Good for write and read scenarios
  • 4. Benefits of using GigaSpaces as the system of record Decreasing database load through partitioning and data distribution - enables higher data volumes and higher throughput with low latency Better decoupling between your application and the database - no need to hard-wire Hibernate and database concepts into your code and runtime environment Event-driven model enables notifications when data is modified Database access can be synchronous or asynchronous - the GigaSpaces Mirror Service allows data to be persisted to the database asynchronously, without a performance penalty
  • 5. IMDG Access support Main Features: Direct persistency (Write/Read Through) Asynchronous Reliable persistency (Write Behind) Fast Data load once IMDG started Lazy load in case of a cache miss Delegating IMDG SQL Queries to database Advanced Hibernate and nHibernate integration Java , C++ and .Net objects persistency Custom persistency support
  • 6. Step 2: Access data via IMDG SQL Queries Supported Options and Queries Opeations: =, <>, <,>, >=, <=, [NOT] like, is [NOT] null, IN. GROUP BY – performs DISTINCT on the POJO properties Order By (ASC | DESC) SQLQuery rquery = new SQLQuery(MyPojo.class,&quot;firstName rlike '(a|c).*' or ago > 0 and lastName rlike '(d|k).*'&quot;); Object[] result = space.readMultiple(rquery); Dynamic Query Support SQLQuery query = new SQLQuery(MyClass.class,“firstName = ? or lastName = ? and ago>?&quot;); query.setParameters(“david”,”lee”,50); Supported Options via JDBC API COUNT, MAX, MIN, SUM, AVG , DISTINCT , Blob and Clob , rownum , sysdate , Table aliases Join with 2 tables Non Supported HAVING, VIEW, TRIGGERS, EXISTS, BETWEEN, NOT, CREATE USER, GRANT, REVOKE, SET PASSWORD, CONNECT USER, ON. NOT NULL, IDENTITY, UNIQUE, PRIMARY KEY, Foreign Key/REFERENCES, NO ACTION, CASCADE, SET NULL, SET DEFAULT, CHECK. Union, Minus, Union All. STDEV, STDEVP, VAR, VARP, FIRST, LAST. # LEFT , RIGHT [INNER] or [OUTER] JOIN
  • 8. The IMDG – Runtime Modes – Embedded An IMDG (space) instance that runs within the application memory address space Accessed by reference without going through network or serialization calls Most efficient configuration mode Used as the primary space configuration setup C++
  • 9. The IMDG – Runtime Modes – Remote Accessing a remote space involves network calls and serialization/de-serialization of the cached objects between the client and the space process Used only in cases where: Client application cannot run an embedded space (due to memory capacity limitations, etc.) In cases where there are a large number of concurrent updates on the same cached object using different remote processes C++ C++
  • 10. The IMDG – Runtime Modes – Master-Local Cache A local ‘cache’ Embedded with a client Set of cached objects is a snapshot No additional objects get added to the local space unless new queries are made Writes should be made on the master only Use when Many distributed clients Accessing the same space Read-mostly
  • 11. The IMDG – Runtime Modes – Master-Local View A local 'View' Embedded with a client Contains updated and changing results based on a client specified query Used when Clients want to get a streaming view of a subset of the 'main' space Writes can be made to the view Contains a proxy to the master
  • 12. The IMDG – Runtime Modes – Persistent Stores data both into memory and on disk in a relational database Can use custom Mapping or built in Hibernate/nHibernate plug-in
  • 13. Asynchronous Reliable Persistency - Write Behind The most common architecture Database is out of the critical path of the transaction IMDG operations and data are delegated to the database in a reliable, consistent manner Support read and write scenarios Hibernate Hibernate Feeder
  • 14. The Initial Load – Fast Data load from the Database
  • 16. IMDG Basic Deployment Topologies Primary-Backup Partitioned Feeder Feeder Partitioned + Backup Feeder
  • 20. What is Space Based Architecture (SBA) Space-Based Architecture ( SBA ) is a software architecture pattern for achieving linear scalability of stateful, high-performance applications, based on Yale’s Tuple-Space Model (Source Wikipedia) What is a Processing Unit : Bundle of services, data, messaging Collocation into single VM Unified Messaging & Data In-Memory Cloud of Processing Units Scale through Partitioning Virtualized middleware What is a Space : Elegant – 4 API Solves: Data sharing Messaging Workflow Parallel processing
  • 21. Move into SBA Deploy Application components as Processing Units Form a composite SOA Application Distributed Data Processing Use GigaSpaces event driven and data processing components to process incoming data in real time Collocate business logic and Data Scale these as one entity to allow true linear scalability
  • 22. Space Based Architecture – Business logic and data collocated Pushing data into the backend system In-Memory-Data-Grid and collocated Processing units Collects results / reporting Service Primary 1 Primary 2 Primary 3 Backup 3 Backup 2 Backup 1 Replication Replication Replication
  • 23. Map-Reduce Approach to perform Parallel Query How The GigaSpaces Task Executors works? Phase 1 - Sending the Task to be executed:
  • 24. Map-Reduce Approach to perform Parallel Query How Task Executors works? Phase 2 - Getting the results back to be reduced. The Task itself will query the IMDG instance and perform whatever calculations needed.
  • 25. Distributed Task Example public class MyDistTask implements DistributedTask<Integer, Long> { public Integer execute() throws Exception { return 1; } public Long reduce(List<AsyncResult<Integer>> results) throws Exception { long sum = 0; for (AsyncResult<Integer> result : results) { if (result.getException() != null) { throw result.getException(); } sum += result.getResult(); } return sum; } } AsyncFuture<Long> future = gigaSpace.execute(new MyDistTask()); long result = future.get(); // result will be the number of primary spaces The Task Reducer Implementation– Run at the client Side The Task execution – Called from the Client side The Task execute Implementation – Run at the Space Side
  • 27. Space Based Architecture Fundamentals The CAP properties: Strong C onsistency: all clients see the same view, even in the presence of updates High A vailability: all clients can find some replica of the data, even in the presence of failures P artition-tolerance: the system properties hold even when the system is partitioned Partitioned Data Grid Partitioned Data Grid with Backup Feeder Integrating with existing database
  • 28. Using SBA to Virtualize the Middleware = GigaSpaces XAP Steps to virtualize the middleware: Decouple the application from the deployment environment Use partitioning to split the load and the data Move manual process to SLA driven deployment Inject dynamic scaling and self healing The result: a scale-out application server providing: End-end scale-out middleware for Web data, messaging and business logic In memory clustering Unique database scalability Automatic self healing Enterprise-grade and OEM-ready: Supports open-source and standard development frameworks Supports Java, .NET, C++ and scripting languages
  • 29. The Service Grid Continuous Application Availability To Achieve 99999’s Automatically provision additional resources after failures Maintain optimal application performance Dynamically scale (or shrink) system resources based upon business demand Dramatic reduction in enterprise server utilization rates Dynamic provisioning eliminates the need to design for peak loads Significant reduction in IT Operations and system management costs An automated, SLA-based application provisioning & management engine
  • 30. Typical Web Application Architecture Dynamic LB Configuration Managed Jetty Web Containers, Http Session on top of the Space Business Logic and Data on top of the Data Grid Interact with BL and Data via Space API, events, remoting or task executors Partitioning and collocation for best performance and scalability Async. Persistency Proactive Administration

Editor's Notes

  1. XAP Features Scale-out application server End 2 End scale-out middleware for:   Web, Data, Messaging, Business logic Space Based Architecture – designed for scaling stateful applications In Memory Data Grid O/R mapping support Support major Enterprise languages Java, .Net C++