In-Memory Computing Essentials for Architects and Engineers

© 2017 GridGain Systems, Inc.
In-Memory Performance
Durability of Disk

In-Memory Computing Essentials
for Java Developers
Denis Magda
Ignite PMC Chair
GridGain Director of Product Management

• Apache Ignite Overview
• Clustering and Deployment
• Distributed Storage
• Distributed SQL
• Distributed Computations
• Machine Learning
• Memory Architecture & Persistence
Agenda

Apache Ignite In-Memory Computing Platform
Memory-Centric Storage
Ignite Native Persistence
(Flash, SSD, Intel 3D XPoint)
Third-Party Persistence
(RDBMS, HDFS, NoSQL)
SQL Transactions Compute Services MLStreamingKey/Value
IoTFinancial
Services
Pharma &
Healthcare
E-CommerceTravel &
Logistics
Telco

Clustering and Deployment

Clustering
• Server Nodes
• Act as containers for data and computations
• Generally started as standalone processes
• Client Nodes
• Provide a cluster entry point to run operations
• Embedded in applications code

Deployment
• Nodes are logical entities
• Runs in a JVM process
• Many nodes in a single JVM process
• On-Premise and Cloud
• Physical server or VM
• AWS, Azure, Google Compute Engine
• Kubernetes, Mesos, YARN

Distributed Storage

Distributed Storage
JCache Transactions Compute SQL
RDBMS
NoSQL
HDFS
Server Node
Distributed Key-Value Store
Dynamic
Scaling
Distributed
partitioned
hash map
ACID TransactionJCache & SQL
Server Node Server Node
3rd party storage caching
DURABLE MEMORY DURABLE MEMORY DURABLE MEMORY

Where Entry Goes?
Ignite Node 1 Ignite Node 2
put (key, value)
? ?

Key to Node Mapping
Key Partition
Server Node
ON-DISK

Caches and Partitions
K1, V1
K2, V2
K3, V3
K4, V4
Partition 1
K5, V5
K6, V6
K7,V7
K8, V8 K9, V9
Partition 2
Cache

Partitions Distribution
0 2 4 6 8
10 12 14
1 3 5 7 9
11 13 15

Where Entry Goes?
put (key, value)
0 2 4 1 3 5
? ?

Where Entry Goes?
put (key, value)
0 2 4 1 3 5

Backup Copies
Ignite Node Ignite Node
0 1
2 3

Backup Copies
0 1
2 3
0
1
2
3

Distributed SQL

Distributed SQL
JDBC ODBC SQL API
Java .NET C++ BI
SELECT, UPDATE,
INSERT, MERGE,
DELETE, CREATE
and ALTER
DDL, DML Support
Cross-platform
Compatibility
Indexes in
RAM or Disk
Dynamic
Scaling
Server Node Server NodeServer Node
Apache Ignite Cluster
Tools

Connectivity
• JDBC
• ODBC
• REST
• Java, .NET and C++ APIs
// Register JDBC driver.
Class.forName("org.apache.ignite.IgniteJdbcThinDriver");
// Open the JDBC connection.
Connection conn = DriverManager.getConnection("jdbc:ignite:thin://192.168.0.50");
./sqlline.sh --color=true --verbose=true -u jdbc:ignite:thin://127.0.0.1/

Data Definition Language
• CREATE/DROP TABLE
• CREATE/DROP INDEX
• ALTER TABLE
• Changes Durability
• Ignite Native Persistence
CREATE TABLE `city` (
`ID` INT(11),
`Name` CHAR(35),
`CountryCode` CHAR(3),
`District` CHAR(20),
`Population` INT(11),
PRIMARY KEY (`ID`, `CountryCode`)
) WITH "template=partitioned, backups=1, affinityKey=CountryCode";

Data Manipulation Language
• ANSI-99 specification
• Fault-tolerant and consistent
• INSERT, UPDATE, DELETE
• SELECT
• JOINs
• Subqueries
SELECT country.name, city.name, MAX(city.population) as max_pop
FROM country JOIN city ON city.countrycode = country.code
WHERE country.code IN ('USA','RUS','CHN')
GROUP BY country.name, city.name ORDER BY max_pop DESC LIMIT 3;

Affinity Collocation
Country
Languag
e
City
Server Node
ON-DISK
Server Node
ON-DISK
key (country = 5) 10
Partition
key (cityId = 10, countryId = 5)
10
Partition
key (cityId = 11, countryId = 9) 12
Partition

Collocated Joins
1. Initial Query
2. Query execution over local data
3. Reduce multiple results in one
Ignite Node
Canada
Toronto
Ottawa
Montreal
Calgary
Ignite Node
India
Mumbai
New Delhi
1 SELECT ct.name, c.name
FROM Country as ct
JOIN City as c ON ct.id = c.countryId
WHERE ct.name = “Canada”;
2
23

Non-Collocated Joins
1. Initial Query
2. Query execution (local + remote data)
3. Potential data movement
4. Reduce multiple results in one
Ignite Node
Canad
a
Toronto
Calgary
1 SELECT ct.name, c.name
FROM Country as ct
JOIN City as c ON ct.id = c.countryId
WHERE ct.name = “Canada”;
2
24 Ignite Node
India
Montreal
Ottawa
3
Montreal
Ottawa
Mumbai
New Delhi

Distributed Computations

Compute Grid
DURABLE MEMORY
DURABLE MEMORY
Ignite Cluster
C1
R1
C2
R2
C = C1 + C2
R = R1 + R2
C = Compute
R = Result
in T/2 time
Automatic Failover
Load Balancing
Zero Deployment

1. Initial Request
2. Fetch data from remote
nodes
3. Process entire data-set
3
1
Data 1
2
2 Data 2
Client-Server Processing Co-located Processing
Server Node
ON-DISK
Server Node
ON-DISK
1. Initial Request
2. Co-located processing with
data
3. Reduce multiple results in
one
2
2
1Client Node
Server Node
ON-DISK
Server Node
ON-DISK
Client Node
3

Machine Learning

Genetic Algorithm Grid
DURABLE MEMORY
DURABLE MEMORY
Ignite Cluster
F2, C2, M2
F = F1 + F2
C = C1 + C2
Collocated
Computation
Biological Evolution
Simulation
Chromosome and Genes Cluster
M = M1 + M2
F1, C1, M1
F = Fitness Calculation
C = Crossover
M = Mutation

Machine Learning Grid
K-Means Regressions Decision Trees
R C++ Python Java
Server Node Server NodeServer Node
Distributed Core Algebra
Scala REST
Random Forest
Distributed Algorithms
Dense and Sparse
Algebra
Large Scale
Parallelization
Multi-Language
Support
Dense and Sparse
Algebra
No ETL

Memory Architecture & Persistence

Durable Memory
Off-heap Removes
noticeable GC
pauses
Automatic
Defragmentation
Stores
Superset of
Data
Predictable memory
consumption
Fully Transactional
(Write-Ahead Log)
Server Node Server Node Server Node
Ignite Cluster
Instantaneous
Restarts

Regions and Segments
• Memory split into regions
• Regions split into segments
• Segments include pages

B+Tree
• Self-balancing tree
• Memory & Disk
• Sorted Index
• Secondary Indexes
• Hash Index
• Primary Keys
• Hash code based sorting

Free Lists
• Tracks pages of ~ equal free space
• 25% free
• 75% free
• Essential for updates
• Gives page with min size needed
• Reduces fragmentation
• Lowers pages compaction activity

Ignite Native Persistence
1. Update
RAM
2. Persist
Write-Ahead Log
Partition File 1
3. Ack
4. Checkpointing
Partition File N
Server Node

Any Questions?
Thank you for joining us. Follow the conversation.
http://ignite.apache.org
#apacheignite
#denismagda

In-Memory Computing Essentials for Architects and Engineers

More Related Content

In-Memory Computing Essentials for Architects and Engineers

Editor's Notes