SlideShare a Scribd company logo
© 2017 GridGain Systems, Inc.
In-Memory Performance
Durability of Disk
© 2017 GridGain Systems, Inc.
In-Memory Computing Essentials
for Java Developers
Denis Magda
Ignite PMC Chair
GridGain Director of Product Management
© 2017 GridGain Systems, Inc.
• Apache Ignite Overview
• Clustering and Deployment
• Distributed Storage
• Distributed SQL
• Distributed Computations
• Machine Learning
• Memory Architecture & Persistence
Agenda
© 2017 GridGain Systems, Inc.
Apache Ignite In-Memory Computing Platform
Memory-Centric Storage
Ignite Native Persistence
(Flash, SSD, Intel 3D XPoint)
Third-Party Persistence
(RDBMS, HDFS, NoSQL)
SQL Transactions Compute Services MLStreamingKey/Value
IoTFinancial
Services
Pharma &
Healthcare
E-CommerceTravel &
Logistics
Telco
© 2017 GridGain Systems, Inc.
Clustering and Deployment
© 2017 GridGain Systems, Inc.
Clustering
• Server Nodes
• Act as containers for data and computations
• Generally started as standalone processes
• Client Nodes
• Provide a cluster entry point to run operations
• Embedded in applications code
© 2017 GridGain Systems, Inc.
Deployment
• Nodes are logical entities
• Runs in a JVM process
• Many nodes in a single JVM process
• On-Premise and Cloud
• Physical server or VM
• AWS, Azure, Google Compute Engine
• Kubernetes, Mesos, YARN
© 2017 GridGain Systems, Inc.
Distributed Storage
© 2017 GridGain Systems, Inc.
Distributed Storage
JCache Transactions Compute SQL
RDBMS
NoSQL
HDFS
Server Node
Distributed Key-Value Store
Dynamic
Scaling
Distributed
partitioned
hash map
ACID TransactionJCache & SQL
Server Node Server Node
3rd party storage caching
DURABLE MEMORY DURABLE MEMORY DURABLE MEMORY
© 2017 GridGain Systems, Inc.
Where Entry Goes?
Ignite Node 1 Ignite Node 2
put (key, value)
? ?
© 2017 GridGain Systems, Inc.
Key to Node Mapping
Key Partition
Server Node
ON-DISK
© 2017 GridGain Systems, Inc.
Caches and Partitions
K1, V1
K2, V2
K3, V3
K4, V4
Partition 1
K5, V5
K6, V6
K7,V7
K8, V8 K9, V9
Partition 2
Cache
© 2017 GridGain Systems, Inc.
Partitions Distribution
Ignite Node 1 Ignite Node 2
0 2 4 6 8
10 12 14
1 3 5 7 9
11 13 15
© 2017 GridGain Systems, Inc.
Where Entry Goes?
Ignite Node 1 Ignite Node 2
put (key, value)
0 2 4 1 3 5
? ?
© 2017 GridGain Systems, Inc.
Where Entry Goes?
Ignite Node 1 Ignite Node 2
put (key, value)
0 2 4 1 3 5
© 2017 GridGain Systems, Inc.
Backup Copies
Ignite Node Ignite Node
Ignite Node Ignite Node
0 1
2 3
© 2017 GridGain Systems, Inc.
Backup Copies
Ignite Node Ignite Node
Ignite Node Ignite Node
0 1
2 3
0
1
2
3
© 2017 GridGain Systems, Inc.
Distributed SQL
© 2017 GridGain Systems, Inc.
Distributed SQL
JDBC ODBC SQL API
Java .NET C++ BI
SELECT, UPDATE,
INSERT, MERGE,
DELETE, CREATE
and ALTER
DDL, DML Support
Cross-platform
Compatibility
Indexes in
RAM or Disk
Dynamic
Scaling
Server Node Server NodeServer Node
Apache Ignite Cluster
DURABLE MEMORY DURABLE MEMORY DURABLE MEMORY
Tools
© 2017 GridGain Systems, Inc.
Connectivity
• JDBC
• ODBC
• REST
• Java, .NET and C++ APIs
// Register JDBC driver.
Class.forName("org.apache.ignite.IgniteJdbcThinDriver");
// Open the JDBC connection.
Connection conn = DriverManager.getConnection("jdbc:ignite:thin://192.168.0.50");
./sqlline.sh --color=true --verbose=true -u jdbc:ignite:thin://127.0.0.1/
© 2017 GridGain Systems, Inc.
Data Definition Language
• CREATE/DROP TABLE
• CREATE/DROP INDEX
• ALTER TABLE
• Changes Durability
• Ignite Native Persistence
CREATE TABLE `city` (
`ID` INT(11),
`Name` CHAR(35),
`CountryCode` CHAR(3),
`District` CHAR(20),
`Population` INT(11),
PRIMARY KEY (`ID`, `CountryCode`)
) WITH "template=partitioned, backups=1, affinityKey=CountryCode";
© 2017 GridGain Systems, Inc.
Data Manipulation Language
• ANSI-99 specification
• Fault-tolerant and consistent
• INSERT, UPDATE, DELETE
• SELECT
• JOINs
• Subqueries
SELECT country.name, city.name, MAX(city.population) as max_pop
FROM country JOIN city ON city.countrycode = country.code
WHERE country.code IN ('USA','RUS','CHN')
GROUP BY country.name, city.name ORDER BY max_pop DESC LIMIT 3;
© 2017 GridGain Systems, Inc.
Affinity Collocation
Country
Languag
e
City
Server Node
ON-DISK
Server Node
ON-DISK
key (country = 5) 10
Partition
key (cityId = 10, countryId = 5)
10
Partition
key (cityId = 11, countryId = 9) 12
Partition
© 2017 GridGain Systems, Inc.
Collocated Joins
1. Initial Query
2. Query execution over local data
3. Reduce multiple results in one
Ignite Node
Canada
Toronto
Ottawa
Montreal
Calgary
Ignite Node
India
Mumbai
New Delhi
1 SELECT ct.name, c.name
FROM Country as ct
JOIN City as c ON ct.id = c.countryId
WHERE ct.name = “Canada”;
2
23
© 2017 GridGain Systems, Inc.
Non-Collocated Joins
1. Initial Query
2. Query execution (local + remote data)
3. Potential data movement
4. Reduce multiple results in one
Ignite Node
Canad
a
Toronto
Calgary
1 SELECT ct.name, c.name
FROM Country as ct
JOIN City as c ON ct.id = c.countryId
WHERE ct.name = “Canada”;
2
24 Ignite Node
India
Montreal
Ottawa
3
Montreal
Ottawa
Mumbai
New Delhi
© 2017 GridGain Systems, Inc.
Distributed Computations
© 2017 GridGain Systems, Inc.
Compute Grid
DURABLE MEMORY
DURABLE MEMORY
Ignite Cluster
C1
R1
C2
R2
C = C1 + C2
R = R1 + R2
C = Compute
R = Result
in T/2 time
Automatic Failover
Load Balancing
Zero Deployment
© 2017 GridGain Systems, Inc.
1. Initial Request
2. Fetch data from remote
nodes
3. Process entire data-set
3
1
Data 1
2
2 Data 2
Client-Server Processing Co-located Processing
Server Node
ON-DISK
Server Node
ON-DISK
1. Initial Request
2. Co-located processing with
data
3. Reduce multiple results in
one
2
2
1Client Node
Server Node
ON-DISK
Server Node
ON-DISK
Client Node
3
© 2017 GridGain Systems, Inc.
Machine Learning
© 2017 GridGain Systems, Inc.
Genetic Algorithm Grid
DURABLE MEMORY
DURABLE MEMORY
Ignite Cluster
F2, C2, M2
F = F1 + F2
C = C1 + C2
Collocated
Computation
Biological Evolution
Simulation
Chromosome and Genes Cluster
M = M1 + M2
F1, C1, M1
F = Fitness Calculation
C = Crossover
M = Mutation
© 2017 GridGain Systems, Inc.
Machine Learning Grid
K-Means Regressions Decision Trees
R C++ Python Java
Server Node Server NodeServer Node
Distributed Core Algebra
DURABLE MEMORY DURABLE MEMORY DURABLE MEMORY
Scala REST
Random Forest
Distributed Algorithms
Dense and Sparse
Algebra
Large Scale
Parallelization
Multi-Language
Support
Dense and Sparse
Algebra
No ETL
© 2017 GridGain Systems, Inc.
Memory Architecture & Persistence
© 2017 GridGain Systems, Inc.
Durable Memory
Off-heap Removes
noticeable GC
pauses
Automatic
Defragmentation
Stores
Superset of
Data
Predictable memory
consumption
Fully Transactional
(Write-Ahead Log)
DURABLE MEMORY DURABLE MEMORY DURABLE MEMORY
Server Node Server Node Server Node
Ignite Cluster
Instantaneous
Restarts
© 2017 GridGain Systems, Inc.
© 2017 GridGain Systems, Inc.
Regions and Segments
• Memory split into regions
• Regions split into segments
• Segments include pages
© 2017 GridGain Systems, Inc.
B+Tree
• Self-balancing tree
• Memory & Disk
• Sorted Index
• Secondary Indexes
• Hash Index
• Primary Keys
• Hash code based sorting
© 2017 GridGain Systems, Inc.
Free Lists
• Tracks pages of ~ equal free space
• 25% free
• 75% free
• Essential for updates
• Gives page with min size needed
• Reduces fragmentation
• Lowers pages compaction activity
© 2017 GridGain Systems, Inc.
Ignite Native Persistence
1. Update
RAM
2. Persist
Write-Ahead Log
Partition File 1
3. Ack
4. Checkpointing
Partition File N
Server Node
© 2017 GridGain Systems, Inc.
Any Questions?
Thank you for joining us. Follow the conversation.
http://ignite.apache.org
#apacheignite
#denismagda

More Related Content

In-Memory Computing Essentials for Architects and Engineers

  • 1. © 2017 GridGain Systems, Inc. In-Memory Performance Durability of Disk
  • 2. © 2017 GridGain Systems, Inc. In-Memory Computing Essentials for Java Developers Denis Magda Ignite PMC Chair GridGain Director of Product Management
  • 3. © 2017 GridGain Systems, Inc. • Apache Ignite Overview • Clustering and Deployment • Distributed Storage • Distributed SQL • Distributed Computations • Machine Learning • Memory Architecture & Persistence Agenda
  • 4. © 2017 GridGain Systems, Inc. Apache Ignite In-Memory Computing Platform Memory-Centric Storage Ignite Native Persistence (Flash, SSD, Intel 3D XPoint) Third-Party Persistence (RDBMS, HDFS, NoSQL) SQL Transactions Compute Services MLStreamingKey/Value IoTFinancial Services Pharma & Healthcare E-CommerceTravel & Logistics Telco
  • 5. © 2017 GridGain Systems, Inc. Clustering and Deployment
  • 6. © 2017 GridGain Systems, Inc. Clustering • Server Nodes • Act as containers for data and computations • Generally started as standalone processes • Client Nodes • Provide a cluster entry point to run operations • Embedded in applications code
  • 7. © 2017 GridGain Systems, Inc. Deployment • Nodes are logical entities • Runs in a JVM process • Many nodes in a single JVM process • On-Premise and Cloud • Physical server or VM • AWS, Azure, Google Compute Engine • Kubernetes, Mesos, YARN
  • 8. © 2017 GridGain Systems, Inc. Distributed Storage
  • 9. © 2017 GridGain Systems, Inc. Distributed Storage JCache Transactions Compute SQL RDBMS NoSQL HDFS Server Node Distributed Key-Value Store Dynamic Scaling Distributed partitioned hash map ACID TransactionJCache & SQL Server Node Server Node 3rd party storage caching DURABLE MEMORY DURABLE MEMORY DURABLE MEMORY
  • 10. © 2017 GridGain Systems, Inc. Where Entry Goes? Ignite Node 1 Ignite Node 2 put (key, value) ? ?
  • 11. © 2017 GridGain Systems, Inc. Key to Node Mapping Key Partition Server Node ON-DISK
  • 12. © 2017 GridGain Systems, Inc. Caches and Partitions K1, V1 K2, V2 K3, V3 K4, V4 Partition 1 K5, V5 K6, V6 K7,V7 K8, V8 K9, V9 Partition 2 Cache
  • 13. © 2017 GridGain Systems, Inc. Partitions Distribution Ignite Node 1 Ignite Node 2 0 2 4 6 8 10 12 14 1 3 5 7 9 11 13 15
  • 14. © 2017 GridGain Systems, Inc. Where Entry Goes? Ignite Node 1 Ignite Node 2 put (key, value) 0 2 4 1 3 5 ? ?
  • 15. © 2017 GridGain Systems, Inc. Where Entry Goes? Ignite Node 1 Ignite Node 2 put (key, value) 0 2 4 1 3 5
  • 16. © 2017 GridGain Systems, Inc. Backup Copies Ignite Node Ignite Node Ignite Node Ignite Node 0 1 2 3
  • 17. © 2017 GridGain Systems, Inc. Backup Copies Ignite Node Ignite Node Ignite Node Ignite Node 0 1 2 3 0 1 2 3
  • 18. © 2017 GridGain Systems, Inc. Distributed SQL
  • 19. © 2017 GridGain Systems, Inc. Distributed SQL JDBC ODBC SQL API Java .NET C++ BI SELECT, UPDATE, INSERT, MERGE, DELETE, CREATE and ALTER DDL, DML Support Cross-platform Compatibility Indexes in RAM or Disk Dynamic Scaling Server Node Server NodeServer Node Apache Ignite Cluster DURABLE MEMORY DURABLE MEMORY DURABLE MEMORY Tools
  • 20. © 2017 GridGain Systems, Inc. Connectivity • JDBC • ODBC • REST • Java, .NET and C++ APIs // Register JDBC driver. Class.forName("org.apache.ignite.IgniteJdbcThinDriver"); // Open the JDBC connection. Connection conn = DriverManager.getConnection("jdbc:ignite:thin://192.168.0.50"); ./sqlline.sh --color=true --verbose=true -u jdbc:ignite:thin://127.0.0.1/
  • 21. © 2017 GridGain Systems, Inc. Data Definition Language • CREATE/DROP TABLE • CREATE/DROP INDEX • ALTER TABLE • Changes Durability • Ignite Native Persistence CREATE TABLE `city` ( `ID` INT(11), `Name` CHAR(35), `CountryCode` CHAR(3), `District` CHAR(20), `Population` INT(11), PRIMARY KEY (`ID`, `CountryCode`) ) WITH "template=partitioned, backups=1, affinityKey=CountryCode";
  • 22. © 2017 GridGain Systems, Inc. Data Manipulation Language • ANSI-99 specification • Fault-tolerant and consistent • INSERT, UPDATE, DELETE • SELECT • JOINs • Subqueries SELECT country.name, city.name, MAX(city.population) as max_pop FROM country JOIN city ON city.countrycode = country.code WHERE country.code IN ('USA','RUS','CHN') GROUP BY country.name, city.name ORDER BY max_pop DESC LIMIT 3;
  • 23. © 2017 GridGain Systems, Inc. Affinity Collocation Country Languag e City Server Node ON-DISK Server Node ON-DISK key (country = 5) 10 Partition key (cityId = 10, countryId = 5) 10 Partition key (cityId = 11, countryId = 9) 12 Partition
  • 24. © 2017 GridGain Systems, Inc. Collocated Joins 1. Initial Query 2. Query execution over local data 3. Reduce multiple results in one Ignite Node Canada Toronto Ottawa Montreal Calgary Ignite Node India Mumbai New Delhi 1 SELECT ct.name, c.name FROM Country as ct JOIN City as c ON ct.id = c.countryId WHERE ct.name = “Canada”; 2 23
  • 25. © 2017 GridGain Systems, Inc. Non-Collocated Joins 1. Initial Query 2. Query execution (local + remote data) 3. Potential data movement 4. Reduce multiple results in one Ignite Node Canad a Toronto Calgary 1 SELECT ct.name, c.name FROM Country as ct JOIN City as c ON ct.id = c.countryId WHERE ct.name = “Canada”; 2 24 Ignite Node India Montreal Ottawa 3 Montreal Ottawa Mumbai New Delhi
  • 26. © 2017 GridGain Systems, Inc. Distributed Computations
  • 27. © 2017 GridGain Systems, Inc. Compute Grid DURABLE MEMORY DURABLE MEMORY Ignite Cluster C1 R1 C2 R2 C = C1 + C2 R = R1 + R2 C = Compute R = Result in T/2 time Automatic Failover Load Balancing Zero Deployment
  • 28. © 2017 GridGain Systems, Inc. 1. Initial Request 2. Fetch data from remote nodes 3. Process entire data-set 3 1 Data 1 2 2 Data 2 Client-Server Processing Co-located Processing Server Node ON-DISK Server Node ON-DISK 1. Initial Request 2. Co-located processing with data 3. Reduce multiple results in one 2 2 1Client Node Server Node ON-DISK Server Node ON-DISK Client Node 3
  • 29. © 2017 GridGain Systems, Inc. Machine Learning
  • 30. © 2017 GridGain Systems, Inc. Genetic Algorithm Grid DURABLE MEMORY DURABLE MEMORY Ignite Cluster F2, C2, M2 F = F1 + F2 C = C1 + C2 Collocated Computation Biological Evolution Simulation Chromosome and Genes Cluster M = M1 + M2 F1, C1, M1 F = Fitness Calculation C = Crossover M = Mutation
  • 31. © 2017 GridGain Systems, Inc. Machine Learning Grid K-Means Regressions Decision Trees R C++ Python Java Server Node Server NodeServer Node Distributed Core Algebra DURABLE MEMORY DURABLE MEMORY DURABLE MEMORY Scala REST Random Forest Distributed Algorithms Dense and Sparse Algebra Large Scale Parallelization Multi-Language Support Dense and Sparse Algebra No ETL
  • 32. © 2017 GridGain Systems, Inc. Memory Architecture & Persistence
  • 33. © 2017 GridGain Systems, Inc. Durable Memory Off-heap Removes noticeable GC pauses Automatic Defragmentation Stores Superset of Data Predictable memory consumption Fully Transactional (Write-Ahead Log) DURABLE MEMORY DURABLE MEMORY DURABLE MEMORY Server Node Server Node Server Node Ignite Cluster Instantaneous Restarts
  • 34. © 2017 GridGain Systems, Inc.
  • 35. © 2017 GridGain Systems, Inc. Regions and Segments • Memory split into regions • Regions split into segments • Segments include pages
  • 36. © 2017 GridGain Systems, Inc. B+Tree • Self-balancing tree • Memory & Disk • Sorted Index • Secondary Indexes • Hash Index • Primary Keys • Hash code based sorting
  • 37. © 2017 GridGain Systems, Inc. Free Lists • Tracks pages of ~ equal free space • 25% free • 75% free • Essential for updates • Gives page with min size needed • Reduces fragmentation • Lowers pages compaction activity
  • 38. © 2017 GridGain Systems, Inc. Ignite Native Persistence 1. Update RAM 2. Persist Write-Ahead Log Partition File 1 3. Ack 4. Checkpointing Partition File N Server Node
  • 39. © 2017 GridGain Systems, Inc. Any Questions? Thank you for joining us. Follow the conversation. http://ignite.apache.org #apacheignite #denismagda

Editor's Notes

  1. The Apache Ignite Platform Apache Ignite is a memory-centric data platform that is used to build fast, scalable & resilient solutions. At the heart of the Apache Ignite platform lies a distributed memory-centric data storage platform with ACID semantics, and powerful processing APIs including SQL, Compute, Key/Value and transactions. Built with a memory-centric approach, this enables Apache Ignite to leverage memory for high throughput and low latency whilst utilising local disk or SSD to provide durability and fast recovery. The main difference between the memory-centric approach and the traditional disk-centric approach is that the memory is treated as a fully functional storage, not just as a caching layer, like most databases do. For example, Apache Ignite can function in a pure in-memory mode, in which case it can be treated as an In-Memory Database (IMDB) and In-Memory Data Grid (IMDG) in one. On the other hand, when persistence is turned on, Ignite begins to function as a memory-centric system where most of the processing happens in memory, but the data and indexes get persisted to disk. The main difference here from the traditional disk-centric RDBMS or NoSQL system is that Ignite is strongly consistent, horizontally scalable, and supports both SQL and key-value processing APIs. Apache Ignite platform can be integrated with third-party databases and external storage mediums and can be deployed on any infrastructure. It provides linear scalability, built-in fault tolerance, comprehensive security and auditing alongside advanced monitoring & management. The Apache Ignite platform caters for a range of use cases including: Core banking services, Real-time product pricing, reconciliation and risk calculation engines, analytics and machine learning.
  2. Ignite Data Grid is a distributed key-value store that enables storing data both in memory and on disk within distributed clusters and provides extensive APIs. Ignite Data Grid can be viewed as a distributed partitioned hash map with every cluster node owning a portion of the overall data. This way the more cluster nodes we add, the more data we can store.
  3. Apache Ignite incorporates distributed SQL database capabilities as a part of its platform. The database is horizontally scalable, fault tolerant and SQL ANSI-99 compliant. It supports all SQL, DDL, and DML commands including SELECT, UPDATE, INSERT, MERGE, and DELETE queries. It also provides support for a subset of DDL commands relevant for distributed databases. Data sets as well as indexes can be stored both in RAM and on disk thanks to the durable memory architecture. This allows executing distributed SQL operations across different memory layers achieving in-memory performance with durability of disk. You can interact with Apache Ignite using SQL language via natively developed APIs for Java, .NET and C++, or via the Ignite JDBC or ODBC drivers. This provides a true cross-platform connectivity from languages such as PHP, Ruby and more.
  4. Ignite In-Memory Compute Grid allows executing distributed computations in a parallel fashion to gain high performance, low latency, and linear scalability. Ignite compute grid provides a set of simple APIs that allow users distribute computations and data processing across multiple computers in the cluster. The disk-centric systems, like RDBMS or NoSQL, generally utilize the classic client-server approach, where the data is brought from the server to the client side where it gets processed and then is usually discarded. This approach does not scale well as moving the data over the network is the most expensive operation in a distributed system. A much more scalable approach is collocated processing that reverses the flow by bringing the computations to the servers where the data actually resides. This approach allows you to execute advanced logic or distributed SQL with JOINs exactly where the data is stored avoiding expensive serialization and network trips.
  5. https://ignite.apache.org/collocatedprocessing.html Collocation of computations with data allow for minimizing data serialization within network and can significantly improve performance and scalability of your application. Whenever possible, you should always make best effort to colocate your computations with the cluster nodes caching the data that needs to be processed. Let's assume that a blizzard is approaching New York City. You, as a telecommunication company has to warn all the people sending a message to everyone with precise instructions on how to behave during such weather conditions. There are around 8 million New Yorkers in your database that have to receive the text message. With the client-server approach the company has to connect to the database, move all 8 million (!) records from there to a client application that will text to everyone. This is highly inefficient that wastes network and computational resources of company's IT infrastructure. However, if the company initially collocates all the cities it covers with the people who live there then it can send a single computation (!) to the cluster node that stores information about all New Yorkers and send the text message from there. This approach avoids 8 million records movement over the network and helps utilizing cluster resources for computation needs. That's the collocated processing in action!
  6. https://github.com/techbysample/gagrid GA Grid (Beta) is an in memory Genetic Algorithm (GA) component for Apache Ignite. A GA is a method of solving optimization problems by simulating the process of biological evolution. GA Grid provides a distributive GA library built on top of a mature and scalable Apache Ignite platform. GAs are excellent for searching through large and complex data sets for an optimal solution. Real world applications of GAs include: automotive design, computer gaming, robotics, investments, traffic/shipment routing and more. Glossary Chromosome is a sequence of Genes. A Chromosome represents a potential solution. Crossover is the process in which the genes within chromosomes are combined to derive new chromosomes. Fitness Score is a numerical score that measures the value of a particular Chromosome (ie: solution) relative to other Chromosome in the population. Gene is the discrete building blocks that make up the Chromosome. Genetic Algorithm (GA) is a method of solving optimization problems by simulating the process of biological evolution. A GA continuously enhances a population of potential solutions. With each iteration, a GA selects the 'best fit' individuals from the current population to create offspring for the next generation. After subsequent generations, a GA will "evolve" the population toward an optimal solution. Mutation is the process where genes within a chromosomes are randomly updated to produce new characteristics. Population is the collection of potential solutions or Chromosomes. Selection is the process of choosing candidate solutions (Chromosomes) for the next generation.
  7. DEMO: run several ML samples from the standard distribution. Main benefits: No ETL – online “in place” ML In-memory speed & scale Large scale parallelization Optimized ML/DL algorithms Last-mile GPU optimization The rationale for building ML Grid is quite simple. Many users employ Ignite as the central high-performance storage and processing systems for various data sets. If they wanted to perform ML or Deep Learning (DL) on these data sets (i.e training sets or model inference) they had to ETL them first into some other systems like Apache Mahout or Apache Spark. The roadmap for ML Grid is to start with core algebra implementation based on Ignite co-located distributed processing. The initial version was released with Ignite 2.0. Future releases will introduce custom DSLs for Python, R and Scala, growing collection of optimized ML algorithms such as Linear and Logistic Regression, Decision Tree/Random Forest, SVM, Naive Bayes, as well support for Ignite-optimized Neural Networks and integration with TensorFlow. Current beta version of Apache Ignite Machine Learning Grid (ML Grid) supports a distributed machine learning library built on top of highly optimized and scalable Apache Ignite platform and implements local and distributed vector and matrix algebra operations as well as distributed versions of widely used algorithms.
  8. Apache Ignite memory-centric platform is based on the Durable Memory architecture that allows storing and processing data and indexes both in memory and on disk when the Ignite Persistent Store feature is enabled. The memory architecture helps achieve in-memory performance with durability of disk using all the available resources of the cluster. Ignite's durable memory is built and operates in a way similar to the Virtual Memory of operating systems such as Linux. However, one significant difference between these two types of architectures is that Durable Memory always keeps the whole data set and indexes on disk if the Ignite Persistent Store is used, while Virtual Memory uses the disk for swapping purposes only. In-Memory • Off-Heap memory • Removes noticeable GC pauses • Automatic Defragmentation • Predictable memory consumption • Boosts SQL performance On Disk • Optional Persistence • Support of flash, SSD, Intel 3D Xpoint • Stores superset of data • Fully Transactional ◦ Write-Ahead-Log (WAL) • Instantaneous Cluster Restarts
  9. Ignite Native Persistence is a distributed ACID and SQL-compliant disk store that transparently integrates with Ignite's Durable Memory as an optional disk layer storing data and indexes on SSD, Flash, 3D XPoint, and other types of non-volatile storages. With the Ignite Persistence enabled, you no longer need to keep all the data and indexes in memory or warm it up after a node or cluster restart because the Durable Memory is tightly coupled with persistence and treats it as a secondary memory tier. This implies that if a subset of data or an index is missing in RAM, the Durable Memory will take it from the disk.
  10. B-tree is a self-balancing tree data structure that keeps data sorted and allows searches, sequential access, insertions, and deletions in logarithmic time. B+Tree is a central part of the whole Ignite Virtual memory architecture because even basic key-value operations work via it (cache.get and cache.put)! Move to the next slide.
  11. On the previous slide we explained how to look up a value inside of the virtual memory. However, how does the virtual memory know where to put a new value? In fact, Ignite uses a special data structure called Free List to support this. Basically, a free list is a doubly linked list that stores references to pages of approximately equal free space. For instance, there is a free list that stores all the data pages that have up to 75% free space and a list that keeps track of the index pages with 25% capacity left. Data and index pages are tracked in separate free lists.
  12. Ignite Native Persistence is a distributed ACID and SQL-compliant disk store that transparently integrates with Ignite's Durable Memory as an optional disk layer storing data and indexes on SSD, Flash, 3D XPoint, and other types of non-volatile storages. With the Ignite Persistence enabled, you no longer need to keep all the data and indexes in memory or warm it up after a node or cluster restart because the Durable Memory is tightly coupled with persistence and treats it as a secondary memory tier. This implies that if a subset of data or an index is missing in RAM, the Durable Memory will take it from the disk.