How to use 23c AHF AIOPS to protect Oracle Databases 23c
- 1. Oracle Database 23c and AHF Insights to do
better AIOps
Aug 2023
Sandesh Rao
VP AIOps , Autonomous Database
@sandeshr
https://www.linkedin.com/in/raosandesh/
https://www.slideshare.net/SandeshRao4
- 3. AHF AIOps Platform
Detect Collect Create Notify
Clarify Rediscover Analyze
Mitigate
& Fix
Telemetry
Mini & SRDC
Bug & Jira
Page & Email
IC Service
Bug
De-duplication
Issue
Clustering
Expert Systems
Timeline
Dev
Containers
Source
Evaluators
Workaround & Patches
Copyright © 2023, Oracle and/or its affiliates
3
- 4. AIOps and Applied Machine Learning
Copyright © 2023, Oracle and/or its affiliates
4
How does Machine Learning play into AIOps?
Time-Series Metrics
Log/Trace Events
Precursor Metric(s)
Precursor Event(s)
Root-Cause Metric
Root-Cause Event
Prevent
Recover
Root-Cause Action
Root-Cause Action
Problem
Predictive
Reactive
- 5. AHF Compliance Manager
• Compliance management
• Around 4000+ best practices
• Covers Exadata and security
• Constant Cadence of features
AHF Root Cause Analyzer
• Log scanners for obvious issues
• ML models to root cause
• Eliminate non-defect issues
• Recommend Patches
AHF AutoUpgrade
• Stack Deployment
• RPM’s , automated packaged
installers
• Standard home locations
What is AHF
AHF Data Collectors
• First Failure Capture
• Telemetry capture, streaming
• Diagnostic log collection
• OS and Database metrics
• Collection standardization
• Rudimentary aggregation and
analysis
AHF ABS
• Bug rediscovery
• Autoclose known issues
• ML based models
• Cloud scale deployment
AHF Service Console
• Front-end for analysis, cause
and solution identification
• Unified Timeline
• Anomaly Detection
• Graphing for Time Series Data
• AHF Insights and Fleet Insights
Copyright © 2023, Oracle and/or its affiliates
5
- 6. 6
Autonomous Health
Cloud Platform
Autonomous Health
Cloud Platform
Machines
Smart Collectors
SRs
Expert
Input
Feedback &
Improvement
Bugs
1
SRs
Logs
Model
Generation
Model
Knowledge
Extraction
Applied Machine Learning
Cloud Ops
Object
Store
Admin UI in Control Plane
Oracle Support
Bug DB
SE UI in Support
Tenant
(CNS)
Cleansing,
metadata
creation &
clustering
5 Model generation
with expert scrubbing
6
Deployed as
part of cloud
image,
running from
the start
1 Proactive regular health checking,
real-time fault detection, automatic
incident analysis, diagnostic
collection & masking of sensitive
data
2
Use real-time health dashboards for
anomaly detection, root cause analysis &
push of proactive, preventative &
corrective actions. Auto bug search & auto
bug & SR creation. 3
Auto SR analysis, diagnosis assistance via
automatic anomaly detection,
collaboration and one click bug creation
4
Message
Broker
Copyright © 2023, Oracle and/or its affiliates
- 7. EXAchk , ORAchk , DBSat , Autoupgrade , CVU , Collection Manager (Apex App)
The verification and compliance tools which support all the components across the stack
What is AHF
Copyright © 2023, Oracle and/or its affiliates
7
AHF
Compliance
Manager
AHF
Data
Collectors
AHF
Root Cause
Analyzer
AHF
Service
Console
TFA , CHM , Data Plane Telemetry , OSWatcher
The different OS and Data Collectors
CHA , DT , Parsers
Automation which responds to the customer issues or makes it easier to slice and dice data
AHF Insights and Fleet Insights
The frontend which is visible to Customers and Support
- 8. Oracle’s AI Ops Cloud Platform Implementation
What does our platform look like implemented?
Copyright © 2023, Oracle and/or its affiliates
8
Machine View
- 9. What are some of the Operations areas that use AML?
AIOps Using Applied Machine Learning
Copyright © 2023, Oracle and/or its affiliates
9
Proactive Prevention
OS Data
Real-Time Performance Prognostics Engine
Alert &
Preventive
Action
DB Data
Rapid Recovery
Entry
Clustering
Knowledge
Base Indexing
Model
Generation
Log
Cleansing
1 3 4 5 6
Expert Input
Knowledge Base Creation
Feedback
Training
Real-time
Log File
Processing
Timestamp
Correlation &
Ranking
8 9
7
Entry Feature
Creation
2
Logs
Traces
Alert &
Preventive
Action
Logs
Traces
In Lab
- 10. Pros:
• Destination for Important DB Events
• Single file to monitor by DBAs
• Many tools available to parse
• Supported by TFA for generating alarms
Cons:
• Includes both critical and non-critical
events
• Incudes messages not intended for
DBAs
• Inconsistently reports severity level
• Can report unintuitive cause and action
• New undocumented messages in every
release
Oracle Database Alert Log
Copyright © 2023, Oracle and/or its affiliates
10
- 11. Copyright © 2023, Oracle and/or its affiliates
11
The Curated Solution - New 21c Attention Log
Contains only important events requiring customer attention
Includes documented set of messages and attributes
All Messages include these attributes:
• Type
• Urgency
• Scope
• Target User
• Cause and Action
• Additional debug information
- 12. Oracle Database Attention Log Message Flow
Copyright © 2023, Oracle and/or its affiliates
12
DB
Component
Diagnostic
Framework
alert/log.xml log/attention.log
attention.amb
(Message Definitions)
Attention Log
Message
Attention Curated
Message
- 13. 1. App-Dev
2. Sec-Admin
3. Net-Admin
4. Cluster-Admin
5. PDB-Admin
6. CDB-Admin
7. Server-Admin
8. Storage-Admin
9. DataOps-Admin
Attention Log Curation - Message Attributes
Copyright © 2023, Oracle and/or its affiliates
13
1. Error
2. Warning
3. Notification
1. Session
2. Process
3. PDB-Instance
4. CDB-Instance
5. CDB-Cluster
6. PDB-Persistent
7. CDB-Persistent
1. Immediate
2. Soon
3. Deferable
4. Info
SCOPE
TYPE
TARGET
USER
URGENCY
- 14. Copyright © 2023, Oracle and/or its affiliates
14
// TYPE - 1 error, 2 warning, 3 notification
// URGENCY - 1 immediate, 2 soon, 3 deferable, 4 info
// SCOPE - 1 session, 2 process, 3 pdb-instance, 4 cdb-instance, 5 cdb-cluster, 6 pdb-persistent, 7 cdb-persistent
// TARGETUSER - 1 app-dev, 2 sec-admin, 3 net-admin, 4 cluster-admin, 5 pdb-admin, 6 cdb-admin, 7 server-admin, 8 storage-admin, 9 dataops-admin
ID::2000
TYPE::2
URGENCY::1
SCOPE::4
TARGETUSER::6
TEXT::Parameter %s specified is high
CAUSE::Memory parameter specified for this instance is high
ACTION::Check alert log or trace file for more information relating to instance
configuration, reconfigure the parameter and restart the instance
STARTVERSION::21.1
Example Attention Message Definition – CDB Warning
- 15. Copyright © 2023, Oracle and/or its affiliates
15
[
IMMEDIATE Parameter SGA_MAX_SIZE specified is high
CAUSE: Memory parameter specified for this instance is high
ACTION: Check alert log or trace file for more information relating to instance
configuration, reconfigure the parameter and restart the instance
CLASS: CDB Instance / CDB ADMINISTRATOR / WARNING / AL-2000
TIME: 2020-05-01T11:09:02.223-07:00
ADDITIONAL INFO: -
WARNING: SGA_MAX_SIZE (6144 MB) is too high - it should be less than 5634 MB (80
percent of physical memory).
]
Example Attention Log Curated Message – CDB Warning
- 16. Copyright © 2023, Oracle and/or its affiliates
17
[
IMMEDIATE Shutting down ORACLE instance (abort) (OS id: 8394)
CAUSE: A command to shutdown the instance was executed
ACTION: Check alert log for progress and completion of command
CLASS: CDB Instance / CDB ADMINISTRATOR / ERROR / AL-1002
TIME: 2020-05-08T17:09:33.773-07:00
ADDITIONAL INFO: -
Shutdown is initiated by sqlplus@den02tlh (TNS V1-V3).
]
Example Attention Log Curated Message – CDB Error
- 17. Copyright © 2023, Oracle and/or its affiliates
19
[
SOON Heavy swapping observed on system
CAUSE: Memory usage by one more application is leading to heavy swapping
ACTION: Check alert log for more information, use tools to analyze memory
usage and take action
CLASS: CDB Instance / SERVER ADMINISTRATOR / WARNING / AL-2100
TIME: 2020-05-01T11:09:02.223-07:00
ADDITIONAL INFO: -
WARNING: Heavy swapping observed on system in last 15 mins.
Heavy swapping can lead to timeouts, poor performance, and instance eviction.
]
Example Attention Log Curated Message – Server Warning
- 18. Attention Log Use Cases – AHF + OCI Integration
Copyright © 2023, Oracle and/or its affiliates
21
Autonomous Health
Framework
Trace File Analyzer
…
…
Attention Log
Repository
Management VCN
AHF Service
Cloud Ops
Object
Store
Runbooks
- 20. • Compliance management
• Around 4000+ best
practices
• Covers Exadata and security
• Constant Cadence of
features
What is AHF
Copyright © 2023, Oracle and/or its affiliates
25
Compliance
Manager
Data
Collection
Root Cause
Analyzer
Service
Tooling
Auto
Upgrade
Bug
Matching
Data
Sanitizing
Resource
Allocation
Issue
Detection
Service
Console
- 21. Building compliance with best practices
Development methodology
1
Idea
Reports from development, testing, support etc
2
Expert review
Weekly meetings to review and test
3
MOS Note 757552.1
Published Exadata best practices
4
Default deployment
Bake best practices back in to default deployment
5 AHF compliance check
Generation of new checks
Copyright © 2023, Oracle and/or its affiliates
26
- 22. Limit checks
-profile
One or more of 40+
different component
focused check
categories
Upgrade readiness
-Database
-GI
-ODA
-Exadata
-ODA
Limit targets
-cells
-clusternodes
-ibswitches
-dbnames
Security assessment
Default password for
OS and database users
Database security
checks using DBSAT
Ways to run compliance checks
Copyright © 2023, Oracle and/or its affiliates
27
- 27. • First Failure Capture
• Telemetry capture,
streaming
• Diagnostic log collection
• OS and Database metrics
• Collection standardization
• Rudimentary aggregation
and analysis
What is AHF
Copyright © 2023, Oracle and/or its affiliates
32
Compliance
Manager
Data
Collection
Root Cause
Analyzer
Service
Tooling
Auto
Upgrade
Bug
Matching
Data
Sanitizing
Resource
Allocation
Issue
Detection
Service
Console
- 29. SRDCs (Service Request Diagnostic Collection)
Oracle Grid Infrastructure
& Databases
AHF
1
AHF detects a fault
2
Diagnostics
are collected
3
Distributed
diagnostics are
consolidated and
packaged
4
Notification of fault is
sent
5 Diagnostic collection
is uploaded to Oracle
Storage Service for
later analysis
Object
Store
Copyright © 2023, Oracle and/or its affiliates
34
- 30. • Database areas
• Errors / Corruption
• Performance
• Install / patching / upgrade
• RAC / Grid Infrastructure
• Import / Export
• RMAN
• Transparent Data Encryption
• Storage / partitioning
• Undo / auditing
• Listener / naming services
• Spatial / XDB
• Other Server Technology
• Enterprise Manager
• Data Guard
• GoldenGate
• Exalogic
Full list in documentation
Some problem areas covered in SRDCs
Around 100 problem types covered
tfactl diagcollect –srdc <srdc_type> [-sr <sr_number>]
Copyright © 2023, Oracle and/or its affiliates
35
- 31. 1. Generate ADDM reviewing Document 1680075.1 (multiple
steps)
2. Identify “good” and “problem” periods and gather AWR
reviewing Document 1903158.1 (multiple steps)
3. Generate AWR compare report (awrddrpt.sql) using “good”
and “problem” periods
4. Generate ASH report for “good” and “problem” periods
reviewing Document 1903145.1 (multiple steps)
5. Collect OSWatcher data reviewing Document
301137.1 (multiple steps)
6. Collect Hang Analyze output at Level 4
7. Generate SQL Healthcheck for problem SQL id using
Document 1366133.1 (multiple steps)
8. Run support provided sql scripts – Log File sync diagnostic
output using Document 1064487.1 (multiple steps)
9. Check alert.log if there are any errors during the “problem”
period
10. Find any trace files generated during the “problem” period
11. Collate and upload all the above files/outputs to SR
1. Run
Manual collection vs TFA SRDC for database performance
Manual method TFA SRDC
tfactl diagcollect –srdc dbperf [-sr <sr_number>]
Copyright © 2023, Oracle and/or its affiliates
36
- 32. Copyright © 2023, Oracle and/or its affiliates
37
Generates view of Cluster and Database diagnostic
metrics
• Always on - Enabled by default
• Provides Detailed OS Resource Metrics
• Assists Node eviction analysis
• Locally logs all process data
• User can define pinned processes
• Listens to CSS and GIPC GI events
• Categorizes processes by type
• Supports plug-in collectors (ex. traceroute,
netstat, ping, etc.)
• New CSV output for ease of analysis
AHF OS Data Collector
GIMR
ologgerd
(master)
osysmon
d
osysmond
osysmond
osysmond
OS Data OS Data
OS Data
OS Data
- 33. Automatic upgrade when AHF finds a new
version
New versions can be found automatically at:
• The local file system
• REST locations
• Object store locations
On-demand via ahfctl upgrade
The latest version can be pulled on-demand
from My Oracle Support
AHF will also prompt you to upgrade when it
detects it’s older than 180 days
Automatic AHF upgrade
39 Copyright © 2023, Oracle and/or its affiliates
- 34. • Log scanners for obvious
issues
• ML models to root cause
• Eliminate non-defect issues
• Recommend Patches
What is AHF
Copyright © 2023, Oracle and/or its affiliates
44
Compliance
Manager
Data
Collection
Root Cause
Analyzer
Service
Tooling
Auto
Upgrade
Bug
Matching
Data
Sanitizing
Resource
Allocation
Issue
Detection
Service
Console
- 35. Discovers Potential Cluster & DB Problems
Actual Internal data drives model
development
Applied purpose-built Applied ML
for knowledge extraction
Expert Dev team scrubs data
Generates Bayesian Network-based
diagnostic root-cause models
Uses BN-based run-time models to
perform real-time prognostics
Database Health - Applied Machine Learning
Copyright © 2023, Oracle and/or its affiliates
45
AHF Dev Team
Log
ASH
Metrics
ML
Knowledge
Extraction
BN
Models
Expert
Supervision
DB+Node
Runtime
Models
Feedback
Scrub Data
AHF
AHF
- 36. Machine
Learning
Pattern
Recognition
Bayesian
Network
Engines
CHA Operational Flow : Anomaly Detection -> Diagnostics -> Prognosis
For each data point …
AHF Anomaly Detection flow
Copyright © 2023, Oracle and/or its affiliates
46
Is data valid
?
Is behavior
expected ?
Is there a
problem ?
What is
causing the
problem ?
Data Validation
Operating State
Estimation
Fault
Identification
Diagnostic
Decision
Is a failure
likely ?
Prognosis
- 37. Models Capture the Dynamic Behavior of all Normal Operation
Models Capture all Normal Operating Modes
47
0
5000
10000
15000
20000
25000
30000
35000
40000
10:00 2:00 6:00
5100
9025
4024
2350
4100
22050
10000
21000
4400
2500
4900
800
IOPS
user commits (/sec)
log file parallel write (usec)
log file sync (usec)
A model captures the normal load phases and their statistics over time , and thus the
characteristics for all load intensities and profiles .
During monitoring , any data point similar to one of the vectors is NORMAL.
One could say that the model REMEMBERS the normal operational dynamics over time
In-Memory Reference Matrix
(Part of “Normality” Model)
IOPS
###
#
2500 4900 800
##
##
User Commits
###
#
10000 21000 4400
##
##
Log File Parallel
Write
###
#
2350 4100 22050
##
##
Log File Sync
###
#
5100 9025 4024
##
##
… … … … … …
Copyright © 2023, Oracle and/or its affiliates
- 38. AHF Anomaly Detection flow
48
Observed values
(Part of a Data Point)
Estimator/predictor (ESEE): “based on my normality model, the value of IOPS should be in the
vicinity of ~ 4900, but it is reported as 10500, this is causing a residual of ~ 5600 in magnitude”,
Fault detector: “such high magnitude of residuals should be tracked carefully! I’ll keep an eye on
the incoming sequence of this signal IOPSand if it remains deviant I’ll generate a fault on it”.
In-Memory Reference
Matrix
(Part of “Normality” Model)
IOPS
###
#
2500 4900 800
##
##
User Commits
###
#
10000 21000 4400
##
##
Log File Parallel
Write
###
#
2350 4100 22050
##
##
Log File Sync
###
#
5100 9025 4024
##
##
… … … … … …
10500
20000
4050
10250
…
Residual Values
(Part of a Data Point)
5600
-1000
-50
325
…
Observed -
Predicted =
Copyright © 2023, Oracle and/or its affiliates
- 39. Inline and Immediate Fault Detection and Diagnostic Inference
AHF Anomaly Detection flow
49
Machine Learning, Pattern
Recognition, & BN Engines
Time CPU ASM
IOPS
Networ
k % util
Network
_Packets
Dropped
Log
file
sync
Log file
parallel
write
GC CR
request
GC
current
request
GC current
block 2-way
GC
current
block
busy
Enq:
CF -
conte
ntion
…
15:16:00 0.90 4100 88% 105 2
ms
600 us 504 ms 513 ms 2 ms 5.9 ms 0
15:16:00
OK OK HIGH
1
HIGH
2
OK OK HIGH
3
HIGH
3
HIGH
4
HIGH
4
OK
Input : Data Point at Time t
Fault Detection and Classification
Diagnostic Inference
15:16:0
0
Symptoms
1. Network Bandwidth Utilization
2. Network Packet Loss
3. Global Cache Requests Incomplete
4. Global Cache Message Latency
Root Cause
(Target of Corrective Action)
Network Bandwidth Utilization
Diagnostic
Inference
Engine
Copyright © 2023, Oracle and/or its affiliates
- 40. Cross Node and Cross Instance Diagnostic Inference
AHF Anomaly Detection flow
50
15:16:00
Root Cause
(Target of Corrective
Action)
Network
Bandwidth
Utilization
Diagnostic
Inference
Engine
15:16:00
Root Cause
(Target of Corrective
Action)
Network
Bandwidth
Utilization
Diagnostic
Inference
Engine
15:16:00
Root Cause
(Target of Corrective
Action)
Network
Bandwidth
Utilization
Diagnostic
Inference
Engine
Cross Target
Diagnostic
Inference
Node 1
Node 2
Node 3
Corrective Action Target
Copyright © 2023, Oracle and/or its affiliates
- 41. Identify Signatures
• Incidents
• Bugs
Detect anomalies
• Logs
• OS metrics
Predict
• Resource usage
• Maintenance window
• Performance issues
• Workload Stability
Some AIOps Use Cases
Copyright © 2023, Oracle and/or its affiliates
51
- 42. Anomaly Detection – High Level
52
Known normal log entry (discard)
Probable anomalous Line (collect)
Log
Collection
File
Type
1
File
Type
2
File
Type
n..
Log File
Anomaly
Timeline
Probable
Anomalies
Copyright © 2023, Oracle and/or its affiliates
- 43. Trace File Analyzer – High Level Anomaly Detection Flow
53
Log
Cleansing
1 2 3 4 5 6
Entry Feature
Creation
Entry
Clustering
Model
Generation
Expert
Input
Knowledge Base
Creation
Knowledge
Base Indexing
Feedback
Training
Real-time
Log File Processing
Timestamp Correlation & Ranking
8 9
7
Batch
Feedback
Copyright © 2023, Oracle and/or its affiliates
- 44. Drain Algorithm
54
• Drain is an online log template miner that can extract templates (clusters) from a stream of log
messages in a timely manner.
• It employs a parse tree with fixed depth to guide the log group search process, which effectively
avoids constructing a very deep and unbalanced tree.
• Drain continuously learns on-the-fly and extracts log templates from raw log entries.
• Drain Research Paper :
• Pinjia He, Jieming Zhu, Zibin Zheng, and Michael R. Lyu. Drain: An Online Log Parsing Approach with Fixed
Depth Tree, Proceedings of the 24th International Conference on Web Services (ICWS), 2017.
• Link : http://jiemingzhu.github.io/pub/pjhe_icws2017.pdf
- 45. Drain Algorithm – Parameters for Tuning
55
• Drain Parameters for tuning to the log file type needs.
Parameter Description
[DRAIN]/sim_th similarity threshold
[DRAIN]/depth max depth levels of log clusters
[DRAIN]/max_children max number of children of an internal node
[DRAIN]/max_clusters max number of tracked clusters
[DRAIN]/extra_delimiters delimiters to apply when splitting log message into words
[MASKING]/masking parameters masking
[SNAPSHOT]/snapshot_interval_minutes time interval for new snapshots
[SNAPSHOT]/compress_state whether to compress the state before saving it
- 46. Our Improvisation over Drain
56
• Multi level drain signatures
• Association with source code with drain
signature for more precise feature capturing
• Interface to tune auto-marking of signatures to
view results of parameter changes in real-time.
- 47. CPU Usage and forecast
Jan 2021
Jan 2021
Jan 2021
Jan 2021
Copyright © 2023, Oracle and/or its affiliates
57
- 48. Seasonality determination to window identification flow
START_TIME CNT
2021-04-11 15:00:00 290
2021-04-11 16:00:00 31120
2021-04-11 17:00:00 21530
2021-04-11 18:00:00 26240
2021-04-11 19:00:00 40520
2021-04-11 20:00:00 54270
2021-04-11 21:00:00 51460
2021-04-11 22:00:00 44310
2021-04-11 23:00:00 25690
START_TIME
2021-04-11 15:00:00 -0.226098
2021-04-11 16:00:00 -0.069821
2021-04-11 17:00:00 -0.350088
2021-04-11 18:00:00 -0.187483
2021-04-11 19:00:00 -0.513240
2021-04-11 20:00:00 0.019737
2021-04-11 21:00:00 0.059213
2021-04-11 22:00:00 -0.011312
2021-04-11 23:00:00 -0.179156
START_TIME
2021-04-11 15:00:00 5.669881
2021-04-11 16:00:00 10.345606
2021-04-11 17:00:00 9.977203
2021-04-11 18:00:00 10.175040
2021-04-11 19:00:00 10.609551
2021-04-11 20:00:00 10.901727
2021-04-11 21:00:00 10.848560
2021-04-11 22:00:00 10.698966
2021-04-11 23:00:00 10.153857
Current Date : 2021-05-12 15:00:00
Current Position in Seasonality : -0.22609829742533585
Best Maintenance Period in next Cycle : 2021-05-12 19:00:00
Worst Maintenance Period in next Cycle : 2021-05-13 08:00:00
Original observation data
1
Convolution filter & average
2
Calculate seasonality
3
Use seasonality to
predict best
maintenance window
4
2021-04-11 2021-04-18 2021-04-25 2021-05-02 2021-05-09 2021-04-11 2021-04-18 2021-04-25 2021-05-02 2021-05-09 2021-04-11 2021-04-18 2021-04-25 2021-05-02 2021-05-09
Copyright © 2023, Oracle and/or its affiliates
58
- 49. Identifying time periods with high z-score events across
multiple metrics
7 May 2021
Copyright © 2023, Oracle and/or its affiliates
59
- 50. • Front-end for analysis, cause
and solution identification
• Unified Timeline
• Anomaly Detection
• Graphing for Time Series Data
• AHF Insights and Fleet
Insights
What is AHF
Copyright © 2023, Oracle and/or its affiliates
60
Compliance
Manager
Data
Collection
Root Cause
Analyzer
Service
Tooling
Auto
Upgrade
Bug
Matching
Data
Sanitizing
Resource
Allocation
Issue
Detection
Service
Console
- 51. Previously, results from different AHF
components were not available in a single
dashboard making it challenging to combine and
correlate.
To mitigate this, AHF Insights provides a web-
based graphical user interface, which does not
require a web server to host the web pages, for
all diagnostic data collectors and analyzers that
are part of AHF Kit.
AHF performs a diagnostic collection for a given
period to analyze the performance of database
systems from:
• Configuration
• Environment Topology
• Metrics
• Logs
This diagnostic data collected from the system
passes through AHF Insights and produce an
offline report.
AHF Insights Overview
Copyright © 2023, Oracle and/or its affiliates
61
AHF Insights provides a bird's eye view of the entire system with the ability to
further drill down for root cause analysis.
- 52. Information Captured
System Topology
• Resource Information
• Resource Configuration
• Summarized viewing of resource data
Insights
• Major events happening on the system
• Operating system information and it’s analysis
• Best practice compliance issues
• Software Recommendation
• Software / Hardware alerts for Database Server
• System changes over last 14 days
• RPM details and RPM inconsistencies among hosts
• Database Parameters and differences among databases
• Kernel Parameters and differences among hosts
Copyright © 2023, Oracle and/or its affiliates
62
- 53. • Latest AHF with AHF Insights code
• Feature available from AHF 22.3 for Exadata Systems
• Required AHF data sources (TFA, Exachk, CHM) should be
enabled and running
• 23.4 and higher for RAC Linux and ODA Systems
Prerequisites
Copyright © 2023, Oracle and/or its affiliates
63
- 54. How can I generate it ?
• Command : ahf analysis create --type insights --last 2h
• Takes around : 3 - 4 minutes (depending on the system)
• Size : 46MB zip (depending on the system)
Copyright © 2023, Oracle and/or its affiliates
64
- 55. System Topology
• Cluster
• Databases
• Database Servers
• Storage Servers
• Fabric Switches
Insights
• Timeline
• Operating System Issues
• Best Practice issues
• System Change
• Recommended Software
• Database Server
• RPM List
• Database Parameters
• Kernel Parameters
AHF Insights Report
Copyright © 2023, Oracle and/or its affiliates
65
- 56. Cluster Summary
1.Showcase relevant system
cluster information.
2.Get DB Home details by clicking
on the dropdown button located
inside the DB Home section.
3.Copy Cluster summary into user
clipboard.
Cluster
Copyright © 2023, Oracle and/or its affiliates
66
- 57. Cluster Summary
1.Showcase relevant system cluster
information.
2.Get DB Home details by clicking on
the dropdown button located inside
the DB Home section.
3.Copy Cluster summary into user
clipboard.
Cluster
Copyright © 2023, Oracle and/or its affiliates
67
- 59. How has Oracle and Customers benefited from this AI Ops implementation?
ü AI Ops has become an essential Cloud technology
ü Understand the problem space
ü Understand the environmental, technical and legal constraints
ü Use appropriate ML algorithms to the task
ü Spend quality time with your training sets
ü Incorporate explainability into the results
ü Provide a feedback mechanism for model evolution
ü Look for opportunities to incorporate actuators
ü Honor the culture and risk tolerance of your target audience
Oracle Cloud AI Ops Takeaways
Copyright © 2023, Oracle and/or its affiliates
69
- 60. Thank you
Any Questions?
Sandesh Rao
VP AIOps Autonomous Database
@sandeshr
https://www.linkedin.com/in/raosandesh/
https://www.slideshare.net/SandeshRao4
Copyright © 2023, Oracle and/or its affiliates
70