Major Outages in Major Enterprises Payara Conference

Ram Lakshmanan | architect: yCrash
Major Outages in Major Enterprises

1. Unresponsiveness in middleware
Major Financial Institution in N. America

1. GC Log
10. netstat
12. vmstat
2. Thread Dump
9. dmesg
3. Heap Dump (optional)
360-degree data
6. ps
8. Disk Usage
5. top 13. iostat
11. ping
14. Kernel Params
15. App Logs
16. metadata
4. Heap Substitute
7. top -H
Open-source script: https://github.com/ycrash/yc-data-script
./yc –p <PROCESS_ID>

2019-12-26 17:13:23
Full thread dump Java HotSpot(TM) 64-Bit Server VM (23.7-b01 mixed mode):
"Reconnection-1" prio=10 tid=0x00007f0442e10800 nid=0x112a waiting on condition [0x00007f042f719000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x007b3953a98> (a java.util.concurrent.locks.AbstractQueuedSynchr)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at java.lang.Thread.run(Thread.java:722)
:
:
1
2
3
1 Timestamp at which thread dump was triggered
2 JVM Version info
3 Thread Details - <<details in following slides>>
Anatomy of thread dump
"InvoiceThread-A996" prio=10 tid=0x00002b7cfc6fb000 nid=0x4479 runnable [0x00002b7d17ab8000]
java.lang.Thread.State: RUNNABLE
at com.buggycompany.rt.util.ItinerarySegmentProcessor.setConnectingFlight(ItinerarySegmentProcessor.java:380)
at com.buggycompany.rt.util.ItinerarySegmentProcessor.processTripType0(ItinerarySegmentProcessor.java:366)
at com.buggycompany.rt.util.ItinerarySegmentProcessor.processItineraryByTripType(ItinerarySegmentProcessor.java:254)
at com.buggycompany.rt.util.ItinerarySegmentProcessor.templateMethod(ItinerarySegmentProcessor.java:399)
at com.buggycompany.qc.gds.InvoiceGeneratedFacade.readTicketImage(InvoiceGeneratedFacade.java:252)
at com.buggycompany.qc.gds.InvoiceGeneratedFacade.doOrchestrate(InvoiceGeneratedFacade.java:151)
at com.buggycompany.framework.gdstask.BaseGDSFacade.orchestrate(BaseGDSFacade.java:32)
at com.buggycompany.framework.gdstask.BaseGDSFacade.doWork(BaseGDSFacade.java:22)
at com.buggycompany.framework.concurrent.BuggycompanyCallable.call(buggycompanyCallable.java:80)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

"InvoiceThread-A996" prio=10 tid=0x00002b7cfc6fb000 nid=0x4479 runnable [0x00002b7d17ab8000]
java.lang.Thread.State: RUNNABLE
at com.buggycompany.rt.util.ItinerarySegmentProcessor.setConnectingFlight(ItinerarySegmentProcessor.java:380)
at com.buggycompany.rt.util.ItinerarySegmentProcessor.processTripType0(ItinerarySegmentProcessor.java:366)
at com.buggycompany.rt.util.ItinerarySegmentProcessor.processItineraryByTripType(ItinerarySegmentProcessor.java:254)
at com.buggycompany.rt.util.ItinerarySegmentProcessor.templateMethod(ItinerarySegmentProcessor.java:399)
at com.buggycompany.qc.gds.InvoiceGeneratedFacade.readTicketImage(InvoiceGeneratedFacade.java:252)
at com.buggycompany.qc.gds.InvoiceGeneratedFacade.doOrchestrate(InvoiceGeneratedFacade.java:151)
at com.buggycompany.framework.gdstask.BaseGDSFacade.orchestrate(BaseGDSFacade.java:32)
at com.buggycompany.framework.gdstask.BaseGDSFacade.doWork(BaseGDSFacade.java:22)
at com.buggycompany.framework.concurrent.BuggycompanyCallable.call(buggycompanyCallable.java:80)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
1 2 3 4 5
6
7
1 Thread Name - InvoiceThread-A996
2 Priority - Can have values from 1 to 10
3
Thread Id - 0x00002b7cfc6fb000 – Unique ID assigned by JVM. It's returned by calling the Thread.getId() method.
4 Native Id - 0x4479 - This ID is highly platform dependent. On Linux, it's the pid of the thread. On Windows, it's simply the OS-level thread ID within
a process. On Mac OS X, it is said to be the native pthread_t value.
5 Address space - 0x00002b7d17ab8000 -
6 Thread State - RUNNABLE
7 Stack trace -

2. Poor response time
Major cloud service

1. GC Log
10. netstat
12. vmstat
2. Thread Dump
9. dmesg
3. Heap Dump
360-degree data
6. ps
8. Disk Usage
5. top 13. iostat
11. ping
14. Kernel Params
15. App Logs
16. metadata
4. Heap Substitute
7. top -H
Open-source script: https://github.com/ycrash/yc-data-script

What is Garbage
request
Objects
Application
Garbage

How are objects Garbage Collected?
Evolution: Manual -> Automatic
3 – 4 decades before Now
Developer
Writes code to Manually evict Garbage
JVM
Automatically evicts Garbage

Automatic GC sounds good right?
Yes, but for
Cpu consumption
gc pauses
Success Stories GC tuning

2019-08-31T01:09:19.397+0000: 1.606: [GC (Metadata GC Threshold) [PSYoungGen: 545393K->18495K(2446848K)] 545393K-
>18519K(8039424K), 0.0189376 secs] [Times: user=0.15 sys=0.01, real=0.02 secs]
2019-08-31T01:09:19.416+0000: 1.625: [Full GC (Metadata GC Threshold) [PSYoungGen: 18495K->0K(2446848K)] [ParOldGen: 24K-
>17366K(5592576K)] 18519K->17366K(8039424K), [Metaspace: 20781K->20781K(1067008K)], 0.0416162 secs] [Times: user=0.38
sys=0.03, real=0.04 secs]
2019-08-31T01:18:19.288+0000: 541.497: [GC (Metadata GC Threshold) [PSYoungGen: 1391495K->18847K(2446848K)] 1408861K-
2019-08-31T01:18:19.345+0000: 541.554: [Full GC (Metadata GC Threshold) [PSYoungGen: 18847K->0K(2446848K)] [ParOldGen:
17382K->25397K(5592576K)] 36230K->25397K(8039424K), [Metaspace: 34865K->34865K(1079296K)], 0.0467640 secs] [Times:
user=0.31 sys=0.08, real=0.04 secs]
2019-08-31T02:33:20.326+0000: 5042.536: [GC (Allocation Failure) [PSYoungGen: 2097664K->11337K(2446848K)] 2123061K-
Sample GC log

3. Sudden CPU spike
Major Automobile manufacturer

4. CPU spike
Major Trading App

How to troubleshoot CPU spike?
https://blog.fastthread.io/2020/04/23/troubleshooting-cpu-spike-in-a-major-trading-application/

What is ‘top -H’ data?
top –h –p <PROCESS_ID>

5. Degradation in response time
Major Travel Service Provider

public void synchronized getData() {
makeDBCall();
}
Thread 1: Runnable
Thread 2: BLOCKED
Thread 1: Runnable
BLOCKED thread state
BLOCKED THREADS

6. Intermittent HTTP 502 error
In AWS EBS service

Major Outages in Major Enterprises Payara Conference

If you want to learn more…
https://ycrash.io/java-performance-training
Online-Training:
Java Performance/Troubleshooting Master class
@tier1app
https://www.linkedin.com/company/ycrash
This deck will be published in: https://blog.ycrash.io

Major Outages in Major Enterprises Payara Conference

Related slideshows

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

More Related Content

Similar to Major Outages in Major Enterprises Payara Conference

Similar to Major Outages in Major Enterprises Payara Conference (20)

More from Tier1 app

More from Tier1 app (20)

Recently uploaded

Recently uploaded (20)

Major Outages in Major Enterprises Payara Conference

Editor's Notes