SlideShare a Scribd company logo
On X86 systems
Baruch Osoveskiy Senior Consultant Brillix
Who Am I
• Senior Consultant in brillix
• Unix-Linux System Admin since 1997
• DBA oracle and MySQL since 2000
• Linux and security Consultant
• blogger in ildba.co.il
• Enterprise Distributions
• Kernel
• Packages
• Drivers
◦ Use Unbreakable Enterprise Kernel (UEK), Redhat ,
Suse Whey ?
 Better optimized for large systems and workloads
 Better hardware support
 Modern Linux features
 Patches required for correct Oracle product operation
• Bug Fixes from upstream
• Most Bug Fixes originate Upstream and are backported to Enterprise Distributions
• More code change from upstream means more time before patches are backported -- if it’s even possible to do so
• More time for security patches to be backported to the Enterprise versions
• Bug Fixes in Upstream apply cleanly to UEK
• Better testing
• Code is tested by the whole Linux community, not dependent on one OS vendor and their customers
• You run the same code that’s in upstream
• No backporting/scaffolding to use the latest Linux kernel features (i.e. NFSv4, TCP fast open, etc)
• Better contributions
• Largest amount of Developers and Company Contributions
• Major Backports not required to provide cutting edge features
• New features seamlessly used by Oracle products
• Updates contain critical security and bug fixes.
• Most Enterprise Distribution updates contain new features.
• “Security Errata” can contain new features
• Updates often contain 1000s of lines of new code from upstream.
• Includes features, bug fixes, enhancements and other tweaks
• Install Only security and bug fixes to avoid down time from new and
untested features
• The DB server is not your Laptop do not install unknown/new software.
• Use Oracle validated
• http://www.oracle.com/technetwork/server-storage/linux/validated-configurations-
085828.html
• Install only base + Oracle validated packages
• Do not install games, applications .. on the production servers
#cd /etc/yum.repos.d
# wget http://public-yum.oracle.com/public-yum-ol6.repo
# yum list
#yum install oracle-rdbms-server-11gR2-preinstall
• BIOS
• CPU
• Memory TYPE , swap
• Disk (Disks in the more the better )
• Virtualization
Motherboard
Handle 0x0003, DMI type 2, 16 bytes
Base Board Information
Manufacturer: Intel
Product Name: S5000PAL0
Processor
Processor Information
Version: Intel(R) Xeon(R) CPU X5355
Memory
Handle 0x0034, DMI type 17, 27 bytes
Memory Device
Data Width: 64 bits
Size: 2048 MB
Form Factor: DIMM
Set: 1
Locator: ONBOARD DIMM_A1
Bank Locator: Not Specified
Type: DDR2
Type Detail: Synchronous
Speed: 667 MHz (1.5 ns)
• Motherboard
4 Memory Channels (S5000PAL0) 8 Slots
(A1/A2/B1/B2/D1/D2)Channels
• CPU
Intel ClovertownCPUs
1333Mhz (Dual Independent FSB)
Bandwidth 10666 MB/s per FSB
21 GB/s Maximum FSB Bandwdith
• Memory
Memory DDR2 667 = PC2-5300
4 Memory Channels at 5.3GB/s each
Memory Bandwidth of 21 GB/s from all 4
channels
16GB memory in total
http://ark.intel.com/
• Always check for appropriate BIOS settings Look out for:
• CPU features
• Enable Maximum Performance in the BIOS
• Memory
• Enable numa
• Power Management
• will give you 35% better performance ( Test On OLTP).
• SMT Simultaneous Multi-Threading
• Run 2 threads at the same time per core
• Do I have HT ?
• Ensuring that HT is enabled at the BIOS.
• grep -e "model name" /proc/cpuinfo
• http://ark.intel.com/
• Do not Enable TH on I/O bound server it only will make it worse.
 cpufreq
you can dynamically scale processor frequencies through the CPUfreq subsystem.
◦ Enable Maximum Performance in the BIOS
◦ /sys/devices/system/cpu/cpu<n>/cpufreq/scaling_governor
◦ On Redhat 5.x default is performance
◦ On Redhat 6.x default is normal
◦ echo performance > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
sudo modprobe cpufreq_conservative
sudo modprobe cpufreq_ondemand
sudo modprobe cpufreq_powersave
sudo modprobe cpufreq_stats
sudo modprobe cpufreq_userspace
/etc/init.d/cpuspeed status
echo performance > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
grep -i mhz /proc/cpuinfo
cpu MHz : 2367.330
• Load of more then n on n cpu server is bad
• Load of n cpu on n cpu server is good
= load of 1.00
= load of 0.50
= load of 1.70
• Use Load to find if the server is CPU bound
• one, five, and fifteen minute averages
• In LINUX CPU bound can impact I/O
 cat /proc/meminfo
◦ Free = Cached + Free
◦ All free space on Linux is used for pagecache.
◦ This behavior can be controlled by cgroups.
◦ PageTables large use HugePages.
 /dev/shm
◦ implementation of traditional shared memory (ramfs )
◦ Used by Automatic Memory Management ( AMM MEMORY_TARGET )
◦ Not working with Hugepages ID 1134002.1
• Oracle will recognize NUMA systems and adjust memory and scheduling operations
accordingly and NUMA technology allows for faster communication between
distributed memory in a multi-processor server. ID 759565.1
• !! Disabling or enabling NUMA Will change application performance. !!
• 8 sockets and beyond may see gains of approximately 5%
• Enbale on bios level and in grub.conf remove numa=off
• _enable_NUMA_optimization=TRUE (look for bugs on your version before enable)
• dmesg | grep -i numa
NUMA: Initialized distance table, cnt=2
NUMA: Node 0 [0,c0000000) + [100000000,1040000000) -> [0,1040000000)
pci_bus 0000:00: on NUMA node 0 (pxm 0)
pci_bus 0000:80: on NUMA node 1 (pxm 1)
• Without HugePages the memory of the is divided into 4K pages
• Using HugePages the page size is increased to 2MB (configurable to 1G )
• HugePages reducing the total number of pages to be managed by the
kernel
• reducing the amount of memory required to hold the page table in memory.
• Use Hugepages Oracle Doc ID 749851.1
• Reduce footprint of individual Oracle database connections.
• Increase performance and scalability with fewer tlb misses.
• Requires manual tuning after SGA changes, and does not work with AMM
(/dev/shm).
Without Hugepages
o 200 Connections to a
12.9GB SGA
o Before DB Startup
Pagetables: 7400 kB
o After DB Startup Pagetables:
652900 kB
o After 200 PQ Slave run query
o Pagetables: 6189248k
o Time to complete:
00:10:23.60
With Hugepages
o 200 Connections to a 12.9GB
SGA
o Before DB Startup PageTables:
7748 kB
o After DB Startup Pagetables:
21288 kB
o After 200 PQ slaves run query
o Pagetables: 80564 kB
o Time to complete: 00:00:18.77
 Use Hugepages with VMs for non-swappable, shared pagetables.
 Hugepages must allocated in the guest VM and the hypervisor
 Oracle VM 3.2.6 contains support for pv-hugepages
 What about Swap?
◦ Modern Linux distributions Do not use Swap (swappiness is very low )
◦ Swap is for OS services only. I do not recommend swap = ram.
◦ Check vmstat output: ensure swap
◦ Do not use Swap as memory – buy more memory
◦ If you have free memory
 echo 10 > /proc/sys/vm/swappiness
 vm.swappiness in /etc/sysctl.conf
• Disks – the more the better
• Do not mix.
• Use RAID
• Use Hardware RAID
• RAID 1+0 is best for write performance (logs).
• RAID 5 is best for read performance.
RAID Level
Total array
capacity
Fault tolerance Read speed (4k) Write speed (4k)
RAID-1+0
500GB x 4 disks
1000 GB 1 disk 2X 2X
RAID-5
500GB x 3 disks
1000 GB 1 disk 3X
Speed of a RAID 5
depends on
controller
• High “log file sync” event time .
• Do Not Use Raid 5 on Redo Logs (low write performance).
• Upgrading the CPU enabled more throughput increase for redo (LGWR also requires CPU)
• reducing the overall number of commits by batching transactions can have a very beneficial effect.
• See if any of the processing can use the COMMIT NOWAIT option.
• See if any activity can safely be done with NOLOGGING / UNRECOVERABLE options.
• Enlarge the redologs so the logs switch between 15 to 20 minutes.
• ID 34592.1
 On Linux Use ASM (Block/RAW Device, O_DIRECT )
 Raw Devices deprecated by OUI for Oracle 11.2 ID 357492.1
 Raw Devices may still bring benefits for intensive redo and large redo log
files
 Use udev or asmlib to Control Devices
 If using file system Bypass journaling when you create a file system., use EXT-2 or
EXT-4 with journaling turned off,
 journaling turned off eliminates double writes.
 “noatime” option eliminates the need for the system to create writes to the file
system when objects are only being read.
 To Creaet partition and to disable DOS compatibility
fdisk -c -u /dev/sda1
 To turn off journaling, execute:
tune4fs -O ^has_journal /dev/sda1
mount -t ext4 -o noatime /dev/sda1 /oradata
Device w/s wMB/s avgrq-sz avqqu-sz avwait svctm %util
sdb1 21357.33 167.86 16.10 1.51 0.07 0.02 44.53
Device w/s wMB/s avgrq-sz avqqu-sz avwait svctm %util
sdd1 3343.00 130.68 80.06 3.25 0.97 0.25 83.97
SSD
HDD
iostat information recorded during the ASM tests SSD/RAW, HDD/RAW, 50GB over a 5 minute period
the redo on the 8 x SSD drives is writing 1.28X more data per second and doing 6.4X the writes/second
although the avgrq-sz shows that the HDD configuration is writing more data for each operation.
However, the avwait, svctm and %util show the the HDD configuration is busier and responding slower.
• Top 5 Timed Events (AWR) looked as follows:
Event Waits Time(s) Avg wait (ms) % DB time Wait Class
DB CPU 19,832 78.42
log file sync 6,700,242 4,059 1 16.05 Commit
Event Waits Time(s) Avg wait (ms) % DB time Wait Class
DB CPU 14,255 52.53
log file sync 5,366,376 12,709 2 46.83 Commit
SSD
HDD
• DB Smart Flash Cache is new (11.2) extension for buffer cache area.
• extension to the SGA as L2 cache
ID 1317950.1
db_flash_cache_file = <+FLASH/filename>
db_flash_cache_size = <flash pool size>
alter [table|index object_name] storage (flash_cache keep);
Oracle  Performance On Linux  X86 systems
• Look for high I/O wait (%wa in top, await iostat)
• Look at %util for disk saturation.
• In the AWR most of DB Time is I/O.
 Virtualization performance is proportional to native performance
 VM Drivers Vs Native Drivers Have ~16% Overhead
• top
• iostat –Nx 1 100
• Sar
• Ksar
• Oracle Orion Calibration Tool
http://docs.oracle.com/cd/E11882_01/server.112/e16638/iodesign.htm#PFG
RF95244
• From Redhat 6.x (6.2 best) and EUK 3
• cgroups: Control Groups for Linux Containers
• Provide fine grained control over system resources
• Can be used to throttle page cache use by backup
processes - Often the reason why systems are slower
after overnight backups
cgroup
Cgroup How To Use
yum install libcgroup
/etc/init.d/cgconfig start
/etc/cgconfig.conf
mount {
cpu = /cgroup/cpu;
memory = /cgroup/memory;
}
group http {
memory {
memory.limit_in_bytes = 10M;
}
Thank You!
Baruch Osoveskiy
054-4746164
baruch@brillix.co.il

More Related Content

Oracle Performance On Linux X86 systems

  • 1. On X86 systems Baruch Osoveskiy Senior Consultant Brillix
  • 2. Who Am I • Senior Consultant in brillix • Unix-Linux System Admin since 1997 • DBA oracle and MySQL since 2000 • Linux and security Consultant • blogger in ildba.co.il
  • 3. • Enterprise Distributions • Kernel • Packages • Drivers
  • 4. ◦ Use Unbreakable Enterprise Kernel (UEK), Redhat , Suse Whey ?  Better optimized for large systems and workloads  Better hardware support  Modern Linux features  Patches required for correct Oracle product operation
  • 5. • Bug Fixes from upstream • Most Bug Fixes originate Upstream and are backported to Enterprise Distributions • More code change from upstream means more time before patches are backported -- if it’s even possible to do so • More time for security patches to be backported to the Enterprise versions • Bug Fixes in Upstream apply cleanly to UEK • Better testing • Code is tested by the whole Linux community, not dependent on one OS vendor and their customers • You run the same code that’s in upstream • No backporting/scaffolding to use the latest Linux kernel features (i.e. NFSv4, TCP fast open, etc) • Better contributions • Largest amount of Developers and Company Contributions • Major Backports not required to provide cutting edge features • New features seamlessly used by Oracle products
  • 6. • Updates contain critical security and bug fixes. • Most Enterprise Distribution updates contain new features. • “Security Errata” can contain new features • Updates often contain 1000s of lines of new code from upstream. • Includes features, bug fixes, enhancements and other tweaks • Install Only security and bug fixes to avoid down time from new and untested features • The DB server is not your Laptop do not install unknown/new software.
  • 7. • Use Oracle validated • http://www.oracle.com/technetwork/server-storage/linux/validated-configurations- 085828.html • Install only base + Oracle validated packages • Do not install games, applications .. on the production servers
  • 8. #cd /etc/yum.repos.d # wget http://public-yum.oracle.com/public-yum-ol6.repo # yum list #yum install oracle-rdbms-server-11gR2-preinstall
  • 9. • BIOS • CPU • Memory TYPE , swap • Disk (Disks in the more the better ) • Virtualization
  • 10. Motherboard Handle 0x0003, DMI type 2, 16 bytes Base Board Information Manufacturer: Intel Product Name: S5000PAL0 Processor Processor Information Version: Intel(R) Xeon(R) CPU X5355 Memory Handle 0x0034, DMI type 17, 27 bytes Memory Device Data Width: 64 bits Size: 2048 MB Form Factor: DIMM Set: 1 Locator: ONBOARD DIMM_A1 Bank Locator: Not Specified Type: DDR2 Type Detail: Synchronous Speed: 667 MHz (1.5 ns) • Motherboard 4 Memory Channels (S5000PAL0) 8 Slots (A1/A2/B1/B2/D1/D2)Channels • CPU Intel ClovertownCPUs 1333Mhz (Dual Independent FSB) Bandwidth 10666 MB/s per FSB 21 GB/s Maximum FSB Bandwdith • Memory Memory DDR2 667 = PC2-5300 4 Memory Channels at 5.3GB/s each Memory Bandwidth of 21 GB/s from all 4 channels 16GB memory in total http://ark.intel.com/
  • 11. • Always check for appropriate BIOS settings Look out for: • CPU features • Enable Maximum Performance in the BIOS • Memory • Enable numa • Power Management
  • 12. • will give you 35% better performance ( Test On OLTP). • SMT Simultaneous Multi-Threading • Run 2 threads at the same time per core • Do I have HT ? • Ensuring that HT is enabled at the BIOS. • grep -e "model name" /proc/cpuinfo • http://ark.intel.com/ • Do not Enable TH on I/O bound server it only will make it worse.
  • 13.  cpufreq you can dynamically scale processor frequencies through the CPUfreq subsystem. ◦ Enable Maximum Performance in the BIOS ◦ /sys/devices/system/cpu/cpu<n>/cpufreq/scaling_governor ◦ On Redhat 5.x default is performance ◦ On Redhat 6.x default is normal ◦ echo performance > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
  • 14. sudo modprobe cpufreq_conservative sudo modprobe cpufreq_ondemand sudo modprobe cpufreq_powersave sudo modprobe cpufreq_stats sudo modprobe cpufreq_userspace /etc/init.d/cpuspeed status echo performance > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor grep -i mhz /proc/cpuinfo cpu MHz : 2367.330
  • 15. • Load of more then n on n cpu server is bad • Load of n cpu on n cpu server is good = load of 1.00 = load of 0.50 = load of 1.70
  • 16. • Use Load to find if the server is CPU bound • one, five, and fifteen minute averages • In LINUX CPU bound can impact I/O
  • 17.  cat /proc/meminfo ◦ Free = Cached + Free ◦ All free space on Linux is used for pagecache. ◦ This behavior can be controlled by cgroups. ◦ PageTables large use HugePages.  /dev/shm ◦ implementation of traditional shared memory (ramfs ) ◦ Used by Automatic Memory Management ( AMM MEMORY_TARGET ) ◦ Not working with Hugepages ID 1134002.1
  • 18. • Oracle will recognize NUMA systems and adjust memory and scheduling operations accordingly and NUMA technology allows for faster communication between distributed memory in a multi-processor server. ID 759565.1 • !! Disabling or enabling NUMA Will change application performance. !! • 8 sockets and beyond may see gains of approximately 5% • Enbale on bios level and in grub.conf remove numa=off • _enable_NUMA_optimization=TRUE (look for bugs on your version before enable) • dmesg | grep -i numa NUMA: Initialized distance table, cnt=2 NUMA: Node 0 [0,c0000000) + [100000000,1040000000) -> [0,1040000000) pci_bus 0000:00: on NUMA node 0 (pxm 0) pci_bus 0000:80: on NUMA node 1 (pxm 1)
  • 19. • Without HugePages the memory of the is divided into 4K pages • Using HugePages the page size is increased to 2MB (configurable to 1G ) • HugePages reducing the total number of pages to be managed by the kernel • reducing the amount of memory required to hold the page table in memory.
  • 20. • Use Hugepages Oracle Doc ID 749851.1 • Reduce footprint of individual Oracle database connections. • Increase performance and scalability with fewer tlb misses. • Requires manual tuning after SGA changes, and does not work with AMM (/dev/shm).
  • 21. Without Hugepages o 200 Connections to a 12.9GB SGA o Before DB Startup Pagetables: 7400 kB o After DB Startup Pagetables: 652900 kB o After 200 PQ Slave run query o Pagetables: 6189248k o Time to complete: 00:10:23.60 With Hugepages o 200 Connections to a 12.9GB SGA o Before DB Startup PageTables: 7748 kB o After DB Startup Pagetables: 21288 kB o After 200 PQ slaves run query o Pagetables: 80564 kB o Time to complete: 00:00:18.77
  • 22.  Use Hugepages with VMs for non-swappable, shared pagetables.  Hugepages must allocated in the guest VM and the hypervisor  Oracle VM 3.2.6 contains support for pv-hugepages
  • 23.  What about Swap? ◦ Modern Linux distributions Do not use Swap (swappiness is very low ) ◦ Swap is for OS services only. I do not recommend swap = ram. ◦ Check vmstat output: ensure swap ◦ Do not use Swap as memory – buy more memory ◦ If you have free memory  echo 10 > /proc/sys/vm/swappiness  vm.swappiness in /etc/sysctl.conf
  • 24. • Disks – the more the better • Do not mix. • Use RAID • Use Hardware RAID • RAID 1+0 is best for write performance (logs). • RAID 5 is best for read performance. RAID Level Total array capacity Fault tolerance Read speed (4k) Write speed (4k) RAID-1+0 500GB x 4 disks 1000 GB 1 disk 2X 2X RAID-5 500GB x 3 disks 1000 GB 1 disk 3X Speed of a RAID 5 depends on controller
  • 25. • High “log file sync” event time . • Do Not Use Raid 5 on Redo Logs (low write performance). • Upgrading the CPU enabled more throughput increase for redo (LGWR also requires CPU) • reducing the overall number of commits by batching transactions can have a very beneficial effect. • See if any of the processing can use the COMMIT NOWAIT option. • See if any activity can safely be done with NOLOGGING / UNRECOVERABLE options. • Enlarge the redologs so the logs switch between 15 to 20 minutes. • ID 34592.1
  • 26.  On Linux Use ASM (Block/RAW Device, O_DIRECT )  Raw Devices deprecated by OUI for Oracle 11.2 ID 357492.1  Raw Devices may still bring benefits for intensive redo and large redo log files  Use udev or asmlib to Control Devices
  • 27.  If using file system Bypass journaling when you create a file system., use EXT-2 or EXT-4 with journaling turned off,  journaling turned off eliminates double writes.  “noatime” option eliminates the need for the system to create writes to the file system when objects are only being read.  To Creaet partition and to disable DOS compatibility fdisk -c -u /dev/sda1  To turn off journaling, execute: tune4fs -O ^has_journal /dev/sda1 mount -t ext4 -o noatime /dev/sda1 /oradata
  • 28. Device w/s wMB/s avgrq-sz avqqu-sz avwait svctm %util sdb1 21357.33 167.86 16.10 1.51 0.07 0.02 44.53 Device w/s wMB/s avgrq-sz avqqu-sz avwait svctm %util sdd1 3343.00 130.68 80.06 3.25 0.97 0.25 83.97 SSD HDD iostat information recorded during the ASM tests SSD/RAW, HDD/RAW, 50GB over a 5 minute period the redo on the 8 x SSD drives is writing 1.28X more data per second and doing 6.4X the writes/second although the avgrq-sz shows that the HDD configuration is writing more data for each operation. However, the avwait, svctm and %util show the the HDD configuration is busier and responding slower.
  • 29. • Top 5 Timed Events (AWR) looked as follows: Event Waits Time(s) Avg wait (ms) % DB time Wait Class DB CPU 19,832 78.42 log file sync 6,700,242 4,059 1 16.05 Commit Event Waits Time(s) Avg wait (ms) % DB time Wait Class DB CPU 14,255 52.53 log file sync 5,366,376 12,709 2 46.83 Commit SSD HDD
  • 30. • DB Smart Flash Cache is new (11.2) extension for buffer cache area. • extension to the SGA as L2 cache ID 1317950.1 db_flash_cache_file = <+FLASH/filename> db_flash_cache_size = <flash pool size> alter [table|index object_name] storage (flash_cache keep);
  • 32. • Look for high I/O wait (%wa in top, await iostat) • Look at %util for disk saturation. • In the AWR most of DB Time is I/O.
  • 33.  Virtualization performance is proportional to native performance  VM Drivers Vs Native Drivers Have ~16% Overhead
  • 34. • top • iostat –Nx 1 100 • Sar • Ksar • Oracle Orion Calibration Tool http://docs.oracle.com/cd/E11882_01/server.112/e16638/iodesign.htm#PFG RF95244
  • 35. • From Redhat 6.x (6.2 best) and EUK 3 • cgroups: Control Groups for Linux Containers • Provide fine grained control over system resources • Can be used to throttle page cache use by backup processes - Often the reason why systems are slower after overnight backups
  • 37. Cgroup How To Use yum install libcgroup /etc/init.d/cgconfig start /etc/cgconfig.conf mount { cpu = /cgroup/cpu; memory = /cgroup/memory; } group http { memory { memory.limit_in_bytes = 10M; }

Editor's Notes

  1. אני אדבר על 2 חלקים נפרדים התוכנה וההפצות וחומרה הם כמובן נפגשים אבל יש להם במצגת שלי שתי חלקים נפרדים
  2. הרבה משתשמים ב centos או בפדורה החבילות שם לא ניבדקו הם מעולם לא עברו QA ויכול להיות מצב שמי שקימפל את החבילות עשה עבודה גרוע ואתם תגלו את זה בראשון לחודש בשלוש בלילה ביום שישי .
  3. The DB server is not your Laptop do not install unknown/new software.
  4. השתמשו באתר oracle validate Install only base + Oracle validated packages
  5. כמה זיכרון יש לשרת ? מה זה /dev/shm
  6. התשמשו ב block אפשר גם ב row יש יתרונות ל row כאשר יש כתיבות גדולות Raw נתמח אבל דפריקדט
  7. נוכל להעמיס יותר את ה SSD הבעיה העיקרית של SSD היא הכתיבה מחדש . מחיקה לוקחת זמן רב
  8. יש יכולת מעניינת ל 11.2 הרחבה של ה SGA
  9. היום אנחנו ב UEK 3