Wednesday, October 20, 2010

Memory Tuning - How OS Page Sizes Can Impact Oracle Performance

Theory

Modern operating systems provide each process it's own virtual address space that is potentially larger than physical memory. It does this by mapping the virtual addresses onto actual physical memory pages. The following diagram shows how large virtual address spaces are mapped onto small physical memory. 



The virtual to physical relationship is maintained in mapping tables. Linux maintains a separate map for each process. The map contains one entry for each page where the size of the page is set by the operating system.  The CPU attempts to speed up the mapping of virtual memory to physical addresses by caching a subset of the mapping entries in its internal cache.  Modern large memory systems and the resulting large processes have increased the size of memory mapping tables faster than the cpu cache sizes have increased. The processor cache is usually very small with only maybe 100 entries in may processors. This mismatch causes faults that can have significant performance implications.

System Analysis

Linux uses physical memory to maintain the process memory mapping tables and swaps those mapping tables in and out of the processor cache on context switches and on memory page table faults. This physical memory is unavailable for other uses essentially reducing the amount of memory available to the rest of the system.  The amount of memory used for page tables is the sum of all the virtual process sizes / the page table size multiplied by the size of a page table entry.  So a system with a 20GB Oracle SGA uses 50MB of physical memory for page tables just for the SGA:
  • 20GB Process Size / 4K page size * 10 = 50Megabytes. (5 million entries)
Every connection to an Oracle database runs in its own process that has direct access to the Oracle DB through shared memory.  This doesn't seem too bad until you consider that page tables for the entire shared memory space are duplicated across each process.  So a 20GB Oracle process with 200 database connections would 5GB for the page tables just for the SGA portion of each process:
  • 20GB Process Size/ 4K page size * 10 entry size * 200 connections = 10GB.  (1 Billion page table entries)
This means a 32GB Oracle server with a 20GB SGA and 200 connections uses 30% of the physical memory just for the page tables.  It also means that the oracle processes, the OS and other processes no longer fit into memory resulting in possibly significant paging.    You cannot rely on ps commands or other process monitors when investigating this because Linux always uses all available memory allocating any "extra" to file system buffers and other needs.  You can get your actual page table usage in your system by dumping the contents of /etc/procinfo with cat /etc/procinfo.

Linux provides the ability to change the page size from 4KB to 2MB potentially returning large amounts of memory to the swappable pool.  Calculations show that converting  just the Oracle SGA in the previous example would save almost 10GB. 
  • 20GB Process size / 2MB page size * 10 entry size * 200 connections = 20MB (2 million entries)
Why Aren't Huge Pages the Default?

Huge (2MB) pages are different from the normal 4K pages in that they are locked into memory and do not swap.  This means that extreme care must be taken when selecting the number and size of the huge page space.  The Huge Page space should be large enough to hold the Oracle SGA while leaving enough other memory to run the non-SGA client code, the operating system and other processes. Teams must understand their memory needs so that htey can set the correct number of huge pages. 

Production Example
I was talking with a colleague about their system and we did some back of the napkin calculations.  They had an Oracle RAC with two 72GB servers each with 500 database connections. This ran a little slower than expected when both machines were up but their real problem was that  they were unable to fail over to a single node even though they had plenty of CPU capacity.  The following calculations show why.

Before
Page Table Entries (SGA) = 35GB / 4KB = 8.75MB
Page Table Entries (1000 connections) = 1000 * 8.75MB = 8.75GB
Memory (1000 connections) = 10B * 8.75G = 87.5GB

After
Page Table Entries (SGA) = 35GB / 2MB = 17.5KB
Page Table Entries (1000 connections) = 1000 * 17.5KB = 17.5MB
Memory (1000 connections) = 10B * 17.5M = 175MB

The team converted 35GB of memory to Huge Pages so that the entire SGA fit into the huge pages.  This returned 40GB of free memory to each machine and made it possible fall back to a single machine when they had planned or unplanned outages on a single server. 

A smaller system at another site recovered 7GB of physical memory on their 32GB servers by converting their 20GB oracle SGA to Huge Pages.  This ended all their system paging.

Recap

Intelligent conversion from small memory pages to large memory pages modern modern large RAM systems can have significant performance implications:
  • Larger page sizes result in fewer page table entries in the CPU cache.  This results in fewer cache misses and less CPU stalling.
  • The memory savings resulting from the 1/500 reduction in page table entries is multiplied by the number of processes sharing the same shared memory. 
Every team with large Oracle Databases should investigate the use of Huge Pages no matter what their operating system.

1 comment:

  1. Nice post. Oracle is a relational database management system produced by oracle corporation. Nowadays most of the multinational companies used this oracle database for storing and managing their data's and programs. So learning Oracle Training Chennai is one of the best idea to make a bright career.

    ReplyDelete