Tuesday, November 29, 2016

EBS 12.2 -- problem installing EBS on XFS -- no server JVM at libjvm.so

This will be a quick tip, as it is about a problem that one of my collegues faced with, few days ago.
During my research, I have seen that the information related with the error is not publically available or documented, so that I thought it maybe a good thing to write about it :)

The problem was encountered while doing an EBS 12.2.6 installation on Oracle Linux 7.3 (64bit).The error was seen during very basic Rapidwiz executions such as while executing RapidWizVersion or while executing wrapper scripts that calls java such as buildStage.sh.

Well, this error was actually a java/jvm error and it was displayed in the following format;

Error: no `server' JVM at `<your_stage_location/startCD/Disk1/rapidwiz/jre/Linux_x64/1.6.0/lib/i386/server/libjvm.so'

Note: There may be other errors as well.

When ignored or skipped somehow, the installation may get stuck in the next stages.

Well... This error is caused by the filesystem, that EBS stage and installation directories reside, being XFS.
I wasn't there when the problem was encountered but according to my research, I can say that it is caused by the enhancements that are done in modern XFS filesystems.
The most significant feature that will cause this issue is the 64 bit inodes. Especially when the filesystem size is more than 1 TB, these 64 bit inodes became a problem for 32 bit java.. (You see the above path , I mean the Rapidwiz java path path.. It says i386 there..)

Anyways, a workaround for this problem is to use another fileystem, like ext3 or ext4 on Linux. (tested & verified)

Note that, ext4 also supports 64 bit inodes, but it is off by default. following is from the manual;

i_version ->  Enable 64-bit inode version support. This option is off by default.
Relocating the stage and startCD location to a smaller filesystem(<1tb) can be a workaround too .

Also, the mount point (with xfs filesytem) can be unmounted and remounted using "-o inode32" argument. After this move, the stage directory should be recreated in the mount point (recopied to the mount point)  to guarantee all its files are associated with 32 bit inodes...

As you see, this is an interesting topic and it worths to mention..

One final note on this: Use of XFS with Oracle Database is not supported. :)

Please comment and give alternative- suggestions if you can.

Thursday, November 24, 2016

RAC -- ORA-01578: ORACLE data block corrupted, "computed block checksum: 0x0", Bad header found during dbv

Yesterday, an interesting issue was escalated to me.
It was a production 2 node RAC environment, and the instances on node 1 could not be started.
It was interesting, becuase the instances on node 2, which belong to the same databases as the instance on node1, could be started and used without any errors.
The instance on node 1 was seeing the disks but they were reporting lots of corruptions.
The dbv when executed from node 1, was again reporting lots of corruptions, but the interesting thing was the corrupted block checksum.
The corrupted block checksum reported by dbv on node 1 has the value of 0x0, which means the checksum was okay.
However, even the dbv was reporting the corruptions..
The corruptions reported there were actually reported for the contents of the blocks.
So the checksum was okay but the contents of the block were not as expected.
In other words, Oracle database or dbv was thinking that the problematic blocks should not be there in their current location, as they belong some other place in the datafile/datafiles.


Here is an example output that I have gathered from the environment.
It was a dbv output, which was produced for a datafile, which had corrupted blocks.

Page 3043 is marked corrupt
Corrupt block relative dba: 0x00400be3 (file 1, block 3043)
Bad header found during dbv:
Data in bad block:
type: 6 format: 2 rdba: 0x1f6179e3
last change scn: 0x0008.a95bf9e0 seq: 0x1 flg: 0x04
spare1: 0x0 spare2: 0x0 spare3: 0x0
consistency value in tail: 0xf9e00601
check value in block header: 0xfe88
computed block checksum: 0x0

The important values and strings in this output were:

rdba: 0x1f6179e3  --> this is Hex. When converted to binary, its first 10 bits corresponds to file number.. In this case it seems like it is file 125.
corrupt block relative dba(rdba): 0x00400be3  -> file 1 block 3043
computed block checksum: 0x0
-Bad header found during dbv-
So, again checksum was okay, but rdba was different than corrupted block rdba. dbv reported "Bad Header found"as well. So the placement issue was obvious. In other words; the block were healthy (computed block checksum is 0x0) but the contents of them were actually the contents of  different blocks.

What I did to solve this was checking the OS layer of node 1 . (since node 2 was not encountering this)


I firstly, suspected from the storage, as the redundancy of the diskgroups was "external". However, the problem was there in the multipath.

The problem was in the multipath since there were conflicting paths.

What I mean by conflicting path is;

there was an ASM disk (customer was using asmlib) and its name was IBM19.

This disk was based on a multipath device called mpath13. That is, IBM19 was created using mpath13.

When I use multipath -ll , I saw the mpath13 was there.

However, when I checked the situation from the ASM perspective using oracleasm querydisk , I saw that, the disk IBM19 was based on 2 disks, mpath13 and mpath40. The mpath40 was displayed in the first line of the oracleasm querydisk output, and since asmlib disks goes through the first path that OS gives them , Oracle was reaching the disk through the wrong path. (It should have gone from mpath13, but it was going from the mpath40)

Note that, node 2 was also seeing the asm disk IBM19 in the same way. The only difference was, in node 2; mpath13 was displayed in the first line of the oracleasm querydisk, so that Oracle was reaching the disks through mpath13, thus there were no problems in node 2.

--mpath40 was based on newly added disks, and it was not included in any ASM diskgroup altough it was formatted using" oracleasm createdisk".

So, multipaths were conflicting. oracleasm was conflicting them somehow.

In other words, Oracle was using the mpath40 to reach the disks that supposed to be pointed by the mpath13, and thus Oracle was reaching the wrong disks.

What I did to fix this was , removing the devices that mpath40 was based on. I used, echo "scsi remote-single device" and then executed, multipathd.

The problem went away. After that, I added the devices back using echo" scsi add-single-device" and the conflict didn't appear again.

At the end of the day, the instance on node 1 could be started ( I only recreated the undo tablespace of node 1, as after the database went in to the open state, undo block corruptions were reported --probably caused by earlier instance terminations)

EBS 12.2 -- Using GSCC in 12.2 Upgrades, using frmcmp_batch and gscc.pl (frmf2xml.sh is already missing)

Nowadays, we have started doing EBS 12.2 upgrades. I'm sure about it, because I currently have on-going EBS upgrade projects and there are upgrade related questions asked in my forum, as well.(http://ermanarslan.blogspot.com.tr/p/forum.html)

This blog post will be based on one of these questions recently asked in my forum.
It was related with GSCC (Global Standards Compliance Checker), that we are using in our upgrades for ensuring our filesystem and db objects are compatible with the new EBS 12.2 release.

The instructions for using GSCC is documented in "Using the Online Patching Readiness Report in Oracle E-Business Suite Release 12.2 (Doc ID 1531121.1)".

When we look to that note, we see that there are for steps given as instruction and those 4 steps are all about running sql files.

However, we look carefully, we see that, the same note also gives us the readme of "R1222: STANDALONE READINESS REPORT AND GLOBAL STANDARDS COMPLIANCE CHECKER (GSCC)", wich actually gives us the necessary info that describes how to the the GSCC (gscc.pl) for checking our filesystem objects.

Here;

Using GSCC
==========
The Global Standards Compliance Checker (GSCC) delivered in this patch consists
of the main, engine script $FND_TOP/bin/gscc.pl and a variety of standards
enforcement code (EFC) modules in $FND_TOP/perl/GSCC/OpenEFC/ that check for
common standards issues.
After applying this patch to install the GSCC code, source the applications
environment file and then run the GSCC on a set files under a directory tree
like this:
cd /home/yourdir
$FND_TOP/bin/gscc.pl -f '/home/yourdir/your-code/*'
In this example, gscc.pl will check all of the files located under the
your-code/ directory using the EFC modules located in
$FND_TOP/perl/GSCC/OpenEFC/ and generate a report named gscc-out.log in the
current working directory (/home/yourdir/ in this example). Invoking
gscc.pl without arguments will print a usage message with additional
information.


So, in addition to the 4 steps outline in the document 1531121.1, we need to use gscc.pl to check our filesystem objects (forms, reports etc..) for ensuring that they are compatible with the new EBS release.(to ensure custom code complies with E-Business Suite (EBS) online-patching coding standards.)

However, we can't use gscc.pl directly with forms (maybe reports as well) . That is, for instance we can't give fmb files as input to gscc.pl and this is why I m writing this blog post.

While gscc.pl just doesn't do anyting when given an input fmb files in some of the EBS 12.2 versions, it just gets error in some other versions, as shown below.

cd $XX_TOP/forms/US
$FND_TOP/bin/gscc.pl -f XXOEXXX.fmb
The log file is :
more gscc-out.log
-------------------------------------------------------------------------------
GSCC Compliance Test Results:
-------------------------------------------------------------------------------
File: XXOEXXX.fmb

Configuration Errors (1):
N/A
0 - File conversion failed for
XXOEXXX.fmb
using the command line
frmf2xml.sh OVERWRITE=YES XXOEXXX.fmb
-------------------------------------------------------------------------------
GSCC Totals:
Passes: 0
Fails: 0
Warnings: 0
Under Review: 0
GSCC Errors: 1

--* GSCC Compliance Test: Configuration Error *--

This error is caused by missing frmf2xml.sh, as GSCC tries to use it internally.  (as seen in the above output: using the command line frmf2xml.sh OVERWRITE=YES XXOEXXX.fmb )

Note that, frmf2xml.sh  is not delivered with EBS.

One workaround for this can be taking it from an Client environment which has Oracle forms installed. (frm2xml is there when we have Oracle Forms)
So, we take the contents of the frmf2xml.bat (it is bat, since clients are generally on Windows) and convert it to Bash;

For example:
================
export PATH=$PATH:$ORACLE_HOME/bin
. $ORACLE_HOME/SID_host.env
export FORMS_API_TK_BYPASS=true
export CLASSPATH=$CLASSPATH:$ORACLE_HOME/forms/java/frmxmltools.jar:$ORACLE_HOME/forms/java/frmjdapi.jar:$ORACLE_HOME/lib/xmlparserv2.jar:
$ORACLE_HOME/10.1.2/lib/xschema.jar
for i in 'ls *.fmb'
do
java oracle.forms.util.xmltools.Forms2XML OVERWRITE=yes $i
done
========

So we can run this script for fmb files, get the xml files created and then give that xml files to the gscc.pl for validation.
However, this method has a problem. The problem is that, this java converter can crash when the fmb files are big. (too big:)

Fortuneatly, there is another & more efficient way and it is based on using frmcmp_batch.
The following script can be put on the directory, where the fmb files that we want to check reside and executed to get the whole gscc output in one go.

Note that, this script can be modified for using with the other types(such as rdf) as well;
It processes the filesystem objects one by one.. It produces one final gscc_log which includes all the gscc checks for fmb files.

for i in $(ls *.fmb);
do
echo $i
frmcmp_batch module=$i userid=apps/apps Script=YES Forms_Doc=YES module_type=FORM
$FND_TOP/bin/gscc.pl -f `echo $i | sed '{s/.fmb/.txt/g; }'` -o gscc_`echo $i | sed '{s/.fmb/.log/g; }'`
done
echo "GENERATING GSCC OUTPUT --all FMBs together" > gscc_log
for i in $(ls *.log);
do
echo ---- GSCC OUTPUT FOR $i ----- >> gscc_log
cat $i >> gscc_log
done

Once we get it working, we will see violations in the gscc output, similar to the following..
When we get the violations, we send them to developers to make them fix them.

-------------------------------------------------------------------------------
File: XXXXX_fmb.xml

Errors (1):
File.Gen.41 (Code Reviewers: CODE_REVIEWER)
516 - Potential standards violation found in Query/DML. Please check whether
apps.xx_12_api_interface&amp
is a direct table reference. If so, change it to use the APPS synonym.
516 - Potential standards violation found in Query/DML. Please check whether
apps.xx_12_api_log&amp
is a direct table reference. If so, change it to use the APPS synonym.

-------------------------------------------------------------------------------
File: XXXX_fmb.xml

Errors (1):
File.Gen.41 (Code Reviewers: CODE_REVIEWER)
814 - Potential standards violation found in Query/DML. Please check whether
inv.org_freight
is a direct table reference. If so, change it to use the APPS synonym.

Again, this was an issue reported in my forum and thanks "Linda" for pointing this out.

Friday, November 18, 2016

RDBMS-- Applying DB PSU and OJVM PSU to a Single Instance Oracle Database

In this blog post, I will show you the way and methodology for installing DB PSU and OJVM PSU to a Single Instance 11.2.0.4 Oracle Database.

First let's see how we can find PSU and OJVM patches.
Again, this post is only for 11.2.0.4 , but the method and the logic is the same for almost all the releases.

In order to find the patches, we use Oracle support .
We choose Patch  Search > Recommended Patch Advisor. Then we choose our platform , release and Product, which is the Oracle Database bytheway, and click the search button ,as depicted in the following screenshot;



That's all we need to do, as the advisor will list DB PSU, CPU(SPU) and the OJVM PSU for our database release.
Note that, SPU was Formerly known as Critical Patch Update. The program name which delivers SPUs will still be called Critical Patch Update,

The only thing that we need to do at this point is to download the patches and apply them by following their readme files.

Note that, if we apply PSU, we can't apply CPU(SPU) and we don't need to actually. (CPU can be though as a subset of PSU.)


Well, after learning how to find our PSU and CPU patches, let's take a look at the process of installing them.

Note that, I recommend applying DB PSU and OJVM PSU . These two are enough if we are on a stable and supported release. In addition, always apply DB PSU and OJVM PSU after a release upgrade. If there are fixes for your newly upgraded release, you should apply them , why not upgrading to a better state , right?

So, the following procedure is written for 11.2.0.4 Oracle Databases which are running on Linux x86-64 platform.

Patch 24006111 - Database Patch Set Update 11.2.0.4.161018 (Includes CPUOct2016) ***PSU
-------------------------------------------------------------------------------------
*Shutdown listener and Databases running from target ORACLE HOME
*Add opatch to the path ($PATH env variable)
export PATH=$PATH:$ORACLE_HOME/OPatch

*CD INTO PATCH DIRECTORY
/home/oracle/oracle_11204/psu/24006111

*PREREQ & CONFLICT CHECK
[oracle@coinactdbtst 24006111]$ opatch prereq CheckConflictAgainstOHWithDetail -ph ./
Oracle Interim Patch Installer version 11.2.0.3.4
Copyright (c) 2012, Oracle Corporation.  All rights reserved.

Oracle Home       : /u01/app/oracle/product/11.2.0.4/db_1
Central Inventory : /u01/app/oraInventory
   from           : /u01/app/oracle/product/11.2.0.4/db_1/oraInst.loc
OPatch version    : 11.2.0.3.4
OUI version       : 11.2.0.4.0
Log file location : /u01/app/oracle/product/11.2.0.4/db_1/cfgtoollogs/opatch/opatch2016-11-18_09-34-37AM_1.log

Invoking prereq "checkconflictagainstohwithdetail"
Prereq "checkConflictAgainstOHWithDetail" passed.
OPatch succeeded.

--check readme if conflicts are reported, the action plan (what to do) is in the read me.

*DOWNLOAD LATEST OPATCH and install in to the target ORACLE_HOME

The patch number of Opatch is 6880880
We download it, we delete the old OPatch directory in ORACLE_HOME and then unzip the downloaded zip in to the ORACLE_HOME.. That's all for installing new version of OPatch.
OPatch Version: 11.2.0.3.15 is enough for this work.

*WE APPLY PSU using Opatch apply
export PATH=$PATH:$ORACLE_HOME/OPatch
cd /home/oracle/oracle_11204/psu/24006111
opatch apply

The opatch should complete with the following messages:

Composite patch 24006111 successfully applied.
OPatch succeeded.

*RUN CAT BUNDLE & UTLRP IN THE DATABASES WHICH SUPPOSE TO RUN FROM THE UPGRADED ORACLE HOME

cd $ORACLE_HOME/rdbms/admin
sqlplus /nolog
SQL> CONNECT / AS SYSDBA
SQL> STARTUP
SQL> @catbundle.sql psu apply
SQL> @utlrp.sql
SQL> QUIT

*CHECK PSU VERSION

select *
from sys.registry$history
where bundle_series = 'PSU'
order by action_time;

opatch lsinventory -bugs_fixed | egrep -i 'PSU|DATABASE PATCH SET UPDATE|APPLIED'

Patch 24433711 - Database Security Patch Update 11.2.0.4.161018 (CPUOct2016)  ***SPU
-------------------------------------------------------------------------------------
No need for this patch, as 24006111 fix the same issues.

Patch 24315821 - Oracle JavaVM Component 11.2.0.4.161018 Database PSU (Oct2016)
-------------------------------------------------------------------------------------
*CD INTO PATCH DIRECTORY
/home/oracle/oracle_11204/ojvm_psu/24315821

*PREREQ & CONFLICT CHECK
[oracle@coinactdbtst 24315821]$ opatch prereq CheckConflictAgainstOHWithDetail -ph ./
Oracle Interim Patch Installer version 11.2.0.3.15
Copyright (c) 2016, Oracle Corporation.  All rights reserved.

Oracle Home       : /u01/app/oracle/product/11.2.0.4/db_1
Central Inventory : /u01/app/oraInventory
   from           : /u01/app/oracle/product/11.2.0.4/db_1/oraInst.loc
OPatch version    : 11.2.0.3.15
OUI version       : 11.2.0.4.0
Log file location : /u01/app/oracle/product/11.2.0.4/db_1/cfgtoollogs/opatch/opatch2016-11-18_10-56-25AM_1.log

Invoking prereq "checkconflictagainstohwithdetail"
Prereq "checkConflictAgainstOHWithDetail" passed.
OPatch succeeded.

--check readme if conflicts are reported, the action plan (what to do) is in the read me.

*WE APPLY PSU using Opatch apply
export PATH=$PATH:$ORACLE_HOME/OPatch
cd /home/oracle/oracle_11204/ojvm_psu/24315821
opatch apply

*RUN SQL PORTION OF THE PATCH & UTLRP IN THE DATABASES WHICH SUPPOSE TO RUN FROM THE UPGRADED ORACLE HOME

cd $ORACLE_HOME/sqlpatch/24315821
sqlplus /nolog
SQL> CONNECT / AS SYSDBA
SQL> startup upgrade  (shutdown first, if the database is already started)
SQL> @postinstall.sql
SQL> shutdown
SQL> startup
SQL> exit
cd $ORACLE_HOME/rdbms/admin
sqlplus /nolog
SQL> CONNECT / AS SYSDBA
SQL> @utlrp.sql

*CHECK PSU VERSION

select * from sys.registry$history
where version like '%OJVM%'
order by action_time;

opatch lsinventory -bugs_fixed | egrep -i 'PSU|DATABASE PATCH SET UPDATE|APPLIED'

Thursday, November 17, 2016

RDBMS -- 11.2.0.3 to 11.2.0.4 using DBUA /Linux

In this post, I will give you a quick example about upgrading an Oracle Database from 11.2.0.3 to 11.2.0.4 in Linux 64 bit platform.
This blog post is written to show you the upgrade work required for upgrading a single instance Oracle Database 11.2.0.3, which is running on a cooked filesystem(non ASM) , to 11.2.0.4.

In this blog post , we 'll go through an out of place upgrade process, because; Oracle strongly recommends that we should not upgrade Oracle Database to release 11.2.0.2 or 11.2.0.3 using an existing Oracle home.

Note that, the upgrade method demostrated here is for an example TEST environment, this method may change (at least minor changes may be required) according to your env. However, this blog post is useful for taking a look at the upgrade process, which is actually pretty straightforward and easy once we know what we are doing.

1)
We download "11.2.0.4.0 PATCH SET FOR ORACLE DATABASE SERVER" using Patch 13390677 from Oracle Support.
Note that, this patch includes Oracle Database, ASM, Grid, Oracle RAC and so on.

We download 7 zip files and their descriptions are as follows;

p13390677_112040_platform_1of7.zip Oracle Database (includes Oracle Database and Oracle RAC)
p13390677_112040_platform_2of7.zip Oracle Database (includes Oracle Database and Oracle RAC)
p13390677_112040_platform_3of7.zip Oracle Grid Infrastructure (includes Oracle ASM, Oracle Clusterware, and Oracle Restart)
p13390677_112040_platform_4of7.zip Oracle Database Client
p13390677_112040_platform_5of7.zip Oracle Gateways
p13390677_112040_platform_6of7.zip Oracle Examples
p13390677_112040_platform_7of7.zip Deinstall

2)
We create a directory for installing our 11.2.0.4 Oracle home and configure its privileges accordingly(i.e chown -R oracle:dba)

3)
we unzip the zip files, and execute the runInstaller from the <unzipped_directory>/database via a X windows session.

4)
We follow the installer screens and install Oracle RDBMS as Software Only.















5)
We run utlu112i.sql in the database which needs to be upgraded. (we run the utlu112i.sql located in new Oracle Home, but we run it when our environment is set to the old Oracle Home.)

6)
We take corrective action by analyzing the output/spool file of utlu112i.sql.

i.e:

Corrective Actions reported in a demo environment:
PURGE DBA_RECYCLE_BIN
exec dbms_stats.gather_dictionary_stats;

7)
We backup our source database . (only db backup, oracle home backup is not required as wer are doing an out-of-place upgrade)
Note that, DBUA provides backup options as well.

8)
Next, we run dbua from the new Oracle Home and upgrade our database following the screens, as shown in the following example screenshots..
Note that, we shutdown our source database before executing dbua, but dbua can handle these stop/start things as well.













9)
Lastly, we check the difference between newly created spfile and the old spfile. We ensure nothing important is changed there.
--Note that, DBUA creates the new spfile according to the old spfile, but still checking is a good thing to do.

10)
Lastly, we review te Post Upgrade section of the following document and take necessary action if our environment requires us to do so.

Oracle Database Upgrade Guide

Tuesday, November 15, 2016

Linux -- Huge Pages in real life, memory consumption, Huge pages on swap operations, using "overcommit" / nr_overcommit_hugepages

This blog post will be about Huge pages on Linux.
I actually wrote a comprehensive article about Linux Memory Optimization (including Hugepages) earlier, but this blog post will be a little different.
Today, I want to make a demo to show you the Hugepages in real life and the memory locking mechanism that we need to get used to, when we enable Hugepages.
The thing that made me write this article was a question that one my collegues asked last week.
My collegue realized that after rebooting his database server, the memory directly becomes "used". Even before starting the database, he could see the memory is in use when he executed the "free" command.
This question asked me on the phone and I directly answered that "it is because huge pages".
However, I wanted to make a demo and see this statement in real life.

Well let's revisit my earlier blog post and recall the general information about the Huge pages:
(I strongly recommend you to read this blog post,as well -> http://ermanarslan.blogspot.com.tr/2013/12/oracle-linux-memory-optimization.html)


When we use hugepages, we have smaller page tables in terms of size, because there will be less pages to handle, as Hugepages are 2MB(or more , it depends on the system) sized. In addition the hugepages are never swapped out, they are locked in memory.. Kernel does less work for bookkeeping of virtual memory, because of the larger page sizes.. Note that: Hugepages is not compatible with automatic memory management that Oracle does if configured to do..


Let's start our demo. (Note that, my demo env is an Oracle Linux 6.5 x86_64 and the kernel is an UEK 3.8.13-16.2.1.el6uek.x86_64)

HUGEPAGES OCCUPY MEMORY ONCE THEY ARE CONFIGURED (altough they are not used by any applications)

Firstly, I will show you the affect of hugepages. You will see the hugepages are never swapped out and when they are configured; they occupy memory, eventhough they are not used at all.

Initially, our hugepages are not configured as seen below;

[root@jiratemp ~]# cat /proc/meminfo |grep Huge
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB

Next, we sync and drop filesystem caches to have a clean environment in terms of memory. (we do this as we will use free command to see the affect of our actions)

[root@jiratemp ~]# sync;echo 3 > /proc/sys/vm/drop_caches ; free -m

[root@jiratemp ~]# free -m
             total       used       free     shared    buffers     cached
Mem:          7985        609       7375          0          4         47
-/+ buffers/cache:        557       7427
Swap:         8015          0       8015

Afterwards; we configure 2048 hugepages directly using proc fs  and directly check the memory usage using free command;
[root@jiratemp ~]# echo 2048 > /proc/sys/vm/nr_hugepages    (Hugepages are 2MB)
[root@jiratemp ~]#  free -m
                    total       used       free     shared    buffers     cached
Mem:          7985       4709       3275          0          4         49
-/+ buffers/cache:       4655       3330
Swap:         8015          0       8015

A quick explanation for the free command output:

Mem: total= Total physical memory
Mem: used = MemTotal - MemFree
Mem: free = Free memory
Mem: shared = meaningless nowadays, can be ignored
Mem: buffers Buffers
Mem: cached Cached memory
-/+ buffers/cache: used MemTotal - (MemFree + Buffers + Cached)
-/+ buffers/cache: free MemFree + Buffers + Cached
Swap: total Total swap
Swap: used SwapTotal - SwapFree
Swap: free Free Swap



You see 4709M are used. One page is 4K, one hugepage is 2M, so 2048 Hugepage makes 4096M
free command reports mb values when used with "-m" argument. See the used value is 4709 (609M was already used before we configure hugepage) . 4709-4096= 613 (almost equal to 609). So these used megabytes are caused by hugepages.
I remind you, we didn't use those hugepages, but once configured, they occupy memory as you see.

Well, it is certain that huge pages are reserved inside the kernel .

HUGEPAGES ARE NOT SWAPPED OUT EVEN UNDER PRESSURE (even when they are not used by any applications)

Hugepages can not be swapped out.. It is real. In order to test it, I wrote a python program. This program takes only one input , the memory size that we want it to allocate.

So , we use this program to create a memory pressure and to see if we can allocate hugepages when there is a memory pressure.

First, we configure 2048 hugepages;

[root@jiratemp ~]# echo 2048 > /proc/sys/vm/nr_hugepages    (Hugepages are 2MB)

[root@jiratemp ~]#  free -m
                    total       used       free     shared    buffers     cached
Mem:          7985       4709       3275          0          4         49
-/+ buffers/cache:       4655       3330
Swap:         8015          0       8015

As you see above, there is only 3275 mb free, so almost all used memory is occupied by Hugepages.
Now, we execute our  python program and try to allocate 4500MB of memory.

 PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                                           
 3740 root      20   0 4615m 3.4g  424 S  0.0 44.2   1:09.38 python ./erm 4500 

While our program is running, we take the free command output in every second to see the system wide memory usage ;

[root@jiratemp ~]# free -m -s 1
                     total       used       free     shared    buffers     cached
Mem:          7985       4701       3284          0          0         43
-/+ buffers/cache:       4656       3328
Swap:         8015          0       8015

                     total       used       free     shared    buffers     cached
Mem:          7985       4701       3284          0          0         43
-/+ buffers/cache:       4656       3328
Swap:         8015          0       8015

                    total       used       free     shared    buffers     cached
Mem:          7985       4703       3281          0          0         45
-/+ buffers/cache:       4657       3327
Swap:         8015          0       8015

                    total       used       free     shared    buffers     cached
Mem:          7985       5750       2234          0          0         46
-/+ buffers/cache:       5704       2281
Swap:         8015          0       8015

                    total       used       free     shared    buffers     cached
Mem:          7985       7928         56          0          0         46
-/+ buffers/cache:       7881        103
Swap:         8015          0       8015

....
.............
........................

You see , as our program allocates more memory in every second, free memory is getting closer to 0 (zero).

Moreover; because of this pressure, our server starts to hang and when we check the situation using top command (using our limited cpu cycles),  we see that kswapd is aggresively running..

*COMMAND HANGS..
*SWAP DAEMON IS RUNNING AND SWAP USED IS INCREASED IN EVERY SECOND!!

PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND  
 60 root      20   0     0    0    0 R 11.2  0.0   0:16.84 kswapd0  

[root@jiratemp 3740]# cat status|grep Swap
VmSwap:  1018512 kB

Morever, when we check our process; we see its Resident memory is 3.4G, as seen below;

 PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                                           
 3740 root      20   0 4615m 3.4g  424 S  0.0 44.2   1:09.38 python ./erm 4500 

However,  the virtual memory of our process is 4615m, as you see above.

(VIRTUAL MEMORY= 4615 MB but RES=3.4G)

So , this is a little interesting right? Because we requested 4500 m memory, but our resident memory is 3.4G..

The situation is the same when we  run a C program and try to allocate 5500 megabytes..

Program just slows down when it reaches the 3 GB of memory and swap activities are triggered.

Our application stuck at this point, but if we wait the swap daemon to swap out the memory , we can see that our program can actually allocate 5500 MB.. Look program says, I m allocating 5550 th MB ;

Currently allocating 5535 MB
Currently allocating 5536 MB
Currently allocating 5537 MB
Currently allocating 5538 MB
Currently allocating 5539 MB
Currently allocating 5540 MB
Currently allocating 5541 MB
Currently allocating 5542 MB
Currently allocating 5543 MB
Currently allocating 5544 MB
Currently allocating 5545 MB
Currently allocating 5546 MB
Currently allocating 5547 MB
Currently allocating 5548 MB
Currently allocating 5549 MB

But; when we look at the top output, we see the RES is only 3.6 G, however again VIRT is increased. So swap is there.

 PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND   
 4191 root      20   0 5418m 3.6g  152 R 64.0 45.7   1:10.96 ./a.out  

You see; when we look at the top output above, we see the RES is only 3.6 G, however VIRT is 5500. So swap is there...
So our page are swapped out!  (Remember VIRT = The  total  amount  of  virtual  memory  used by the task.  It includes all code, data and shared libraries plus pages that have been swapped out)

MAN TOP ->
VIRT  --  Virtual Image (kb)
          The  total  amount  of  virtual  memory  used by the task.  It includes all code, data and shared libraries plus pages that have been swapped out. (Note: you can
          define the STATSIZE=1 environment variable and the VIRT will be calculated from the /proc/#/state VmSize field.)

Well, if we disable the hugepages; we can allocate "resident" memory using the same program... Here is an example output of top for the same program; (hugepages disabled),.

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND            
 4333 root      20   0 6623m 6.5g  376 S 67.2 82.9   0:03.33 a.out        

You see RES=6.5g.. So you see it allocates from the RESIDENT memory. (a.out is a C program which continously and endlessly allocates and uses memory)

So, this proves that hugepages are not swapped out . They are not swapped out, even in the case of a memory shortage. Also, when we have a memory shortage and if almost all the memory is allocated by the Hugepages, then we can see the pages that we recently allocated by our program, are swapped out to make room for our program to allocate more memory :).

Another interesting thing is; if there is not enough free memory, Hugepages can not be configured properly.
I mean, we can allocate regular pages from a self written program and we can test this.
When we do such a test, we see the hugepages will not be allocated altough we issue the commands;

Well we allocate all the memory by using a self written application and then configure 2048 hugepages..
Interesting thing is that, our command doesn't encounter any errors but hugepages are not allocated at all..

[root@jiratemp ~]# echo 2048 > /proc/sys/vm/nr_hugepages
[root@jiratemp ~]# echo $?
0
[root@jiratemp ~]# hugeadm --pool-list
      Size  Minimum  Current  Maximum  Default
   2097152        1        1        1        *
[root@jiratemp ~]# grep Huge /proc/meminfo
HugePages_Total:       1
HugePages_Free:        1
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB

As you see, setting hugepages is a task that should be done carefully. As they are not swapped out, system may hang in case of a memory shortage and the risk of memory shortage is actually increased when we you hugepages or let's say when we configure(not even used) hugepages.

Well, there is alternative way for configuring hugepages actually. User the overcommit  configuration, we can at least decrease the memory allocation of our hugepages when they are not used by any process.

OVERCOMMIT SETTING FOR HUGEPAGES:

Let's introduce the overcommint setting for the hugepages first;

/proc/sys/vm/nr_overcommit_hugepages specifies how large the pool of
huge pages can grow, if more huge pages than /proc/sys/vm/nr_hugepages are
requested by applications. Writing any non-zero value into this file
indicates that the hugetlb subsystem is allowed to try to obtain that
number of "surplus" huge pages from the kernel's normal page pool, when the
persistent huge page pool is exhausted. As these surplus huge pages become
unused, they are freed back to the kernel's normal page pool.

So, if we set the hugepages to a lower value and set the overcommit hugepages to a large value (large enough to meet our peak hugepage requests); then we can have a dynamic hugepage allocation in our environments.

Let's make a demo and see how it is done and how it behaves;

We set 100 hugepages and we set 1000 overcommit hugepages

[root@jiratemp ~]# echo 100 > /proc/sys/vm/nr_hugepages
[root@jiratemp ~]# echo 1000 > /proc/sys/vm/nr_overcommit_hugepages

We check the /proc/meminfo and hugepage pool list and see only 100 hugepages are allocated (as no processes use any hugepages at the moment)

[root@jiratemp ~]# grep Huge /proc/meminfo 
HugePages_Total:     100
HugePages_Free:      100
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB

[root@jiratemp ~]# hugeadm --pool-list
      Size  Minimum  Current  Maximum  Default
   2097152      100      100     1100        *

We sync and clear the caches to have a fresh start in terms of memory and allocate shared memory from the huge pages. (just like an Oracle Database does :)

[root@jiratemp ~]# sync;echo 3 > /proc/sys/vm/drop_caches ; free -m
             total       used       free     shared    buffers     cached
Mem:          7985        736       7249          0          0         21
-/+ buffers/cache:        713       7271
Swap:         8015        125       7890

Note: for allocating shared memory from the hugepages, I use the following C program:

#include<stdio.h>
#include<sys/shm.h>
#include<sys/stat.h>
#include<unistd.h>
int main(){
    int segment_id_1;
    char *shared_memory_1;
    struct shmid_ds shmbuffer;
    int segment_size;
    const int shared_segment_size=0x40000000;
    /*Allocate a shared memory segment*/
    segment_id_1=shmget (IPC_PRIVATE,shared_segment_size,SHM_HUGETLB | IPC_CREAT | SHM_R | SHM_W);
    shared_memory_1=(char*)shmat(segment_id_1,0,0);
    sprintf(shared_memory_1,"ERMAN");
    sleep(100);
    return 0;
}

0x40000000 means 1GBytes, which means 512 Hugepages in Linux.
So , we tell our program to allocated 1GB ( 512 Hugepages) shared memory from the hugepages.

Remember, our hugepage count was 100, so there were 100 hugepages in our hugepage pool as shown earlier. On the other; we set 1000 overcommit hugepages. 

Well, when we execute this program, we see 512 pages are allocated. So our pool has enlarged :)

[root@jiratemp ~]# grep Huge /proc/meminfo 
HugePages_Total:     512
HugePages_Free:      511
HugePages_Rsvd:      511
HugePages_Surp:      412
Hugepagesize:       2048 kB

[root@jiratemp ~]#  hugeadm --pool-list
      Size  Minimum  Current  Maximum  Default
   2097152      100      512     1100        *

[root@jiratemp ~]# free -m
             total       used       free     shared    buffers     cached
Mem:          7985       1567       6417          0          3         25
-/+ buffers/cache:       1538       6446
Swap:         8015        125       7890

Now, our free memory decreased by 1024 Mbytes.
So overcommit works perfectly. We had 100 hugepages at first. So our hugepages were occupying only 200 Mbytes initially. However, when we need more, we could allocate it (thanks to overcommit)
We got ourselves an environment which can do a dynamic hugepage allocation..

WHAT ABOUT ORACLE DATABASE? CAN IT USE OVERCOMMIT HUGEPAGES?

Let's try with the Oracle database;

First of all our limits.conf should be configured properly to use hugepages.. In other words; oracle Os user must be able to lock memory when it is instructed to use hugepages (especially Hugepages only!)

This can be done in 2 ways.

1) By setting cap for oracle binary with root account
cd $ORACLE_HOME/bin
setcap cap_ipc_lock=+ep oracle

2) By adding the following (change the values according to your needs) to the limits.conf
oracle soft    memlock        unlimited
oracle hard    memlock        unlimited

If we don't do one of these configurations; we end up with the following ORA-27137 ;

[oracle@jiratemp ~]$ sqlplus "/as sysdba"

SQL*Plus: Release 12.1.0.1.0 Production on Thu Nov 10 11:08:01 2016
Copyright (c) 1982, 2013, Oracle.  All rights reserved.
Connected to an idle instance.
SQL> startup nomount;
ORA-27137: unable to allocate large pages to create a shared memory segment
Linux-x86_64 Error: 1: Operation not permitted
Additional information: 14680064
Additional information: 1

Well, suppose we configured our memory lock parameters or set cap for oracle binary, 
and configured our memory related database parameters as follows;
--these parameters are used to configure the initial memory allocation of Oracle Database ,when it is started. (the parameter sga_target is for this actually)

sga_max_size = 1000M
sga_target = 500M
use_large_pages_only=ONLY --> this instruct oracle to use only the hugepages.

We set hugepage overcommit to 1000 and hugepage count to 100 ;

[root@jiratemp ~]# sync;echo 3 > /proc/sys/vm/drop_caches ; free -m

total used free shared buffers cached
Mem: 7985 205 7779 0 0 16
-/+ buffers/cache: 189 7796
Swap: 8015 30 7985

[root@jiratemp ~]# grep Huge /proc/meminfo

HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB

[root@jiratemp ~]# hugeadm --pool-list
Size Minimum Current Maximum Default
2097152 0 0 0 *

[root@jiratemp ~]# echo 100 > /proc/sys/vm/nr_hugepages
[root@jiratemp ~]# echo 1000 > /proc/sys/vm/nr_overcommit_hugepages

[root@jiratemp ~]# hugeadm --pool-list
Size Minimum Current Maximum Default
2097152 100 100 1100 *

So, we startup our Oracle database (note that , starting the database in nomount mode is enough for this test) as follows;

oracle@jiratemp ~]$ sqlplus "/as sysdba"

SQL*Plus: Release 12.1.0.1.0 Production on Thu Nov 10 11:23:41 2016
Copyright (c) 1982, 2013, Oracle.  All rights reserved
Connected to an idle instance.
SQL> startup nomount;
ORACLE instance started.
Total System Global Area 1043886080 bytes
Fixed Size                  2296280 bytes
Variable Size             876611112 bytes
Database Buffers          159383552 bytes
Redo Buffers                5595136 bytes

We check our parameters are set.

SQL> show parameter sga_target   

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
sga_target                           big integer     500M

SQL> show parameter sga_max_size  
sga_max_size                     big integer   1000M


Once, our database is started, we directly check the huge page usage;

Remember, our overcommit hugepage number was set to 1000 (2048 Mb) and our hugepage number was set to 512 (1024 Mbytes)

[root@jiratemp ~]#  grep Huge /proc/meminfo 
HugePages_Total:     501
HugePages_Free:      252
HugePages_Rsvd:      252
HugePages_Surp:      401
Hugepagesize:       2048 kB

You see? The hugepages the total hugepages count is now 501. (it was 100 earlier), and the Hugepages surplus is 401.  So Linux let oracle to allocate more hugepages than configured by respecting the overcommit configuration. In other words;  Linux let Oracle to allocate(reserver+allocate) 512 Hugepages by enlarging the Hugepage pool automatically and dynamically.

As we set sga_target parameter to 500M, Oracle allocated almost 250 Hugepages.(see Hugepages_Free is 252), also Oracle reserved 252 more hugepages as sga_max_size was set to 1024MB.

Now, we set sga_target parameter to 1000Mbytes and see if Linux will let Oracle to use all the 512 Hugepages.

SQL> alter system set sga_target=1000M scope=memory;

System altered.
[oracle@jiratemp ~]$ grep Huge /proc/meminfo 
HugePages_Total:     501
HugePages_Free:        2
HugePages_Rsvd:        2
HugePages_Surp:      401
Hugepagesize:       2048 kB

Yes. Oracle allocated almost 500 Hugepages to build its 1024M sized SGA on top of these hugepages.

When we shutdown our Oracle database, we see the hugepage pool dynamically deallocated and the space occupied by Oracle 's Hugepages are freed as expected.

SQL> shu immediate;
ORA-01507: database not mounted
ORACLE instance shut down.

[root@jiratemp ~]#  grep Huge /proc/meminfo 
HugePages_Total:     100
HugePages_Free:      100
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB

HOW ORACLE DOES IT?

Well, we see that, Oracle can go with the the hugepage overcommit setting(nr_overcommit_hugepages).
What actually Oracle does is, that it honours the sga_max_size in the instance startup and make commitment to allocate the necessary hugepages for satisfying the sga_max_size. However, actually it only allocates the necessary hugepages for satisfying the sga_target.
Although the configured number of hugepages is set to a lower value, Oracle can allocate the necessary amount of hugepages to satisfy its sga_max_size, because an hugepage overcommit configuration is in place.Furthermore, when Oracle database is shut down, the surplus of hugepages (overcommit count - number of hugepage count) is given back to OS.

CONCLUSION:

So, following are the conclusions;

  • Hugepages are not swapped out in any circumstances.
  • Hugepages are not given to any process that want to allocate regular pages.
  • Hugepages occupy memory once they are configured , eventhough they are not used.
  • nr_overcommit_hugepages is good thing in the relevant cases. 
  • nr_overcommit lets a process to allocate more hugepages than configured.
  • When a process shuts down gracefully or release its memory, the hugepages used by that process are given to back to system and if that process uses overcommit hugepages, the nr_overcommit_hugapages value will be set to its default.
  • Oracle will go with the nr_overcommit_hugepages parameter.
  • Oracle will allocate(not use) shared memory using Huge pages based on the sga_max_size.(We can think like Oracle allocates shared memory using shmget in the instance startup, in other words; the segment size argument given in the shmget equals the value defined in sga_max_size) . So, when Oracle allocates shared memory in startup, it actually reserves the hugepages and the Reserved hugepage count increases. On the other hand, Oracle will use the number of hugepages based on the sga_target and that's why Hugepagess free(seen in /proc/meminfo) decreases accordingly.
  • If we use hugepages (only), we should set sga_max_size equal to the sga_target. (in order to not to waste our memory)

Well, after knowing these; I want to give an example, where overcommit memory can be used to address a memory wastage problem.

QUESTION:

Suppose; 
  • We have 3 databases in a single server environment. Let's say these are TEST , DEV and UAT.
  • We sometimes work only in TEST, sometimes in DEV and sometimes, in 2 of these databases and  sometimes in all of these 3.
  • We also have processes/session which do lots of PGA work.
  • We have 100 Gigabytes of Ram and we want to reserver 10GB of it to OS.
  • We want to have 20GB sized SGA for each of these databases.
  • We want to use sga_max_size/sga_target (ASMM -- not AMM) and use Hugepages as well.

Now,  consider which one of the following setting is a better idea?

1)

 echo 10240 > /proc/sys/vm/nr_hugepages
 echo  30720> /proc/sys/vm/nr_overcommit_hugepages

2)
 echo 30720> /proc/sys/vm/nr_hugepages
 echo  30720> /proc/sys/vm/nr_overcommit_hugepages

MY ANSWER:

*My answer is 1) . 
The reason is explained earlier. :)
What is yours? please feel free to comment...

Friday, November 11, 2016

EBS -- DST V28 for EBS "Apps Tier" released.

Until yesterday, DST V28 patches were not available for EBS Apps Tier components.
For those, who are not familiar with the latest DST changes, the DST V28 includes the following changes;
  • Turkey switched from EET/EEST (+02/+03) to permanent +03,effective 2016-09-07
  • Renamed Asia/Rangoon to Asia/Yangon
Today, I m here to announce that; DST V28 patches for EBS apps tier have been released. Both Solaris and Linux DST Patches have been uploaded and released.
Check out the MOS document for the patches : : Complying with Daylight Saving Time (DST) and Time Zone Rule Changes in E-Business Suite 12 ( Doc ID 563019.1 )

These documents were actually contains the instructions in the DST document prepared for the RDBMS (2185562.1)), but they also included instructions specific to the EBS under the section named Application Server Tier Patches.

As I wrote in one of my previous blog posts; 

The interesting and dissapointing thing was the patches documented in those notes, were not present for EBS application tier components, until yesterday . This was true for EBS 12.2, as well.
The patches were not available for EBS because the application server components used in EBS, were actually old versions.
For example, the FMW home version, for instance was 11.1.0.7. The Oracle Homes were also old.
It didn't matter if we use EBS 12.2 or 12.1 or 12.0... The patches were not there and we needed to log Support Requests in order to obtain these patches.
This was quite dissapointing because this was even true for the latest EBS , which was 12.2.

However; as of today, the patches have been released for EBS Apps tier components. (I mean the patches prepared specific to the EBS Apps Tier component versions)


Note that, the patch seen above is a prereq patch, the actual patch is Patch 24701840: RDBMS - DSTV28 UPDATE OCT 2016 - TZDATA2016G and it is available for Oracle Solaris and Linux X86.

For 12.2 -> Oracle E-Business Suite 12.2.x users must patch the forms 10.1.2 and FMW Oracle Homes with the appropraiate required support files (rsf's) to uptake the latest timezone information. 10.1.2 Forms should be patched with the 10.1.5 version of patch 24701840 and FMW must be patched with the 11.1.0.7 version of 24701840 .. ( 11.1.0.7 version of patch can be applied on top of (FMW 11.1.1.9 or 11.1.1.7))

For 12.1 -> Oracle E-Business Suite 12.1.x and 12.0.x users should patch the 10.1.2 and 10.1.3 Oracle_Homes .Oracle E-Business Suite 12.1.x and 12.0.x users must patch the 10.1.2 and 10.1.3 Oracle _Home's with the 10.1.0.5 (32-bit) version of the DST upgrade patch.

Thursday, November 3, 2016

ODA X6-2M, the First POC and the First look

I did my first POC with one of the newest product of ODA family, ODA X6-2 M (Medium).
I must say that ODA X6 is not only an easy-to-deploy machine. That is, ODA also gives us the opportunity to deploy our databases and oracle home 's very easily.
In the previous month, I wrote an article of this new 1U ODA Machine and I was already excited. However; working hands on was a different pleasure...


Let's revisit the general specs of ODA X6-2 M;

ODA X6-2 M is a 1U machine.
It has one server for serving both the compute and storage services.
ODA X6 M has a built in 6.2 TB High Speed NVME .
It supports Oracle Database SE, SE one, SE 2 or EE . (earlier ODA environments ODA X5 and so on, could only supported Oracle Database EE -- unless they are deployed with Virutalized environment option)
ODA X6-' M can not be expanded horizontally and it can not be virtualized at the moment.
The ODA X6-2 M machine specifications are as follows;

CPU: 2x10 core (2.2 GHZ Intel Xeon E5-2630 V4)
Memory:  256 GB ( 8x32GB)
Storage: 6.4 TB(2x3.2TB) NVMe
Boot Disk : 2x480GB SSD (mirrored)
Ethernet : 4x10GBase-T
Fiber : 2x10GbE SFP+

The results of the POC are not delivered yet, but still; I will write what I have done and seen on this POC work.

First of all; the deployment of new ODA was pretty simple. It already comes configured but if we want to reimage the OS, we just do it using ILOM.
The deployment of the Grid and Oracle Home infrastructure is done through the command line and it is very simple as well.

All the necessary information is available in Database Appliance X6-2S and X6-2M Deployment and User’s Guide (https://docs.oracle.com/cd/E75549_01/doc.121/e76082/toc.htm)

The databases can be created using Oracle Database Appliance Manager Web Console, which is available in the following url :https://<ODA_IP>:7093/mgmt/index.html. (this manager interface becomes up&running just after the deployment)
--note that, Web Console is available for OXA X6-2S,M and L, ODA X6 HA doesn't have a web console.

Anything that can be done using the web console or maybe more, can also be done using the command line interface of ODA named "odacli" (formerly known as oakcli).

odacli can be used, by "cd"ing to the /opt/oracle/dcs/bin directory using root account and executing ./odacli command.

The databases deployed via the cli or web console become immediately up&running. The databases are registered to the cluster registry by default as well.. (we can directly administrate them using srvctl commands)

the default listener is started from the GRID, as it has been the new trend since last few years.

One of the new things with this new ODA is that, we can now deploy Standard Edition databases in to the ODA environment, as well.

The new ODA machine supports 11.2.0.4 and 12c databases and it is quite simple to deploy them, as said earlier.

The resources for the use of the databases (like CPU count) can be configured during the creation of the databases. So, the new ODA gives us the opportunity to configure the cpu resource utilization even in creating the database using a web interface.
ODA X6-2S and M do not provide virtualization option, at the moment.
ODA X6 also offers capacity on demand, however this is only true for Enterprise Edition. Capacity on demand can not be used with the Standard Edition.

Example: Enabling 12 cores of an ODA X6-2 HA:

cd /opt/oracle/dcs/bin
[root@oak bin]# ./odacli describe-cpucore
Node  Cores  Modified                       Job Status     
----- ------ ------------------------------ ---------------
0     20     February 20, 2017 12:10:18 PM EET Configured   

[root@odademo bin]# cat /proc/cpuinfo|grep processor| wc -l
40

[root@oak bin]# ./odacli update-cpucore --cores 12
{
  "jobId" : "b53d8ab8-c59a-48bf-805f-fe1c477d6e6a",
  "status" : "Created",
  "message" : null,
  "reports" : [ ],
  "createTimestamp" : "February 20, 2017 14:57:44 PM EET",
  "description" : "CPU cores  service update",
  "updatedTime" : "February 20, 2017 14:57:44 PM EET"
}

[root@oak bin]# ./odacli list-jobs

ID             Description                                 Created                             Status    
---------------------------------------- --------------------------------------------------------------------------- 
...
4d8bb3d     Provisioning service creation   February 20, 2017 2:01:05 PM EET    Success   
b53d8ab8-c59a     CPU cores  service update  February 20, 2017 2:57:44 PM EET    Success   

[root@oak bin]# ./odacli describe-cpucore

Node  Cores  Modified                       Job Status     
----- ------ ------------------------------ ---------------
0     12     February 20, 2017 2:57:45 PM EET Configured   

[root@odademo ~]# cat /proc/cpuinfo|grep processor| wc -l

24

The filesystems that we can use on this new ODA machine, are ACFS and ASM.


When used ACFS, the filesystems come with the autoextend configuration, so it is pretty simple to administrate it.

Let's talk about my POC...

The customer was a big Healthcare group and they were thinking to position this new 1xU ODA machine in their new datacenter located in one of their new branches.
(1U ODA X6-2 machine has also an advantage in terms price)

The customer wanted to have an engineered , easy-to-deploy and easy-to-manage solution and had a  concern about their licenses. (they had standard edition)

At this point, new ODA came in to play.

I did this POC using an ODA X6-2M.

I created a new 11.2.0.4 database using Oracle Database Appliance web console just to show the customer how easy to deploy Oracle Databases in this machine.

I also used upgraded one of their TEST databases (Standard edition, sized 200 GB) from 11.2.0.3 to 11.2.0.4 and then migrate it to ODA using "rman duplicate" method for making their own databases available in this machine for having a better POC.

Take a detailed look at the POC work;

  • Upgraded the source database in place
  • Created an online RMAN backup + plus archivelog.
  • Exported the backup location using NFS. 
  • Mounted the backup directory to ODA using NFS
  • Duplicated the database from backup using RMAN in ODA.
  • Configured the initialization parameters
  • I used the Oracle Home that is deployed with ODA Webconsole. As the Oracle Home was deployed with the latest PSU patch applied, I executed the SQL Layer of the PSU patch from this Oracle Home and installed the PSU in the newly migrated database. (I executed only the database tier of the PSU patch, the oracle home binaries were already patched as the Oracle Home was deployed with the PSU by default)
Done!.

At the moment, the customer is their application tests and performances on this newly migrated database, which is running on ODA X6-2M.

I will revisit this blog post, once the results of this POC is delivered to me.

Wednesday, November 2, 2016

Exadata -- Cell NTP problem, real life diagnostics

I have faced a NTP - time synchronization problem in one of my Exadata customers.
The problem was observed in the cells, in other words; the Compute Nodes had no trouble when synching time from the NTP server.

Before going forward, I want to give an overview of the Exadata network;

I will just give the difference between Compute Nodes and cell nodes in terms of network access paths.

The NTP related network path for Exadata Storage cells, goes through the Ethernet switch , which comes built-in with the Exadata Rack. ( this path is the only way for reaching the ntp server from a cell)
The NTP related network path for the Compute Nodes, however may go through either the Ethernet switch or the bonded client access interface, which is directly connected to the Client's network.

So , let's say, if your client access network and NTP server's network are 192.168.0.* and if your bonded client interfaces have ips addresses from the same network ( 192.168.0.* ) , then your compute node reaches the NTP server directly from the client network interfaces. ( I have seen this. This is true)

On the other hand, when your cell management interfaces (which is the only network related with accessing the NTP Server from the cell nodes) have 10.10.10.* and when your NTP server's network is 192.168.0.*, the connection is not direct anymore.

Here is a demo showing this difference;

Compute node with ip : 192.168.0.82 reaches the ntp server directly.
[root@osrvdb01 ~]# traceroute 192.168.0.14
traceroute to 192.168.0.14 (192.168.0.14), 30 hops max, 40 byte packets
 1  ntpserver (192.168.0.14)  0.086 ms  0.110 ms  0.103 ms

Cell node with it 10.10.10.104 reaches the ntp server indirectly through the gateway.
[root@osrvcel03 ~]# traceroute 192.168.0.14
traceroute to 192.168.0.14 (192.168.0.14), 30 hops max, 40 byte packets
 1  10.10.10.1 (10.10.10.1)  0.101 ms  0.124 ms  0.118 ms
 2  ntpserver (192.168.0.14)  0.188 ms  0.186 ms  0.181 ms

After seeing this difference and explaining the general situation briefly, let's take a look at a possible scenario which I have faced in one my clients.
I m giving this scenario , because the diagnostic work performed for determining the cause here was important. That is, the  network team disagreed the underlying network/routing problem till I delivered the necessary diagnostics to them.

Scenario:
  • The cell nodes were not synching time from Ntp. It could be seen with "the date command";

  • The cell wall was not the cause because the rules for reaching the NTP server were there in the cellwalls rule list;
[root@osrvcel01 ~]# /etc/init.d/cellwall state|grep 192.168.0.14
    0     0 ACCEPT     udp  --  bondib0 *       192.168.0.14         0.0.0.0/0           udp spt:123 
    0     0 ACCEPT     tcp  --  bondib0 *       192.168.0.14         0.0.0.0/0           tcp spt:53 
    0     0 ACCEPT     udp  --  bondib0 *       192.168.0.14         0.0.0.0/0           udp spt:53 
    0     0 ACCEPT     udp  --  eth0   *       192.168.0.14         0.0.0.0/0           udp spt:123 
    0     0 ACCEPT     tcp  --  eth0   *       192.168.0.14         0.0.0.0/0           tcp spt:53 
   14  1591 ACCEPT     udp  --  eth0   *       192.168.0.14         0.0.0.0/0           udp spt:53 
  • The ntpq -p output was showing that the LOCAL(0) is the preffered NTP server . (rather than 192.16.0.14) .. There was a "*" sign in front of the LOCAL(0) and this meant that the NTP Server was not used for time synch.
  • "ntpdate -dv 192.168.0.14" command was giving the following output (note that, the following output just a little piece of the full output, but it shows the problem)
host found : 192.168.0.14
transmit(192.168.0.14)
receive(192.168.0.3)
receive: server not found
transmit(192.168.0.14)
receive(192.168.0.3)
receive: server not found
transmit(192.168.0.14)
receive(192.168.0.3)
receive: server not found
  • "tracert" command to 192.168.0.14 was hanging...
[root@osrvcel01 ~]# tracert 192.168.0.14
traceroute to 192.168.0.14 (192.168.0.14), 30 hops max, 40 byte packets
1 (10.10.10.1) 0.121 ms 0.113 ms *
2 * * *
3 * * *
4 * * *
5 * * *
6 * *
  • Most important thing was in the nslookup output:
[root@osrvcel01 ~]# nslookup 10.10.10.1
;; reply from unexpected source: 192.168.0.3#53, expected 192.168.0.14#53

Yo see the reply was coming weirdly.. This unexpected source message was saying something like "there is some device which is located in the middle and it is performing an improper NAT or something."

After delivering these diagnostic outputs to the Network team, the issue resolved.

That is, they accepted that this is a network problem in the company network and the cause was in the server which was doing the IP routing . (they added a static to route to that ip routing machine).

After they implemented the fix, I restarted the Ntp daemon in the cells and seen the Ntp server became the preffered one and the time of the cell nodes was synched from the Ntp server..

[root@osrvcel01 ~]# ntpq -p
remote refid st t when poll reach delay offset jitter
==============================================================================
LOCAL(0) .LOCL. 10 l 12 64 37 0.000 0.000 0.001
*192.168.0.14 109.74.206.120 3 u 13 64 37 0.222 7.665 7.230

So again, although; we are DBAs or Apps DBAs or Exadata Admins and although; our network knowledges are limited, we can still do network diagnostics from the OS layer and we can still underline the cause in the network. These diagnostics works are important because at the end of the day, we are responsible from the Exadata Machine and all its problems, which may even be caused by the things residing outside our scope.