Wednesday, April 22, 2015

EBS R12/12.2 -- HR ORG CHART domain name problem / sessionEBS is null

Recently discovered a constraint while working on a HR ORG CHART project.. Oracle Development identified it as a bug at the first look .. The problem seemed caused by the java code used by the application.. Oracle Support concluded this as a bug after our long cooperation through a SVR 1 SR..
After a while the issue identified as a constraint rather than a bug.. At the end of the day, It was concluded as a limitation in the browser itself.. The problem was caused by using the short domain names in the server environment.. Using short domain names which contain only one dot leads this problem to arise.. (For example: .erman)

The possible workarounds for this limitations were given as follows;

1) Use longer FQDN for their servers, something like host.erman.com (not host.erman)
2) Implement shared entry point for EBS and external server where ADF app is hosted via proxying.

Okay.. Lets take a closer look at the issue. Lets walk through the related documents, review the diagnostics methods, see the effects of the limitation , review some RFCs based on MSDN and make the conclusion according to it.

By this time, I have did several HR ORG CHART implementations and recorded them in this blog..

You can read my blogpost about HR ORG CHART using the following links;
http://ermanarslan.blogspot.com.tr/2014/05/ebs-122-implementing-hr-organization.html
http://ermanarslan.blogspot.com.tr/2014/07/oracle-adf-browser-certification.html
http://ermanarslan.blogspot.com.tr/2015/02/ebs-r12-hr-organization-chart-redirects.html

We have seen how to install it, how to upgrade it , how to deploy it and how to diagnose it ..

We also have learnt some constraints about it ; such as it s required to have the same domain name for all the servers in the configuration.. (Hr Org Chart want us to have the same domain name for all of the servers involved between the process for reaching Hr Org Chart from the EBS..)

While we were thinking that everything was stable, we have seen see this limitation when we changed the domain names of both EBS and HR ORG CHART weblogic servers.

The problem was triggered when the domain names have only one dot (.).

I mean, if the names of the servers are like ermanEBShost.domain.com and ermanHRORGhost.domain.com, then HR ORG Chart works well.

On the other hand, if the names of the servers are ermanhostEBS.domain and ermanHRORGCHARThost.domain, HR ORG Chart fails..

In this condition, the weblogic server log said that it could not see the EBS session .. "sessionEBS is null"
It just failed to see the ebs session cookie and redirected us to the EBS login page again.

The problem seemed that it was in the AppsRequestWrapper class , which was present in the fndext.jar.. So it was actually not directly an HR ORG CHART problem, but at the end of the day it affects HR ORG ChART directly.

To show this, we have done a lot of implementations , disabled Load Balancers, gathered Http Header traces and so on..

We even set FINE level logging in the Weblogic Server, as we have seen that the code puts the logs to the logger if there is FINE level logging configured in the Weblogic Server..
if(logger.isLoggable(Level.FINE)) 

To enable this kind of logging, specified the logger level using logging.properties file and supplied it to the ADFServer(HR ORG CHART's managed server) as java command line argument, thus Hr org chart started to put logs in to the ADFServer.out file..

What 's exactly happening was that ; AppsRequestWrapper.getAppsSession() was able to set the cookies correctly when there are 2 or more dots in the domain names of the servers. On the other hand, it was failing to set the cookies, when there were only one dot in the domain names..

So it was clearly a seemed a bug in the "AppsRequestWrapper.getAppsSession()", and Oracle Support started working on it as "Per Bug 20812489   APPSREQUESTWRAPPER.GETAPPSSESSION() IS RETURNING NULL"

Later , as I have mentined in the beginning, the issue was concluded as a limitation in the browser..
During my research , I have found the following in MSDN..

WinINET (the network stack below IE) has cookie implementation based on the pre-RFC Netscape draft spec for cookies. 
When we get riderect to Netscape draft spec for cookies (http://web.archive.org/web/20080205173011/wp.netscape.com/newsref/std/cookie_spec.html)

We see the following;

domain=DOMAIN_NAME
When searching the cookie list for valid cookies, a comparison of the domain attributes of the cookie is made with the Internet domain name of the host from which the URL will be fetched. If there is a tail match, then the cookie will go through path matching to see if it should be sent. "Tail matching" means that domain attribute is matched against the tail of the fully qualified domain name of the host. A domain attribute of "acme.com" would match host names "anvil.acme.com" as well as "shipping.crate.acme.com".
Only hosts within the specified domain can set a cookie for a domain and domains must have at least two (2) or three (3) periods in them to prevent domains of the form: ".com", ".edu", and "va.us". Any domain that fails within one of the seven special top level domains listed below only require two periods. Any other domain requires at least three. The seven special top level domains are: "COM", "EDU", "NET", "ORG", "GOV", "MIL", and "INT".


So, the issue was caused by this : Only hosts within the specified domain can set a cookie for a domain and domains must have at least two (2) or three (3) periods in them to prevent domains of the form: ".com", ".edu", and "va.us".
This was causing HR ORG Chart to not able to read the EBS session cookie..
As a result , we changed the domain names of our EBS Application servers, database servers , HR ORG CHART server and Load Balancer urls to be a erman1212.domain.com.tr and Oracle updated the HR ORG CHART FAQ document  as follows;

Question 2: Can we configure organization chart if weblogic server and EBS instance using PQDN (Partially Qualified Domain Name) or short FQDN (Fully Qualified Domain Name)?

No, configuring organization chart requires both EBS instance and Weblogic Server to use fully qualified domain names only and not partially qualified domain name. Also it requires both EBS instance and Weblogic Server to be in same domain.

For Example : 
Host Name : orgchart
Partially Qualified Domain Name : .in
Fully Qualified Domain Name : .in.oracle.com

If server name (hostname.domainname) is "orgchart.in" then integration for HR organization chart is not possible as session cookies are not properly shared when using PQDN and browser is simply ignoring such cookie. Hence server names (hostname.domainname) should be using fully qualified domain names like "orgchart.in.oracle.com".

EBS R12 -- OID integrated EBS R12 -- ORA-20001: Unabled to call fnd_ldap_wrapper.update_user

If your EBS is integrated with OID , you may encouınter "ORA-20001: Unabled to call fnd_ldap_wrapper.update_user" error. This error can be seen when you are trying to update a user using User define Form.. For example: when giving an end date to a user account..

The full error message will appear as something like the following;

Unabled to call fnd_ldap_wrapper.update_user due to the following reason:
ORA-20001: Unabled to call fnd_ldap_wrapper.update_user due to the following reason:
An unexpected error occurred. Please contact your system Administrator. (USER_NAME=XXXX). (USER_NAME=XXXX)

If that is the case; enable the FND Debug logging for a user using the relevant profiles (user level) . After enabling the logging, relogin to Oracle Apps and reproduce the problem.
After reproducing the problem, query the fnd_log_messages, you will see the underlying problem there..
While using FND Debug is a preffered way for dealing this kind of problems, I can say that ; most of the time, this type errors are caused by EBS not being able to make OID connections.. 
I mean if EBS can not connect to OID , this error will be generated..

Here is a real life example of it:
In this example, the problem is caused by the underlying error : LDAP error : ORA-31203: DBMS_LDAP: PL/SQL - Init Failed... 
In this case, the cause of the ORA-31203 was a firewall problem..
I mean, as this is a PLSQL which run in EBS database, the corresponding database user must have the required access to connect to the OID host using its ldap port. 
The things which can prevent this connection could be a lack in the ACL definitions or a Firewall.
Anyways, for this specific case the solution was in the firewall..  We request the System Admin to configure the firewall as,  EBS database was required to reach the OID server using its Ldap port..

Another example for this problem can be a lack in the configuration..
I mean we may  the same error : LDAP error : ORA-31203: DBMS_LDAP: PL/SQL - Init Failed.., but the solution may be different..

Lets have a quick look at the process ;
That is ; DBMS_LDAP tries to init an LDAP session using OID host and OID port stored in the FND_USER_PREFERENCES table .. 
So, basically If the host and port values are empty or wrong there, then this will result DBMS_LDAP being not able to initalize the LDAP session and  you will see Init Failed errors in the fnd_log_messages table.

If that 's the case;
The following sql can be used to check the configuration.. This sql should return the OID host and port accordingly.

select PREFERENCE_NAME, PREFERENCE_VALUE from FND_USER_PREFERENCES where user_name like '%INTERNAL%' and module_name = 'LDAP_SYNCH' and PREFERENCE_NAME in ('HOST','PORT');

If this sql can not return the values, then I can say that you have a lack in the EBS-OID configuration..
In this case; runinng txkrun.pl will fix this. txkrun.pl will insert the HOST and PORT into the fnd_user_preferences table and the issue will be fixed.. This was also a real life example bytheway ..

Friday, April 17, 2015

Rdbms -- Oracle Cloud Backup Service

Recently tested and verified Oracle Cloud backup for an EBS 12.2 environment.
In this work, we have seen that, the Cloud backup is required to be encrypted and it is heavily dependent on the network speed..
Other than these, it uses rman and can use its option like compression.. Using the compression makes cloud backup faster as it requires less data to flow between the machines.

In our tests, we have used a 10Mbit network and we could backup an EBS 12.2 database, which was 138gb in size in 9 hours. It required a long time, altough  we have used compression too.
Ofcourse, the required time for backing up database can be decreased using an incremental backup methodology.. You can take a level 0 backup once a week, and continue with the level 1 backups during the other days..

The list of the cloud backup features;
  • Unlimited Oracle Database backups
  • Automatic three-way data mirroring
  • Regional data isolation
  • Transparent access via Oracle Database Cloud Backup Module and Recovery Manager (RMAN)
  • RMAN encryption and compression
  • Note that: Trial account is limited to 10gbytes.
To start the process , you first need to register to the Oracle Cloud  Database Backup Service with your oracle account. After your confirmation , you receive an email with contains a password.(you use this password in the java command below)
Once you receive the password, you can follow the instructions below..
In general , the action plan consists of ; downloading opc_install.jar, creating required directories, executing java command to download the required files; preparing rman scripts and running those rman scripts.
Check for the details : http://docs.oracle.com/cloud/latest/dbbackup_gs/CSDBB/GUID-4E945356-F5B6-4267-8D33-ADB4C1D5413C.htm

Here is an real life deployment example:

· Before Installation use below queries to see how long will backup taken
· You can also check previous rman backups to see how backup is compress and time to taken for backup.

Check the db size;


select round((a.data_size+b.temp_size+c.redo_size+d.cont_size )/1024/1024/1024,0) "Total Size GB"
from ( select sum(bytes) data_size
from dba_data_files ) a,
( select nvl(sum(bytes),0) temp_size
from dba_temp_files ) b,
( select sum(bytes) redo_size
from sys.v_$logfile lf, sys.v_$log l
where lf.group# = l.group#) c,
( select sum(block_size*file_size_blks) cont_size
from v$controlfile ) d;


or

SELECT ROUND (SUM (used.bytes) / 1024 / 1024 / 1024) || ' GB' "Database Size",
ROUND (free.p / 1024 / 1024 / 1024) || ' GB' "Free space"
FROM (SELECT bytes FROM v$datafile
UNION ALL
SELECT bytes FROM v$tempfile
UNION ALL
SELECT bytes FROM v$log) used,
(SELECT SUM (bytes) AS p FROM dba_free_space) free
GROUP BY free.p


Check the previous Rman backups and their input , output bytes .. Just to get an idea..

SELECT start_time,
end_time,
status,
input_type,
compression_ratio,
input_bytes_display,
output_bytes_display,
time_taken_display
FROM v$rman_backup_job_details
ORDER BY session_key DESC;



1.        Create below directories and upload opc_install.jar file into the server

mkdir $ORACLE_HOME/OPC/lib
mkdir $ORACLE_HOME/OPC/wallet
mkdir $ORACLE_HOME/OPC/script >>> Optional

upload opc_install.jar to $ORACLE_HOME/OPC/

2.        Check permissions of directories, files and set require permissions
Note: To make sure I set chmod –R 777 for lib, wallet and script folders.. ( we used 777 here.. It was just a test instance)

> ll
drwxrwxrwx. 2 oracle oinstall    4096 Mar 30 18:47 lib
-rwxrwxrwx. 1 oracle oinstall 2576642 Sep  4  2014 opc_install.jar
drwxr-xr-x. 2 oracle oinstall    4096 Mar 30 18:49 script
drwxrwxrwx. 2 oracle oinstall    4096 Mar 30 18:47 wallet

3.       Use below java to download lib, wallet and ora files

Usage: java -jar opc_install.jar -serviceName hr -identityDomain abc -opcid joe@abc.com -opcPass 'Oracle$1' -libDir $ORACLE_HOME/lib -walletDir $ORACLE_HOME/dbs/opc_wallet

-- Supply the arguments according to your cloud account. For example serviceName, identityDomain, opcPass and etc.. (you will receive these details after registration.. They will be provided via an email from Oracle)

java -jar opc_install.jar -serviceName shdjısjdıs6757 -identityDomain yuıshfıshfuıos74 -opcId blabla@blala.com -opcPass password -libDir /u01/install/PROD/11.2.0/OPC/lib -walletDir /u01/install/PROD/11.2.0/OPC/wallet

Output of the java command:
Oracle Database Cloud Backup Module Install Tool, build 2014-09-04
Oracle Database Cloud Backup Module credentials are valid.
Oracle Database Cloud Backup Module wallet created in directory /u01/install/PROD/11.2.0/OPC/wallet.
Oracle Database Cloud Backup Module initialization file /u01/install/PROD/11.2.0/dbs/opcORATEST.ora created.

Downloading Oracle Database Cloud Backup Module Software Library from file opc_linux64.zip.
Downloaded 23169388 bytes in 121 seconds. Transfer rate was 191482 bytes/second.
Download complete.

4. Create your backup scripts and configure them running as scheduled job using crontab. 
Set the schedules according to the backup duration of your system.

An example:

crontab –l

00 19 * * 5 sh /u01/install/PROD/11.2.0/OPC/script/cloudbackup_l0
00 19 * * 0,1,2,3,4 sh /u01/install/PROD/11.2.0/OPC/script/cloudbackup_l1


Script_Level0 : cloudbackup_l0

su - oracle -c "/u01/install/PROD/11.2.0/OPC/script/rman_l0.sh" > /u01/install/PROD/11.2.0/OPC/script/cloudbackup_Level0_`date +%d%m%y`.log

Script_Level0 : rman_l0.sh
               
currentdate=`date '+%d%b%Y_%H%M'`
rman target/ <<EOF
configure channel device type sbt parms='SBT_LIBRARY=/u01/install/PROD/11.2.0/OPC/lib/libopc.so, ENV=(OPC_PFILE=/u01/install/PROD/11.2.0/dbs/opcORATEST.ora)';
configure device type sbt parallelism 4;
configure default device type to sbt;
SET encryption on identified by "dfsdfsdf" only;
CONFIGURE COMPRESSION ALGORITHM 'MEDIUM';
show all;
backup as compressed backupset incremental level 0 database plus archivelog delete input;
quit;
EOF


Script_Level1 : cloudbackup_l1

su - oracle -c "/u01/install/PROD/11.2.0/OPC/script/rman_l1.sh" > /u01/install/PROD/11.2.0/OPC/script/cloudbackup_Level1_`date +%d%m%y`.log

Script_Level1: rman_l1.sh

currentdate=`date '+%d%b%Y_%H%M'`
rman target/ <<EOF
configure channel device type sbt parms='SBT_LIBRARY=/u01/install/PROD/11.2.0/OPC/lib/libopc.so, ENV=(OPC_PFILE=/u01/install/PROD/11.2.0/dbs/opcORATEST.ora)';
configure device type sbt parallelism 4;
configure default device type to sbt;
SET encryption on identified by "cdfdfdfd" only;
CONFIGURE COMPRESSION ALGORITHM 'MEDIUM';
show all;
backup as compressed backupset incremental level 1 database plus archivelog delete input;
quit;
EOF

Wednesday, April 15, 2015

EBS 12.2 -- ADOP , Automated Patching

In this post, I will cover an automatic online patching scenario.


In this example, we will apply a patch without specifying the arguments and/or our inputs one by one.. That is; we will run one command and our patch will be applied.
Note that: This is not a supported or recommend approach.. I have done this just to satisfy my curiosity and now I can say that it can be done..
Okay.. In order to get this thing work; we will use bash techniques, a pipe and an adop input file.
Lets start with the input file..
Generally, the Input file is read by adop and adop behaves according to the things written in the input file.
To supply an input file to adop , we create an input file using our favorite editor and use the input_file cmd argument while running the adop utility.

So we basically, supply our desired worker count , the patch we want to apply, the phases we want adop to complete and the patchtop that our patch directory resides..
Here is an example of the input file :
Following is an input_file for applying patch 20863040 which resides in /home/applmhr/20863040 directory.

workers=12
patches=20863040
phase=prepare,apply,finalize,cutover,cleanup
patchtop=/home/applmgr

Okay. Using input_file we supply some mandatory arguments, as well as patch phases to have a continous adop execution , but what about the passwords? Adop requires us to provide 3 password during its run.. So lets see what we can do for this;

In the earlier releases of EBS, the defaultsfile can be used to supply the passwords to adpatch, so we could use defaultsfile to apply patches without a need to supply the passwords during the runtime.

On the other hand in 12.2 , even if the adop utility uses the same adpatch in the backend; it seems we dont have such an option, as stated in "Oracle E-Business Suite Maintenance Guide" -> Only one parameter, patchtop, can currently be defined in the defaultsfile. "

So we need to make something else;
What about echoing and piping the passwords to the adop utility? That's make sense, but it is a little tricky..
I am talking about someting like the following;

{ echo appspassword; echo systempassword; echo adminpassword; } | adop

we use {} characters to group the three echo commands into a single output stream.
So we execute adop only once but make it read the inputs of our 3 echo commands.
The outputs will be read by adop one by one, so it must be in order of Apps password, System password and weblogic admin password just like we supply in a normal adop execution.
So we direct the stdout of these 3 echo commands to the stdin of the adop.
By using bracelets and semiclon we pipe the 3 outputs  to the adop and wait it to read it , it reads them one by one, it thinks that it reads it from the stdin from the keyboard, but we actually use a pipe for its stdin..

But in order to apply a patch , we need to supply some more things. At this point; input_file comes into play.. As I have mentioned above, we supply our input file to the adop using input_file argument and that 's it..
Our command will be like the following;

{ echo appspassword; echo systempassword; echo adminpassword; } | adop input_file=/home/applmgr/input_erman

Our one-liner is ready!
When we execute it, we will have our patch specified in our input_file will be applied.. All the adop phases will be completed ( if there will be no patch related problems)

Current Patching Session ID: 70
Node Name       Node Type       Phase           Status          Started                        Finished                       Elapsed
--------------- --------------- --------------- --------------- ------------------------------ ------------------------------ ------------
ermansserver master          PREPARE         COMPLETED       14-APR-15 04:11:19 +03:00      14-APR-15 05:32:38 +03:00      1:21:19
v1

                                APPLY           COMPLETED       14-APR-15 05:33:03 +03:00      14-APR-15 05:35:32 +03:00      0:02:29
                                FINALIZE        COMPLETED       14-APR-15 05:35:42 +03:00      14-APR-15 05:36:07 +03:00      0:00:25
                                CUTOVER         COMPLETED       14-APR-15 05:36:17 +03:00      14-APR-15 05:50:51 +03:00      0:14:34
                                CLEANUP         COMPLETED       14-APR-15 05:50:53 +03:00      14-APR-15 05:51:34 +03:00      0:00:41

File System Synchronization Used in this Patching Cycle: Full

File System Information:
========================
File System: SINGLE NODE INSTANCE

Node Name   Node Type   Current Base                   Other Base                     Non Editioned Base             Inst Top
----------- ----------- ------------------------------ ------------------------------ ------------------------------ ------------------------------
ermansserver master      /u01/install/APPS/fs1          /u01/install/APPS/fs2          /u01/install/APPS/fs_ne        /u01/install/APPS/fs1/inst/app
ppsrv1                                                                                                               s/ORATEST_ermansserver

Custom File System Synchronization script: /u01/install/APPS/fs_ne/EBSapps/appl/ad/custom/adop_sync.drv

Patch Stage Top: /u01/install/APPS/fs_ne/EBSapps/patch

Online Patching Log base: /u01/install/APPS/fs_ne/EBSapps/log/adop/70

=========================================================================
=             Patches Applied in the Current Patching Cycle
=========================================================================

Node Name    Node Type Patch         Status         Run ID File System Patch Base    File System Applied Base  Adpatch Parameters   Session Type
------------ --------- ------------- ----------- --------- ------------------------- ------------------------- -------------------- ---------------
ermansserver master    20863040      Y               78344                           /u01/install/APPS/fs1                          ONLINE

As you see above, there were no problems encountered.. All the phases were completed and the patch was applied.  

Okay, we achieved our goal , but before finishing I want to add some more info as follows;

Note that : during the execution, our shell may display something like the following;
stty inappropriate ioctl for device

This is actually an ingorable error, as it is caused by the adop trying to use stty command to set some terminal option and because the stdin is a pipe, stty reports inappropriate ioctl for device..

In detail: adop uses stty -echo to hide our inputs.. As you may recall, when using adop to apply patches, it requires us to supply the passwords, and when we supply them, we dont see the password displayed in the terminal.. This is how adop hides the password inputs and this is why we may see this ignorable error.

This also can be seen if we look at the adop's perl scripts.. The adop file is just wrapper, it calls the adzdoptl.pl , which has a lot of libraries attached to it.
Libraries are coming from the directories present in the $PERL5LIB environment variable.
An example of PERL5LIB can be;
 echo $PERL5LIB 
/u01/install/APPS/fs2/FMW_Home/webtier/perl/lib/5.10.0:/u01/install/APPS/fs2/FMW_Home/webtier/perl/lib/site_perl/5.10.0:/u01/install/APPS/fs2/EBSapps/appl/au/12.0.0/perl:/u01/install/APPS/fs2/FMW_Home/webtier/ohs/mod_perl/lib/site_perl/5.10.0/x86_64-linux-thread-multi

/u01/install/APPS/fs2/EBSapps/appl/au/12.0.0/perl looks promising;

When we cd to it ; cd /u01/install/APPS/fs2/EBSapps/appl/au/12.0.0/perl
ls
ADOP  ADX  FND  MSI  TXK

So , as we see there is an ADOP directory.


cd ADOP
ls
AdopMain.pm    BackupUtil.pm    ClonePhase.pm              CommonUtil.pm    DatabaseManager.pm   GlobalVars.pm     OtherPhases.pm  PreparePhase.pm    SessionManager.pm  Singleton.pm
ApplyPhase.pm  CleanupPhase.pm  CommonBusinessRoutines.pm  CutoverPhase.pm  DefaultsFileUtil.pm  MultiTierUtil.pm  Phase.pm        ServiceManager.pm  SessionUtil.pm

Okay, we have our perl modules here.. Adop perl scripts uses them..When we analyze them , we become more familiar with the patching process..
On the other hand; I will stop here :) This blog post is going beyond its objective :)

Monday, April 13, 2015

EBS R12, 12.2 -- About Dynamic Image Generation

There was a Dynamic Image Generation option in 11i.
We were using it to supply an environment for the Framework applications which are capable of generating dynamic images .. In order to work properly, Dynamic Image Generation requires our middle tier access to graphical capabilities of a GUI environment framework (X Server serves this..). That is ; we had to configure a X server to accomplish this..
An example scenario for the usage of Dynamic Image Generation, could be a need to have OAF buttons displayed as images.. By making the buttons displayed as images, one can configure their sizes easily..

On the other hand; In EBS R12 or 12.2  , we don't have such an option unfortunately,
The new release, specifically 12.2.4 uses SKYROS look and feel.
There is no support for image buttons in SKYROS used by 12.2.4.
In other words; there is no option to implement Button images as in R11i using SKYROS.
For 12.1 , the situation is pretty much the same.. SWAN look and feel used in 12.1, does not support for image buttons..

So, by knowing this, we can say that playing with the button sizes in OAF pages seems to be a customization..

Wednesday, April 8, 2015

Discoverer Desktop&Admin on 64 bit Windows

Writing this blog post, because I m still getting some requests about it, so I feel that this subject needs a clarification.

There is no 64 bit version of Discoverer client software and it seems there will never be. (at least for 10 and 11 version)
In other words; there are no plans for a 64-bit binary versions of Administrator and Desktop, because the 32-bit versions are sufficient for operations.

To make Discoverer to work on a 64 bit Windows PC (for ex: Windows 7) stably, you need to run it in Windows 32 bit mode..

("This certification is with Windows 7 in XP mode only. To configure Windows 7 in XP mode, refer to http://www.microsoft.com/windows/virtual-pc/download.aspx (Select 32 bit system).")

Actually, Discoverer 11g Desktp/Admin can run on Windows 64 bit.. I mean you may install and run it , but you cant get support for any installation or runtime problem.. 
Again, you need to use it in Windows XP mode to be certified.

Related Oracle Support Document: 233047.1

If you try to install on Windows 64-bit operating systems by running setup.exe from the /Disk1 directory, then you will receive a warning like the image attached below (bottom of note).
You may workaround this by installing from ..\Disk1\install\win32 and running setup.exe from there; however, as stated at this point it is specifically not certified on 64bitWindows client operating systems; therefore, we cannot accept Service Requests for functional issues.

Tuesday, April 7, 2015

ZFS -- NFS shares unreachable , IPMP, Probe Based vs Link Based Failure detection

IPMP supplies an network multipathing mechanism for ZFS storages.. It is a feature in Solaris Operating Systems and means the same thing as device mapper multipath for me..
When there are two interfaces in your system, you can make create and IPMP group in front of them and thus benefit from the fault tolerancy and load balancing..


IPMP detects the problems in the interfaces and takes the necessary actions.. For example: in an active-standby configuration, IPMP regularly checks the active interface and put the standby in use if necessary.. It is implemented in L3, so the switch does not know or need to know anything about it, we just plug the cables to the switch and dont do any configuration ..
So it differs from LACP..


IPMP can make this detection using 2 different methods. IPMP can use a probe based detection to detect the problems, or it may use a Link based detection to recognize the problems..

In Probe Based Detection; IPMP uses ICMP probes to check the interfaces.
So in order to work properly, Probe based detection requires at least one neighbour or a default gateway, which can respond to ICMP probes, must be present in the same subnet. Probe Based Detection gets activated when the test addresses are used in the IPMP configuration. IPMP uses these tests addresses to make the ICMP probes That is the IPMP daemon sends ICMP probes on test address to one or more target systems on the same subnet. The target systems are determined dynamically.. (First, the routing table will be scanned for gateways (routers) on the same subnet as the IP interface's test address and up to five will be selected. If no gateways on the same subnet were found, the system will send a multicast ICMP probe (to 224.0.01. for IPv4 or ff02::1 for IPv6) and select the first five systems on the same subnet that respond)

In Link Based Detection; IPMP uses the interface kernel drivers and check the changes in IFF_RUNNING flag on the interfaces.. So it does not send ICMP probes and does not require any IP address to be present and allocated for the test addresses.

It is actually the default failure detection in IPMP, but it gets activated in ZFS when "0.0.0.0/8" is used for the test addresses.

So both mechanism can be choosen while implementing IPMP in ZFS Storages.. But I have to say that Probe Based Detection is a little delicate..

I have also need to add that in the "Backup and Recovery Performance and Best Practices using the Sun ZFS Storage Appliance with the Oracle Exadata Database Machine", Link based Detection is used for IPMP.

I have seen probing fails in an Exadata environment; and I think; the following real life example can increase our motivation for using Link Based Detection rather than Probe based detection in ZFS environments.

Note that for complex network architectures; link based detection may be misleading.
Check the following link to understand what I meant : http://www.c0t0d0s0.org/archives/6294-Less-known-Solaris-features-IP-Multipathing-Part-3-Foundations-2.html

Environment:
Solaris ZFS Storage ZS3-2 connected to Exadata via ethernet , through Cisco Switch

Change : 
A recent network change in the infrastructure. Evertying is working properly , but the ZFS shares..

Problem:
Unable to reach the NFS shares of ZFS.
IPMP is down in ZFS.
IPMP uses Probe Based Detection, so it is dependent to the other systems in the network, as it needs to send ICMP probes to them..
Network has ping problems.

64 bytes from 10.10.10.11: icmp_seq=13 ttl=255 time=0.120 ms
64 bytes from 10.10.10.11: icmp_seq=14 ttl=255 time=0.136 ms --- A gap.. from seq 14 to 27
64 bytes from 10.10.10.11: icmp_seq=27 ttl=255 time=0.123 ms
64 bytes from 10.10.10.11: icmp_seq=28 ttl=255 time=0.095 ms
64 bytes from 10.10.10.11: icmp_seq=29 ttl=255 time=0.147 ms
64 bytes from 10.10.10.11: icmp_seq=30 ttl=255 time=0.119 ms

Effect:
Can not backup the PROD database into ZFS..

Diagnostic and Solution:
Check the logs using CLI > ssh to Management ip
issue the command -> "maintenance logs select alert show"

*Problem :
All Interfaces in group groupname have failed
Network connectivity via datalink ixgbe1 has been lost.
connectivity via interface ixgbe1 has been lost due to probe-based failure., Major alert
It seemed, IPMP made all the interfaces because of a single point of failure.. Specifying specific hosts for probing may solve this one, but no need. We have only one hope, which is connected to  Exadata's Cisco switch,so link based detection can be used in here..

Reach the ZFS using BUI (or CLI may be used too..)
URL: https://management_ip_address_of_ZFS:215
Check the IPMP configuration and change it to used Link Based failure detection mechanism.
Choose Network > Configuration > Click on IPMP group  > Click on the interfaces and  update their ip addresses with 0.0.0.0/8..


This action makes the link based detection become active and this in-turn makes the interfaces so the IPMP group up&ready.. As a result, shares are available again..

In conclusion, 
IPMP in the network stack of ZFS increases the availability and its failure detection mechanism provides proactivity .. Probe Based failure detection in IPMP is more sopisticated than the Link based one.
On the other hand, sometimes using the advanced methods can bring you a disadvantage.. Like you see in this example.. Altough there was a problem in ICMP ping, mount could work.. Rman could write to the related NFS shares, but the sophisticated probe based failure detection algorithm have sensed an error in the network and disabled the interfaces.. This affected the continuity but there was a real a problem in the network though.. A sophisticated mechanism can made us realize this kind of problems.. 
Anyways,I would think that choosing the sophisticated method is not the best way for all the times, but I cant.. It might stopped us mounting the nfs share like it did in this case, on the other hand; it made us recognize a real network problem..

At the bottom line, 
If you have a complex network architecture, I still recommend using Probe Based failure detection in ZFS storages.  But I also recommend proceeding with the Link based failure detection if Probe based detection encounters a failure.. 
In such a specific configuration like connecting ZFS to Exadata using Cisco switch, just use Link based Failure detection for IPMP, as it is already recommended by MAA.
An Oracle White Paper April 2012 Backup and Recovery Performance and Best Practices using the Sun ZFS Storage Appliance with the Oracle Exadata Database Machine

Monday, April 6, 2015

Linux -- A problem diagnostic - rsync, out_of_memory, oom_kill_process, vmmemctl, overcommit_memory, /proc/sysrq-trigger -- All in one Post

Recently delivered an application filesystem sync utility.. It was using rsync to do the syncronizations of EBS filesystems.. In such syncronizations, we usually proceed with Dataguard in DB tier, and rsync in the APPS Tier..

In this article, I will focus on the syncronizations in the Apps Tier..
As you may know, a production EBS instance has several log and output files produced day by day.. These logs and outfiles are the most important part of the daily sync activities.. I mean you can sync Application Filesystem(APPL_TOP,INST_TOP,COMMON_TOP and Apps Tier Oracle Homes) once in a week, but these log and outfiles require to be syncronized at least day by day because they are changing/getting produced continously..

The rsync approach is actually documented in the Oracle 's Business Continuity documentations as follows;

If you wish to synchronize your concurrent manager log and out files from the primary to the standby, create directories that correspond to the APPLCSF environment variables on the standby application tier server(s).
For example:
$ mkdir -p <APPLCSF from PRODUCTION>/log
$ mkdir -p <APPLCSF from PRODUCTION>/out
Repeat this on the primary server, creating directories matching the standby context name, to be ready for a switchover operation.
For UNIX systems, on the primary application tier(s), set up an rsync job in cron, to run every few minutes. The following example synchronizes the log directory:
$ rsync av <APPLCSF>/log <standby_host>: <APPLCSF from PRODUCTION >/log --rsync-path=/usr/local/bin/rsync

Even Business Continuity for 11i suggests that..

Business Continuity for Oracle E-Business Release 11i Using Oracle 11g Release 2 and later Physical Standby Database - Single Instance and Oracle RAC (Doc ID 1068913.1)

Okay.. But there is one thing .. EBS 11i application servers are 32 bit..Actually, they must be 32 bits, the code does not work in 64 bit..  This brings a challange for the memory, as an  32 bit system can handle memory stably more than 16 gigs. So we normally deal with a limited amount of memory when working with EBS 11i apps servers.. So we must carefull..

Lets take a look at the following real life story and be aware of the possible consequences..
The system in question was an Oracle Linux 32 bit, running with 2.6.32-200.13.1.el5uek kernel..
It was a standby Application server , which was the target of rsync..
One day it got locked and could not answer any connection attempts.. It required an hard reboot..
Note that : The Standby apps server had 8 gb memory.

After rebooting, firstly noticed the following in the /var/log/messages..


There was an out-of_memory in 6:19 PM. This was the last trace..
Unfortunealty, we did not have OSwatcher running..

Just a little reminding;

Os watchler collects the following system information;

ps
top
ifconfig
mpstat
iostat
netstat
traceroute
vmstat
meminfo (Linux Only)
slabinfo (Linux Only)

But we had SAR , so consulted the SAR reports.. However, they were all empty for all the metrics after 6:10 PM.. 
Note that: probably, OS Watcher would not be able to collect its information in this incident either, as it seemed that the system as a whole stopped working suddenly because of this incident..

So we investigated the SAR reports, and tried to make comment accordingly..

Context Switch per second.
05:30:01 AM    147.21
05:40:01 AM    153.14
05:50:01 AM    150.48
06:00:01 AM    150.97
06:10:01 AM    215.15

                pgpgin/s pgpgout/s   fault/s  majflt/s
05:00:01 AM      0.00      0.19      4.21      0.00
05:10:01 AM      0.01      0.18      5.89      0.00
05:20:01 AM      0.00      0.21      4.20      0.00
05:30:01 AM      0.00      0.16      4.20      0.00
05:40:01 AM      0.00      0.32      5.25      0.00
05:50:01 AM      0.00      0.13      4.20      0.00
06:00:01 AM      0.72      0.39     19.77      0.01
06:10:01 AM   3737.28   3677.23     25.48      0.02
             
                  tps      rtps      wtps   bread/s   bwrtn/s
05:00:01 AM      0.11      0.00      0.11      0.00      1.12
05:10:01 AM      0.12      0.01      0.11      0.04      1.08
05:20:01 AM      0.12      0.00      0.12      0.00      1.28
05:30:01 AM      0.09      0.00      0.09      0.00      0.96
05:40:01 AM      0.17      0.00      0.17      0.00      1.92
05:50:01 AM      0.07      0.00      0.07      0.00      0.76
06:00:01 AM      0.36      0.15      0.21      4.31      2.34
06:10:01 AM   1095.44    113.58    981.85  22423.70  22063.37


                       IFACE   rxpck/s   txpck/s   rxbyt/s   txbyt/s   rxcmp/s   txcmp/s  rxmcst/s
05:10:01 AM        lo      0.00      0.00      0.00      0.00      0.00      0.00      0.00
05:10:01 AM      eth0      4.59      0.24    636.79     38.50      0.00      0.00      1.75
05:20:01 AM        lo      0.00      0.00      0.00      0.00      0.00      0.00      0.00
05:20:01 AM      eth0      4.70      0.48    635.94     50.97      0.00      0.00      1.71
05:30:01 AM        lo      0.00      0.00      0.00      0.00      0.00      0.00      0.00
05:30:01 AM      eth0      4.39      0.22    610.48     35.23      0.00      0.00      1.69
05:40:01 AM        lo      0.00      0.00      0.00      0.00      0.00      0.00      0.00
05:40:01 AM      eth0      4.47      0.25    620.27     38.46      0.00      0.00      1.74
05:50:01 AM        lo      0.00      0.00      0.00      0.00      0.00      0.00      0.00
05:50:01 AM      eth0      4.37      0.24    605.46     38.13      0.00      0.00      1.72
06:00:01 AM        lo      0.00      0.00      0.00      0.00      0.00      0.00      0.00
06:00:01 AM      eth0      4.52      0.31    625.29     48.23      0.00      0.00      1.72
06:10:01 AM        lo      0.00      0.00      0.00      0.00      0.00      0.00      0.00
06:10:01 AM      eth0     13.95      7.89  12658.31   1526.62      0.00      0.00      1.81

From the SAR reports, it could be seen that, exactly at 6:10 there is an increase in the system activities.. I mean Network, paging , context switches and IO..

This was a standby application server, so it could not have a big activity on it..  With this in mind, consulted the crontab; and saw the following;

0 8,10,12,14,18,20,22,0,2,4,6 * * * rsync -av --delete /oracle/PROD/R12/inst/apps/PROD_ebslvapp/logs/appl/conc/log/ 172.17.0.43:/oracle/PROD/R12/inst/apps/PROD_ebslvapp/logs/appl/conc/log/ --rsync-path=/usr/bin/rsync 

Yes.. There was a cron job in the PROD application tier for syncronizing the log and out files of the Application Tiers.. Rsync was used with "a", "v" and "--delete" arguments..

The cron job was saying that "run rysnc everyday in every 2 hours.."
The arguments was saying the following;

-v : increase verbosity
-a : archive mode; equals -rlptgoD (no -H,-A,-X)

So, -a does;

-r, --recursive recurse into directories
-l, --links copy symlinks as symlinks
-p, --perms preserve permissions
-t, --times preserve modification times
-g, --group preserve group
-o, --owner preserve owner (super-user only)
-D same as --devices --specials
--devices preserve device files (super-user only)
--specials preserve special files
-a does not/excludes the following;

-a does not;

-H, --hard-links preserve hard links
-A, --acls preserve ACLs (implies -p)
-X, --xattrs preserve extended attributes

--delete: delete extraneous files from dest dirs.

So, the cron job was logical..
Checked the crond log and saw that the cron job was triggered by the crond of the PROD application server at 06:00 pm.
Note that : In this configuration rsync was contacting to the remote system(standby application server): using a remote-shell(ssh) program as the transport.. It is the internal mechanism of rsync.. It s not rsync daemon, so uses remote shell to reach the target node.

Mar 29 06:00:01 crond[10266]: () CMD (rsync -av --delete /oracle/PROD/R12/inst/apps/PROD_ebslvapp/logs/appl/conc/log/ 172.17.0.43:/oracle/PROD/R12/inst/apps/PROD_ebslvapp/logs/appl/conc/log/ --rsync-path=/usr/bin/rsync 

Okay, cron triggered the rysnc job at 06:00 pm, and system went down in somewhere between 6:10 and 06:20 pm.
Could this hang be related with rysnc??

Lets explore how rsync works;

Ref: rsync.samba.org.
By default rsync determines which files differ between the sending and receiving systems by checking the modification time and size of each file. As this only requires reading file directory information, it is quick, but it will miss unusual modifications which change neither.


To determine which parts of a file have changed, the recipient splits its copy of the file into chunks and computes two checksums for each chunk: the MD5 hash, and a weaker but easier to compute rolling checksum It sends these checksums to the sender. The sender quickly computes the rolling checksum for each chunk in its version of the file; if they differ, it must be sent. If they're the same, the sender uses the more computationally expensive MD5 hash to verify the chunks are the same. The sender then sends the recipient those parts of its file that did not match, along with information on where to merge these blocks into the recipient's version. This makes the copies identical. 

When Rsync communicates with a remote non-daemon server via a remote shell the startup method is to fork the remote shell which will start an Rsync server on the remote system. Both the Rsync client and server are communicating via pipes through the remote shell. As far as the rsync processes are concerned there is no network. In this mode the rsync options for the server process are passed on the command-line that is used to start the remote shell.

The first thing that happens once the startup has completed is that the sender will create the file list. "

**"So it started at 6:00 pm  in the Application Prod server , and probably took sometime"

While it is being built, each entry is transmitted to the receiving side in a network-optimised way.

When this is done, each side sorts the file list lexicographically by path relative to the base directory of the transfer. (The exact sorting algorithm varies depending on what protocol version is in effect for the transfer.) Once that has happened all references to files will be done by their index in the file list.

If necessary the sender follows the file list with id→name tables for users and groups which the receiver will use to do a id→name→id translation for every file in the file list.

After the file list has been received by the receiver, it will fork to become the generator and receiver pair completing the pipeline.
**"Well, after the file list, Application Standby server becomes the generator and receiver , and makes the serios computing.."

The generator process compares the file list with its local directory tree. Prior to beginning its primary function, if --delete has been specified, it will first identify local files not on the sender and delete them on the receiver.
The generator will then start walking the file list. Each file will be checked to see if it can be skipped. In the most common mode of operation files are not skipped if the modification time or size differs

The receiver will read from the sender data for each file identified by the file index number. It will open the local file (called the basis) and will create a temporary file.
The receiver will expect to read non-matched data and/or to match records all in sequence for the final file contents. When non-matched data is read it will be written to the temp-file. When a block match record is received the receiver will seek to the block offset in the basis file and copy the block to the temp-file. In this way the temp-file is built from beginning to end.
The file's checksum is generated as the temp-file is built. At the end of the file, this checksum is compared with the file checksum from the sender. If the file checksums do not match the temp-file is deleted. If the file fails once it will be reprocessed in a second phase, and if it fails twice an error is reported.
After the temp-file has been completed, its ownership and permissions and modification time are set. It is then renamed to replace the basis file.
Copying data from the basis file to the temp-file make the receiver the most disk intensive of all the rsync processes. Small files may still be in disk cache mitigating this but for large files the cache may thrash as the generator has moved on to other files and there is further latency caused by the sender. As data is read possibly at random from one file and written to another, if the working set is larger than the disk cache, then what is called a seek storm can occur, further hurting performance.

**"So according the info above(written in bold)  the standby application server do a lot of stuff in this rsync process."

Lets come back to our topic..
We had seen out_of_memory errors right? Just at the same time interval that our rsync job was working.

Okay , in this example; we were syncing log and out files of a production EBS instance, so there were several log and out files to be syncronized, besides there were a bunch of big log files (2gb) in our syncronization directories..

Altough we have a lot of activities in a heavy rsync operation; it would a weak comment, if we directy charge this problem to rsync..

Lets explore the other way around... I mean the system side..
By looking the error, syslogd recorded we can say that ;

vmmemctl, which is a kernel driver for Vm environments, collaborates with the server to reclaim pages that are considered least valuable by the guest operating system invoked the oom_killer which is a  a feature enabled by default, is a self protection mechanism employed the Linux kernel when under severe memory pressure..
So vmmemctl may be using /proc/sys/kernel/sysrq to trigger oom_killer or /proc/sysrq-trigger.
For example: echo "f" > /proc/sysrq-trigger   -- to call oom_kill.

sysrq_trigger:
Using the echo command to write to this file, a remote root user can execute most System Request Key commands remotely as if at the local terminal. To echo values to this file, the /proc/sys/kernel/sysrq must be set to a value other than 0. 

‘k’ – Kills all the process running on the current virtual console.
‘s’ – This will attempt to sync all the mounted file system.
‘b’ – Immediately reboot the system, without unmounting partitions or syncing.
‘e’ – Sends SIGTERM to all process except init.
‘m’ – Output current memory information to the console.
‘i’ – Send the SIGKILL signal to all processes except init
‘r’ – Switch the keyboard from raw mode (the mode used by programs such as X11), to XLATE mode.
‘s’ – sync all mounted file system.
‘t’ – Output a list of current tasks and their information to the console.
‘u’ – Remount all mounted filesystems in readonly mode.
‘o’ – Shutdown the system immediately.
‘p’ – Print the current registers and flags to the console.
’0-9′ – Sets the console log level, controlling which kernel messages will be printed to your console.
‘f’ – Will call oom_kill to kill process which takes more memory.
‘h’ – Used to display the help. But any other keys than the above listed will print help

For example: echo "f" > /proc/sysrq-trigger   -- to call oom_kill.

When running a kernel with SysRq compiled in, /proc/sys/kernel/sysrq controls the functions allowed to be invoked via the SysRq key. Here is the list of possible values in /proc/sys/kernel/sysrq:
  • 0 - disable sysrq completely
  • 1 - enable all functions of sysrq
  • >1 - bitmask of allowed sysrq functions (see below for detailed function description):
  • 2 - enable control of console logging level
  • 4 - enable control of keyboard (SAK, unraw)
  • 8 - enable debugging dumps of processes etc.
  • 16 - enable sync command
  • 32 - enable remount read-only
  • 64 - enable signalling of processes (term, kill, oom-kill)
  • 128 - allow reboot/poweroff
  • 256 - allow nicing of all RT tasks
Why vmmemctl invokes the oom_killer?

vmmemctl works in conditions when some other vm guest require more pages.


It allocates pages to build the required balloon..
It seems, in this situation, vmmemctl tried to allocate pages in our application standby server, and at its last try it got a NULL for its malloc call.. That 's why ; it invoked the oom-killer, and oom-killer killed some important processes to reclaim the memory, which resulted a system crash ..

Also there is known bug against it:
The balloon driver, vmmemctl, is unaware of pinned pages (1003586)



Could be the problem with rsync memory usage then? Considerin again...

We monitored the consequent runs of rsync and I can say that it did not allocate anymore that 20 mb resident memory. It allocates the file cache though.. I mean when we looked the memory consumption we see almost 2gb memory file cache used by rsync, but this was not the issue.
The only effect of it may be a negative performance, as a large files may occupy the cache space and this may force other data blocks out of the cache. This can effect system performance in overall because the other running processes on the system will need to reread their data from disk.

Just in case; we created a cron job for freeing the file based cache and executed in regularly during the rsync executions.
5,10,35 8,10,12,14,18,20,22,0,2,4,6 * * * sync; echo 3 > /proc/sys/vm/drop_caches

Alternatively, a patched version rsync which has --no-cache / drop-cache option may be used in this situation..
Website of the patch : http://insights.oetiker.ch/linux/fadvise/ 
Patch file : http://tobi.oetiker.ch/patches/rsync-3.0.9-2-fadvise.patch
--drop-cache do not cache rsync files (POSIX_FADV_DONTNEED)\n");

What are the solutions for all of these possibilities?

I say possibility because we didnt have the diagnostic data.. It could not be generated..

1) Dropping the caches during rsync run (just in case)
5,10,35  8,10,12,14,18,20,22,0,2,4,6 * * * sync; echo 3 > /proc/sys/vm/drop_caches

2) set overcommit kernel parameters; (just in case)
vm.overcommit_memory = 2
vm.overcommit_ratio = 80

3) Set the vm parameter sched.mem.maxmemctl to an optimized value..

Ref: Vmware
The administrator of a system using pinned pages should specify the maximum balloon size (in megabytes) using the sched.mem.maxmemctlparameter in the virtual machine's .vmx configuration file. This overrides the system-wide default.

For example, consider a virtual machine with:
4GB guest memory
2GB pinned for a high-performance user application

In this example, you can specify sched.mem.maxmemctl = "1330" to limit the memory that the balloon driver will attempt to swap to the guest swap space. In this case, a value of 1330MB is used because it is about 65% of the 2GB that can safely be swapped to disk (this leaves some headroom for pinned pages belonging to the guest operating system).

This recommendation assumes you have followed the instructions in the Resource Management Guidefor ESX Server. Specifically, "...be sure your guest operating systems have sufficient swap space." This swap space must be greater than or equal to:

(the virtual machine's configured memory size - its Reservation) + (the space required to accommodate the workload of the virtual machine)

Furthermore, the maximum balloon size set with sched.mem.maxmemctl must be less than or equal to the configured swap space in the guest operating system. See the Swap Space and Guest Operating System section in the Resource Management Guidefor ESX Server for additional information.


4) Increasing the lowmem protection (just in case)
vm.lowmem_reserve_ratio = 256 256 250
Check out this link :http://ermanarslan.blogspot.com.tr/2014/12/ebs-r12linux-32-bit-application-tier.html

Optional / Disabling the ballooning driver

Ref: Vmware 
Warning: Disabling the balloon driver in a virtual machine results in performance issues with the ESXi/ESX host. For more information, see thevSphere Resource Management Guide.
Disabling ballooning via the vSphere Client

To set the maximum balloon size to zero:

Using the vSphere Client, connect to the vCenter Server or the ESXi/ESX host where the virtual machine resides.
Log into the ESXi/ESX host as a user with administrative rights.
Shut down the virtual machine.
Right-click the virtual machine listed on the Inventory panel and click Edit Settings.
Click the Options tab, then under Advanced, click General.
Click Configuration Parameters.
Click Add row and add the parameter sched.mem.maxmemctl in the text box.
Click on the row next to it and add 0 in the text box.
Click OK to save changes.

In conclusion;

This have become a tricky blog post, as we first go on with rsync, and then completed with vmmemctl :)

As I already mentioned, this was a real life story , so I have written the analysis part as it was in a real life :) That is ; we first got suspected rsync..  Then thought that it was because applications which allocate big amounts of memory altough free memory was not available
On the other hand; the problem was there in the syslogd messages .. So it all started by the vmmemctl kernel driver..
Anyways; I can say that It was a good experience though... Coming down from rsync to vmmemctl made me explore the different applications, which indirectly brought me new perspective for dealing this kind of tainted problems..

Friday, April 3, 2015

Exadata-- ZFS adminstration for DBAs

Nowadays, we see ZFS storages in the Exadata environments..
The Sun ZFS Storage Appliance can be connected to the Exadata Database Machine using InfiniBand or 10GigE infrastructure.
We see ZFS in Exadata environment, because there are advantages of using it..
The key benefits of using ZFS in Exadata environments are ;
  • Implementing highly available and highly performing backup and restore solution 
  • Implementing a Cost effective backup environment
  • Eliminating the configuration and operational guess work to achieve the above backup and restore rates  
  • Offloading backups from Exadata Database Machine so more DATA space is available for database growth or for consolidating additional database on to the Exadata system 
  • Fast backup and restore times that can meet most RPO and RTO requirements
Please read the following for detailed info:
Backup and Recovery Performance and Best Practices using the Sun ZFS Storage Appliance with the Oracle Exadata Database Machine
So , after giving the introduction lets jump to our topic.. As you may guess by its title; this blog post focuses on giving an introduction about the ZFS management in Exadata environment..
If you are an experienced Dba having the general knowledge about the SANS, then managing the ZFS storages is not a very big task for you.
When we examine the environment of a new release ZFS storage like ZFS ZS3-2, we see that we have a Webuserı interface called BUI, and a CLI for managing the whole storage..
Though, the storage uses Solaris as Operating System, we cant login  to the Operating system, it seems to be not supported..
We ofcourse we have ILOM to manage the device, as almost all the latest sun devices have one.
We also have management interfaces to manage the components of the appliance. That is we are managing  our ZFS storages using BUI or CLI..

ILOM is a well known interface, I dont need to see a need to go in the details of it..


In general, ILOM has an IP associated with it. ILOM is working on a Web Server..We reach ILOM interface using the ILOM IP address with https and supply our root user and its password to login.. We change check the memory, CPU , fans, cooling, network devices , pci devices.. We also have the opportunity to manage the power of the storage, and open a remote console.


Note that: We can also reach the ILOM using ssh..

ssh root@192.168.0.8
Password: 
Oracle(R) Integrated Lights Out Manager
Version 3.1.2.18 r81429
Copyright (c) 2013, Oracle and/or its affiliates. All rights reserved.
Warning: password is set to factory default.

Okay, lets continue with the BUI..
I can say that BUI is our friend, it is a user friendly user interface and it is not clumsy (like the Oracle Vm console).. We login to the BUI using our Storage root account (its default password is changeme as for the all sun systems) ..

URL:  https://management_ip_address_of_ZFS:215


Once we login , our dashboard is populated using the information related with components and services such as CPU, Network, NFS, Storage Capacity , Services, Disk performance, ISCSI, SMB , FTP etc..


So The ZFS console welcomes us with status information.. (status tab)
As you in the picture above, we have configuration, maintanence, shares and analytics tabs in the console page, and in every tab there are a several subtabs in which we can take bunch of actions for the associated components..
We can manage shares, changing the ip address make the IPMP configuration, see the logs, display the system info,check the storage health and so on..


For the details about the BUI please read the following doc:

Okay, lets talk about CLI.. It is also an alternative way to manage the ZFS Storage.. It is a admin friendly CLI, so as dba 's and OS admins we easily get used to it.
To connect to the ZFS storage using CLI, we use ssh to connect to the ZFS management ip, we supply the root user and its password and we are in...

We can use the help in CLI to display the available commands.. 

help
Subcommands that are valid in this context:
   configuration        => Perform configuration actions
   maintenance          => Perform maintenance actions
   raw                  => Make raw XML-RPC calls
   analytics            => Manage appliance analytics
   status               => View appliance status
   shares               => Manage shares
   help [topic]         => Get context-sensitive help. If [topic] is specified,
                           it must be one of "builtins", "commands", "general",
                           "help", "script" or "properties".
   show                 => Show information pertinent to the current context
   get [prop]           => Get value for property [prop]. ("help properties"
                           for valid properties.) If [prop] is not specified,
                           returns values for all properties.
   set [prop]           => Set property [prop] to [value]. ("help properties"
                           for valid properties.) For properties taking list
                           values, [value] should be a comma-separated list of
                           values.

The logic in the cli is like the logic in directory tree structure.. 
In order to choose a command or subcommand, we just type its name; 
Like the following example:
exayedek:> configuration

Like the way we use the "ls" command to list the contents in a directory in Linux/Unix systems, we use "ls" command to see available subcommands of a command..

exayedek:configuration> ls
Children:
                              net => Configure networking
                         services => Configure services
                          version => Display system version
                            users => Configure administrative users
                            roles => Configure administrative roles
                      preferences => Configure user preferences
                           alerts => Configure alerts
                          cluster => Configure clustering
                          storage => Configure Storage
                              san => Configure storage area networking

exayedek:configuration> net
exayedek:configuration net> ls
Children:
                        datalinks => Manage datalinks
                          devices => Manage physical devices
                       interfaces => Manage IP interfaces
                          routing => Manage routing configuration

exayedek:configuration net> interfaces

exayedek:configuration net interfaces> ls
Interfaces:

INTERFACE   STATE    CLASS LINKS       ADDRS                  LABEL
ipmp1       up       ipmp  ixgbe1      10.10.10.10/24        exayedek
                           ixgbe2                             
ixgbe0      up       ip    ixgbe0      192.168.0.10/22       Mgmt_Interface
ixgbe1      up       ip    ixgbe1      0.0.0.0/8              ExaYedekInterface1
ixgbe2      up       ip    ixgbe2      0.0.0.0/8              ExaYedekInterface2

help
Subcommands that are valid in this context:

   help [topic]         => Get context-sensitive help. If [topic] is specified,
                           it must be one of "builtins", "commands", "general",
                           "help" or "script".

   show                 => Show information pertinent to the current context

   done                 => Finish operating on "interfaces"

   select [interface]   => Select the specified interface to get its
                           properties, set its properties, or run a subcommand

   list                 => List all interfaces

   destroy [interface]  => Destroy the specified interface

   ip                   => Create an IP interface

   ipmp                 => Create an IP multipath interface

Sometimes when we need choose something , we use "select" command to choose..
We need to use select if the object which is our next move is not a command or subcommand..
For example: 
Here, we choose the ipmp1 in the configuration/net/interfaces

exayedek:configuration net interfaces>  select ipmp1
exayedek:configuration net interfaces ipmp1> 
exayedek:configuration net interfaces ipmp1> ls
Properties:
                         state = up
                      curaddrs = 10.10.10.10/24
                         class = ipmp
                         label = exayedek
                        enable = true
                         admin = true
                         links = ixgbe1,ixgbe2
                       v4addrs = 10.10.10.10/24
                        v4dhcp = false
                       v6addrs = 
                        v6dhcp = false
                           key = 1
                      standbys = 

We can use help command anywhere in the CLI to see our options..
For example: in ipmp1, we can show the details about the ipmp1 configuration.. Also, we can set a property and get a property of it.

exayedek:configuration net interfaces ipmp1> help
Subcommands that are valid in this context:

   help [topic]         => Get context-sensitive help. If [topic] is specified,
                           it must be one of "builtins", "commands", "general",
                           "help", "script" or "properties".

   show                 => Show information pertinent to the current context

   commit               => Commit current state, including any changes

   done                 => Finish operating on "ipmp1"

   get [prop]           => Get value for property [prop]. ("help properties"
                           for valid properties.) If [prop] is not specified,
                           returns values for all properties.

   set [prop]           => Set property [prop] to [value]. ("help properties"
                           for valid properties.) For properties taking list
                           values, [value] should be a comma-separated list of
                           values.

CLI can also be connected from ILOM interface..
We may login to ILOM and start /SP/console to connect to the CLI ..

ssh root@192.168.0.8
Password: 

Oracle(R) Integrated Lights Out Manager
Version 3.1.2.18 r81429
Copyright (c) 2013, Oracle and/or its affiliates. All rights reserved.
Warning: password is set to factory default.
-> start /SP/console
Are you sure you want to start /SP/console (y/n)? y
Serial console started.  To stop, type ESC (
exayedek:configuration net interfaces> 

Okay.. Enough for now.. My next blog post will be based on a real life example  of a Production ZFS environment, which we had a problematic NFS share.. We 'll walk through the concepts used in ZFS networking and use BUI to correct the problem..