Monday, December 28, 2015

EBS 12.0-- Context file problems due to bug 4585869 /Windows

While migrating an EBS 12.0.4 from Windows to Linux, we encountered a weird problem while running adconfig.pl.
The error was in txkConfigDbOcm.pl.. It was giving an error like "BAD NAME... PATH"..

The problem was in the CONTEXT_FILE. That is, some of the values in database CONTEXT_FILE were written wrongly. The xml tags such as xmlparser were including % characters . 
It was like "D:\oracle\PROD\database\11.2.0\%appsutil%/%java%...................... "

The problem was due to 4585869 (XMLPrintDriver.close() does not close the handle properly).

To solve this problem;

We applied patch 4585869 in to the 10.1.3 Oracle Home, then recreated the appsutil and transferred the newly created appsutil.zip to the database server. In the database server; using the new adbldxml.pl delivered with new appsutil.zip, we recreated the CONTEXT_FILE and executed autoconfig once again.

EBS -- Querying Workflow Timeouts from backend

Workflow timeout configurations are normally done and seen using Workflow builder, but here is a query for querying the workflow timeout configuration directly from the database.

select pa.process_item_type, pa.process_name, pa.process_version, pa.activity_item_type, pa.activity_name, erm.value_type, erm.text_value, erm.number_value
from wf_process_activities pa,wf_activity_attr_values erm
where pa.process_item_type = '<PUT_YOUR_ITEM_TYPE_HERE>'
and erm.process_activity_id = pa.instance_id and erm.name = '#TIMEOUT'
and process_name = 'AME_REQUEST_APPROVAL_PROCESS'order by process_version

This query produces an output like the following;


The number_value is the timeouts which we are interested in, in this example... (note that number value is stored in minutes)
Also, we can go one step further and modify the timeout values from the database using the relevant APIS. Altough it will be an unsupported move, it seems applicable.

Thursday, December 17, 2015

EBS/RAC/Exadata : Concurrent:TM Transport Type QUEUE

After RAC migrations, such as "migrating your single instance EBS database to a 2 node RAC enviroment" , you may start encountering errors in PO related workflows..
The error you may get is the following:
Wf_engine_Util.Function_Call(PO_REQAPPROVAL_ACTION.OPEN_DOC_STATE,POAPPRV..


The solution is already documented in 11.5.10: Intermittent Document Manager Error #3 in VERIFY_APPROVER_AUTHORITY in RAC Instance (Doc ID 376556.1)

The document says it is Error3 , so it is different than the error presented in the screenshot above .( it is error 1 there), but it does not matter, the fix is the document is the solution.

The solution is basically setting "Concurrent:TM Transport Type" profile to "QUEUE" in site level.

But why?

Why we are setting Concurrent:TM Transport Type to "QUEUE"?

The answer is that, normally Transaction Manager uses DBMS PIPEs (When, Concurrent TM Transport Type is set to "PIPE") to communicate with with the Service Manager, and any user process initiated by Forms, or Standard Manager requests. DBMS_PIPE is by design requires the sessions which there is a need for establising a communication between eachother , to be in the same instance.
As, DBMS_PIPE is a package that lets the sessions in the same instance to communicate using Oracle Pipes, using them in a RAC environment is not acceptable. That's why we set Concurrent:TM Transport Type to "QUEUE" in RAC environments unless we have multiple application nodes which have concurrent manager services dedicated to their own database instances.

RDBMS/Exadata/RAC -- What do we recommend for recoverability?

Here is a list of techonologies and configurations which we recommend for having an optimal recoverability according to the general RPO and RTOs defined in Customer environments.
Ofcourse, these are just titles but we have lots of things to say about these subjects as well. 
On the other hand, we are going into the details when we are doing recoverability assessments.. So this is not the place for giving the details:)
  • Implement Oracle RAC
  • Run services on multiple instance Oracle RAC 
  • Use ASM for placing the Oracle Database files
  • Contact third party vendors for storage based replication technologies and ensure best practices are implemented
  • Place the redolog files in multiple mount points
  • Place the control files in multiple mount points 
  • Implement Dataguard and utilize Standby Database (one local , one remote)
  • Configure archive_lag_target (for standby)
  • Use Enterprise Manager
  • Use synchronized Dataguard in the local site
  • Use DataGuard Broker:
  • Document the restore, recovery and standby failover,switch operations
  • Enable Flashback Database option
  • Use Guaranteed recovery points
  • Use Retention guarantee for Flashback Queries: 
  • Use automated Backups
  • Increase Backup & Recovery speed 
  • Transfer Database Backups to multiple Locations
  • Configure a recovery window based backup retention:
  • Build the restore, recovery, failover task flow , define the communication methods and document everything
  • Test restore, recovery and disaster scenarios
  • Periodically run orachk or exachk(if it is an Exadata) , and run the reported recoverability related problems.
  • For Exadata : Ensure  Exadata best practices documented in the whitepaper named “Backup and Recovery Performance and Best Practices for Exadata Cell and Oracle Exadata Database Machine” implemented or at least make plans for implementing them
  • Example to Recoverability related problems (exachk):
o   One or more Ethernet network cables are not connected.
o   Database parameters log_archive_dest_n with Location attribute are NOT all set to recommended value
o   Storage Server alerts are not configured to be sent via email
o   Active system values should match those defined in configuration file "cell.conf"
o   System is exposed to Exadata Critical Issue DB18
o   System is exposed to Exadata Critical Issue DB23
o   System is exposed to Exadata Critical Issue DB24
o   System is exposed to Exadata Critical Issue EX19
o   Grid Version is older than the recommended
o   Storage Server Version is older than the recommended
o   Exadata bundle patch  Version is older than the recommended
o   Database parameter LOG_BUFFER is NOT set to recommended value
o   fast_start_mttr_target should be greater than or equal to 300
o   The data files should be recoverable
o   Flashback on PRIMARY is not configured
o   Primary database is NOT protected with Data Guard (standby database) for real-time data protection and availability
o   RMAN controlfile autobackup should be set to ON
o   Oracle database does not meet certified platinum configuration

EBS -- Discoverer 4i and Redhat/Oracle Linux 5 and above

Discoverer 4i may be still in use in your environments.
Especially if you have EBS 11i, you may have Discoverer 4i as it comes builtin with EBS 11i application tier.
One thing I want to remind is that, Discoverer 4i is not supported since 2006 and it is just not
working with Operating Systems like Redhat5/Oracle Linux 5 and above.

There is nothing we can do about it, as it is already documented in Error When Trying To Start Discoverer (Addisctl.Sh) (Doc ID 854976.1).
The error is:
8.0.6 $ORACLE_HOME/jre1183o/lib/i686/green_threads/libzip.so: symbol errno, version GLIBC_2.0 not defined in file libc.so.6 with link time reference (libzip.so)
Unable to initialize threads: cannot find class java/lang/Thread
Could not create Java VM


When you upgrade your OS to be Redhat / Oracle 5 or above, it is not certified at all.
The certified version is Dicoverer 10.1.2 and the solution for this is to upgrade Discoverer using note:
Note 313418.1 Using Discoverer 10.1.2 with Oracle E-Business Suite 11i.

This was the case in our last ODA X5 migration project, as we have migrated the EBS Application Tier from ORacle Linux 4 nodes to the Oracle Linux 5 VM nodes in ODA X5.

Wednesday, December 16, 2015

RAC - Importance of file permissions in RAC environment (Grid filesystem), CRS-5802, a use of awk command in Dba Life.

Recently got solved a permission problem in a RAC environment.
The issue have started after some junior dba changed the ownership of the root file system .
The impact of this operation was in srvctl.
As, from the node where the ownership of files was changed, the database could not be started using srvctl.
Also the other node, which did not have any problems  can only start its own instance .
In other words; "srvctl start database -d PROD" command could not start the instance in node1 where the permission of files were changed.

srvctl start database -d PROD command was failing with the following:

PRCR-1013 : Failed to start resource ora.PROD1.db
PRCR-1064 : Failed to start resource ora.PROD1.db on node1
CRS-2680: Clean of 'ora.PROD1.db' on 'node1' failed
CRS-5802: Unable to start the agent process
I have analyzed the crsd log file and found that it was the oraagent process that was failed to be started.

The reason behind this was -> the permission of the files in the directory $ORACLE_BASE/crsdata/output/ crsd_oraagent were not proper.

Then I logged in to the node2 which didnt not have this problem.
Changed my directory to $ORACLE_BASE/crsdata/output/ crsd_oraagent and run the following awk command to generate the chown commands which need to be executed in node1 to correct permissions for me. There were several files and I thought that using a dynamic method for generating the chown command might ease my job, and actually it did.

I used the following command in node2 and executed the output of this command in node1 to correct the file permissions in node1.

cd $ORACLE_BASE/crsdata/output/ crsd_oraagent
ls -l | awk '{print "chown " $3 ":" $4 " " $9}'

EBS - Exadata - regarding db_block_checking, db_block_checksum and db_lost_write_protect parameters

It is Ok to set db_block_checking, db_block_checksum and db_lost_write_protect parameter for an EBS database.
That is ; these parameters can be used to increase db level protection and they are supported for EBS databases as well. 
On the other hand; it is good to know that db_block_checksum and db_lost_write_protect may introduce an performance overhead between %1 - %10 and db_lost_write_protect may increase redo generation as well as may also introduce a performance overhead, which we can not predict at the moment.
In Exack reports, "DATA CORRUPTION PREVENTION BEST PRACTICES" may fail when these parameters are not set.

Exadata -- Database initialization parameter : cluster_interconnects parameter

It is not recommended to use HAIP infiniband ip addresses as preferred cluster interconnects for the database. It is recommended to use static ib0 and ib1 (not ib0:1, ib1:1)
Making an Oracle database running in Exadata use static infiniband interconnect ip address relies on setting cluster_interconnects parameter.
If not set, Oracle database by default chooses HAIP infiniband addresses for the cluster interconnect and it is not recommended.

This recommendation can also be viewed by analyzing an Exachk report.
that is , if we don't set the cluster_interconnects parameter in the database and leave Oracle database to use the default HAIP interconnects, then Exachk  will report a failure saying "Database parameter CLUSTER_INTERCONNECTS is NOT set to the recommended value"

Also when we click on the Details link in Exachk report; here is what we see;
(pay attention to the sentence written in bold )

Database parameter CLUSTER_INTERCONNECTS should be a colon delimited string of the IP addresses returned from sbin/ifconfig for each cluster_interconnect interface returned by oifcfg. In the case of X2-2 it is expected that there would only be one interface and therefore one IP address. This is used to avoid the Clusterware HAIP address; For an X2-8 , the 4 IP addresses should be colon delimited

So use ifconfig to determine the ip addresses assigned for ib0 and ib1 interfaces (not the ib0:1 or ib1:1) on all the rac nodes, and set these ip address in a colon delimeted strings for all the instances and restart the database;

Ex:

alter system set cluster_interconnects='ib_ipaddress1ofnode1:ib_ipaddress2ofnode1' scope=spfile sid='SIDinst1';

alter system set cluster_interconnects='ib_ipaddress1ofnode2:ib_ipaddress2ofnode2' scope=spfile sid='SIDinst2';

Exadata - Exachk finding: The bundle patch version installed does not match the bundle patch version registered in the database

You may encounter a failed check in Exachk report saying : "The bundle patch version installed does not match the bundle patch version registered in the database"

This may likely be encountered after applying a Database Bundle Patch, like "Patch 20950328 (DATABASE BUNDLE PATCH: 12.1.0.2.9"

This exack failure is based on the data that needs to be stored in "dba_registry_sqlpatch".. Exach just try to match the patch info stored in oraInventory with the patch info stored in dba_registry_sqlpatch. If it can't match those, it concludes this check as a failed one.

The manuel check can be done by executing the following shell commands in sequence;

opatch_bp=$($ORACLE_HOME/OPatch/opatch lspatches 2>/dev/null|grep -iwv javavm|grep -wi database|head -1|awk -F';' '{print $1}') 

database_bp_status=$(echo -e "set heading off feedback off timing off \n select STATUS from dba_registry_sqlpatch where PATCH_ID = $opatch_bp;"|$ORACLE_HOME/bin/sqlplus -s " / as sysdba" | sed -e '/^ *$/d')

if [ "$database_bp_status" == SUCCESS ]
then
      echo "SUCCESS: Bundle patch installed in the database matches the software home and is installed successfully."
else
      echo "FAILURE: Bundle patch installed in the database does not match the software home, or is installed with errors." 
 fi

The failure is actually caused by not running post installation instructions, as these post instructions makes you running datapatch to record the patch information in to the sql inventory.
On the other hand; the readme of the bundle patches are a little wrong, as running datapatch may require the database to be in upgrade mode and if you are patching Exadata , which is generally a RAC based environment, you need to set the cluster_database=false before starting the database using startup upgrade command;
Also note that, you must have at least 1 job_queue_process in the database environment.

Running datapatch is easy.

go to $ORACLE_HOME/OPatch directory and execute it as follows;

[oracle@exadatanode1 OPatch]$ ./datapatch -verbose
SQL Patching tool version 12.1.0.2.0 on Tue Dec 15 18:15:04 2015
Copyright (c) 2015, Oracle.  All rights reserved.

Log file for this invocation: /oracle/PROD/proddb/cfgtoollogs/sqlpatch/sqlpatch_130061_2015_12_15_18_15_04/sqlpatch_invocation.log

Connecting to database...OK
Bootstrapping registry and package to current versions...done
Determining current state...done

Current state of SQL patches:
Patch 20415564 (Database PSU 12.1.0.2.3, Oracle JavaVM Component (Apr2015)):
  Installed in the binary registry only
Bundle series DBBP:
  ID 9 in the binary registry and not installed in the SQL registry

Adding patches to installation queue and performing prereq checks...
Installation queue:
  Nothing to roll back
  The following patches will be applied:
    20415564 (Database PSU 12.1.0.2.3, Oracle JavaVM Component (Apr2015))
    20950328 (DATABASE BUNDLE PATCH: 12.1.0.2.9 (20950328))

Installing patches...
Patch installation complete.  Total patches installed: 2

Validating logfiles...
Patch 20415564 apply: SUCCESS
  logfile: /oracle/PROD/proddb/cfgtoollogs/sqlpatch/20415564/18617752/20415564_apply_PROD_2015Dec15_18_15_39.log (no errors)
Patch 20950328 apply: SUCCESS
  logfile: /oracle/PROD/proddb/cfgtoollogs/sqlpatch/20950328/18903184/20950328_apply_PROD_2015Dec15_18_15_40.log (no errors)
SQL Patching tool complete on Tue Dec 15 18:16:24 2015

Tuesday, December 15, 2015

EBS 11i/R12 - Concurrent Manager, Target node/queue unavailable,target_node updated as "null", RCVOLTM

Normally, when we see target node/queue unavailable reported in the status field of an EBS concurrent manager, we as Apps Dbas apply the possible solution, which is setting control code to null and setting target_node for that concurrent queue to be the application server on which the concurrent managers suppose to run.

The action plan for this kind of scenarios is as follows;

shutdown the problematic concurrent manager (it should be already shutdown actually)
  1. sqlplus apps/apps
  2. update FND_CONCURRENT_QUEUES set control_code = null  where concurrent_queue_name like %'Concurrent Manager Name%'; such as RCVOLTM
  3. update FND_CONCURRENT_QUEUES set TARGET_NODE='<correct node >'   -> the name of the application server where CONCURRENT_QUEUE_NAME='Concurrent Manager Name'; such as RCVOLTM
  4. commit;
  5. Start the problematic concurrent manager
On the other hand, this may not solve your problem, if the binaries which is executed the Concurrent Manager is problematic.
This was happen to me while starting Receiving Transaction  Manager in a newly cloned environment. 
I have seen the target node/queue unavailable status in the manager's status column, and directly applied the above action, but the action plan didn't solve my problem and everytime when I have restarted the concurrent manager, the target_node column was updated to null automatically.
Then, I relinked the RCVOLTM ( or I could relink all the application programs using adadmin) and then applied the above action plan and the problem dissapeared.

So, if a target_node is updated to null automatically during restarting a concurrent manager , then most probably, the binary/underlying program bound to that concurrent manager is corrupt. Execute it by hand to see the problem, relink/compile to solve the problem.

So, that was the tip of the day. It worths to write it as there is no documentation for that.


Friday, December 4, 2015

EBS R12(12.1) - ALECTC module- signal 11

You may encounter ALECTC signal 11 while EBS is executing the concurrent programs triggered by Oracle Alerts.
This was happen to me in my last EBS R12-Exadata migration.
The root document to follow was What To Do When Receiving A Signal 11 For One or All Oracle Alerts? (Doc ID 1354644.1)
The solution was applying the following patches.
Note that, even if it was not related with the password case sensitivity, it was useful to apply the patch for that bug as well.

12.1: Patch 13728376 - Alert fails with signal 11 error after password case sensitivity fix (aluapp.lc  120.1.12010000.4)
12.1: Patch 9908378 - RELEASE 12 BUG FOR SIGNAL 11 WITH ALPPWF  (alppwf.lc  120.0.12010000.3)
12.1: Patch 9817770 POST-R12.ATG_PF.B.DELTA.3 CONSOLIDATED PATCH (alssmn.lc 120.22.12010000.9)

Thursday, December 3, 2015

RDBMS -- Log Writer(LGWR), log file sync, log file parallel write and the Disk performance

In this post, you can find some quick tips for Log Writer IO waits.
Well, if the log file sync time and log file parallel write times are high, then these probably mean that our IO subsystem has performance problems.

If that is the situation; we need to consider the following;
  • We need to work with the Storage and System admins to analyze the filesystems where our redologs reside, because most probably, there is a problem with the performance at that level and the performance of the storage device needs to be improved.
  • We need to check if our redolog files are on RAID-5 or RAID-6., as it is not recommended to be so, as the redolog files should not be placed on raid configuration which needs the parities to be calculated. That is, we need to put our redolog files at least into Raid  mirrors.
  • We must not to use SSDs(Solid State Disks) for placing our redolog files unless our database is on Exadata , Supercluster or Oracle Database Appliance. Note that: Oracle Engineered Systems are optimized for using Redologs with SSDs.
  • We need to be sure that no other processes creating a load on the the disks where our redolog files reside , as this processes may cosume the I/O bandwidth. So if there is such a situation, we need to move those processes or we need to move the redolog files to other disks.
  • We need to ensure that our log_buffer is not a very big one. A big log buffer may introduce a performance problem, as LGWR will need a lot of I/O to flush the buffer when it will be filled. As, LGWR has to write all the data in to the redolog files and as LGWR will wait until all I/O is completed, having a big log buffer may make LGWR to aggresively write a big sized data(redo) in once, thus may introduce I/O waits according to the performance capacity of the underlying storage.
  • Even if our log file syncs are high and log file parallel writes are low; still this may be related with the I/O performance caused by the I/O peaks which are not reflected to AWR or average wait times.If that s the situation, we need to the OSWATCHER to identify the I/O peaks.
  • IF we see any I/O peaks which are encountered at the same of time(or little earlier) that we see the log file sync wait,  then again  we need work with Sys admin and storage admin and make them configure the I/O device handle these kind of peaks.
  • Lastly, we can use the following document to get traces from LGWR if there are log file related wait events. Document:601316.1 LGWR Is Generating Trace file with "Warning: Log Write Time 540ms, Size 5444kb" In 10.2.0.4 Database

RDBMS -- Standby database analysis.

Recently, I have involved a recovery assesment project leaded by EMC.
One of the assignments of this project was to gather standby related information and analyze the dataguard configuration. I had no connection to the databases, so I had to request the information by already prepared queries and here is the list of the queries I have used to gather the standby related information.
Note that: these queries need to be executed in standby database.

generic information and the data protection mode.

Select database_role role, name, db_unique_name, platform_id, open_mode, log_mode, flashback_on, protection_mode, protection_level  FROM v$database;

Force logging enabled or not, dataguard broker used or not

Select force_logging, remote_archive, supplemental_log_data_pk, supplemental_log_data_ui, switchover_status, dataguard_broker FROM v$database;

Thread numbers and instance names

Select thread#, instance, status FROM v$thread;

Are the source db and standby db in the same oracle version or not

Select thread#, instance_name, host_name, version, archiver, log_switch_wait FROM gv$instance ORDER BY thread#;

Number and sizes of the redologs

Select thread#, group#, sequence#, bytes, archived ,status FROM v$log ORDER BY thread#, group#;
standby redolog information , size of standby redologs equal the size of online redologs or not

Select thread#, group#, sequence#, bytes, archived, status FROM v$standby_log order by thread#, group#;

archive log destinations

Select thread#, dest_id, destination, gvad.status, target, schedule, process, mountid mid FROM gv$archive_dest gvad, gv$instance gvi WHERE gvad.inst_id = gvi.inst_id AND destination is NOT NULL ORDER BY thread#, dest_id;

Select thread#, dest_id, gvad.archiver, transmit_mode, affirm, async_blocks, net_timeout, delay_mins, reopen_secs reopen, register, binding FROM gv$archive_dest gvad, gv$instance gvi WHERE gvad.inst_id = gvi.inst_id AND destination is NOT NULL ORDER BY thread#, dest_id;

any errors occured the last time an attempt to archive to the destination was attempted

Select thread#, dest_id, gvad.status, error from gv$archive_dest gvad, gv$instance gvi WHERE gvad.inst_id = gvi.inst_id AND
destination is NOT NULL ORDER BY thread#, dest_id;

status of processes involved in the shipping redo on this system

Select thread#, process, pid, status, client_process, client_pid, sequence#, block#, active_agents, known_agents
from gv$managed_standby ORDER BY thread#, process;

the last sequence# received and the last sequence# applied to standby database

Select al.thrd "Thread", almax "Last Seq Received", lhmax "Last Seq Applied" FROM (select thread# thrd, MAX(sequence#) almax FROM v$archived_log WHERE resetlogs_change#=(SELECT resetlogs_change# FROM v$database) GROUP BY thread#) al, (SELECT thread# thrd, MAX(sequence#) lhmax FROM v$log_history WHERE resetlogs_change#=(SELECT resetlogs_change# FROM v$database) GROUP BY thread#) lh WHERE al.thrd = lh.thrd;

how often and how far the apply lags

Select name, time, unit, count, TO_DATE(last_time_updated, 'MM/DD/YYYY HH24:MI:SS') FROM v$standby_event_histogram
ORDER BY unit DESC, time;

any archive gaps

Select * From v$archive_gap;

how much redo data generated by the primary database is not yet available on the standby database, how much redo data could be lost if the primary database were to crash at the time I queried 

Select * from v$dataguard_stats WHERE name LIKE '%lag%';

Non default standby init parameters

SELECT num, '*' "THREAD#", name, value FROM v$PARAMETER WHERE NUM IN (SELECT num FROM v$parameter WHERE (isdefault = 'FALSE'
OR ismodified <> 'FALSE') AND name NOT LIKE 'nls%' MINUS
 SELECT num FROM gv$parameter gvp, gv$instance gvi WHERE num IN (SELECT DISTINCT gvpa.num FROM gv$parameter gvpa, gv$parameter gvpb
   WHERE gvpa.num = gvpb.num AND  gvpa.value <> gvpb.value AND (gvpa.isdefault = 'FALSE' OR gvpa.ismodified <> 'FALSE') AND gvpa.name
     NOT LIKE 'nls%') AND gvi.inst_id = gvp.inst_id  AND (gvp.isdefault = 'FALSE' OR gvp.ismodified <> 'FALSE') AND gvp.name
     NOT LIKE 'nls%')
    UNION
    SELECT num, TO_CHAR(thread#) "THREAD#", name, value FROM gv$parameter gvp, gv$instance gvi WHERE num IN (SELECT DISTINCT gvpa.num
     FROM gv$parameter gvpa, gv$parameter gvpb WHERE gvpa.num = gvpb.num AND gvpa.value <> gvpb.value AND (gvpa.isdefault = 'FALSE' OR gvpa.ismodified <> 'FALSE') AND gvp.name NOT LIKE 'nls%') AND gvi.inst_id = gvp.inst_id  AND (gvp.isdefault = 'FALSE' OR gvp.ismodified <> 'FALSE') AND gvp.name NOT LIKE 'nls%' ORDER BY 1, 2;