Monday, July 22, 2013

Standby Database- MRP restart script // WARN: ARCH: Termination pid hung on an I/O operation

To workaround RFS hung problems (probably due to network problems), following script can be used..
The known issues in Oracle Support does not help, as they declare this problem is fixed in 11.2.0.3.
But I m writing this post because I faced this problem in 11.2.0.3..
I will try to make a brief explanation about the problem..
Following is the method/script that I believe, can be used temporarily, for getting rid of the problem when the usual or planned method isn't working.( for example: if you encounter this error on 11.2.0.3)

STANDBY --> 
WARN: ARCH: Terminating pid 7993 hung on an I/O operation (7993 is the pid of RFS)
Note that ARCH is killing RFS processes,again and again.. 

PROD --> 
ORA-03113: end-of-file on communication channel
WARN: ARC1: Terminating pid 8147 hung on an I/O operation..


RELATED BUGS:
-------------
1. Bug 12737862
- Recommendations there were:
a) Apply the patch for 12776906. This is a merge of 10222719 and BP7 and we 
already have that fix.
b) Set _redo_transport_max_stall_time=3600. Not sure if this is relevant.

2. Bug  11853815
- The problem here was that RFS process kept getting killed.
- In our case it's ARCn processes that get killed.
- Need your advice if this bug is relevant

3. Bug 13595410
- This is more like our issue. Logged recently and still under investigation.

# This ksh automates restarting the MRP process by periodically monitoring
# the V$ARCHIVE_GAP for redo log gaps. If any rows are returned (i.e. a
# gap has been detected) it stops the MRP process and starts a new one.
#
if [ $# -eq 0 ]; then
$0 300 /tmp
exit 0
elif [ $# -ne 2 ]; then
echo "Usage: $0 <monitor interval (s)> <log file directory>"
exit 1
fi
#
monInterval=$1
logDir=$2
#
while [ 1 ]
do
sqlplus -s "/ as sysdba" << eof >> ${logDir}/monGap.log
set echo off feedback off serveroutput on pages 2000 linesize 132 pages 20000
declare
v_isGap number;
begin
select count(*) into v_isGap from v\$archive_gap;
if ( v_isGap > 0 ) then -- We have a gap
dbms_output.put_line(to_char(sysdate,'HH24:MM:SS DD-MON-YYYY')||': Gap detected, restarting MRP');
execute immediate 'alter database recover managed standby database cancel';
execute immediate 'alter database recover managed standby database disconnect using current logfile';
end if;
end;
/
eof
sleep ${monInterval}
done

No comments :

Post a Comment

If you will ask a question, please don't comment here..

For your questions, please create an issue into my forum.

Forum Link: http://ermanarslan.blogspot.com.tr/p/forum.html

Register and create an issue in the related category.
I will support you from there.