Wednesday, October 18, 2017

EBS 12.2 -- oacore server start problem -- java.util.zip.ZipException: error reading zip file

We encountered a strange problem in an EBS 12.2.6 environment, built on Solaris 11 sparc servers.
The problem started after the Dba restarted the application services.
The problem was directly related with oacore..
oacore_server1 and oacore_server2 could not be started. (it was a multi node apps tier environment, built on shared appl_top)

While, all the other managed servers(like forms) and the Admin Server could be started without any problems, oacore servers could not.

Oacore managed servers could not be started because of the following error;

java.util.zip.ZipException: error reading zip file
at java.util.zip.ZipFile.read(Native Method)
at java.util.zip.ZipFile.access$1400(ZipFile.java:56)
at java.util.zip.ZipFile$ZipFileInputStream.read(ZipFile.java:679)
at weblogic.utils.io.DataIO.readFully(DataIO.java:351)
at weblogic.utils.io.DataIO.readFully(DataIO.java:328)
at weblogic.utils.classloaders.ZipSource.getBytes(ZipSource.java:76)
at weblogic.utils.classloaders.GenericClassLoader.defineClass(GenericClassLoader.java:330)
at weblogic.utils.classloaders.GenericClassLoader.findLocalClass(GenericClassLoader.java:302)
at weblogic.utils.classloaders.GenericClassLoader.findClass(GenericClassLoader.java:270)
at weblogic.utils.classloaders.ChangeAwareClassLoader.findClass(ChangeAwareClassLoader.java:64)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at weblogic.utils.classloaders.GenericClassLoader.loadClass(GenericClassLoader.java:179)
at weblogic.utils.classloaders.ChangeAwareClassLoader.loadClass(ChangeAwareClassLoader.java:43)
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)

This is an undocumented and an interesting real life case.

I will give the cause and the solution shortly, but first let's look at what we did to correct the problem, or let's say; find the underlying cause of this problem.

It was obvious that during the start of the oacore servers, oacore application was deployed by EBS. ( weblogic)

The problem was on this deploy.. 

During the deployment, some zip files could not be read! (it may be a zip file, jar file or war file)

The error stack was saying these above things but it didn't give us the name of that problematic file.
  • So, we enabled debug on WLS. We enabled debug for Deployer as well.
Enabling debug:
Environment > Servers > MyServer > Debug > weblogic
Then, enable the level of debug you need, e.g.: Deploment.
This change does not require WebLogic Server Restart.
Make sure the severity is set to debug in Weblogic console:
Environment > Servers > MyServer > Logging >Advanced > Minimum severity to log: Debug

Even after enabling the debug, the name of the problematic file could not be determined.
  • We executed ChkEBSDepencencies to ensure that there is no dependency failure.
$FND_TOP/bin/txkrun.pl -script=ChkEBSDependecies -server=ALL_SERVER

This was successful.. So dependencies were not the cause.
  • We exucted truss to see which jar/zip file is having issues.
 truss -daefo /tmp/to_erman.log admanagedsrvctl.sh start oacore_server1

However, truss didn't give us the name.. (or the output file was so big to be make good analysis)
  • Modified the ulimits (especially hard and soft files and process limits). Again, not fixed.
  • Tried to create a new oacore server and work around the problem in case, it could be related with a specific oacore_server using "$AD_TOP/patch/115/bin/adProvisionEBS.pl ebs-create-managedserver"
However, this command failed as it tried to start the new server once it was created and, that new managed server (lets say oacore_server2) failed with the same zip error!  So this wasn't the solution or the workaround.
  • Checked the AD and TXK patch level, but they were already high..
SQL> select ABBREVIATION, NAME, codelevel FROM AD_TRACKABLE_ENTITIES where abbreviation in ('txk','ad');

ABBREVIATION NAME CODELEVEL
ad Applications DBA C.9
txk Oracle Applications Technology Stack C.9
  • Did the following things as instructed by Oracle Support: (altough I found them unrelated with our zip issue)
1. Set SITE level profile option "FND: Disable Inline Attachments" (FND_DISABLE_INLINE_ATTACHMENTS) to a value of "TRUE"
2. Re-start EBS middle tier services to ensure the profile option change is picked up
3. Monitor for any further recurrence of the issue

1.set s_jdbc_connect_descriptor_generation parameter to TRUE on the Target instance
2. Run autoconfig for the affected parameters to reference Target instance
3. Re-test issue

As I expected, these moves didn't solve the issue.

Okay.. Let's see how I found the problematic file and how I fixed the issue ->>

After trying the attempts above, I decided to regenerate the Jar files using adadmin.

I knew that those jar files were used by oacore servers, but I wasn't expecting that there were zip files used during the deployment /start of oacore_server + I didn't expected the same zip files were used when we run the regenerate jar files using adadmin..

So, I executed the adadmin and tried to relink the jar files.
adadmin failed with error, so I checked the adadmin.log file.

There it was..!  the I/O errors...

ERROR: I/O error while attempting to read /u01/app/fs1/EBSapps/comn/java/lib/DnBGlobalAccess.zip

ERROR: I/O or zip error while attempting to read entry oracle/dss/dataView/AdornmentLayout.class in zip file /u01/app/fs1/EBSapps/comn/java/lib/bipres.zip

ERROR: I/O or zip error while attempting to read entry oracle/apps/edr/security/server/EdrVpdRuleEOImpl.class in zip file /u01/app/fs1/EBSapps/comn/java/classes

So, the zip and some class files in the $JAVA_TOP could not be read due to I/O errors.

After seeing these errors, I diretly jumped to the filesystem and tried to copy those problematic files using cp command.

I/O Errors, again !! Solaris could not copy them due to I/O errors..

So, the files were corrupted on OS/Storage layer, on filesystem layer.. ( I sent this info to the OS team and requested a host and fs check from them)

What I did for the fix was simple;

I renamed those files and copied them from the patch filesystem. (checked patch fs, these files were identical as the run filesystem)

Copy was successful.. So the files in patch fs were not corrupted .

After copying them from patch fs, I executed the adadmin again. (generate jar files)
This time, it successfuly completed.

After that, I started the services using adstrtal.sh

This time, oacore_server1 and oacore_server2 could succesfully started!!

So, at the end of the day, I spent almost 6 hours to solve this.. 
No sleep during the diagnostics work!

Unfortuneatly, the issue was undocumented and there was no method to see the problematic zip file other than executing adadmin regenerate jar files..

Anyways, I hope you find this post useful.

Wednesday, October 11, 2017

Oracle Database Appliance / ODA X7-2 released!

Oracle released ODA X7-2, new generation of the ODA machine. This new ODA has more cpu cores, more processing power and more disk capacity than the former, ODA, ODA X6-2.

What is more interesting than these improved system resources is, that, ODA X7-2 will support Standard edition databases even in its HA model!

We will have S, M and HA model in ODA X7-2. So, there is no ODA Large (L) model in ODA X7-2 family..
I think, we will see these machines in several customer environments, in the folowing days..
I'm already excited about it :)

You can read more on :

http://www.oracle.com/technetwork/database/database-appliance/overview/index.html
https://www.oracle.com/engineered-systems/database-appliance/x7-2m/index.html

ODA X6-2M -- virtualization with KVM -- a real life example and my first thoughts

Recently, created a virtualized environment in ODA X6-2M.
I used Kernel Based Virtual Machine for virtualizing this new ODA Medium Model, as instructed by Oracle.

The machine that I worked was like the following ->

[root@odax6 ~]# odacli describe-component
System Version  
---------------
12.1.2.11.0

Component                            Installed Version    Available Version   
---------------------------------------- -------------------- --------------------
OAK                                      12.1.2.11.0               up-to-date          
GI                                         12.1.0.2.170418       up-to-date          
DB                                        11.2.0.4.170418       up-to-date          
ILOM                                   3.2.7.26.a.r112632   3.2.9.23.r116695    
BIOS                                    38050100                 38070200            
OS                                        6.8                           up-to-date  


[root@odaX6 ~]# odacli describe-appliance

Appliance Information                                           
---------------------------------------------------------------- 
                     ID: xxxxxxxxxxxxxxxxxxxx
               Platform: OdaliteM
        Data Disk Count: 2
         CPU Core Count: 20

                Created: August 22, 2017 1:19:18 PM EET

The OS of this ODA machine was Oracle Linux 6.8.
I want to call it as the new ODA, but ODA X7-2 is just released :) It is hard to keep up with this ODA family :)

Anyways, ODA X6-2M is not configured with KVM out of the box.
So, I needed to make the KVM enablement of this environment.
I must admit that, it was pretty easy to enable KVM on this machine.

I just started the libvirtd and installed the virt-manager, which is the GUI of ODA.

[root@odax6 ~]# service libvirtd start
Starting libvirtd daemon: 
[root@odax6 ~]# service libvirtd status
libvirtd (pid  8943) is running...

[root@odax6 yum.repos.d]# wget http://yum.oracle.com/public-yum-ol6.repo
[root@odax6 yum.repos.d]# yum install virt-manager

That was it, the KVM enablement was done!.

After this point, I continued with the storage pool and KVM network configurations.

In order to configure/create the storage pool; I first created an ACFS volume using asmca ->

[grid@odax6 asmca]$ asmca -silent -createVolume -volumeName kvm_repo1 -volumeDiskGroup DATA -volumeSizeGB 300 -sysAsmPassword welcome1
[grid@odax6 asmca]$ asmcmd volinfo -G DATA kvm_repo1 | grep -oE '/dev/asm/.*'
/dev/asm/kvm_repo1-33

Then, I created the ACFS filesystem on top of it and mounted it using a single command; --again using asmca silently ->

[grid@odax6 asmca]$ asmca -silent -createACFS -acfsVolumeDevice /dev/asm/kvm_repo1-33 -acfsMountPoint /kvm_repos/kvm_repo1

ASM Cluster File System created on /dev/asm/kvm_repo1-33 successfully. Run the generated ACFS registration script /u01/app/grid/cfgtoollogs/asmca/scripts/acfs_script.sh as privileged user to register the ACFS with Grid Infrastructure and to mount the ACFS. The ACFS registration script needs to be run only on this node: odax6.

-- needed to run acfs_script.sh using root as a part of this ACFS creation.

[root@odax6 ~]# sh /u01/app/grid/cfgtoollogs/asmca/scripts/acfs_script.sh

ACFS file system /kvm_repos/kvm_repo1 is mounted on nodes odax6

Later on, I checked my mounts and saw that the new ACFS is there.

root@odax6 ~]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/VolGroupSys-LogVolRoot
                       30G   21G  7.7G  73% /
tmpfs                 126G  631M  126G   1% /dev/shm
/dev/sda1             477M   46M  406M  11% /boot
/dev/mapper/VolGroupSys-LogVolOpt
                       59G   11G   46G  18% /opt
/dev/mapper/VolGroupSys-LogVolU01
                       99G   21G   73G  23% /u01
/dev/asm/dattest-33   100G  1.7G   99G   2% /u02/app/oracle/oradata/test
/dev/asm/reco-481     149G  5.2G  144G   4% /u03/app/oracle
/dev/asm/commonstore-33
                      5.0G   49M  5.0G   1% /opt/oracle/dcs/commonstore
/dev/asm/kvm_repo1-33
                      300G  648M  300G   1% /kvm_repos/kvm_repo1

After the creation of the repo, I started the repo and also made it autostart.

[root@odax6 ~]# virsh pool-start kvm_repo1
Pool kvm_repo1 started
[root@odax6 ~]# virsh pool-autostart kvm_repo1
Pool kvm_repo1 marked as autostarted

Then checked it to see whether it is there and whether its size and everything were configured properly.

Note that virsh is a command line management tool for KVM.

[root@odax6 ~]# virsh pool-info kvm_repo1
Name:           kvm_repo1
UUID:           97dda9d1-ca6b-c9e2-bbfc-901cc2274898
State:          running
Persistent:     yes
Autostart:      yes
Capacity:       300.00 GiB
Allocation:     647.62 MiB
Available:      299.37 GiB

[root@odax6 ~]# virsh vol-list --pool kvm_repo1
Name                 Path                                    
-----------------------------------------
lost+found           /kvm_repos/kvm_repo1/lost+found  


At this point , my storage pool and volumes were configured properly.

Continued with the network stack..

I needed to arrange a network, a virtual interface for the virtual machines that would reside on this ODA environment. (The virtual machines in my case, were EBS Application Tier nodes. )
I had 2 option. (actually 3 , if we count networking with MacVTap..)

Anyways, the first option was Nat forwarding. Nat forwarding could not be used in my case, because the network of ODA X6-2M and the network of the virtual Application tier nodes(that would reside on ODA X6-2M) were the same. Their IPs were from the same block , so I had to use the other option, which was the Bridged networking ("shared physical device").

This method was actually the full bridging, which could let the guest (EBS Application tier nodes in my case) to be able to connect directly to the LAN.

In order to configure these network things , I used the virt-manager. 
However, virt-manager had some fonts problems, so I needed to fix them first.
Here is a little info, and the fix for it:


virt-manager is management interface that eases the administration of the KVM environment (in ODA or in anywhere else) It is called  Virtual Machine Manager and it is executed using the command virt-manager (using root).
As it is a GUI, it needs a X environment to run it.
In Oracle Linux world, as you may also agree, we mostly use vncserver for displaying the X screens remotely.
So, we connect to the vncserver (or we can use ILOM remote connection or anything that does the same thing) and execute the virt-manager to start the Virtual Machine Manager for KVM.
The issue starts here.
After the deployment of ODA and enabling the KVM, we run the virt-manager command and we see the garbage characters.
We actually see little squares rather than the characters and fonts.
So, in order to fix this, we basically need to install the fonts that Virtual Machine Manager needs.
A simply yum command can do this work and this little piece of information may save you time :)
Fix: yum install dejavu-lgc-sans-fonts

Well.. After the fix, I could use the virt-manager without any problems.

So in order to configure the vm network; I did the following;

Opened virt-manager.

Connected to the KVM environment.

Created a bridge named br1 on btbond1 and activated it directly. (using the network interface tab)

--I used this bridge for multiple machines. (I had 2 apps Vm machines on ODA, so their virtual NICs are based on this bridge called br1)

All done from GUI (virt-manager) and that was it..

My KVM network was configured.

The last thing to do was, creating my virtual machines for my EBS Apps nodes and installing the Operating Systems (in my case, Linux) on them.

The virtual machine creation and OS installation was extremely straight forward.

Again , I used the virt-manager.

In order to create a virtual machine and configure it to be booted with OS installation media, I did the following ->

I clicked the "Create a new virtual machine" button to open the new vm wizard

Specified the installation type "Local install media (ISO image)" -- clicked next:)

Located the ISO image, Configure OS Type and Version (linux, Redhat 6 in my case, I already downloaded the OS installation ISO and placed it into ODA X6-2M earlier.) -- clicked next:)

Configured CPU and memory -- clicked next:)

Configured the VM's local disks and their sizes. (on the ACFS volume created previously) --clicked next:)

Lastly, selected the network device : (br1 - the bridge in my case) -- this case clicked the Finish button:)

I did these things 2 times, because I had to have 2 apps virtual machines on ODA.

After creating the virtual machines, I started them using virt-manager and they were booted with Oracle Linux 6 installation media. I used the console that comes with the virt-manager to install the OS and then directly started using the Apps Nodes without any problems. (after configuring the network, their IPs, ofcourse)



At the end of the day, I got myself a virtualized ODA X6-2M.
This virtualization was a little different than the Oracle VM Server virtualization that we had in the earlier releases of ODA.

In ODA X6-2M, we use KVM ... So, there is no ODA_BASE, we just place our databases directly on ODA nodes and create our guest machines for the Apps Tier nodes.

In short, apps nodes are running on VMs on top of KVM, and databases are running directly on ODA nodes. (so they are running on Bare Metal) .. ("as instructed by Oracle")

We still have capacity on demand, both for databases and virtual machines.



This was an interesting work for me.. After all these years dealing with Oracle VM Server, I configured a new virtualized ODA with a different virtualization technology. Anyways, I liked it and found it as a good and an easy virtualization solution.

We will also see its performance in the couple of days...

Monday, October 9, 2017

About my tech reviews

I like doing tech reviews and already did a couple of them in ITCentralStation.com. (I reviewed Exadata and Oracle Linux, a few months ago..)


Recently, ITCentralStation.com sent me my Top 5 contributors badge and it reminded me that, it is the time for making another review :)

Currently, I m considering to review ODA X6 , ODA KVM or Oracle EBS 12.2, but we'll see..
Once it is ready, I will update you with the link...

Friday, September 29, 2017

FMW -- Starting/ Stopping a 2 Node Forms&Reports 12C Cluster with a single command. SCRIPTS.. Automated start/stop for High Available FMW environments

Hello everyone,

today, I want to share 2 scripts that I have written for starting and stopping a 2 Node Forms&Reports 12C Cluster in one-go. This scripts makes our lifes easy, as they provide an automated way of controlling FMW 12C cluster components.

Using these scripts, the admin can start all the services running across 2 nodes with a single command by connecting to a single node.

Using these scripts, an admin can connect to a node (primary node) and start/stop all the Forms&Reports services (including OHS instances, managed server ,admin server etc) with only running one simple command.

In addition to that, I wrote these scripts by taking the dependencies between the components into account. That is, if a component is dependent to another component , that dependent component is started after the component, that it depends on.

Likewise, if a component is dependent on another component, that dependent component is stopped before the component that it depends on.

Before running these script, we configure the ssh equivalency between node1 and node2. The environment that I wrote this script was Solaris and it was very easy to enable the ssh equivalency between the FMW OS users. Anyways, the same method for enabling the ssh equivalency works for Linux as well.

So, we enable ssh equiv. because the scripts connects from node1 to node2 using ssh.

Actually, one way ssh equiv is enough.. (node1 should be able to connect to node2 using ssh --without password)

In addition to that, there is also one other requirement. That is, we create a directory called /<your_mount_point>/startstop_script and put our script files there. (we create this directory both on node1 and node2)

In order to start the services , we use the script called FRMRP_START.sh

For stopping the services, we use the script called FRMRP_STOP.sh

So, we only execute "sh FRMRP_START.sh" command to start all the services. (to start the services both on node1 and node2)

Similarly, we only execute " sh FRMRP_STOP.sh" command to stop all the services. (to stop the services both on node1 and node2)

Pretty handy right? :) We just execute a script , we wait a little and our full WLS stack including High Available Forms and Reports services are started/stopped. no need to remember the nohup commands, no need to create multiple ssh connections, no need to connect to the weblogic console for starting the managed servers, no need to remember the command for starting the ohs instances and so on... no need to spend energy while starting/stopping multiple FMW components across multiple server nodes:)

In order to start all the services (across a 2 node - FMW Forms&Reports 12C cluster)
We connect to node 1 using FMW OS user
We cd to the directory where our scripts are located -> /uo1/startstop_scripts
We execute FRMRP_START.sh

In order to stop all the services (across a 2 node - FMW Forms&Reports 12C cluster)
We connect to node 1 using FMW OS user
We cd to the directory where our scripts are located -> /u01/startstop_scripts
We execute FRMRP_STOP.sh

The codes of the scripts are as follows;

--Note that, the directory paths used in these scripts should be modified according to your env..

Alternatively, the scripts can be enhanced to make use of of env variables or bash script variables rather than using direct directory paths. I was on the field and wrote these scripts there.. I could actually wrote them better to take the direct path dependencies and the ssh equivalency requirements away, but still these scripts are okay and they are already tested & used in a production environment.

Also note that, there are 3 phyton scripts that you will see below. These phyton scripts are internally executed by FRMRP_START.sh and FRMRP_STOP.sh scripts. So these phyton scripts should also be located in the script directory (/<your_mount_point>/startstop_script).

FRMRP_START.sh script:

#Set the domain env.

. /u01/FMWHOME/oracle_home/user_projects/domains/base_domain/bin/setDomainEnv.sh

# Starting NodeManager 1 on node1
echo Starting Node Manager 1
nohup /u01/FMWHOME/oracle_home/user_projects/domains/base_domain/bin/startNodeManager.sh > /tmp/nohup_nodemanager.out 2>&1 &

# Starting NodeManager 2 on node2
echo Starting Node Manager 2
ssh <node2hostname> '/u01/FMWHOME/oracle_home/oracle_common/common/bin/wlst.sh /u01/startstop_scripts/startnodemgr2.py'



# Starting WebLogic Admin Server
echo Starting Admin Server
echo We just wait here for 60 secs
nohup /u01/FMWHOME/oracle_home/user_projects/domains/base_domain/bin/startWebLogic.sh > /tmp/nohup_adminserver.out 2>&1 &
sleep 60

# Starting the managed servers on Node 1
echo Starting the managed servers on Node 1
nohup /u01/FMWHOME/oracle_home/user_projects/domains/base_domain/bin/startManagedWebLogic.sh WLS_FORMS > /tmp/nohup_wlsforms.out 2>&1 &
nohup /u01/FMWHOME/oracle_home/user_projects/domains/base_domain/bin/startManagedWebLogic.sh WLS_REPORTS > /tmp/nohup_wlsreports.out 2>&1 &

#Starting the managed servers on Node 2
echo Starting the managed servers on Node 2
ssh <node2hostname> 'nohup /u01/FMWHOME/oracle_home/user_projects/domains/base_domain/bin/startManagedWebLogic.sh WLS_FORMS1 > /tmp/nohup_wlsforms1.out 2>&1 &'
ssh <node2hostname> 'nohup /u01/FMWHOME/oracle_home/user_projects/domains/base_domain/bin/startManagedWebLogic.sh WLS_REPORTS1 > /tmp/nohup_wlsreports1.out 2>&1 &'

# Starting Web Tier OHS1
echo 
Starting Web Tier OHS1
/u01/FMWHOME/oracle_home/user_projects/domains/base_domain/bin/startComponent.sh ohs1 


Starting Web Tier OHS2
echo 
Starting Web Tier OHS2
/u01/FMWHOME/oracle_home/oracle_common/common/bin/wlst.sh /u01/startstop_scripts/startohs2.py

echo Script completed.
echo The logs are under /tmp.. nohup_* files.

Note that, FRMRP_START.sh script needs 2 additional/helper scripts in order to be able to run successfully. See below->

Helper Scripts for FRMRP_START.sh:

These scripts were written with phyton and they were written to be executed by WLST. They are for starting nodemanager and OHS instances remotely. (for starting node2's nodemanager and OHS from node1)


starthos2.py script  (Located on node1)

nmConnect('nodemanager',xxxxx,'node2.oracle.com','5556','base_domain','/u01/FMWHOME/oracle_home/user_projects/domains/base_domain','ssl');
nmStart(serverName='ohs2', serverType='OHS');
exit();


startnodemgr2.py script (Located on node2) 

startNodeManager(verbose='true',NodeManagerHome='/u01/FMWHOME/oracle_home/user_projects/domains/base_domain/nodemanager',ListenPort='5556',ListenAddress='xxxxx.node2.oracle.com')
exit()

FRMRP_STOP.sh script:

# Set the domain environment
. /u01/FMWHOME/oracle_home/user_projects/domains/base_domain/bin/setDomainEnv.sh

# Stopping Managed Servers on node1
echo 
Stopping Managed Servers on node1
nohup /u01/FMWHOME/oracle_home/user_projects/domains/base_domain/bin/stopManagedWebLogic.sh WLS_FORMS > /tmp/nohup_wlsforms.out 2>&1 &
nohup /u01/FMWHOME/oracle_home/user_projects/domains/base_domain/bin/stopManagedWebLogic.sh WLS_REPORTS > /tmp/nohup_wlsreports.out 2>&1 &

Stopping Managed Servers on node2
echo 
Stopping Managed Servers on node2
ssh node2hostname 'nohup /u01/FMWHOME/oracle_home/user_projects/domains/base_domain/bin/stopManagedWebLogic.sh WLS_FORMS1 > /tmp/nohup_wlsforms1.out 2>&1 &'
ssh node2hostname 'nohup /u01/FMWHOME/oracle_home/user_projects/domains/base_domain/bin/stopManagedWebLogic.sh WLS_REPORTS1 > /tmp/nohup_wlsreports1.out 2>&1 &'

# Stopping Web Tier OHS1
echo 
Stopping Web Tier OHS1
/u01/FMWHOME/oracle_home/user_projects/domains/base_domain/bin/stopComponent.sh ohs1

Stopping Web Tier OHS2
echo stopping Web Tier OHS2 using WLST in foreground.
/u01/FMWHOME/oracle_home/oracle_common/common/bin/wlst.sh /u01/startstop_scripts/stopohs2.py

# Stopping Node Manager 1 on node1

echo Stopping Node Manager 1 on node1
nohup /u01/FMWHOME/oracle_home/user_projects/domains/base_domain/bin/stopNodeManager.sh > /tmp/nohup_nodemanager.log 2>&1
# Stopping Node Manager 2 on node2

echo 
Stopping Node Manager 2 on node2
ssh node2hostname '/u01/FMWHOME/oracle_home/user_projects/domains/base_domain/bin/stopNodeManager.sh'

# Stopping Weblogic Admin Server
echo Stopping Weblogic Admin Server
nohup /u01/FMWHOME/oracle_home/user_projects/domains/base_domain/bin/stopWebLogic.sh > /tmp/nohup_adminserver.out 2>&1

echo Script completed.
echo Check /tmp for the script logs.. nohup_* files.

Helper Script for FRMRP_STOP.sh:

This script was written with phyton and it was written to be executed by WLST. This script is for stopping the OHS instance remotely. (for stopping node2's OHS instance from node1)

stopohs2.py script

nmConnect('nodemanager','xxxx','forms02.oracle.com','5556','base_domain','/u01/FMWHOME/oracle_home/user_projects/domains/base_domain','ssl');
nmKill(serverName='ohs2', serverType='OHS');
exit();

Thursday, September 28, 2017

EBS 11i -- Could not initialize class oracle.apps.fnd.common.Guest / a different kind of a problem and an easy fix.

Nowadays, I'm dealing with an EBS 11i-EXADATA migration. I have solved some pretty interesting issues during the way and wanted to share one of them with you.
It was encounted while we were migrating the TEST environment to Exadata.

The issue was started at the point, where the Dba applied 11i.ATG_PF.H.RUP7 (as a prereq for the migration)

Not only the Jserv logs (In EBS 11i, we have Jserv) and Oacore logs, but even the login page was complaining about the Guest class.

The error that we saw, was "Could not initialize class oracle.apps.fnd.common.Guest" and no matter what we did, we could not fix it. ( The error was documented in MOS, but the solution documented there didn't fix the problem)

So this issue was a a little different and that's why made me to jump to the code and analyze the Guest class.

The error text made me think that, there could be a classpath problem or there can be a class-permission problem , but the actual problem was suprisingly weird :)

I saw that, Guest class was written to execute fnd_web_sec.get_guest_username_pwd (it was enclosed with a begin-end).

So I checked the database and saw that the fnd_web_sec package had no function named get_guest_username_pwd.

The get_guest_username_pwd function seemed to be delivered with 11i.ATG_PF.H.RUP7(or any other patches along the way) and I concluded that there was sychronization problem between the apps code and the db code..

Apps code was expecting the get_guest_username_pwd but db code had no function named  get_guest_username_pwd .

At this time, I concluded that this was a db level problem and I also concluded that "could not initialize class" and "java.lang.NoClassDefFoundError" errors were misleading.. (they were the results, not the cause)

When I analyzed the issue and investigated the issue by asking the DBA, I found out that, after the patch application, they recreated this fnd_web_sec package with its former code. 
They said "we did it, because we had another custom plsql which was dependent on fnd_web_sec and that custom plsql could not work with the new version of the fnd_web_sec."

At this point, I recreated the fnd_web_sec by taking its code from another RUP7 environment and -told them to not to modify standard codes..  The missing function was there...

I told them to modify their custom code to be aligned with the changes in standard codes.

At the end of the day, we have dealed with a basic problem, but its reason could not be found easily. (a hard to solve basic problem, isn't it :)

The lesson  learned for the customer and that dba was;
  • Never touch the standard code.
  • Analyze patches to be applied before the applying them and test your customizations if you suspect that your customizations can be affected.
  • Document your customizations and check them after applying any patches.
  • Modify your custom code when a standard code that it is dependent on, changes.

Wednesday, September 27, 2017

EBS R12 -- XML publisher -- java.lang.OutOfMemoryError, the definitions of recommended properties

For big sized reports, Oracle recommends settings the following properties for XML publisher..
This is especially, when you encounter java.lang.OutOfMemoryError (Usually OPP gets its).

Set the following properties from XML Publisher Administration:

Responsibility=>Administration UI 

General => Temporary directory => \tmp 
This could be any directory with full read and write access 

FO Processing=> 
Use XML Publisher's XSLT processor =>true 
Enable scalable feature of XSLT processor=> true 
Enable XSLT runtime optimization=>true
2. The above properties can be set to "xdo.cfg" as well.

<property name="xslt-xdoparser">True</property>
<property name="xslt-scalable">True</property>

<property name="xslt-runtime-optimization">True</property>

Some of my followers asked about their definitions, and here they are:
  • Enable XSLT runtime optimization: When set to "true", the overall performance of the FO processor is increased and the size of the temporary FO files generated in the temp directory is significantly decreased. 
  • Use XML Publisher's XSLT processor: Controls XML Publisher's parser usage. If set to False, XSLT will not be parsed.
  • Enable scalable feature of XSLT processor: Controls the scalable feature of the XDO parser. The property "Use BI Publisher's XSLT processor" must be set to "true" for this property to be effective.

Tuesday, September 26, 2017

EBS 11i - compiling jsps, just a little info -> not a valid class_dir directory

We know that, we can compile jsps in EBS 11i manually. (by using perl -x $JTF_TOP/admin/scripts/ojspCompile.pl —compile —quiet)

We also know that, in EBS 11i; we can clear the jsp cache by deleting the _pages directory located in $COMMON_TOP.

However, there is a little important thing that we need to know, while planning to take these 2 actions.

That is, you can't just the clear jsp cache and then directly compile the jsps.

This is because osjpCompile.pl wants the $COMMON_TOP/_pages/_oa__html directory to be present, as it is designed to get this directory as its class_dir.

So, if we clear jsp cache (by running rm -fR $COMMON_TOP/_pages) and then run the osjpCompile.pl immediately, we end up with the following;

identifying apache_top.../TEST/testora/iAS
identifying apache_config_top.../TEST/testora/iAS
identifying java_home.../usr/java/jdk1.6.0_23
identifying jsp_dir.../TEST/testcomn/html
identifying pages_dir.../TEST/testcomn
identifying classpath...file:///TEST/testora/iAS/Apache/Jserv/etc/jserv.properties
"not a valid class_dir directory: (/TEST/testcomn/_pages/_oa__html)"


Well.. As seen above, we need to have the jsp cache to run the osjpCompile.pl.

In order to have our jsp cache back, we start apache and then using our browser; we reach the login page (reaching it once is enough)

After that, we see our  $COMMON_TOP/_pages/_oa__html directory is created. At this point; we can run osjpCompile.pl without any errors.

This was the tip of the day. I hope you will find it useful.

Wednesday, September 20, 2017

Problem installing Oracle FMW 12 - Error - CFGFWK-64254, ONS related error, oracle.jdbc.fanEnabled=false

Today, I was doing a 2 node Forms & Reports 12.2.1.3 Cluster on Solaris 11.3 Sparc 64 bit. and during the config.sh run, I encountered "CFGFWK-64254 error during OPSS Processing" phase execution.
The underlying error was "java.lang.IllegalArgumentException: ONS configuration failed"..
It was clearly related with RDBMS ONS. (Oracle Notification Service), but the database environment where I created the RCU schemas(forms and reports schemas), was a single node db environment and it was not configured with ONS.
So the error was unexpected and probably it was a bug. It was not documented and it motivated me for finding the fix.
The installer of Forms 12.2.1.3 ( or lets FMW) was, however; wanted to use ONS and it insisted on it..
In the previous config.sh screens, I actually did find a workaround for it.. That is, I could use the FAN related argument in those screens as those screens had textboxes for supplying java arguments.. (oracle.jdbc.fanEnabled=false)

However, when you fill all the config.sh installation forms and press the button "create", you can not use this workaround as there is nowhere to supply this java argument and you ended up with these ONS related errors.

The workaround ( in my opinion, it is a fix / it  is a patch) that for this is, to supply this argument in the config_internal.sh. (config.sh indirectly executes config_internal.sh)

What I did was to modify the config_internal.sh to include -Doracle.jdbc.fanEnabled=false
Ofcourse, I wrote it in the right place/line in that script and make the java use it.
This fixed the problem.
Tested and verified. :)

Monday, September 18, 2017

EBS 12.2 -- NFS-based Shared Application Filesystem -- What to do when the first node is down?

I already wrote a blog post about an important point to be considered when building a Shared Application Filesystem using NFS. (http://ermanarslan.blogspot.com.tr/2017/08/ebs-122-important-point-to-be.html)
This point should be considered especially, when we export the NFS shares from the first apps node and mount them from the second node.  (as instructed in Sharing The Application Tier File System in Oracle E-Business Suite Release 12.2 (Doc ID 1375769.1) )

That is, in such a multi node shared application filesystem configuration; when our 1st node where the NFS mounts are hosted, is down, our EBS apps tier services gets down. 
This is an expected behaviour. It is caused by the first node being a single point of failure.So, if it goes down, the NFS shares go with it. 

However, we should be able to start our EBS apps tier services on the surviving nodes, right? 
This is an important thing, because the problem in the first node may not be resolved quickly.. 

Well. Here is the things that we should do to start the EBS apps tier services on the second apps node ,in such a scenario  ->

Note : these steps are for NFS-based shared application filesystem.

1) Map the apps luns to the second(surviving) node: This is a storage and OS tier operation. The luns that apps filesystem resides should be mapped to and mounted on the second node. 

2) Update the second node's apps tier context file and run autoconfig on the secondary apps node. 
There are 3 context value updates are neccessary : s_webhost, s_login_page and s_external_url.. This is because, these context file attributes is set to appstier1 by default.. "However, if we already implemented the Load Balancer configuration, then this means that these updates are already done and there is no need to do anyting in this step".

s_webentryhost  : appstier2
s_login_page : http://appstier2.company.com:8050/OA_HTML/AppsLogin on Application Server 2
s_external_url : http://appstier2.company.com:8050/

Note: modify the above apps node name (appstier) according to your second apps node's hostname..

3) Start the apps tier services using adstrtal.sh , but using the msimode argument.
($ADMIN_SCRIPTS_HOME/adstrtal.sh -msimode)

msi means managed server independence.. As the first node is down, our Admin server is down, so the managed servers (like oacore) can not be started on the second node unless using the msimode argument.. 
Without the msimode, managed servers will try to reach the admin server for reading their configuration and they will fail.. Without msimode, we see errors like "ERROR: Skipping startup of forms_server2 since the AdminServer is down", while executing the adstrtal.sh.

Here is the defition of msi mode (from Oracle):
When a Managed Server starts, it tries to contact the Administration Server to retrieve its configuration information. If a Managed Server cannot connect to the Administration Server during startup, it can retrieve its configuration by reading configuration and security files directly. A Managed Server that starts in this way is running in Managed Server Independence (MSI) mode.

Well..  As you see, in a NFS-based Shared application filesystem env, there are max 3 things to do for starting the apps tier services on the second node (supposing the first is crashed, down)

I tested this approach and it took me 15 minutes to complete.. Ofcourse it is dependent on the storage mapping and bunch of other factors but, it is certain that, there is a downtime there..

That's why , I recommend using non-shared APPL_TOP or shared APPL_TOP with ACFS filesystem or shared APPL_TOP with NFS shares that are coming directly from the storage :)