Monday, September 18, 2017

EBS 12.2 -- NFS-based Shared Application Filesystem -- What to do when the first node is down?

I already wrote a blog post about an important point to be considered when building a Shared Application Filesystem using NFS. (http://ermanarslan.blogspot.com.tr/2017/08/ebs-122-important-point-to-be.html)
This point should be considered especially, when we export the NFS shares from the first apps node and mount them from the second node.  (as instructed in Sharing The Application Tier File System in Oracle E-Business Suite Release 12.2 (Doc ID 1375769.1) )

That is, in such a multi node shared application filesystem configuration; when our 1st node where the NFS mounts are hosted, is down, our EBS apps tier services gets down. 
This is an expected behaviour. It is caused by the first node being a single point of failure.So, if it goes down, the NFS shares go with it. 

However, we should be able to start our EBS apps tier services on the surviving nodes, right? 
This is an important thing, because the problem in the first node may not be resolved quickly.. 

Well. Here is the things that we should do to start the EBS apps tier services on the second apps node ,in such a scenario  ->

Note : these steps are for NFS-based shared application filesystem.

1) Map the apps luns to the second(surviving) node: This is a storage and OS tier operation. The luns that apps filesystem resides should be mapped to and mounted on the second node. 

2) Update the second node's apps tier context file and run autoconfig on the secondary apps node. 
There are 3 context value updates are neccessary : s_webhost, s_login_page and s_external_url.. This is because, these context file attributes is set to appstier1 by default.. "However, if we already implemented the Load Balancer configuration, then this means that these updates are already done and there is no need to do anyting in this step".

s_webentryhost  : appstier2
s_login_page : http://appstier2.company.com:8050/OA_HTML/AppsLogin on Application Server 2
s_external_url : http://appstier2.company.com:8050/

Note: modify the above apps node name (appstier) according to your second apps node's hostname..

3) Start the apps tier services using adstrtal.sh , but using the msimode argument.
($ADMIN_SCRIPTS_HOME/adstrtal.sh -msimode)

msi means managed server independence.. As the first node is down, our Admin server is down, so the managed servers (like oacore) can not be started on the second node unless using the msimode argument.. 
Without the msimode, managed servers will try to reach the admin server for reading their configuration and they will fail.. Without msimode, we see errors like "ERROR: Skipping startup of forms_server2 since the AdminServer is down", while executing the adstrtal.sh.

Here is the defition of msi mode (from Oracle):
When a Managed Server starts, it tries to contact the Administration Server to retrieve its configuration information. If a Managed Server cannot connect to the Administration Server during startup, it can retrieve its configuration by reading configuration and security files directly. A Managed Server that starts in this way is running in Managed Server Independence (MSI) mode.

Well..  As you see, in a NFS-based Shared application filesystem env, there are max 3 things to do for starting the apps tier services on the second node (supposing the first is crashed, down)

I tested this approach and it took me 15 minutes to complete.. Ofcourse it is dependent on the storage mapping and bunch of other factors but, it is certain that, there is a downtime there..

That's why , I recommend using non-shared APPL_TOP or shared APPL_TOP with ACFS filesystem or shared APPL_TOP with NFS shares that are coming directly from the storage :)

4 comments :

  1. Dear Erman,

    Can i know expected behavior in shared appl top with load balancer. If primary goes down, application will also go down?

    ReplyDelete
  2. Yes. If you follow the MOS document : 1375769.1, then your NFS shares will be hosted by the primary apps node.. If primary apps node is down, then this means EBS services are down..In such a case, you need to take some manual actions to make the services start on the second apps node again.

    However, if you build your own environment using a cluster filesystem or using the NFS shares which come directly from a storage or from a fault tolerant NFS server, then this is anothor case... In this case, if you have a load balancer setup, then your apps services will still be alive even if your primary apps node goes down.

    ReplyDelete
  3. Again, if you follow note 1375769.1 and build a shared application fs based on what is written there, then If primary apps node is down, then this means EBS services are down. If that's the case, you can use the things explained in the blog post above to make the apps services start on the secondary node.

    ReplyDelete