We just finished installing ODA Patch Bundle 220.127.116.11.0 on one of our ODA’s….. There goes one of the key features for the ODA – Simple one-button patch installation. As a DBA (and I think all administrators) are very reserved when these kind of statements are made.
First of all applying Patch Bundle 18.104.22.168.0 (and Patch Bundle 22.214.171.124.1, but this one is very small and doesn’t require downtime) patching of one ODA took 3.5 hours to complete, so a pretty long maintenance window for the kind of patches that are part of the patch bundle (ILOM/BIOS and Grid Infrastructure patch).
Unfortunately we ran into different problems when installing Patch Bundle 126.96.36.199.0 on our ODA’s. This post will give you a description of what went wrong:
- ILOM/BIOS firmware update failed
Firewall between public network interface and ILOM interface
- Cluster Ready Service (CRS) got into undefined state, running but didn’t know it was running
Reduced number of enabled CPU cores (Core Configuration Key)
- GI (Grid Infrastructure) patch failed
Oracle Enterprise Manager Grid Control agent running
ILOM/BIOS patch firware update failed
The ILOM/BIOS firmware is updated over LAN (using ipmitool), meaning you should be able to connect from your public network interface to the ILOM interface. If there is a firewall in between (as we do) a firewall rule should be created to allow UDP connections from the SC public network IP address to port 623 on the ILOM interface of the SC (your ILOM IP address for the same SC). This should be done for both SC’s.
If this connection is not possible, patching the ILOM/BIOS (as part of Patch Bundle 188.8.131.52.0) will fail and more problematic is the very long timeout (an hour or so) before the oakcli patch process decides that it isn’t going to work.
(There is a workaround if you don’t get the firewall issue fixed – you still have the very long timeout – that is described in MOS note 888888.1)
You can prevent the first problem by checking if you can connect to the ILOM interface from and ODA SC using the following sample command:
ipmitool -I lanplus -H-U root chasis status
You have to specify the ILOM (root) password and if this command returns the status of the chassis then the network connection is ok.
Cluster Ready Service (CRS) got into undefined state, running but didn’t know it was running
If you have limited the number of enabled CPU cores in your ODA (using a generated “Core Configuration Key”) this patch will reboot your SC’s twice. The first time this is part of the patch procedure, but the second time the reboot will be initiated by the oakd process a couple of minutes ater the first reboot.The reason why oakd reboots your system is that by patching the BIOS (flashing it), oakd finds that the number of enabled CPU cores on your SC (node) doesn’t match the allowed number of cores (a specified by the “Core Configuration Key”) and it will update the BIOS with the correct number of enabled CPU cores and reboots the SC again. Next to the fact that this will takes another 5 – 10 minutes to restart the SC again (x2 because the same thing is going to happen on the second SC), the problem is that after the first reboot oakd starts and does a lot of initializing before it checks the number of enabled CPU cores. At the same time the Cluster Ready Services is starting (and all of its resources) and somewhere during starting CRS, oakd updates the enabled CPU cores and reboots.
Unfortunately CRS didn’t really like the moment of this restart and after the second reboot it failed to start (became in a undefined state where it was running but didn’t know it) and even crashed all of our databases on the second SC.
You cannot predict if you run into the second problem, it depends on when during the CRS startup the system is rebooted by oakd. For example we only had the problem of CRS not starting after the second reboot on our first node (SC0). The problem didn’t occur on the second node (SC1). Also if you haven’t limited the number of enabled CPU cores (by using a “Core Configuration Key”) you won’t run into this problem.
GI (Grid Infrastructure) patch failed
During installation of the component oracle (Oracle separated the OS/Hardware patches and Oracle RDBMS/GI patches) in Patch Bundle 184.108.40.206.0, the Grid Infrastructure PSU patch is installed on both the Oracle RDBMS home and the Oracle GI home. The installation on the RDBMS home works fine, but when the patch is applied on the GI home patching fails.
One of the steps that is required before patching the oracle stack (-component oracle), is to stop dbconsole (that is automatically configured for the database that you installed during deployment of your ODA). This is an important step because the dbconsole process keeps executing crsctl commands (keeping the file crsctl.bin in use) while running which can get you into trouble when the GI is patched.
What the note doesn’t mention is that the same thing goes for Oracle Enterprise Manager Grid Control Agents. It is very important to stop the OEM Grid Control agents on both the nodes (SC’s) before you start patching the Oracle stack. This will prevent you from most likely getting the error message that oakcli wasn’t able to patch the GI home.