Grid Infra – ASM 11.2.0.3 bug on Solaris – slice 2 CANDIDATE

We recently ran into a bug (at least that how I look at it) that is introduced with ASM 11.2.0.3. As of this version, ASM shows slice 2 (the complete disk, starting a cylinder 0 thus inlcuding the VTOC) as an CANDIDATE disk in ASM. In previous versions of ASM (Grid Infra) only slices that started from cylinder 3 (or higher) where shown as CANDIDATE disks in ASM, so you couldn’t accidently select slice 2 to be used for a diskgroup.

In ASM 11.2.0.3 it is possible to add the slice 2 (for example *c0t0d0s2) to an ASM diskgroup while another slice of the same disk, starting at cylinder 3, is already part of the same or even another diskgroup. You can understand what a mess this gives!
Example partition table of solaris disk, where until 11.2.0.3 you would only see partition (slice) 0, starting at cylinder 3 (see MOS note: ASM Does Not Discover Disk on Solaris [ID 368840.1]).

Current partition table (original):
Total disk cylinders available: 13651 + 2 (reserved cylinders)

Part      Tag    Flag     Cylinders         Size            Blocks
  0       root    wm       3 - 13650       49.98GB    (13648/0/0) 104816640
  1 unassigned    wu       0                0         (0/0/0)             0
  2     backup    wu       0 - 13650       49.99GB    (13651/0/0) 104839680
  3 unassigned    wm       0                0         (0/0/0)             0
  4 unassigned    wm       0                0         (0/0/0)             0
  5 unassigned    wm       0                0         (0/0/0)             0
  6 unassigned    wm       0                0         (0/0/0)             0
  7 unassigned    wm       0                0         (0/0/0)             0

ASM (11.2.0.3) discovers the following disks now. Take a good look at the disk number of the MEMBER disks and CANDIDATE disks:

SQL> select path,header_status from v$asm_disk order by 1,2;

PATH                                               HEADER_STATUS
-------------------------------------------------- ---------------
/dev/rdsk/c3t60060E8005492800000049280000319Dd0s0  MEMBER
/dev/rdsk/c3t60060E8005492800000049280000319Dd0s2  CANDIDATE
/dev/rdsk/c3t60060E8005492800000049280000319Ed0s0  MEMBER
/dev/rdsk/c3t60060E8005492800000049280000319Ed0s2  CANDIDATE
/dev/rdsk/c3t60060E8005492800000049280000319Fd0s0  MEMBER
/dev/rdsk/c3t60060E8005492800000049280000319Fd0s2  CANDIDATE
/dev/rdsk/c3t60060E800549280000004928000031A0d0s0  MEMBER
/dev/rdsk/c3t60060E800549280000004928000031A0d0s2  CANDIDATE
/dev/rdsk/c3t60060E800549280000004928000031A6d0s0  MEMBER
/dev/rdsk/c3t60060E800549280000004928000031A6d0s2  CANDIDATE
/dev/rdsk/c3t60060E800549280000004928000031A7d0s0  MEMBER
/dev/rdsk/c3t60060E800549280000004928000031A7d0s2  CANDIDATE
/dev/rdsk/c3t60060E800549280000004928000031A8d0s0  MEMBER
/dev/rdsk/c3t60060E800549280000004928000031A8d0s2  CANDIDATE
/dev/rdsk/c3t60060E800549280000004928000031A9d0s0  MEMBER
/dev/rdsk/c3t60060E800549280000004928000031A9d0s2  CANDIDATE
/dev/rdsk/c3t60060E800549280000004928000031CBd0s0  MEMBER
/dev/rdsk/c3t60060E800549280000004928000031CBd0s2  CANDIDATE

I have created an SR for this problem, but until then you’ll have the following options:

  1. Set the ASM_DISKSTRING parameter to /dev/rdsk/*s0 (or whatever slice you normally use)
  2. Make sure the user grid does not have any privileges on the /dev/rdsk/*s2 devices.

Update 26-02-2013:
Oracle Support finally came back with an answer and a patch. The most important thing is that they agree that this is actually a bug and created a bug (bug nr. 14577881) for it. Support tells me that the problem is “really” fixed in Oracle 12.2 (when even 12.1 is not even released yet ;-)) but they have created a backport request for it. I just got back that the backport is available as a one-off patch downloadable as patch 14577881 for 11.2.0.3.

Posted in Grid Infrastructure, Uncategorized | Tagged , , , , , , , , , , , , , , , , | 2 Comments

Oracle Database Appliance – Installing Bundle Patch 2.3.0.0.0

On 23 july 2012 ODA bundle patch 2.3.0.0.0 was released, a long awaited version which should (and indeed does) include multi-home support. As with each new bundle patch, in this version the patch process is getting more robust and the long awaited support for rolling upgrade of the Grid Infrastructure and RAC databases is implemented. Unfortunately the installation of the infrastructure part still requires downtime of the complete cluster stack and rebooting of both nodes (although you can skip the reboot part of the infra patch and reboot the nodes at a more convenient time).

At the beginning I was very surprised how well this new bundle patch installed on one of our test ODA’s. It installed without any problems and in a couple of hours I brought this ODA from 2.1.0.3.1 to 2.3.0.0.0 (2.1.0.3.1 -> 2.2.0.0.0 – only infra + Grid and 2.2.0.0.0 -> 2.3.0.0.0) including database upgrade. Unfortunately after this successfull installation and upgrade of 7 test databases, on the second ODA I patched after that, the first node crashed during patching of the 2.3 infra component. After cleanup and restarting I was able to fix this ODA, but while patching the third ODA, the second node crashed during the installation of the 2.3 infra patch component. Unfortunately cleanup and restarting the patch didn’t help.

On this ODA I now have 1 node with an old BIOS version and even worse, the first node having an applied core configuration key, but on the second node OAK doesn’t know anymore about this applied core configuration key on the first node.

Solution for the core_config_key problem: run the oakcli apply core_config_key <key file> again on the second node. After the reboot (it will only reboot the second node) OAK knows about the key again.

My guess is that the problem lies in the combination of the usage of core configuration keys and the BIOS upgrade that is done during the installation of the infra component/phase of the 2.3 bundle patch. At the moment I have an SR opened with Oracle Support and I can’t patch my other ODA’s until the problem is solved.

Although I have problems with the installation of bundle patch 2.3.0.0.0 I will describe the patching process here, to get some idea of how long things take and what questions you can expect.

Patch installation – component infra

The first component that needs to be patched is the infra component that for bundle patch 2.3.0.0.0 consists of firmware patches (BIOS, ILOM, SAS disks) and of course a new version of OAK. As mentioned, the installation of the infra component requires downtime of the complete cluster (the complete Grid Infrastructure will be shutdown).

Mandatory requirements before installing the 2.3 infra component:

  • Copy the zipfile containing the 2.3 bundle patch (p13982331_23000_Linux-x86-64.zip) to BOTH nodes of the ODA
  • Unpack this patch using the oakcli unpack -package command on BOTH nodes of the ODA
  • You need the passwords (just sudo isn’t enough) for OS user root

After that start the installation of the infra component using the command (run as user root):

 oakcli update -patch 2.3.0.0.0 --infra

It will take around 45 minutes from starting the patch process until both ODA nodes are rebooted and the complete CRS stack and databases are up and running again (time may differ a little depending on the number of databases).

Also make sure to do the checks mentioned in the 2.3 bundle patch release notes on BOTH NODES (run as user root):

dmidecode -t 1 | grep "Serial Number"
fwupdate list disk | grep -A5 CONTROLLER

For the dmidecode command, the serial number of your ODA should be returned on both nodes. For the fwupdate command the output should contain a value in the “BIOS version” column for controllers c1 and c2 on both nodes. If any of these checks fail, reboot the node and check again.

Patch installation – component GI

As of ODA bundle patch 2.3.0.0.0, patching (installation of an PSU) the Grid Infrastructure requires is done in a rolling fashion. So unlike the infra component all RAC databases will be available during the patch process. The process will first patch the Grid Infra environment on ODA node 1 (SC0) and when it is finished and all instances are started again, it will start patching the Grid Infrastructure on ODA node 2 (SC1).

Requirements before installing the 2.3 GI component:

  • You need the passwords (just sudo isn’t enough) for OS users root and grid

After that start the installation of the gi component using the command (run as user root):

 oakcli update -patch 2.3.0.0.0 --gi

It will take around 30 minutes for the Grid Infrastructure patch to complete and as mentioned doesn’t require downtime for RAC databases.

Patch installation – creating 11.2.0.3.3 dbhome

Because I didn’t have my ODA’s running on ODA version 2.2.0.0.0 yet, mainly because this bundle patch introduced a very nasty bug with crashing 11.2.0.2.x (RAC) databases, I had to install mandatory parts of this bundle patch first to be able to install the 2.3 bundle patch. I decided to only install the infra and GI component of the 2.2 bundle patch and use the new 2.3 command oakcli create dbhome command to create a new 11.2.0.3 oracle home and use the oakcli upgrade database command to upgrade my databases from 11.2.0.2.5 (ODA 2.1.0.3.1) to 11.2.0.3.3 (ODA 2.3.0.0.0). Using the database upgrade functionality of ODA 2.3 also doesn’t have the known issue of ODA 2.2 that databases with a databases name containing capitals couldn’t get upgraded.

Requirements for creating an Oracle home (11.2.0.3.3):

  • You need the passwords (just sudo isn’t enough) for OS users root and oracle
  • You need the password for user SYS on the ASM instances

To create a new 11.2.0.3.3 Oracle database home use the following command (run as user root):

create dbhome -version 11.2.0.3.3

It takes around 6 minutes to create this new Oracle home.

Patch installation – upgrading databases

After creating the new 11.2.0.3.3 Oracle home for databases I upgraded all databases running Oracle 11.2.0.2.5 to Oracle 11.2.0.3.3 using the oakcli upgrade database command.

Requirements for upgrading databases using the oakcli upgrade database command:

  • You need the passwords (just sudo isn’t enough) for OS users root and oracle
  • A list of all available dbhomes
    Use the command oakcli show dbhomes
  • A list of all available databases
    Use the command oakcli show databases

To upgrade 1 database use the following command (run as user root):

oakcli upgrade database -db <dbname> -to <destination home name>

Example:
oakcli upgrade database -db mcldbts1 -to OraDb11203_home1

To upgrade all databases running from one Oracle home (serial process – run as user root):

oakcli upgrade database -from <source home name> -to <destination home name>

Example:
oakcli upgrade database -from testdbb1 -to OraDb11203_home1

The database(s) gets upgraded by oakcli using the DBUA in the background. The logfiles of the upgrade process can be found under the directory structure:

/u01/app/oracle/cfgtoollogs/dbua

Posted in Database Appliance | Tagged , , , , , , , | 2 Comments

Oracle Database Appliance – cleanupDeploy X-windows failure

Whenever something goes wrong during the deployment of an ODA, you can try to fix the problem and restart the deployment from the step where the deployment failed (using the GridInst.pl script). Sometimes however when the deployment fails, restarting the deployment doesn’t work and the only (or fastest) solution is to cleanup and start all over. There is a script called cleanupDeploy.pl which does this for you.

A couple of days ago a colleague of mine needed to use this cleanup script because of an error somewhere in deployment step 15 (RunRootScripts) and the deployment couldn’t be restarted. After using the cleanupDeploy.pl script for cleaning up the failed ODA deployment the ODA was brought back to the pre-deployment state, but he wasn’t able to open an xterm (problem with X-windows) for starting the graphical ODA deployment process.

It took some time before the problem was found, but the problem was that the cleanupDeploy.pl script removed the localhost entry from the /etc/hosts file. Manually adding this line to the /etc/hosts file on both ODA nodes fixed the problem.
So just add the following line to the /etc/hosts file:

127.0.0.1    localhost.localdomain localhost

The problem seams to exist only in 2.1.0.0.0 cleanupDeploy.pl script. Running a cleanupDeploy.pl from the 2.1.0.3.1 image did not remove this line from the /etc/hosts file.

Posted in Database Appliance | Tagged , , , , , , , , , , , , | Leave a comment

Oracle Database Appliance – odachk

With the release of ODA patch bundle 2.2.0.0.0 a tool available on Exadata only, is available for ODA too. This tool named “Exadata Configuration Audit Tool – exachk” is modified and renamed to odachk and is located in /opt/oracle/oak/odachk.

With this tool you can check the configuration your complete ODA environment (numerious checks for OS, GI, RDBMS) and will show you where there are problems or might be problems.

Running the tool

Run the following command as user oracle (full check with verbose option) to start the tool. You will be asked some questions before the checks are executed.

cd /opt/oracle/oak/odachk
./odachk -a -o verbose

There are more command line options that can be found in the User Guide.

Example output

Screendump “odachk” command execution: odachk_testdb_output.txt
HTML file that is generated by “odachk” command: odachk_MCLDB_081512_092757.html

Bugs

At the moment there is at least 1 annoying bug in the odachk tool. The check “One or more software or firmware versions are NOT up to date with OAK repository” will FAIL because odachk will use the software and firmware versions of the OAK 2.1.0.0.0 repository instead of the latest (for now thus 2.2.0.0.0). I’ve raised an SR for this bug and Oracle Support mentioned that this will be fixed as part of the next ODA patch bundle and that this is the reason why they didn’t mention odachk in the MOS note 888888.1.

With the release of ODA bundle patch 2.3.0.0.0 the odachk utilility is officially released. I did some testing with it and the bug mentioned here before is indeed fixed. I have uploaded a new sample HTML report created by this utility and the output while running the command.

Userguide

There is not specific userguide for the odachk tool and the readme for this tool will direct you to My Oracle Support note 1070954.1 (Oracle Exadata Database Machine exachk or HealthCheck) from where you can download the exachk tool including the ExchUserGuide.pdf file.

Oracle Exadata Database Machine exachk or HealthCheck [ID 1070954.1]

Posted in Database Appliance | Tagged , , , , , , | 1 Comment

Oracle Database Appliance – Installing Patch Bundle 2.2

On 17 April 2012 ODA patch bundle 2.2.0.0.0 was released, which includes an Linux kernel upgrade (2.6.18-194.32.1.0.1.e15 to 2.6.32-300.11.1.el5uek – Unbreakable Enterprise Kernel), Oracle Grid Infrastructure patchset 11.2.0.3.2 and Oracle RDBMS patchset 11.2.0.3.2.
Oracle made the installation of various parts of the patch bundle a lot more flexible. Now you can install the infrastructure (OS, firmware, OAK, etc.), GI and RDBMS parts independently from each other. You can even choose to just install the Oracle RDBMS software without upgrading the existing databases and keep them running on the current 11.2.0.2.x patchset. This way you can just install the “infra” part without upgrading the rest. Unfortunately you have to upgrade to GI patchset 11.2.0.3.2 directly after upgrading the “infra” part if you are using ACFS, because ACFS in GI 11.2.0.2.x doesn’t work with the new Linux kernel.

Patch installation – part 1 (infra)

As with the previous ODA patch bundles, the installation of patch bundle 2.2 isn’t rolling and you will have downtime. The first part of the installation (infra) does a shutdown of the complete CRS stack (cluster) and will reboot both nodes at the end (at the same time). Depending on how many databases are running on your ODA, it will take 45 – 60 minutes before everything is running again.

Patch installation – part 2 (GI & RDBMS)

Especially if you are using ACFS (/cloudfs filesystem) you will have to install the Grid Infrastructure (GI) patchset 11.2.0.3.2 as soon as possible. Installation of the GI patchset isn’t rolling, so again the complete CRS stack (cluster) will be shutdown. I have chosen to install the GI patchset and RDBMS patchset (software only) in one “oakcli update -patch 2.2.0.0.0 –gi –database” run and it took around 45 minutes to complete (the cluster resources like databases aren’t available during this period).

Known issues

There are some “know issues” that come with this patch bundle. The most annoying ones I will sum down here:

  • The privileges on the “oracle” executable in the old (11.2.0.2) home become incorrect after installing this patch bundle. So you have to fix them as noted in the patch readme.
  • With the installation of the 2.2.0.0.0 patch bundle you can choose to let all or a selection of databases be upgraded to 11.2.0.3. However there is a bug which prevents databases that have capitalized database names to be upgraded automatically.

Problems I ran into

Although the installation of ODA patch bundle 2.2.0.0.0 went without visible problems, and it fixed a problem with a shared disk (state details for a disk: PredictiveFail), it introduced a new and I think a much bigger problem with the availability of our RAC databases. Whenever I reboot one of the ODA nodes, the RAC instances on the remaining node crash with an ORA-07445 (generated by the LMD0 daemon – Global Enqueue Service Daemon), see the error below. I have an Service Request opened for this and I will update this post whenever the problem is solved.

SKGXP: ospid 16245: network interface with IP address 169.254.117.116 no longer running (check cable)
Exception [type: SIGSEGV, Invalid permissions for mapped object] [ADDR:0x7FEA51FBB592] [PC:0x7FEA548AA5E7, skgxp_local_status_change()+191] [flags: 0x0, count: 1]
Errors in file /u01/app/oracle/diag/rdbms/tstdb1/tstdb11/trace/tstdb11_lmd0_16245.trc  (incident=121697):
ORA-07445: exception encountered: core dump [skgxp_local_status_change()+191] [SIGSEGV] [ADDR:0x7FEA51FBB592] [PC:0x7FEA548AA5E7] [Invalid permissions for mapped object] []

Problem solved!

Finally the problem with the crashing RAC instance (the surviving one) has been solved. You have to upgrade your databases (RDBMS) to 11.2.0.3 or if you need to be running on 11.2.0.2 you will have to apply patch 12628521SKGXP V3.4 – CUMULATIVE FIXES PATCH 6.1 (For description of the bug fixed see MOS note 11711682.8).

The problem is being described as an 11.2.0.2 generic problem, but the problem doesn’t occur on ODA 2.1.0.3.1 so it has be some combination of either the new GI (11.2.0.3) that comes with ODA 2.2.0.0.0 or the kernel/OS upgrade. I did ask Oracle about this, but they didn’t know (wanted to look further into this).

Posted in Database Appliance | Tagged , , , , , , , , , , | 4 Comments

Oracle Database Appliance – Safely usable ASM diskgroup size

Not long ago I got a warning from Oracle Enterprise Manager that the REDO diskgroup on one of our ODA’s exceeded the warning threshold of 75%. Looking at the number of database instances and configured online redo log sizes I couldn’t understand how this was possible while the ODA documentation states that diskgroup +REDO size is 97.3 GB ( 4 * 73 GB / 3 – high redudancy is used on the ODA diskgroups).

I know the divide by 3 is a rough calculation but I only had 39 GB on redo logs in the REDO diskgroup. Withing ASM the diskgroup showed that 158 GB was still free (FREE_MB column of v$asm_diskgroup) so after triple-mirrored leaving 52 GB, but then the column REQUIRED_MIRROR_FREE_MB came into sight which showed that 140 GB was required for mirroring.
ASM documentation is clear about the calculation of the required size for mirroring: The value is the total raw space for all of the disks in the two largest failure groups.

This means for the ODA that the actual “safely” usable REDO diskgroup size can be calculated with: total_mb – required_mirrror_free_mb = 280016 – 140008 = 140008 MB (raw size)  after tripple-mirrorring leaves (140008/3) 45.6 GB which differs by 51.7 GB with the value that Oracle notes in its documentation.

So here is a table with the “safely” usable diskgroup sizes:

Actual diskgroup size (safely usable)

diskgroup internal backup external backup
+DATA 1.41 TB 2.81 TB
+RECO 1.87 TB 0.46 TB
+REDO 45.6 GB 45.6 GB

Oracle documentation size

diskgroup internal backup external backup
+DATA 1.6 TB 3.2 TB
+RECO 2.4 TB 0.8 TB
+REDO 97.3 GB 97.3 GB

Of course you can use more space, effectively resulting in a negative value for USABLE_FILE_MB, but the question if you want this to happen!

Posted in Database Appliance | Tagged , , , , , , , , | Leave a comment

Oracle Database Appliance – Installing Patch Bundle 2.1.0.3.0

We just finished installing ODA Patch Bundle 2.1.0.3.0 on one of our ODA’s….. There goes one of the key features for the ODA – Simple one-button patch installation. As a DBA (and I think all administrators) are very reserved when these kind of statements are made.

First of all applying Patch Bundle 2.1.0.3.0 (and Patch Bundle 2.1.0.3.1, but this one is very small and doesn’t require downtime) patching of one ODA took 3.5 hours to complete, so a pretty long maintenance window for the kind of patches that are part of the patch bundle (ILOM/BIOS and Grid Infrastructure patch).

Unfortunately we ran into different problems when installing Patch Bundle 2.1.0.3.0 on our ODA’s. This post will give you a description of what went wrong:

  • ILOM/BIOS firmware update failed
    Firewall between public network interface and ILOM interface
  • Cluster Ready Service (CRS) got into undefined state, running but didn’t know it was running
    Reduced number of enabled CPU cores (Core Configuration Key)
  • GI (Grid Infrastructure) patch failed
    Oracle Enterprise Manager Grid Control agent running

Continue reading

Posted in Database Appliance | Tagged , , , , , , , , , , , , , , , | Leave a comment

Oracle Database – 11.1.0.7 in extended support

As of 1 September 2012 Oracle database release 11.1 (terminal release 11.1.0.7) will be in Extended Support. Here is list of some database releases with start dates for extended support and sustaining support:

Release Extended Support Start Sustaining Support Start
10.2 August 2010
(fee waived for 1 year)
August 2013
11.1 September 2012 September 2015
11.2 February 2015
(fee waived for 1 year)
February 2018
Posted in Database | Tagged , , , , , , , , , | Leave a comment

Oracle Database Appliance – Critical Patch Bundle 2.1.0.3.1

Oracle found a bug (a Seagate firmware issue) for ODA’s with Patch Bundle 2.1.0.3.0 applied. They highly recommend the installation of patch 13817532 after you’ve applied Patch Bundle 2.1.0.3.0. The problem is described in MOS note 1438089.1, but in short the problem is that a disk failure could trigger a complete system (ODA) shutdown.

One good thing about this patch (of course it comes in the form of a Patch Bundle with version 2.1.0.3.1, that you can only apply AFTER you’ve applied Patch Bundle 2.1.0.3.0 – it is not a cumulative patch) is that it is the first ODA Patch Bundle that doesn’t require downtime! (it only patches OAK).

Posted in Database Appliance | Tagged , , , , , , , | Leave a comment

Oracle Database Appliance – Applying one-off patches

Until now, one of the biggest problems with the ODA was with applying one-off patches on Oracle Homes for high-impact problems. The only supported patching of an ODA was by installing an ODA patch bundle which (could) include Oracle Database PSU’s. If you run into a bug that is fixed by a one-off patch, it is not guaranteed that this one-off patch will be part of the PSU that will be part of the next ODA patch bundle.

Oracle has created a workaround for the problem by allowing the application of one-off patches for problems with a very high impact. This does not mean you are allowed to just install all one-offs you find critical, but Oracle support (after internal discussion with ODA support and/or ODA development) can allow you to install certain one-offs.

The following impacts should be noted if considering applying a patch which does not exist in the ODA patchset

  • Applying a one-off patch may overwrite existing code with unintended consequences
  • One-offs are currently not supported: Test the steps and impact if considering one-offs
  • The patch may be overwritten in future patchset applications
  • The patch may not successfully apply
  • Support may suggest waiting for the full OneCommand patchset

See a full description for applying one-offs on your ODA, the MOS document 1399055.1

Posted in Database Appliance | Tagged , , , , , , , | Leave a comment