11 September 2015

Upgrade from 11.2 to 12.1 with just 24 seconds downtime

Rolling upgrade with Transient Logical Standby is known as a MAA (Maximum Availability Architecture) technique, to minimize downtime during upgrade of Oracle database. 

The white paper: Database Rolling Upgrade Using Transient Logical Standby: Oracle Data Guard 11g has been available for quite some time. But the steps involved in testing this technique require a lot of skills, patience hardware and experience, aka blood sweat and tears.

  • Important limitations are, that you need to be able to install the old and the new Oracle software on both nodes. For instance, you can install Oracle 10.2 on Oracle Linux 6.4, but that is not supported (follow OraToolkit if you need to do this). So search for a platform that supports both versions.
  • You might have unsupported data types, read the white paper (above) to check this.
  • You cannot use Dataguard Broker during this setup.

Last week I finally got this working on Virtualbox on my laptop. I give several hints how.

Create two nodes in Virtualbox
Create two Oracle Linux 6.4 hosts ‘london’ and ‘paris’. Make sure they can ssh and talk to each other. Provide ssh keys so that copying is easy.
Setup Oracle and Oracle on both nodes.
Setup a listener and demo database on ‘london’.

Setup standby

Test the physical standby, make sure that logs are applied (which may take a few minutes to start I experienced). You might want to use blog: http://sys-admin.wikidot.com/check-dataguard

Make sure you have a large db_recovery_file_dest_size on both instances. The restore point will require this space during database upgrade.

Test the switchover (and back), you might want to use blog: http://www.oracledistilled.com/oracle-database/data-guard-switchover-to-a-physical-standby for this.

Run the preupgrd.sql from $ORACLE12/rdbms/admin  and resolve problems if any.

Use the physru.sh script
Via the note Oracle11g Data Guard: Database Rolling Upgrade Shell Script (Doc ID 949322.1) you can download the physru.sh script. How this is used in a practical manner, is explained in a blog Minimal downtime rolling database upgrade to 12c Release 1 by Gavin Soorma. Follow this note and you will execute the physru.sh three times (from the ‘london’ Primary host). Gavin explains (in detail) how this works (copied the following text from his blog):

First execution
  • Create control file backups for both the primary and the target physical standby database
  • Creates Guaranteed Restore Points (GRP) on both the primary database and the physical standby database that can be used to flashback to beginning of the process or any other  intermediate steps along the way.
  • Converts a physical standby into a transient logical standby database.

Second execution
  • Use SQL apply to synchronize the transient logical standby database and make it current with the primary
  • Performs a switchover to the upgraded 12c transient logical standby and  the standby database becomes the primary
  • Performs a flashback on the original primary database to the initial Guaranteed Restore Point  and converts the original primary into a physical standby

Third execution
  • Starts Redo Apply on the new physical standby database (the original primary database) to apply all redo that has been generated during the rolling upgrade process, including any SQL statements that have been executed on the transient logical standby as part of the upgrade.
  • When synchronized, the script offers the option of performing a final switchover to return the databases to their original roles of primary and standby, but now on the new 12c database software version.
  • Removes all Guaranteed Restore Points

The results are displayed after the Third execution of physru.sh. As you can see, the whole process took a lot of time (about 6 hours) mainly because my laptop was running out of space. In the end, the steps were succesfully completed, with a service downtime of just 24 seconds.

Second attempt
A few days later, I retried the technique. Upgrade went much smoother, and also switched back at the end. This will give you additional downtime (switchover), in the screenshot below, seen as 19 seconds. Total procedure of upgrade took just over 1 hour, which might be useful for those situations that require maximum availability.

No comments:

Post a Comment