Professional Documents
Culture Documents
Systems With WriteBack Smart Flash Cache (WBFC) Enabled Running Into Unnecessary Block Repair During Resilvering Could Cause Data Loss
Systems With WriteBack Smart Flash Cache (WBFC) Enabled Running Into Unnecessary Block Repair During Resilvering Could Cause Data Loss
Systems With WriteBack Smart Flash Cache (WBFC) Enabled Running Into Unnecessary Block Repair During Resilvering Could Cause Data Loss
2]
Oracle Exadata Storage Server Software - Version 11.2.3.2.1 to 11.2.3.2.1 [Release
11.2]
Oracle Database Cloud Schema Service - Version N/A and later
Oracle Database Exadata Cloud Machine - Version N/A and later
Oracle Cloud Infrastructure - Database Service - Version N/A and later
Information in this document applies to any platform.
DESCRIPTION
After failure of a flashdisk on Exadata Storage Servers with Write-back Smart Flash
Cache (WBFC) enabled, ASM resilvering takes too long creating extended exposure
to a second flashdisk failure (or third flashdisk failure if using ASM high redundancy),
which may result in data loss.
OCCURRENCE
Systems with the following configuration are exposed to this behavior and should
take immediate action to apply the required fixes:
SYMPTOMS
With WBFC enabled, upon flashdisk failure, the griddisks cached by the failed
flashdisk will have stale blocks. Exadata Storage Server Software initiates a
resilvering operation in order to resynchronize the stale blocks from the content on
other storage servers. Block repair operations initiated by the database for the stale
blocks should be suppressed while the resilvering operation is in progress.
The expected duration of resilvering process is based on the amount of dirty blocks
stored on the failed flashdisks. When experiencing the behavior described above,
database-initiated block repair interferes with the resilvering operation, substantially
extending the time it takes resilver to complete. If a second flashdisk failure (or third
flashdisk failure if using ASM high redundancy) occurs on a different Exadata Storage
Server during the extended resynchronize time then data can be lost.
Messages on alert.log on the storage cell indicating the failure of the FDOM(s)
and initiation of resilvering - creation of resilvering tables. Those messages
are EXPECTED as part of the initialization of resilvering.
or
SUCCESS: extent 16667 of file 286 group 1 repaired by relocating to a different AU
on the same disk or the disk is offline
NOTE: repairing group 1 file 286 extent 11654
WORKAROUND
none
PATCHES
Note the minimum requirements for using WBFC in addition to the fixes listed above