Monday, March 24, 2008

WAFL Check


Wafl check is a Data ONTAP utility to repair filer volumes which repair the inconsistency in the volumes.

Wafl check steps :

Data ONTAP (test.com)

login: root
Password:


Filer1> reboot

Total number of connected SCSI clients: 1 Number of r/w, online, mapped LUNs: 4

Warning: Rebooting with clustering disabled will terminate SCSITarget services and might cause data loss and application visible errors, or other OS failures on storage clients!!

CIFS local server on vfiler vfiler11 is shutting down...
CIFS local server on vfiler Vfiler12 is shutting down...
CIFS local server on vfiler vfiler11 has shut down...
CIFS local server on vfiler vfiler12 has shut down...
CIFS local server on vfiler vfiler0 is shutting down...
CIFS local server on vfiler vfiler0 has shut down...

Enter the number of minutes to wait before disconnecting [5]:
11 minute left until termination (^C to abort)...[LCD:info] REBOOTING

*** Warm reboot...

Clearing memory
Probing devices
Finding image...
Loading /pc-card:1,\x86\kernel\primary.krn

Starting Press CTRL-C for special boot menu
.................................................................................................................................................................................................................
Special boot options menu will be available.
Mon Feb 26 21:43:52 GMT [cf.ic.linkEstablished:info]: The Cluster Interconnect link has been established.
Mon Feb 26 21:43:53 GMT [cf.nm.nicTransitionUp:info]: Interconnect link 0 is UP

NetApp Release 7.0.5: Wed Aug 9 00:27:38 PDT 2006
Copyright (c) 1992-2006 Network Appliance, Inc.
Starting boot on Mon Feb 26 21:43:47 GMT 2007
Mon Feb 26 21:44:02 GMT [diskown.isEnabled:info]: software ownership has been enabled for this system
LUN Ownership using low range


Please choose one of the following:


(1) Normal boot.
(2) Boot without /etc/rc.
(3) Change password.
(4) Initialize owned disks (65 disks are owned by this filer).
(4a) Same as option 4, but create a flexible root volume.
(5) Maintenance mode boot.

Selection (1-5)? WAFL_check


In a cluster, you MUST ensure that the partner is (and remains) down,
or that takeover is manually disabled on the partner node,
because clustering software is not started or fully enabled
in WAFL_check mode.

FAILURE TO DO SO CAN RESULT IN YOUR FILESYSTEMS BEING DESTROYED
Continue with boot? y

add net 127.0.0.Mon Feb 26 21:48:01 GMT [cf.noDiskownShelfCount:info]: Disk shelf count functionality is not supported on software based disk ownership configurations.
0: gateway 127.0.0.1Mon Feb 26 21:48:04 GMT [fmmbx_instanceWorke:info]: Disk disk1:0-4.125L1 is a primary mailbox disk
Mon Feb 26 21:48:04 GMT [fmmbx_instanceWorke:info]: normal mailbox instance on primary side
Mon Feb 26 21:48:10 GMT [fmmbx_instanceWorke:info]: Disk disk1:0-4.125L0 is a backup mailbox disk
Mon Feb 26 21:48:10 GMT [fmmbx_instanceWorke:info]: normal mailbox instance on backup side
Mon Feb 26 21:48:10 GMT [cf.fm.partner:info]: Cluster monitor: partner 'filer2'
Mon Feb 26 21:48:10 GMT [cf.fm.timeMasterStatus:info]: Acting as cluster time slave
Mon Feb 26 21:48:14 GMT [localhost: cf.fm.launch:info]: Launching cluster monitor
Mon Feb 26 21:48:14 GMT [localhost: cf.fm.partner:info]: Cluster monitor: partner 'filer2'
Mon Feb 26 21:48:14 GMT [localhost: cf.fm.notkoverClusterDisable:warning]: Cluster monitor: cluster takeover disabled (restart)
Mon Feb 26 21:48:15 GMT [localhost: cf.fsm.takeoverOfPartnerDisabled:notice]: Cluster monitor: takeover of Filer2 disabled (cluster takeover disabled)
Mon Feb 26 21:48:15 GMT [localhost: raid.cksum.replay.summary:info]: Replayed 0 checksum blocks.
Mon Feb 26 21:48:15 GMT [localhost: raid.stripe.replay.summary:info]: Replayed 0 stripes.
Check vol01? y
Check vol02? n
Check vol03? y
Check vol04? y
Check vol0? n
Check vol05? y

Checking vol01...

WAFL_check NetApp Release 7.0.5

Starting at Mon Feb 26 21:50:32 GMT 2007

Phase 1: Verify fsinfo blocks.
Phase 2: Verify metadata indirect blocks.
Phase 3: Scan inode file.
Phase 3a: Scan inode file special files.
Phase 3a time in seconds: 6
Phase 3b: Scan inode file normal files.
(inodes 5%)
(inodes 10%)
(inodes 15%)
(inodes 20%)
(inodes 25%)
(inodes 30%)
(inodes 35%)
(inodes 41%)
(inodes 46%)
(inodes 51%)
(inodes 56%)
(inodes 61%)
(inodes 66%)
(inodes 71%)
(inodes 76%)
(inodes 82%)
(inodes 87%)
(inodes 92%)
(inodes 97%)
(inodes 99%)
(inodes 99%)
Phase 3b time in seconds: 2989
Phase 3 time in seconds: 2995
Phase 4: Scan directories.
Phase 4 time in seconds: 0
Phase 5: Check volumes.
Phase 5a: Check volume inodes
Phase 5a time in seconds: 0
Phase 5b: Check volume contents

Checking volume vol03...
Phase [5.1]: Verify fsinfo blocks.
Phase [5.2]: Verify metadata indirect blocks.
Phase [5.3]: Scan inode file.
Phase [5.3a]: Scan inode file special files.
Phase [5.3a] time in seconds: 20
Phase [5.3b]: Scan inode file normal files.
(inodes 5%)
(inodes 10%)
(inodes 15%)
(inodes 20%)
(inodes 25%)
(inodes 30%)
(inodes 35%)
(inodes 40%)
(inodes 45%)
(inodes 50%)
(inodes 55%)
(inodes 60%)
(inodes 65%)
(inodes 70%)
(inodes 75%)
(inodes 80%)
(inodes 85%)
(inodes 90%)
(inodes 95%)
Phase [5.3b] time in seconds: 5
Phase [5.3] time in seconds: 26
Phase [5.4]: Scan directories.
Phase [5.4] time in seconds: 6
Phase [5.6]: Clean up.
Phase [5.6a]: Find lost nt streams.
Phase [5.6a] time in seconds: 5
Phase [5.6b]: Find lost files.
Phase [5.6b] time in seconds: 16
Phase [5.6c]: Find lost blocks.
Phase [5.6c] time in seconds: 0
Phase [5.6d]: Check blocks used.
Phase [5.6d] time in seconds: 722
Phase [5.6] time in seconds: 744

Clearing inconsistency flag on volume vol03.

Volume vol03 WAFL_check time in seconds: 776
Inconsistent vol vol03 marked clean.
WAFL_check output will be saved to file /vol/vol03/etc/crash/WAFL_check

Checking volume vol04...
Phase [5.1]: Verify fsinfo blocks.
Phase [5.2]: Verify metadata indirect blocks.
Phase [5.3]: Scan inode file.
Phase [5.3a]: Scan inode file special files.
Phase [5.3a] time in seconds: 55
Phase [5.3b]: Scan inode file normal files.
(inodes 5%)
(inodes 10%)
(inodes 15%)
(inodes 20%)
(inodes 25%)
(inodes 30%)
(inodes 35%)
(inodes 40%)
(inodes 45%)
(inodes 50%)
(inodes 55%)
(inodes 60%)
(inodes 65%)
(inodes 70%)
(inodes 75%)
(inodes 80%)
(inodes 85%)
(inodes 90%)
(inodes 95%)
Phase [5.3b] time in seconds: 1419
Phase [5.3] time in seconds: 1474
Phase [5.4]: Scan directories.
Phase [5.4] time in seconds: 13
Phase [5.6]: Clean up.
Phase [5.6a]: Find lost nt streams.
Phase [5.6a] time in seconds: 10
Phase [5.6b]: Find lost files.
Phase [5.6b] time in seconds: 42
Phase [5.6c]: Find lost blocks.
Phase [5.6c] time in seconds: 4
Phase [5.6d]: Check blocks used.
Phase [5.6d] time in seconds: 1166

FS info: setting total inodes used to be 1692 (was 1693).
Phase [5.6] time in seconds: 1222
Clearing inconsistency flag on volume vol04.
Volume vol04 WAFL_check time in seconds: 2710
1530 error messages discarded
Indirect blocks cleared: 3
Block counts corrected: 1
Total inodes corrected.
1530 lost blocks collected into 1530 files in lost+found.
Volume vol04 Inconsistent vol vol04 marked clean.
WAFL_check output will be saved to file /vol/vol04/etc/crash/WAFL_check

Checking volume vol05...
Phase [5.1]: Verify fsinfo blocks.
contains snapmirrored qtrees. Committing these changes willbreak the synchronization between these qtrees andPhase [5.2]: Verify metadata indirect blocks.
their sources.Use snapmirror resync or initiPhase [5.3]: Scan inode file.
Phase [5.3a]: Scan inode file special files.alize to re-establish the mirror(s).
Phase [5.3a] time in seconds: 798
Phase [5.3b]: Scan inode file normal files.
(inodes 5%)
(inodes 10%)
(inodes 15%)
(inodes 20%)
(inodes 25%)
(inodes 30%)
(inodes 35%)
(inodes 40%)
(inodes 45%)
(inodes 50%)
(inodes 55%)
(inodes 60%)
(inodes 65%)
(inodes 70%)
(inodes 75%)
(inodes 80%)
(inodes 85%)
(inodes 90%)
(inodes 95%)
(inodes 100%)
Phase [5.3b] time in seconds: 3512
Phase [5.3] time in seconds: 4310
Phase [5.4]: Scan directories.
Phase [5.4] time in seconds: 15
Phase [5.6]: Clean up.
Phase [5.6a]: Find lost nt streams.
Phase [5.6a] time in seconds: 12
Phase [5.6b]: Find lost files.
Phase [5.6b] time in seconds: 49
Phase [5.6c]: Find lost blocks.
Phase [5.6c] time in seconds: 0
Phase [5.6d]: Check blocks used.
Phase [5.6d] time in seconds: 707
Phase [5.6] time in seconds: 768
Clearing inconsistency flag on volume vol05.
Volume vol05 WAFL_check time in seconds: 5094
Inconsistent vol vol05 marked clean.
Volume vol05 is a snapmirrored volume. Committing these changes will break thesynchronization between this
WAFL_check output will be saved to file /vol/vol05/etc/crash/WAFL_check
Phase 5b time in seconds: 9697
Phase 6: Clean up.
Phase 6a: Find lost nt streams.
Phase 6a time in seconds: 0
Phase 6b: Find lost files.
volume and its source, forcing you to restartthe snapmirror process. A better alternative may be toWAFL_check the source volume and reinitialize snapmirror by issuing a"snapmirror initialize" command on this volume.Phase 6b time in seconds: 4
Phase 6c: Find lost blocks.
phase 6c time in seconds:17
Phase 6d: check blocks used
phase 6d time in seconds: 51
phase 6 time in seconds: 72
Clearing inconsistency flag on aggregate vol05.
WAFL_check total time in seconds : 12764
Commit changes for aggregate aggr1 to disk? ? y

___________________________________________________________________________

2 comments:

Unknown said...

Hello Swami,

million thanks to you for this very useful post!

I've recently had a strange situation when the NetApp filer just crashed in the middle of the day, and all the Terabytes and aggr0 on the filer were corrupted.
It was really the very unexpected "single point of failure", and only with the help of your note and WAFL_check command I was able to restore all the data!

Thanks again!

Best regards,
Serge.

Salinux said...

Gracias, Me me sirvió mucho este procedimiento.


Salinux.