Enterprise Server/Storage Architects

Wednesday, July 24, 2013

NetApp Bug-Burt 393877- Spike

High Read Latency Spike Causes Outage to underlying Virtual Infrastructure.

a.       These variables are set at the loader prompt (Note: this settings are platform dependent).
b.      LOADER-A> setenv raid_vol_readio_res_count 6230
c.       LOADER-A> setenv raid_mirror_readio_res_count 7702
d.      LOADER-A> setenv wafl_pw_limit_percent 12

The 'raid_vol_readio_res_count' and 'raid_mirror_readio_res_count' increase the resource count available for RAID. By default the partial write limit is 50% of the resources count.

Setting 'wafl_pw_limit_percent ' reduces this value to mitigate a partial write workload from dominating the system.

This is only a partial fix and may not completely eliminate the issue seen. (Random Read Spike). It is for this reason an upgrade to DOT 8.1.3 upon release is still the long term recommendation".

VM Client Disconnecting Issue:
a.       http://kb.vmware.com/kb/51306
b.      https://kb.netapp.com/support/index?page=content&id=2010823

Wednesday, January 2, 2013

Nutanix and MapReduce

So Nutanix relased the latest NOS 3.0 with all rich features. One of them that acutally hit me was the offline compression.

Mapreduce techniques to essentially figure out which data has gone cold and then compress it. the compression only occurs when the system is idle and data is cold, and determining when data is cold is left up to the user (e.g. after one day or after one hour). This technique is based on Google’s snappy compression library and hits a nice balance between high-speed & reasonable compression rate.

Thursday, November 29, 2012

DFM Options

Handy one. NetApp Monitoring On Command stops collecting performance data if there is not enough space on the Disk and sometimes dont even start up. Below option can be set to start it.

dfm option set monMinFreePercent=5

Wednesday, November 14, 2012

Yahoo Storage Outage

http://www.sportsgrid.com/nfl/were-sorry-yahoo-just-emailed-all-of-their-fantasy-football-players-this-apology/

"So what happened? At Yahoo!, we have giant machines called “filers” that process a lot of the real-time data and stats for us and for you. We do millions of calculations every hour for our games, and normally our machines can handle this with no problem. Recently, we discovered a hardware issue in one of the filers that caused the other one to overload. We replaced some hardware, re-configured the setup, and did some testing. However this Sunday – at approximately 12:15 p.m. Eastern – the new configuration failed. This created an overload on storage capacity and took the Fantasy part of our site down."

Thursday, October 18, 2012

IBM XIV

Interesting Read.
http://storagegorilla.com/2010/03/26/7-reasons-why-ibms-xiv-isnt-perfect/

Tuesday, November 8, 2011

ESXi Storage Add Issue..Call fails for “HostDatastoreSystem.QueryVmfsDatastore- CreateOptions” for object “ha-datastoresystem”

Solution
Source VMWARE.

Clearing partitioning information in ESXi using the DD utility

Due to differences between ESX classic and ESXi, the parted utility is not available in ESXi. These steps describe how to clear partitioning information for a LUN under ESXi.
Warning: This process will destroy data on the target device. The steps outlined here are potentially hazardous for your environment if they are not followed exactly. If you are not comfortable performing these steps, contact VMware Technical Support and work with them to resolve the issue.

Open a console to the ESX or ESXi host.
Identify the disk device in question from the log messages. For example:

/vmfs/devices/disks/vml.0200030000600508b30093fcf0a05b5b8cc739002f4d5341313531
Use the fdisk command to obtain the exact size of the target disk device in bytes:

fdisk -l "/vmfs/devices/disks/DeviceName"

The output appears similar to:

Disk /vmfs/devices/disks/DeviceName: 536 MB, 429491220480 bytes 255 heads, 63 sectors/track, 52216 cylinders, total 838850040 sectors Units = sectors of 1 * 512 = 512 bytes Device Boot Start End Blocks Id System/vmfs/devices/disks/DeviceName 1 128838850039 419424956 ee EFI GPT
Use the dd command to erase the first 34 sectors (34x512 bytes) of the disk device with zeros:

dd if=/dev/zero of="/vmfs/devices/disks/DeviceName" bs=512 count=34 conv=notrunc

Caution: This operation deletes the partition table and master boot record on the disk device. The changes take effect immediately. This results in data loss and cannot be reverted.
The GPT partition scheme stores a backup of the partition table at the end of the disk device. Erase the last 34 sectors of the device as well:
1. Determine the offset at which the last 34 sectors begins:
  
  (SizeInBytes / 512) - 34 = SeekOffset
  
  For example, using the values in step 3:
  
  (429491220480 / 512) - 34 = 838850006
2. Use the dd command to erase the last 34 sectors of the disk, starting at the offset found in step 5a:
  
  dd if=/dev/zero of="/vmfs/devices/disks/DeviceName" bs=512 count=34 seek=SeekOffset conv=notrunc
  
  Caution: This operation deletes the partition table and master boot record on the disk device. The changes take effect immediately. This results in data loss and cannot be reverted.
Depending on the original contents of the disk device, it may be necessary to erase a larger amount of data on the disk.
Retry the storage operation.

Few more that I stumbled on Google
http://www.eversity.nl/blog/2010/08/call-fails-for-hostdatastoresystem-queryvmfsdatastore-createoptions-for-object-ha-datastoresystem/

http://www.digital52.com/help/gptremoval.html

ESXi 5.0 Issue with ISP2432 HBA

So if you are upgrading to ESXi 5.0 and have these HBAs, they are not supported. Doing a Rescan will not pick up the newly provided LUNs.
We have to issue a LIP to get the LUNs seen which is disruptive to the IO.

~ # echo "scsi-qlalip" > /proc/scsi/qla2xxx/5
~ # echo "scsi-qlalip" > /proc/scsi/qla2xxx/6
~ #