Storage, storage STORAGE!

SInce last month, our Nexenta based storage cluster has been deployed and I have now moved production data onto it.

A bump and bruise occurred last weekend (I had done an announcement already) and yesterday things burped again.

The problem? Looks like an issue with the 2 mirrored boot drives of the first head unit (each head manages its own volumes and HA is used to make sure a single head failure doesn’t cause an outage) are … bad.

One drive has full on SMART failure via the BIOS. Interesting…so replace that drive with another from the shelf…and the other drive is showing something ‘odd’. Yank it out, replace it, and move it another machine for testing.

And it is failing as well! Four hard disks, 2 per head unit, and 2 fail in one head unit, what are the chances of that? (the second head unit does not exhibit any of the same symptoms)

Now how do failing boot drives really cause problems? I don’t know, but if a scrub is started on the boot volume (with the bad disks), the system has intermittent stalls causing a delay in I/O to the NFS mounted volumes.

VMware ESX handles this gracefully but some older editions of operating systems don’t like to wait on I/O to complete. FreeBSD 6-STABLE (where 6.x and x > 2) is not doing so well here, while 7-STABLE and 8-STABLE are doing great. Ubuntu 8.04 and 10.04 also weathered the issue just fine, and one Windows Server 2003 system was burping, the rest of the Windows Server based systems were unaffected.

Very frustrating to find this issue and quite difficult to track down.

Log entries from the VMware virtuals helped immensely but correlating things took a bit to see that the issue was coming from that head unit, then to find what was wrong with the head unit took even more time.

Now the issue is resolved and I’ll need to rebuild the now ‘bad’ head unit with new disks (I don’t trust the data on the replacement disks right now as there were multiple problems on the previous drives) and bring it fully back into the cluster.

Statement: Nexenta is not at fault at all, only mentioned because that is what we are using, the issue is hardware and not software and no blame should be attributed to Nexenta.

9 Replies to “Storage, storage STORAGE!”

  • Were you able to do any testing (ie – even pulling the SMART data) to determine if the issue was with the hard drives itself, or if the ZFS filesystem on top somehow developed corruption on its own? Curious minds and all that.. ;)

    • One was just failed, and failed miserably on another system.

      The other SMART failed and showed up that way on the other system as well.

      Just horrendous luck it seems :(

      ZFS would correct itself but when both drives are doing funky things, funky things then happen making the system unstable.

      • Yeah, horrible luck, sheesh!

        Glad to here it was plain ‘ol physical failures and not ZFS corruption tho.. I’ve heard a lot about ZFS corruption with semi-failing disks in previous versions, ut it’s all supposed to be fixed now.. :)

  • I was wondering how you were finding the reliability and performance? We are considering creating a clustered nexenta solution, ours would have 144GB of RAM in each head, with 3 x 45 disk jbods with 43 1TB SAS 7.5 krpm disks, in a 3g way mirror, 1 in each jbod, 4 stec 8GB zeus ram ssd, 2 in 1 jbod and 1 in each of the other. Then 8 x c300 ssds, 4 in each head. I was hoping you could give a little info on your performance. Thanks

    • Hiya.

      First, don’t think you need that much RAM, but if you have it, flaunt it!

      Going to assume the ‘8GB zeus ram ssd’ for the ZIL (intent log) for write acceleration – make sure they are mirrored.

      Going to assume the 8xc300 ssds are the Crucial RealSSD C300 models – and assuming these are for read caching. No RAID needs to be done – the blocks would be spread over them and use as needed.

      using the large mirror setup should give some fantastic read I/O, upwards of concurrency of over 20K I/Os per second (depending on how random it is) because of the 3-way-mirroring and the frontended SSDs for caching blocks requested.

      Our setup does not use any SSD for offloading reads, just done in a mirror configuration for ZIL.

      I have one volume using raidz1 with 7 drive pools (3 of them) and a second volume using 10 2-way-mirror sets. Both volumes are using ZIL with mirrored SSDs.

      Performance? Everything is quite good for our needs. We only store VMware VMDKs on this over NFS, we do not use iSCSI or the CIFS implementation.

      • Hi,

        Thanks for replying, Yep the 8GB ZeusRAM drives will be used for the ZIL, mirrored, with the mirrors being in different JBODS (so a JBOD can fail), Same reason for the 3 way mirror, so a disk or JBOD can go down, and still have redundency.
        With the L2ARC, yep, its the Crucial, 4 x 256GB for each head.
        There is that much RAM in it, as we have a defined buget for the project, and this comes in under it, so we maxed out the RAM, more RAM is better :) Also this will be used for multiple VM hosts, plus a few DB servers, 2 of which have a requirement of 3000+ IOPS, with a DB size of 2TB+, along with it being as reliable as possible. All accessed via iSCSI.
        We have been trying to decide on whether to go for this type of a solution or a traditional SAN, complenent was out of budget, equilogic, could be within, NETAPP within (not counting support :) ) and fair amount of disks, CORAID within, but doesnt have anywhere as near as much hardware as this. All have their advantages and dissadvantages. Difficault choice :)

        So I was hoping to get a little insight into how nexenta has been for you, reliability wise, performance, and usablility, we are currently testing it on some older hardware, DELL poweredge 6950 with an MD1000 (14x15krpm 146GB), no dedicated zil or l2arc.

        • So far so good relatively.

          I haven’t posted yet – just had an issue dealing with the duplicate request cache filling up (NFS related) and causing the VMware hosts to get all pissy and go through a cyclic remount of the NFS mounts. When I went to ‘fail-over’ the volume onto the other head, a new problem reared its head – we use dedupe and the time to load the DDT (deduplication data tables) is longer than the HA timeout, so it was importing then aborting. After 3 cycles of this I restarted things from scratch by moving the volume to manual and doing an import by hand and waiting the 90ish minutes then telling HA to go automatic and things have been fine since.

          I am in the process of moving all data from this volume onto another without dedupe turned on and then moving it back. Thankfully we have full VMware licensing so I can do it via VMware’s Storage vMotion.

Comments are closed.