Storage Cluster; a year in review

It has been 17 months since I deployed the Nexenta cluster for our VMware hosting platform at ipHouse.

Unfortunately this post will not be positive.

Storage system related problems on our Nexenta HA cluster:

  • Thanksgiving weekend, 2010
  • February into March, 2011
  • October 22nd, 2011

Thanksgiving weekend of 2010 was not a good weekend.  I later found that it was a customer virtual server that was swapping inside the VM itself in a way that was thoroughly crushing the backend storage system. The problem was seen by only a few of the VMs, though and took me 2 days to find the badly behaving virtual server. In the end this was not a Nexenta problem.

And now we move into the issues I relate back to Nexenta and the failures of Nexenta to notify me of problems.

First, my screw-ups…

February/March, 2011 was not a fun 30 days of problems.

A volume went offline and I did not know why. Bringing it back online was taking forever and this affected ~50% of our customers on the VMware cluster.

Finally, when the volume was mounted, I started to figure out what was going on.

The problem was simple: Deduplication. The solutions as simple as well but the execution was very time-consuming.

It starts with the issue that for every 1 TiB of storage space you need ~8 GiB of RAM to hold the dedupe tables. My head units have 24 GiB of RAM in them and have more than 10 TiB of usable storage each. I should have known this but I did not and googling around shows that this is not a very well-known piece of information at the time. (it is now, of course)

The solution was to turn off deduplication on the other head (and volume) and Storage vMotion all customer virtual machines onto the other storage system then turn off deduplication on the source volume and reverse the process. This takes a very long time.

Now for the Nexenta items…

The problem with this is that there was a secondary issue that wasn’t being reported; a failure of a ZIL device (mirrored). Removal of the ZIL took care of many performance problems and I ended up removing the ZIL from both head units and their volumes. They were never added back in.

My issue with Nexenta in dealing with these problems? Nothing was reported in the GUI or the command line system about drive failure(s).

I was convinced that the problem was our SAS HBAs and so I replaced them with a  different model (both HBAs are on the Solaris HCL) and rebuilt each head unit (again going through the long process of Storage vMotion) and the volumes. One of the things I like about the newer edition HBA is that the devices are now using the WWN instead of just cxtxdx or sd nomenclature though now I have to maintain a spreadsheet of which drive is in which slot.

Everything had run fine until October 22nd, 2011 when I was alerted about high I/O latency to one of the volumes on our Nexenta storage cluster (yep, full HA bought and paid for).

So, taking everything I learned (and using lsiutil to look at things) I find, again, drives failing. Two of them this time, each in different RAID groups. As soon as I issued the commands on the command line to offline the devices everything returned to normal in terms of I/O latency but now I am missing drives from 2 different RAID groups and would need to replace them.

My beef with Nexenta is that I should not have to manually go and look for drives failing out.

Nexenta should be reporting this to me.

Nexenta should be able to do this same work automatically without human interaction and then alert if failures are imminent or happening.

I bought Nexenta not because I don’t know how to support my own systems but because it offered a simple way for my employees to help out in the storage realm (they aren’t necessarily as hip to Solaris as I am) and to do high-availability. My NetApp systems are more than capable of telling me when a drive is failing. Old systems using 3WARE controllers can tell me when drives are failing. Old systems using Areca controllers – the same!

Why can’t Nexenta? I spent 5 figures on a storage management software suite that really is only doing HA for me. Issues with the underlying volumes and devices is now on my head to manage, monitor, and maintain. That’s an expensive HA system in my opinion.

Here is an example of what I did (not much of an example as it is exactly what I did but you can use this to do your own remediation):

(remove failure)
# zpool detach volume01 c4t5000C500104EEE9Fd0
(remove spare)
# zpool remove volume01 c4t5000C500104F191Bd0
(attach spare to mirror)
# zpool attach volume01 c4t5000C500104F6313d0 c4t5000C500104F191Bd0

# zpool detach volume01 c4t5000C50020CB0F97d0
# zpool remove volume01 c4t5000C500104F99C3d0
# zpool attach volume01 c4t5000C500104F67D3d0 c4t5000C500104F99C3d0

and in under 5 hours I was fully redundant again.

Want to do your own checking?

% iostat -en `zpool status volume01 | grep c4t | awk '{print $1}'`
  ---- errors ---
  s/w h/w trn tot device
    0   0   0   0 c4t5000C500104EF0D3d0
    0   0   0   0 c4t5000C500104F6DD3d0
    0   0   0   0 c4t5000C50010330173d0
    0   0   0   0 c4t5000C500104EF433d0
    0   0   0   0 c4t5000C500104F99C3d0
    0   0   0   0 c4t5000C500104F6533d0
    0   0   0   0 c4t5000C500104F67D3d0
    0   0   0   0 c4t5000C500104F6313d0
    0   0   0   0 c4t5000C50020CAF9E3d0
    0   0   0   0 c4t5000C500104EF1D3d0
    0   0   0   0 c4t5000C50020CB1557d0
    0   0   0   0 c4t5000C50020CAFC57d0
    0   0   0   0 c4t5000C500104F6607d0
    0   0   0   0 c4t5000C500104F191Bd0
    0   0   0   0 c4t5000C500104F6C9Bd0
    0   0   0   0 c4t5000C500104F6DABd0
    0   0   0   0 c4t5000C50020C6102Bd0
    0   0   0   0 c4t5000C500104F190Bd0
    0   0   0   0 c4t5000C5000439FB2Bd0
    0   0   0   0 c4t5000C500104EE6EFd0
    0   0   0   0 c4t5000C50020CAFAFFd0
    0   0   0   0 c4t5000C500104F6D1Fd0

Replace ‘volume01’ with your volume name and ‘c46’ with the starting characters of the devices in your volume.

You should see zeros in every column, if you are not then you should look further for hardware problems.

Good luck.

Sad to say but ipHouse won’t be purchasing further Nexenta licensing for our production network.

13 Replies to “Storage Cluster; a year in review”

  • Thanks for the writeup Mike. This is the kind of information i need to hear before i purchase my nexenta based setup (which is why i have been googling around and doing my due diligence and happened upon your blog)

    I would be interested to hear what they have to say about your problems.

    If you were going to do a solution like this again, would you use a ZFS based setup again? or would you just spend the extra coin for a proprietary appliance?

    Thanks again for sharing!

    • I love ZFS, I just don’t like spending money on something to manage my storage and have it fail out on me :(

      I am talking to multiple non-Nexenta based vendors now that are using ZFS but doing things their own way. I have asked specifically about this problem for each of them.

      One of them is actually giving me budgetary numbers better than my hand built Nexenta solution. We’ll see how that pans out.

      Yes, I’ll post again when I decide on our new storage vendor(s) and why they were chosen.

      • Sweet, looking forward to the update. The only other ZFS vendor we’ve talked to is the makers of FreeNAS (and their enterprise ver TrueNAS) and they were very competitive price wise but i wasnt able to find out much via google as to real experiences with using them for VM storage in an ESX environment. Good luck and thanks again.

      • no need to make public (unless you want to)

        still choosing vendor options myself, if you feel like throwing some more business towards your particular ZFS vendor feel free to give them my email addy (its in my profile)

        good luck with your new solution!

        • Thanks!

          I’ll be posting stuff.

          I have 2 companies ready to deliver me some evaluation units. Both are set for a little later in the year with one towards the end of November and the other mid-December.

          The mid-December one is a ZFS-based system but they did all their own work on front-end management and all that jazzy jazz.

          The November one is a company building upon their own filesystem and made for virtualization solutions only. It is not a general purpose NFS appliance.

  • Thanks for posting this follow-up, Mike. I’m starting a new gig on Monday, and I’ll likely need to build some sort of storage appliance like this. I’ll be very interested in hearing about what post-Nexenta vendor you end up choosing.

  • I know you’ve moved on already to another storage platform, but I didn’t see in any of the above where you called in and opened a ticket.

    The other thing that jumps out at me is that if you’re buying equipment based on the Solaris HCL, and not the Nexenta HSL, you can, and probably will, run into issues, and this may have contributed to your lack of notifications.

    I wish I had been with the company when you were having these issues, but it looks like you’re well on a new path.

    Good luck!

    • Hiya and thanks for responding after all this time.

      This system is still on 3.0.4. The server equipment was on the Solaris HCL and I don’t really care about the Nexenta HSL considering that there was no such thing when I started this project. More on that in a bit.

      I, too, wish you would have been at the company and reached out. It isn’t like I haven’t been in touch in the past or vocal on Twitter and these blog posts. Thank you for calling today as well, I’ll try to get back to you via phone later but may be Monday.

      As far as tickets are concerned what would I have said? “Hey, a drive has gone into error mode, I can see the errors via iostat -en but Nexenta doesn’t notice.” Sure, I’d have loved to have thought that it would have resolved any issues or potentially helped someone else out. I just don’t have the faith in the process. Drives fail. Period. It doesn’t matter who the vendor is.

      Upgrading to 3.1 series will be done as soon as I can finish getting the new storage in place so I can see if 3.1 takes my issues into account. I’d even post about it. The lack of non-interrupted upgrade from 3.0 to 3.1 does suck for a minor point release.

      Back to the magical Nexenta HSL, I’d suggest you read my post (http://www.iphouse.com/blog/mike/2010/05/a-storage-cluster-is-born/) and find me something in my list of items that isn’t on the HSL (http://www.nexenta.com/corp/images/stories/pdfs/nexenta_hsl.pdf) today. There was a change after last years issues as I swapped out the SuperMicro USAS HBA for a LSI SAS 9200-8e in both heads. The server is correctly listed as SYS-6016T-NTRF4+ 1U chassis in my post while the HSL lists the 6026T edition but we’re talking almost 2 years since purchase. It does have the X8DTU-LN4F+ motherboard which is contained in the HSL. And the SSDs are gone gone, toast, destroyed.

      Performance is horrible using mirrored pairs of 22 drives (11 per mirror) per volume. Each head has the same configuration and both are giving some icky latency. I’ll post more about this later on.

  • Mike,

    I spent alot of the weekend reading this and following up on case files. One of the best aspects of nexentastor is our support. We can help and do help our customers. We currently have on our roadmap tools that will help identify disk failures so that you can have your jr admin’s monitor the gui and not need to have deep skills. You are right in thinking that is a core value of nexentastor, that and an excellent support team. If you are seeing performance problems, we probably have seen the issues before, i suggest you file a case on the latency issues.

    We are always listening to our customers to see what is the next thing they need. I noticed you are in my city, and i would like to buy you a cup of coffee and hear what you think should be next. If you are interested, please email me at [email protected].

    As far as the hsl, we publish a new version every 2 weeks. We run on commodity hardware and sometimes the vendors change firmware or bios settings so modifications have to made in realtime. We do our best to support as many configs as possible and keep up with the industry to ensure we have a robust set of certified hardware. We need to recertify all the time as the industry moves quickly.

    Nexenta grew at 400% last year. We hope to do that again this year and the year after that.. With that growth, engineering and resources will grow too. Our products get better and better all the time. I look forward to working with you.

Comments are closed.