Friday, May 20, 2011

Followup: ZFS Data Gone

In Feburary I blogged about a nasty data loss event with ZFS

http://christopher-technicalmusings.blogspot.com/2011/02/zfs-data-gone-data-gone-love-is-gone.html

I've been quite busy since then and haven't followed up on the results. As a few people have been asking if I was able to get the data back, here is my answer:

Yes, I did get most of it back, thanks to a lot of effort from George Wilson (great guy, and I'm very indebted to him). However, any data that was in play at the time of the fault was irreversibly damaged and couldn't be restored. Any data that wasn't active at the time of the crash was perfectly fine, it just needed to be copied out of the pool into a new pool. George had to mount my pool for me, as it was beyond non-ZFS-programmer skills to mount. Unfortunately Solaris would dump after about 24 hours, requiring a second mounting by George. It was also slower than cold molasses to copy anything in it's faulted state. If I was getting 1 Meg/Sec, I was lucky. You can imagine that creates an issue when you're trying to evacuate a few TB of data through a slow pipe.

After it dumped again, I didn’t bother George for a third remounting (or I tried very half-heartedly, the guy was already into this for a lot of time, and we all have our day jobs), and abandoned the data that was still stranded on the faulted pool. I copied my most wanted data first, so what I abandoned was a personal collection of movies that I could always re-rip.


I was still experimenting with ZFS at the time, so I wasn't using snapshots for backup, just conventional image backups of the VM's that were running. Snapshots would have had a good chance of protecting my data from the fault that I ran into.


I was originally blaming my Areca 1880 card, as I was working with Areca tech support on a more stable driver for Solaris, and was on the 3rd revision of a driver with them. However, in the end it wasn't the Areca, as I was very familiar with it's tricks - The Areca would hang (about once every day or two), but it wouldn't take out the pool. After removing the Arcea and going with just LSI 2008 based controllers, I had one final fault about 3 weeks later that corrupted another pool (luckily it was just a backup pool). At that point, the swearing in the server room reached a peak, I booted back into FreeBSD, and haven't looked back.

Originally when I used the Areca controller with FreeBSD, I didn't have any problems with it during the 2 month trial period.

I've had only small FreeBSD issues since then, nothing else has changed on my hardware. So the only claim I can make is that in my environment, on my hardware, I've had better stability with FreeBSD than I did with Solaris.

Interesting Note: One of the speed slow-downs with FreeBSD compared to Solaris from my tests was the O_SYNC method that ESX uses to mount a NFS store. I edited the FreeBSD NFS source to always do a async write, regardless of the O_SYNC from the client, and that perked FreeBSD up a lot for speed, making it fairly close to what I was getting on Solaris.

I'm not sure why this makes such a difference, as I'm sure Solaris is also obeying the O_SYNC command. I do know that the NFS 3 code from FreeBSD is very old, and a bit cluttered - It could just be issues there, and the async hack gives it back speed it loses in other areas.

FreeBSD is now using a 4.1 NFS server by default as of the last month, and I'm just starting my stability tests with using a new FreeBSD-9 build to see if I can run newer code. I'll do speed tests again, and will probably make the same hack to the 4.1 NFS code to force async writes. I'll post to an update when I get this far.

I do like Solaris - After some initial discomfort about the different way things were being done, I do see the overall design and idea, and I now have a wish list of features I'd like see ported to FreeBSD. I think I'll have a Solaris based box setup again for testing. We'll see what time allows.

Monday, May 9, 2011

Captchas for everyone

I sometimes wonder if there is something wrong with me.

A captcha is a quick little test to make sure that a human, not a bot or script is requesting information from a website.

Re Captcha seems to be a very popular captcha that is supported by Google. It's on a lot of the sites that I use, and I can almost never complete the damn thing without multiple tries.

Do we need such cryptic letters that even humans have problems reading them? Can't we use other tricks that require a bit of reasoning instead of very cryptic letters that could be a c e a or o ? Maybe two words that are related instead of "word-like" prompts?

Audio really isn't much better. I'm at 5 attempts from the audio prompts, and I was just asked to spell out "Their". Or was it "There" ? I don't know, must we use homonyms in an audio captcha? Isn't that just toying with the poor user who's trying to decipher these crazy voices? I think this is what insanity must be like.. all these strange sounds swirling around my head, punctuated by occasional nonsense words that don't make sense.

In my mind, the deductive questions I see on some sites are the best. They ask questions like "If Tom has two apples and George has five apples, how many apples does Tom have?" Questions like this are easy to assemble via random strings, and are just as secure against computer login as a crazy jumbled set of letters in a proto-word that no one recognizes.

The current captcha is probably decent for non-English speakers, but then again only if they recognize the Latin alphabet. If they have trouble there, being presented with English audio cues won't help them at all.

I just want to use the site. Can someone come up with a better method?