Friday, December 31, 2010

OpenBSD 4.8 vs FreeBSD 8.1 em0 Network Performance

As mentioned in one of my earlier postings, I have been looking into OpenBSD as a possible firewall OS.

I'll post a more opinion-based article shortly on what I think about OpenBSD vs FreeBSD, but for now, I thought I'd report this tidbit of info;

When benchmarking with netperf, OpenBSD 4.8 wasn't as fast as FreeBSD 8.1 until I tweaked some sysctls.

OpenBSD 4.8 - 520.49 Mbits/sec
FreeBSD 8.1 - 941.49 Mbits/sec

I then applied the speed tweaks for OpenBSD found here: https://calomel.org/network_performance.htm , and then was able to match FreeBSD's performance using NetPerf.

While this was with the em driver (Intel), I believe it would show on other drivers as well, as the sysctl's adjusted weren't specific to the em driver.

Conclusion? Make sure you're applying these tuning tips if you're running OpenBSD 4.8

Friday, December 17, 2010

Building a Ghetto-SAN - Part 1 - Basic Considerations

What is a Ghetto-SAN?

It's a SAN built on the cheap with whatever you can get your hands on.

In my situation, it's not sacrificing any type of reliability or speed, it's simply putting together a lot of parts that may not typically exist in enterprise SAN deployments.


 I built a  6 TB FreeBSD/Samba server roughly two years ago when 1.5 TB drives first came out, but we've long since outgrown that.  I could build another 12 TB easily enough, but that may not last me the year, and I don't want to start piling more servers around the office. I need consolidation as well as massive storage. 

I project that we need at least 24 TB to make it two more years, and 36 TB may be preferable. Some of my data recovery projects are chewing up 3-5 TB and the complex ones involving law firms as clients sometimes mean I need to  hold on to that data for 3 months before I can delete it.

I need a SAN, but I can't afford even the "affordable" SAN's from EMC, EqualLogic, etc.  These things start at $100k for the features and storage density that I need today.

So I decided to build my own, basing it on opensource software (thanks guys!), ZFS, SATA drives, and whatever else I needed to make this thing the foundation of our little data centre.  I'm flip-flopping on FreeBSD/Solaris/Nexenta as my SAN OS, but that's another topic.

It's been over 6 months since I started building and testing the environment, and I've learned a lot. This is not a task for the faint-at-heart, nor for those who don't like to do a lot of testing.

I plan on posting more information about what I've learned, my SAN specs, etc. over the next few months, but first I wanted to quickly comment on what I feel are two dead-end paths for some people trying to design their own Ghetto-SAN's.

Mistake #1 - Getting hung up with Controller/Disk bandwidth when you don't have the network for it

You need to realize that a 6 drive FreeBSD/ZFS raidz on a 3 year old Core 2 system with Intel's ICH 7 drive chipset, and 4 gig of RAM can saturate a 1 Gb network port - At least mine could.

This was my test bench for early SAN performance stats. It's when I knew I needed to look into faster network performance (and I eventually settled on 10 Gb Ethernet over FiberChannel and InfiniBand. That's also another post.)

If you're not talking about a 10Gbe network as the backbone for your SAN, don't worry about your drive performance. There's no need to buy seperate SAS controllers, SAS drives, and make sure everything is connected end-to-end.

Just get a simple SAS HBA for ~$200, hook it into a SAS backplane (and the old 3.0 Gb/sec ones are dropping nicely in price now) and you're going to be able to run that 1 Gb network port into the ground... and it will probably be faster than a basic 4 drive RAID-10 Direct Attached Storage (DAS) for a lot of things, thanks to ZFS having a great caching system. It beats the hell out of a PERC6 or H700 4 drive RAID-10 with 300 gig 6.0 Gb/sec SAS drives. You do have more latency, but the throughput is at least double in my tests, which well makes up for the 100 ms access time. Every VM i've moved to my test SAN has performed better, and "felt" better than the DAS units.

So you have 2 1 Gbe pots that you're going to team? Same story, I bet you'll be running ethernet into the ground before your disks are starving for I/O. 

With 4 Ethernet ports, you may have cause - but you still need to do some testing. I decided on a single SAS 6.0 Gb/s HBA and SAS 6.0 Gb/s backplane for my SAN, and it's working out great for me. 


Mistake #2 - Putting all your eggs in one basket

Everything fails. Everything.

If you have just one of anything that you depend on, you need to go read that last line again. I've got over 20 years experience in IT, and trust me when I say that the only thing you can count on with technology is it's eventual failure.

I have dual everything in my data centre. Dual UPS's connected to dual battery banks, feeding dual PDU's to the servers that all have dual power supplies, so each server is fed from two UPS' in case one of them fails under load when the power cuts out and the generator hasn't started yet. Dual firewalls, dual ESX servers, etc. You get the picture.

Same for your SAN. It's far better to have two cheaper SAN boxes than one more expensive one.

I've gone with a fairly expensive Primary SAN (redundant back planes, power, network, etc). It will be my main go-to box, and will deliver the day-to-day performance that I need to stay sane.

I will also have a Secondary SAN that will be run off an older server that will save my ass if the primary goes down. It won't be very fast at all, will be tight on storage, and low on RAM - BUT it will hold a copy of all my critical data, and a very recent snapshot backup of the main critical servers so I can get my network up and running within 15 minutes in case of failure in the main SAN.

This is completely redundant hardware, done on the cheap - and you need something like this if you care about your data. I just need to stuff more drives into an old case, setup some zfs send/recv and snapshot jobs, and I'm done.

I'm currently lucky - Not losing data is my primary concern. Quick repair is my second, but my clients will tolerate a rare incident that may result in their data access being interrupted for 30 minutes, followed by a day of slow service until we get the primary back on-line. Not everyone may be that lucky, so make sure you take the worst-case scenario into account when you start designing your SAN.


--

I hope to followup on this more regularly as my SAN timidly transitions from testing into full production. Comments and opinions welcome.

Intel X520 DA 10 Gbe Network Card and FreeBSD 8.2-PRE and 9.0-Current

I love my Intel X520 DA. It's two SPF+ 10 Gbe Network ports on one PCIe 4x card. 

Out of the box, it works great with FreeBSD 8.1, even if you ignore Intel's optimization advice.

Currently for $380 a card (Retail close to $500), it's part of my Ghetto-SAN foundation. No need for an expensive 10 Gbe switch when you can afford to put two of these in each machine and make a simple point-to-point SAN network.

However, under FreeBSD 8.2-PRE and 9.0-CURRENT (both are not official releases yet) it won't work properly if you set the card to a MTU of 9000 or higher.

The reason is that it runs out of buffers to handle traffic of that size.

The quick fix is to put these settings into your /etc/sysctl.conf

hw.intr_storm_threshold=9000
kern.ipc.nmbclusters=262144
kern.ipc.nmbjumbop=262144
kern.ipc.nmbjumbo9=48000

The kicker is the last line - nmbjumbo9 - By default FreeBSD allocates 6400, which isn't enough to handle what this card can produce. This isn't some little 100 Mbit card.. that's 10,000 Mbit that it needs to push. Expect to require deep buffers.

Remember to check your netstat -m output to see if you are close to exhaustion with nmbjumbo9 at 48000 - Mine was using ~36000 after some load testing.

Here's some additional information on settings for the card, and drivers, etc.

http://www.intel.com/support/network/sb/CS-025829.htm

http://downloadcenter.intel.com/SearchResult.aspx?lang=eng&ProductFamily=Network+Connectivity&ProductLine=Intel%C2%AE+Server+Adapters&ProductProduct=Intel%C2%AE+Ethernet+Server+Adapter+X520+Series

Wednesday, December 15, 2010

lagg performance penalty between ESXi 4.1 and FreeBSD 8.1

While doing some benchmarks of my ZFS/NFS/ESX setups, I started fiddling with the lagg driver to create a fail-over connection between my ESX server and the ZFS server.

Both boxes have the Intel x520DA 10Gbe adapter, which has 2 10 Gig ports.

ESXi doesn't offer proper load-balance between two ports unless you get into the Cisco Nexus vswitch, which I currently don't have.


I ran a few quick tests with my usual 2003 32 Bit VM and Performance test 6.1, and here's what I found;



Type of Test        With lagg                Without lagg    % Diff
File Server:            478.16                     491                     2.6
Webserver:            84.19                       86.46                  2.6
Workstation:          0.52                         0.6                      13
Database:              127.88                     136.15                 6

Every test was slower with lagg being used.
This is something interesting to think about, as a lot of admins enable lagg or similar failover technology without thinking that there may be a performance impact. 

At this stage, I don't know if it's on the lagg side, or ESX's side - I'd have to run further tests. I'll add it to the list of things that I'd like to know if I had time.

But for now, I know that I'll take a small performance hit to have this fail-over setup.

Sunday, December 5, 2010

Netperf and SMP - Oddness (Part 2 of 2)

To continue my previous post.. http://christopher-technicalmusings.blogspot.com/2010/12/netperf-and-smp-oddness.html

I was concerned that the poor performance of netperf when it was run on a multi-processor system was due to SMP overhead.

The only way to know for sure would be to run the test with netperf and netserver on separate (but identical) machines. That way I could ensure data was transferring across the network cable, and not being accelerated by any buffer-copy process within the processors or system.

I once again setup my netperf and netserver test, using FreeBSD 8.1 AMD64 on two Dell PowerEge 1950's.

In one set of tests, I restricted both ends to use a single CPU. In the other I let the system choose what CPU to run on (which results in the process jumping CPU a number of times as it executes the threads on the most opportunistic CPU - as noted with top -P).


The command was;

netperf -H 192.168.88.1 -L 192.168.88.2 -t TCP_STREAM -l 300 -f  m

The results show no difference between multi-processor and single processor results. In either test, I saw 941.49 Megabits/sec, regardless of single or multiple CPU's.

Which means that there are some serious optimizations for working between 2 network cards in the same physical machine, as seen in my previous tests. The same tests run from one internal NIC to the other result in 4500 - 9000 MBits/sec depending on if it's all through one CPU or across multiple CPU's.

I obviously need to spend more time understanding this, as it would make a large difference when you're considering a firewall application, or anything that needs to move a lot of data between interfaces.

I welcome any insights..

Danger with Dell's H700 RAID card and FreeBSD 8.1-9.0

My new Dell PowerEdge T710 came with a fairly decent Dell PERC RAID card - the H700 with 512M of battery backed cache. It's a LSI 2108 based solution, and was providing far better performance than their older PERC5 or PERC6 solutions. It is even 6.0 G/s.

However - It's still a Dell RAID card, which are known for their slow speed. What do you expect, they need to protect their profit margins. :-)

Slow speed was something I was prepared to deal with - but that wasn't my issue; Specifically, I could never get a stable system using the mirrors I made with the H700.

It would be between a day or a few hours, and I'd start to see device timeout errors popping up on the console for one of the RAID mirror devices. I knew something was up, because my NFS connection to this box would die.

Oddly enough, the FreeBSD system was up, responsive, and I could browse the directories. I could not for the life of me get ESX to see my NFS shares on this box anymore. It didn't matter what services I restarted, it was down.

A reboot, and everything would be fine, until the next set of errors on the console.  Very frustrating as you can imagine.

After some research, I've found that the LSI drivers are not the best in FreeBSD due to corporate disclosure issues. I found this to be a shame, as I've loved LSI products for years in a Windows environment, and have even used their older chipsets without issue in FreeBSD as well.

If LSI would work with the open source community a bit more, maybe we'd be have a stable driver for it. At this stage, I'd say it's not a good idea in a production FreeBSD box.

So I needed a new brand to depend on, and this time around I was going to choose a company that had solid FreeBSD support.

Research returned good opinions about the Areca brand controllers. Not only are they often on the top of the benchmarks between other brands of RAID card, but they have native FreeBSD drivers - Source or compiled kernel lodable.

I dove in a purchased an Areca1880ix-12, shown below.
It may be a bit of overkill, but this thing can be expanded to 4 gig of RAM, has a dedicated RJ-45 port in the back for full web management, and it's currently on the top of the SAS 6.0 RAID HBA speed charts.

I also really like the 4 SAS 6.0 ports. I'll explain why in a future blog article when I detail my SAN build.


After a month of running this card, I've yet to have an odd panic, disk issue, or other crash with my SAN. It's rock solid with no complaints.

It's been so long, that I don't even have the logs of the original error message that the H700 would throw.. sorry.. but you'll see it if you hook one up with FreeBSD. :-)

If anyone gets a H700 running cleanly with a RAID mirror, let me know. I still have my H700, it's destined for ebay or possibly a spare for a Windows only box.

Friday, December 3, 2010

Netperf and SMP - Oddness (Part 1 of 2)

Netperf seems to still be a fairly standard network performance tool. I see iperf out there as well, and interestingly enough, it generates very different numbers from netperf.

I've decided to go with netperf for my benchmarking needs, and have started running some simple tests with it to become more familiar with it's operation.

The first oddness I notice is with SMP. I get wildly different results depending on if multiple CPU's are used.

Let's start with what I'm running on: FreeBSD 8.1 AMD64 on a Dell 1850 PowerEdge, with 2 Xeon 5340 CPU. These machines have 2 Intel 1000MT NICs built into the board, which I have given 192.168.88.1 and 192.68.88.2 to. They run the em0 driver.  I've connected them together with a cross-over cable.

FreeBSD is setup as stock, installed from the CD, no changes.

I'm using cpuset to drive the applications to one CPU or the other. Here's what I'm executing;

cpuset -c -l 0 netserver -L 192.168.88.1 -n 2

cpuset -c -l 0 netperf -H 192.168.88.1 -L 192.168.88.2 -t TCP_STREAM -l 300

By specifying the different IP's, I'm forcing the data to move from one NIC to the other. It's running IPv4, not IPv6.

This combination drives both the receive server (netserver) and the test program (netperf) from the same CPU. If I want to make them run on different CPU's I'd change one of the -l 0's to -l 1. If I want to leave it up to the kernel to schedule, I leave out the cpuset command entirely.

All hyperthreading is turned off. These are two standalone CPU's.

Here's what I'm getting, expressed in GigaBytes per Sec

Same CPU:  1.05 GB/sec
Different CPU: 0.47 GB/sec
No Preference: 0.88 GB/sec.



Very interesting..

We're looking at 1/2 speed when we run it on different CPU's. When we don't set a preference for the CPU, it will flip-flop between the two, sometimes both on one, other times separate. The speed for the no preference is almost exactly a split of the single CPU and dual CPU speeds.

I've researched this online, and found a few other people mentioning similar issues, but the threads never come to a conclusion.

There are two reasons I can think of this wide spread between single and dual CPU speeds;

1) Because all the work is happening on one CPU, there is some sort of cache/memory/buffer combining that allows for a faster transfer of data on the PCI bus. Maybe the data isn't transferring - but I do see the little link lights blinking away furiously when I run the tests.

2) There is significant overhead between the processors for SMP.


I do have a second identical PowerEdge 1850 that I plan on bringing into this equation shortly to try and figure out where this is coming from.  By sending to a separate machine, I'm going to eliminate the possibility that the CPU is combining something.

However, if you're looking to make a firewall run quickly, it looks at from this first small test that a single CPU firewall will outperform a dual.  That's an early conclusion, and I'll post more shortly when I know more.

If anyone has more info on this, that would be great.

Continued here..


http://christopher-technicalmusings.blogspot.com/2010/12/netperf-and-smp-oddness-part-1-of-2.html

Thursday, December 2, 2010

Switching to OpenBSD from FreeBSD for the new pf syntax

There is a new syntax available for pf in OpenBSD 4.7 and 4.8 that is quite interesting. You can read a bit more about it here;

http://marc.info/?l=openbsd-misc&m=125181847818600

The item that has me the most interested is the new NAT featureset. They've changed it so when you do NAT in the firewall rules, it appears to change the address on the fly to the new IP.

This makes matching rules further down in the ruleset more interesting, and in my mind, clearer, because as you run through the ruleset, you're not going to be concerned with pass rules for both the WAN and LAN addresses - The LAN address will become the WAN address, so there really is just the one rule.


It looks like it will be a while before FreeBSD picks up the new syntax. Currently plans are to update FreeBSD-9.0 to OpenBSD 4.5's version of pf. If you want to play with it, you'll need OpenBSD for now.

Since I'm designing a pretty hefty dual redundant firewall from scratch, complete with ALTQ, pfsync, OpenVPN, load balance, fail over, and some monitoring tools, I'm firing up a OpenBSD 4.8 box now to check it out, and see if it's really as good as it seems.

BTW, here is a link to a conversion script that should help you connvert to the new format:

http://jim-code-rand.blogspot.com/2010/05/openbsd-47-release-pfconf-conversion.html


I'll report back as I make progress. Since I have 2 identical Xeon machines to act as a firewall, I may have a chance to do a small performance test between OpenBSD and FreeBSD. I'm not sure where the advantage will be. I have a lot of faith in FreeBSD, but I also know that ALTQ and pf is a port in FreeBSD, where in OpenBSD they are built in, and have a few more features.


Time will tell.

If anyone else has recently made the switch, I'd love to hear about it..

Thursday, October 7, 2010

Porting ICC to FreeBSD

I'm intrigued at the possibility for more speed out of FreeBSD by using Intel's C compiler. It's demonstrated some impressive benchmarks against gcc in the past, and we all know the version of gcc (4.2.1) that is included in FreeBSD 8.x (due to GPL license issues) is quite old, and doesn't properly recognize a Core2 let alone an i7 or Atom processor.

Here's my posting to the freesbsd-ports mailing list;

--

My first stage on this will be feasibility. I'm a pretty busy guy, so I don't want to waste time on a task that I won't complete. I'll either do it all, or I won't bother. It should take me a few weeks to get enough of a feel for things to make that call.

Fist stage is working with ICC on Linux to make sure I understand how it behaves. I don't think it will matter which distro too much, but if anyone has a suggestion, I'm open to it. I'll be able to do some basic comparisons with a newer GCC at the same time... or perhaps I'll go back in time to an older distro that has GCC 4.2.1 for a closer comparison to what FreeBSD has.

I'll be focusing on 64 bit FreeBSD only, and I'll work it from the 8.1 base. 9 is too much of a moving target.

At this stage, I'm not sure if I should bother with kernel or not. Unfortunately, unless someone has benchmarks showing the difference a FreeBSD kernel/world compiled with ICC is on a modern Intel Xeon or i7, I really don't know if it's worth it or not. I'll probably have to do it to see if it's worth it.

Personally, I'm interested in anything that can help make my ZFS installations move faster. I'm building them with 96+ gigs of RAM for L2ARC, so memory alloc, etc are important, along with checksum and compression calculations - Items that I know are improved in newer Intel chips. Is GCC/LLVM able to access them as well as ICC? Does it matter to real world speed if I can get 20% more out of them? We'll see.

My processor will probably be an i7, ultimately I can probably pull a spare production X5660 for a few days worth of testing later on.

I feel a modern processor is necessary so it's new features can be properly exploited. I expect that there should be less of a difference between ICC and GCC on older base P4 type architecture, but once again, that's something that I can test and report on.

Any comments or suggestions are welcome.

Thursday, September 16, 2010

ESXi 4.0.0 vs 4.1 Speed - 4.1 Not as fast

I've just upgraded my 4.0.0 ESXi box to 4.1.

...Well, lets not call it upgrading, the process I used was to pull my existing boot drive, install a new 8 gig  USB stick inside my Dell T710, and install a fresh copy of the Dell Installable 4.1 ESXi. This results in a fresh copy, and all I have to do is reattach my storage, import my .VMX files, and I'm back in business.

I once had a bit of a mess with the upgrade process from 3.0 to 3.5, so I settled on this method as being he safest. I even pull my drives when I do the upgrade, so there is no chance the installer will become confused and format my main datastore. Am I paranoid? It's happened before, and you only have to burn me once..

The process isn't that much more time consuming than an in-place upgrade. When your ESXi files are all on one drive/flash stick, and your datastores are all on another, you've got plenty of flexibility. I was able to do it within 30 minutes, because I really don't edit my ESXi configuration that much from default.

 I'm very interested in 4.1 because it has some neat power management interfaces. Look here, I can now track my wattage being burned on this server:


This is for a Dell T710 with 24 gig DDR3 memory, 2 Xeon 5660's,  4 600Gig SAS drives, and 4 1TB nSAS drives. I've yet to check how accurate this is with a KilloWatt or similar meter, but this sounds about right.


I'm also interested in memory compression - When you building redundancy by having two ESX servers, you need to have very similar configurations for processor / memory on your backup server, or you take a large performance hit. That's not always easy to budget for. If I can get away with my Exchange servers still running for a couple of hours on a server that is only 1/2 or 1/4 the memory of my main ESX, then I'll be happy. It's got to be quicker to compress memory than to swap it to disk - We'll see. I'll be testing that later on.

Service Console is now no longer an unsupported hack.

But really, the bit that gets me very interested the most is that vmware is now putting it's full weight behind ESXi - There won't be any more ESX! And to make this transition easier, you can now access vMotion from an ESXi 4.1 server. You still need licenses, but now it's nearly $10k cheaper to access this technology.

With my Ghetto-SAN comming online any week now, I'm very excited about this development.

BUT - There seems to be a but of a trade-off for the new things that 4.1 brings - It's just a tad slower than 4.0.0.

Once again, I do quick and dirty benchmarks to get a feel for things - Performance Test 6.1 isn't the best tool, but it's quick and makes for nice easy graphs to compare. After pouring through reams of iozone stats in Excel, I sometimes like quick, easy, and pretty.

My process was simple: Take a Performance Meter tests from a running 2008 R2 server before the upgrade to 4.1, and one after (with the new vmware tools loaded).

While 4.1 had better graphic performance (who cares!), it was around 2% slower for memory and CPU performance.

That's a small price to pay for new features, and I'm hoping it's just a result of new technology being focused on stability first, performance second.

Anyone else run benchmarks that can confirm or deny this?

Tuesday, September 14, 2010

ZFS and NFS performance, with ZIL disable and/or flushcache

I'm building my new FreeBSD 8.1 SAN, and one of the first tasks is trying to pull decent NFS performance from the box.

As you may be aware, NFS and ZFS don't mix well. It's because NFS is asking for a flush of the ZFS ZIL after each write, which is incredibly slow. It destroys all caching or other speed enhancements that ZFS can bring to the table.

The quick and dirty response is to disable the ZIL. While this won't lead to corruption, it's removing one of ZFS's protection mechanisms, and as I'm building a very large SAN, I really don't want to hamper ZFS's ability to save my ass on data corruption issues.

This is basically opening NFS in async mode, which is what iSCSI does. I want iSCSI performance with the stability of NFS sync.

I thought I'd play around with disabling the ZIL (makes me nervous), using  loander.conf commands, and adding a SSD as the ZIL.


NFS Tests on a ZFS RAID10

Notes:
All numbers are MB/sec, tests run twice (thus 2 numbers) then rebooted.
Tests are from PerformanceTest 6.1, which is easy for quick-n-dirty testing.

Tests are on a Windows Server 2003 32 Bit, going to FreeBSD 8.1, but who cares? They are only valid as a comparison on my own machine.  

Without ZIL
Fileserver: 64.25, 65.97 MB/sec
Workstation:  9.52, 12.99
Database: 56.31, 56.98




Decent speed for a RAID-10 ZFS on SATA drives without any tweaking. Beats my C: drive which is a SAS RAID-10 under ESXi, and is around the same speed as I was getting from iSCSI

With ZIL
Fileserver:  8.37, 6.53  
Workstation: 2.51
Database:

Basically: Much, Much slower.  I gave up after a few tests, as it was so tedious I didn’t want to continue. I did these tests before, so I know it’s slower across the board. At this speed iSCSI kicks NFS's ass. 


NOTE: I did try with ZIL and vfs.zfs.cache_flush_disable=1 , but the speed is basically just as bad. Besides, why wouln't you want your ZIL on a SSD?


With ZIL on Intel X25-M SSD (32 Gig)
Fileserver: 61.38, 62.08
Workstation: 8.05, 7.66
Database:  23.07, 23.05


Hmm,this is faster. I wouldn't be too unhappy with this type of performance. Database still suffers though.


With ZIL on Intel X25-M SSD (32 Gig), vfs.zfs.cache_flush_disable=1
Fileserver 54.69, 62.57
Workstation 12.43, 9.54
Database  54.2, 54.69

Hey - That's pretty good. Just a tiny tad under ZIL-less operation. 


Notes for SSD tests: The SSD as ZIL stayed around 50% busy. The ZIL does work.

So all we have to do is make ZFS lie and say it flushes the cache when it doesn't. Editer your /boot/loder.conf to include the vfs.zfs.cache_flush_disable=1 command, and you're off and running.

I believe this is an enhancement in newer ZFS pools anyway, so I'm really not too worried about it. If it's on the ZIL, why do we need to flush it to the drive? A crash at this point will still have the transactions recorded on the ZIL, so we're not losing anything.

BTW - It looks like ZFS v23 is comming to FreeBSD sooner than we expected - So this may all be moot, as it's included around v18 I seem to recall. 

Final thoughts: Never, never run a ZIL that isn't mirrored. It dies, lots of bad things happen... although I was able to shut down the system, turn off my ZIL in loader.conf, and boot without the SSD, so I think you could recover.. I'll be testing how nasty things are with a destroyed ZIL during transactions tomorrow.

Tuesday, September 7, 2010

FreeBSD, pf, and fwanalog

I think I've fixed this - if anyone was trying to run it and getting nonsense. There were some issues with the tcpdump date format, a couple small bugs, etc.

I'll eventually post a patch, but for now just ask if you're having the same problems, I'll send you the updated fwanalog script.

Tuesday, August 24, 2010

OpenSolaris: So close...

I posted earlier of my love for ZFS Dedup, and how I was experimenting with OpenSolaris as a new base-unix instead of FreeBSD, because FreeBSD is so far behind in the ZFS ports.

Well, I'm switching back. It's been a few months, and we still don't see any new OpenSolaris code. Will Oracle release a new OpenSolaris? There's talk about it being dumped. There's too many questions, so I have decided to side-line and wait - so I'm back to FreeBSD until I see an OpenSolaris 2010 - But it's quite possible we'll have FreeBSD Dedup before we have another solid OpenSolaris release.

Quick and Dirty: FreeBSD 8.1 AHCI vs ATA (CAM)

While building and testing what will be a 24 drive SAN running FreeBSD 8.1, ZFS, and NFS/iSCSI I discovered a problem with hot-swapping the SATA drives.

It seems FreeBSD 8.1 AMD64 was running the SATA drives in ATA mode, so they really didn't hot-swap. If I pulled a drive and reinserted, I couldn't make it understand that the drive was back, even if I fooled with atacontrol's attach and detach. I had to reboot, which is no way to run a SAN, no mater how ghetto it may be.

A bit of poking showed that while my BIOS was set to AHCI (Intel ICH7R chipset), FreeBSD was still running in ATA mode. (type 'atacontrol list' and if you see drives, you are too). camcontrol is the program you use once in CAM mode.

The answer was to put ahci_enable="YES" in /boot/loader.conf.

This changes your drives to ada's from ad's which causes a boot problem, but that's easily fixed.

I did a quick a dirty test with 'raidtest' to show the speed difference. Here's my raidtest creation command:

raidtest genfile -s 128 -S 512 -n 50000

Here's the results:

With CAM: (ahci_enable="YES" in /boot/loader.conf)

iscsi# raidtest test -d /dev/zvol/tank/vol
Read 50000 requests from raidtest.data.
Number of READ requests: 24831.
Number of WRITE requests: 25169.
Number of bytes to transmit: 3286713344.
Number of processes: 1.
Bytes per second: 37162958
Requests per second: 565


Without CAM

iscsi# raidtest test -d /dev/zvol/tank/vol
Read 50000 requests from raidtest.data.
Number of READ requests: 24831.
Number of WRITE requests: 25169.
Number of bytes to transmit: 3286713344.
Number of processes: 1.
Bytes per second: 6069384
Requests per second: 92



Big difference eh? It's night and day. It then lead to some ZFS issues when I turned CAM on, because ZFS likes to hit write/read stalls when you move a lot of data, but I cleared that up with some further ZFS tweaks that I will detail another day.

If you're not running with ahci_enable="yes" in your loader.conf, you may want to look at enabling it.

I"ll be doing more tests over the new few days on a few different FreeBSD machines, and we'll see what the general results are.

Monday, August 2, 2010

ZFS on FreeBSD 8.1

I've been using ZFS on my FreeBSD box for some time now, and it's been serving out gigs and gigs of data faithfully for some time.

I am looking into OpenSolaris, but with the slow (or never) release of 2010.x I'm still fully a FreeBSD shop at this stage for Unix.

I've just upgraded to 8.1 Release a few days ago, and I'm liking the new release - I took the opportunity to also upgrade my Samba to 3.4.8 at the same time, so the system seems perkier. I didn't have the time to do any real speed tests, so take that for what it's worth.

BUT - I've been getting some "kmem_map to small" panics when I tried to copy down a 160 gig file. I can't recall if I've ever moved such a large file before, so I don't think this is an 8.1 issue.

Some digging around and I found this information on the FreeBSD mailing list that I wanted to pass along.


From: Steve Polyack
Subject: Re: Freebsd 8.0 kmem map too small
Date: Wednesday, May 5, 2010 - 7:46 am


On a system here with 8GB of RAM and a very large zpool consisting of
multiple zdevs, little tuning was needed. The system is running
8-STABLE(amd64) as of a month or two ago. The only things I have set in
/boot/loader.conf are:
vm.kmem_size="12G"
vfs.zfs.arc_max="4G"

Setting kmem_size to 12G came from a combination of recommendations I
saw in various mailing list posts (you can probably dig them up).
Setting it to the physical memory size was the initial recommendation,
while others recommended 1.5x physical memory size to help prevent
fragmentation/wasted space in kmem.

Regardless, this has served us quite well for the ~6 months the system
has been in use. It has never crashed, even under intensive
multi-threaded benchmarking.


Other people recommend vm.kmem_size to be 1/2 of available RAM, and then vsf.zfs.arc_max to be 512M less than that, but that doesn't make sense to me - What's the point of RAM if you can't use it? This box only runs ZFS and Samba, let these things have some memory.

I've now set those same values as Steve on my FreeBSD system in question is running 8 gigs as well. I'm going to start my 160 gig copy again (which BTW, I get ~34 Meg/Sec according to Windows 7)

If you don't hear back from me, it's working well for me. :-)

Wednesday, June 9, 2010

ZFS Dedeuplication

Oh my god - Is anyone else as excited about this as I am?

Are you thinking of a SAN running a massive ZFS raidz2 in deduplication mode?

I've already built a 6tb raidz2 array last year, and I'm finding I need more space - next shot will be for 12 or 16 tb, and I'd love to use dedup to help me out.

I'm so curious about it, I'm considering switching to Solaris as my disk host. Normally I'd use FreeBSD, but their ZFS is too far behind (version 13, Sun's got Version 22) to support dedup in the next 2 years.

I'll be posting some test results soon.. just experimenting with some 4 gig .OST files under a ZFS dedup scenario to get a feel for compression.

Early tests with straight MP3's showed 23% compression. These are very, very promising times.

Here's a few links you can drool over until you get the time to do something, or I post my results:

http://blogs.sun.com/jsavit/entry/deduplication_now_in_zfs

http://blogs.sun.com/bonwick/entry/zfs_dedup

http://en.wikipedia.org/wiki/Data_deduplication

Sunday, January 10, 2010

ConnectWise - Taking the Plunge

Okay, after a little more than a year on CommitCRM as our main CRM/PSA package, I've taken the plunge and bought ConnectWise.

CW was never really considered when I was looking at CommitCRM and other packages last year - it was just too much (My current 4 user license with my features that I want is $15k USD). CommitCRM was just $3k, and it did everything that I needed last year.

CommitCRM is still a great little package - but that's just it - it's a little package. It will continue to grow as it's developed, but I need more (like project management) and I need it now, so it's time to switch.

I was considering TigerPaw for a while, and almost bought - but CW has more abilities that I needed - and you pay for them. TigerPaw was cheaper by nearly half.

In the end, the ability to work efficiently is worth more to me than holding on to the $$. We can make so much more if we have focused and properly utilized staff.

It's taking some getting used to - CommitCRM is a small, tight, well written program. It doesn't require much in the way of power to run, and it's client auto-installs when you run - All very easy.

ConnectWise is the exact opposite - It's a full Microsoft .NET development program - Requiring full MS SQL server, a dedicated Windows server, and the client runs via a .NET install (means you better have some workstation power).

I added a $14k Dell server to my CW cost, so I'm now up to nearly $30k for this package. You need a honking dedicated server, because they won't let you run it on anything else.

I'll write more about ConnectWise as I learn - I'm still in the initial training and getting used to the program - It's very complex, and it's taken me nearly a week to get to the stage where I can dribble out an invoice or two.