Disclaimer: We are a Nimble Storage partner. At the time of this writing we are not a Silver Peak partner.
Summary – Our environment received a 60% data reduction when using Silver Peak to Optimize a Nimble Storage to Nimble Storage replication traffic. In addition our low latency(10ms) 100Mbs WAN could not push more than 60Mbs of replication traffic without Silver Peak. With Silver Peak can now push 80Mbs.
In researching SIlver Peak, much like my experience researching Nimble Storage last year, I found an absence of real world data on the internet that isn’t silver peak literature. Accordingly I thought I would share the results in hopes that others find this valuable. I also like to share good technology when I find it.
Much like the rest of the IT world I thought there was only one real player in WAN optimization (Riverbed) and a bunch of smaller players that all did the same thing. Silver Peak is one of those smaller players. I had heard of silver peak prior but didn’t know much about them and thanks to Riverbed’s marketing had thought no one else was important in this space.
My prior experience with WAN optimization had been with Riverbed and the Juniper WX product (use to be peribit) that no longer exists. There were three things about the silver peak story that resonated with me compared to my prior experiences with Riverbed and Juniper WX.
- Software not hardware…read they specialize in virtualization(software) solutions and provide an OVF for install. This is their recommended deployment model rather than a supplement to their hardware.
- Simple…they optimize at layer 3 and for the most part are application unaware. This means much simpler configuration and a “set it and forget it” mindset.
- Focused, they only do WAN optimization. Riverbed’s current story, Post IPO, seems a little “public market pressure to grow combined meets marketing spin” i.e. all the acquisitions/new products SOUND like they are related but then once you really think about the new riverbed technologies and their interrelationship it feels like riverbed might be trying to fit a square peg in a round hole. To give a perfect example how is running a virtual machine on top of a riverbed hardware platform related to WAN optimization. If I purchase riverbed just for WAN optimization do I really want riverbed’s R&D budget going to things like this. I won’t even mention how confusing their product that is now called Granite is?
My goal was simple, if I could bring up two silver peak virtual appliances then I could route my existing nimble storage replication traffic though the silver peak and measure before and after.
Our environment consists of
- 20TB of storage used on a Nimble cs460 X2 at our datacenter replicating to a cs260 at our office. Keep in mind this is compressed data as Nimble natively compresses everything it stores.
- The datacenter has a 1Gbs internet connection billed at the 95th percentile
- Our office has a fixed 100Mbs Fiber Internet connection
- Latency between the datacenter and our office is 10-15 ms.
- We used the VRX-4 with 32GB of ram.
- The nimble storage CS460 X-2 is running about 130 Virtual Machines on vmware. The vm’s are all windows vm’s and a mixed of file, app, sql and xenapp.
I used the following methodology
- Verify an increase in data sent by the Nimble Storage
- Verify data reduction at the Silver Peak appliance
- Verify overall bandwidth reduction at the firewalls
- Verify reduction in costs based on the 95 Percentile and/or look at possible ROI’s.
Putting more data on the wire
The first step was to verify an increase in replication from the Nimble’s point of view. This is an easy stat to get from the Nimble GUI. The first graph shows a 7 day view with the red arrow pointing to when we started the optimization. You can see the higher peaks meaning more throughput when the link is optimized by silverpeak. The second graph shows a 60 minute view and shows that the nimble is consistently sending more than 100Mbs which is the bottleneck on the far end. I don’t have the graph but there are bursts I have seen on a 5 minute view that shows we can burst upto 300Mbs.
KEY TAKE AWAY: The nimble storage is putting between 100Mbs and 300Mbs on the wire which is much more than the far side internet connection. This means optimization is working!
Reducing Data on the WAN
I also want to see how silverpeak views the world. The first graph is from the Silverpeak GUI and corresponds to the same 60 min time period. You can see we are getting more than 1.0GB of data transferred a minute. This works out to be (1GB * 8bits * 1024Mega)/60 seconds = 136Mbs said another way the average replication rate for that 60 seconds is 136Mbs.
KEY TAKE AWAY: We see between 600MB a minute and 2GB a minute which effectively works out to be between 80 Mbs to 273 Mbs. We are typically seeing 50GB+ in an hour widow in which replication is happening for that whole window.
This next set of graphs is also from the SIlver Peak’s perspective for the same time period. I circled the two most relevant points. The latency and packet loss graphs are also of interest but not the focus of this case study. The main point is that the LAN was sending ~150Mbs and the WAN sending slightly more than 50Mbs during replication. For the 60 minute time window of the graph the nimble sent a total of 18GB while the WAN only send 6GB. That’s a 60% reduction in WAN bandwidth used. This 60 minute view is representative of what we have been experiencing with silverpeak and nimble, i.e. ~ 60% reduction in WAN bandwidth. What’s amazing about this is that the Nimble data is already compressed as nimble storage compresses all data natively… implying that this reduction is not due to compression but other data reduction techniques. To get to this level of data reduction we did need to make two optimizations to the silver peak platform. We added more memory to the VRX-4’s. We set the memory on the virtual machine to 32GB rather than the 8 they start with. In addition we changed the bandwidth optimization policy to Maximize Reduction rather than Minimize Latency(which is suppose to be better for real time traffic).
KEY TAKEAWAY – The Silver Peak software is providing 60% Data reduction for the Nimble Replication Traffic.
FINDING ANOTHER WITNESS – AKA OLD RELIABLE PALO ALTO
The other goal was to demonstrate a reduction in used bandwidth from the firewall’s perspective. To accomplish this I used the data from our monitoring tool(LogicMonitor) which provides me with datapoints every 5 minutes. To smooth out the data I used a 50 datapoint window, i.e. about 4 hours of data averaged together, to provide a graph that’s easier to interpret. I used excel to handle the smoothing process which really just takes an average of the 50 prior datapoints. The red arrow on the left side of the graph shows when the Silver peak was installed. The bandwidth used prior to silver peak averaged 30-40Mbs but after silver peak it averages between 10-20 Mbs. The second red line, i.e. the red line on the right, is during a powerage that haulted the far side silverpeak appliance. Interestingly enough we had a 13 hour power outage at our office which is why there is a peak one the graph representing consistent replication once the power came back on in order to catch up.
You need to keep in mind that since this is a smoothed graph you can not draw any conclusions except that on average the WAN uses less bandwidth when optimized with Silver Peak…this is an important point especially when we talk about the 95th percentile. To demonstrate this difference between a smoothed results and a raw results I included the graph from our monitoring tool over the same period. You can see that the raw data from the firewall without smoothing makes it very hard to understand the average impact of bandwidth used yet the smooth graph makes it hard to understand the peak utilization. On the raw data graph you can actually see an increase in max bandwidth used due to optimization….BUT we use that extra bandwidth much less -> resulting in a lower average utilization. I can’t emphasis understanding this point enough as it is ultimately one of the main value propostions of WAN optimization. Said another way the 100Mbs internet line can only sustain 60Mbs without WAN optimization but with WAN optimization it can get to 80Mbs. This functionality in WAN optimization lets you get the most out of your internet line but doesn’t reduce the data actually on the line by a material amount…you could turn off data reduction completely on the Silver Peak WAN optimization and still this type of utilization increase. Whats more is that the greater the latency the less ability to utilize the internet pipe, i.e. if our latency moved from 10 MS to 50MS our bandwidth without optimization would be materially less than 60Mbs. The second function that WAN optimization provides is data reduction, this is what allows the bandwidth on the firewall to be lower on average over a 4 hour period.
KEY TAKE AWAY – The Silver Peak Optimizations provides Greater WAN usage while replicating and less WAN usage given the same amount of data to transfer.
Reducing my Datacenter Internet Usage Bill
The last item I wanted to address is 95th percentile and ensuring my bill from the datacenter actually is reduced. This is a bit more complicated than originally intended and I haven’t had to chance to fully validate. The reality is this is probably best addressed by either; 1) applying rate limiting to either the nimble storage replication although doing so limits the LAN not the WAN speeds as the nimble storage device is blind to the data reduction going on or 2) setting up bandwidth limitations on the tunnel in silver peak. I think without these things my bill will actually go up as the data points thrown out by the 95th percentile may not be all of my highest usage points.
However I am weighing the possibility of saving money with the extra value obtained by having replication jobs finish much quicker. In our cloud services model we do have steady state, i.e. normal churn in data, for our existing customers but we also have the new data that bringing in a new customer adds to the system. Historically this new data puts the other customers offsite replication at risk, i.e. bring on 1 TB of new data and all my customers must wait for the first 1TB of replication to finish. It might be that the extra value provided by always having my replication up to date is worth more than saving money through less bandwidth. I think that this concept, i.e. ROI from a WAN replication device like silver peak v.s. the value from increased performance along with an easier to hit RPOs/RTOs is one that a lot of IT shops struggle with. Said another way, is the value from WAN optimization obtain through cost reduction(not having to buy more bandwidth) or performance gains(Lower RPO/RTO, better user experience)? In addition to ROI, you can also make the TCO argument, i.e. less time that your IT staff has to spend managing and worrying about WAN performance. For my company in particular I like the TCO and feature benefit argument as now my CTO lose sleep over other things NOT WAN Repilcation or bandwidth management! Ultimately cost/benefit justification is the largest hurdle for the WAN Optimization industry as conventional wisdom suggests that the path of least resistance is what people will normally choice, i.e. simply throwing bandwidth at the problem. There are times that this strategy will yield a better result than the current bandwidth…. but there are also many times, especially with higher latency links, where this mindset will not address your soft costs problems(your best IT people wasting time thinking about WANs) and will only marginally allow you better use of the link.