Why didn’t you call? …

Posted in Advisor - Neil, Application Environments, General, Platforms, Storage Applications, Storage Interconnects & RAID, Storage Management by Neil

Can’t remember exactly what that line is from (possibly a TV ad from my younger days), but it basically tells a story in the RAID industry. I’m continually getting customers (a) telling me what they are doing, then (b) ending up asking me how to put it all together because “they are not really experts at this” etc. Considering the fact that normally these people are doing something pretty bizarre or high-end, this is somewhat strange.

So …

Basic tips for looking at how to build a system. These are very, very generic, but there are some truisms to these, with one basic proviso – IF YOU ARE NOT 100% SURE OF WHAT YOU ARE DOING – CALL YOUR RAID VENDOR (in this case Adaptec of course).

Capacity – performance – redundancy (paranoia) – cost

These are the basic building blocks of storage – and you can’t have them all. Of course if you disregard cost you can get whatever you want in this world, but there are not many customers who actually mean it when they say “we are not worried about the cost, we just want the best” etc (they pretty quickly change their tune), so I’ll base everything below with a strong eye towards the “cost” of the system).

Capacity – it’s pretty easy to get big storage these days – 3tb drives are commonplace, but keep an eye on performance while you are calculating this variable. A general rule of thumb would be that the more spindles you have working for you, the quicker the box will be. Eg – 4 drives in a RAID 10 will be quicker than 2 drives in a mirror. So keeping cost down too much can often hurt your performance, though it generally doesn’t hurt your redundancy.

So think carefully about capacity, but don’t let it be the only consideration.

Performance – what exactly are you doing with your data, and what sort of data is it? Notice that we haven’t even bothered looking at hardware yet – the data type will determine most of the configuration. This can be broken down into basically streaming and random data – streaming can be in either direction (capturing or delivering content) and random is generally read (with a lesser degree of write – eg database).

So if you are doing streaming data then you will go for a parity raid (5, 6, 50, 60) as the parity calculations won’t hurt your performance and you’ll get a lot more capacity for your dollar. If on the other hand you are doing random data like a database, then you’ll be more interested in a non-parity raid such as 1 or 10.

Streaming data works fine on SATA drives (7200rpm), but random data (database) tends to work much better on SAS (10K or 15k drives). Of course SSD comes into play here but the cost generally wipes them out for anything except small specialised installations.

So if you are doing streaming data you are probably looking at SATA drives in a RAID 5, 6 or 50, while if you are doing database you are probably looking at SAS drives in a RAID 10, 1E or 1 (in that order please).

Of course, after all that, you need to work out which card can give you the raid levels and drive connections you are looking for, but that becomes the easy part.

So … back the heading … if you’re not sure, call :-)

Ciao
Neil

So how do you put that together?

Posted in General by Neil

Had a call from a customer who had already purchased a bunch of hardware and wanted to know how to put it together. While this is not my ideal way to go about things (generally I like to talk to people “before” they spend any money), the deed had been done so I needed to explain my best thoughts on how to put this all together.

The explanation was not going well. I wanted the customer to create a hybrid raid, and my explanation just wasn’t sinking in. I thought to myself – I’ve written something about this somewhere … now where did I put those files?

It finally occurred to me that I wrote some stuff for marketing who put it all in a whitepaper and stuck it on our web. So I took the customer past ET (see if you can tell me what I mean by that), and through to the “new whitepaper” on hybrid raid on our website.

I completely ignored the first page (because I didn’t write that) and moved to the following pages and their neat little diagrams of how to mix and match SSD and spinning media to get the best combinations of performance, capacity, cost etc. Worked like a treat. As the old saying goes, a picture is worth a thousand words.

So take a look at the whitepaper – I’m not going to give you the link because I want you to work out my ET joke (if in fact you are old enough).

Ciao
Neil

Just how fast can things go?

Posted in Advisor - Neil, Application Environments, General, Platforms, Storage Applications, Storage Interconnects & RAID, Storage Management by Neil

I spent some time last month at Intel’s IDF in Beijing. Doing the meet and greet with all the great Chinese customers, resellers, integrators etc. They really are a nice bunch of people and I love Beijing at that time of year (it’s considerably warmer than December).

On the Adaptec stand we had all of our current products, and something a little special. A PCIe 3 prototype product demonstrating PMC’s SRCv chip: 24-native SAS ports, PCIe Gen 3.0 etc etc. We borrowed 16 SATA SSD’s (Pulsar XT) from Seagate, scrounged a server from our friends at Karmen (or should I say my good friends at Karmen did the scrounging for me) – and we cobbled the whole thing together in 20 minutes just (and I mean just) before the show.

A simple raid0 across the 16 drives using all defaults (256kb stripe size etc). slapped Windows 2008 R2 SP1 on it, formatted the disk and ran up iometer. I like to do this because it shows real world speed – yes I can get better performance running on a raw disk but that’s just hocus pocus when it comes to numbers.

So … slap the whole thing together, run up iometer on a 100% 256kb sequential read – simple. Then sit back and watch whether this is in fact going to work at all. Holy suffering catfish Batman – 6030mb per second read speed. Blink, double-check, run iometer again – yep, 6030mb per second.

Is this a secret? The product running the whole show certainly is. The chip powering it is not (it’s a publicly released product) and the results – well to every person who picked their jaw up off the ground and stopped to question us intently about just what we were doing – no it’s not a secret.

It’s just one hell of a performance number.

Ciao
Neil

Will this tin can work with my raid card?

Posted in Advisor - Neil, Application Environments, General, Platforms, Storage Applications, Storage Interconnects & RAID, Storage Management by Neil

The world has gone more than a little crazy in the hard drive department these days. Unless you are a major OEM you are probably having difficulty getting hold of your preferred hard drives to build your server. Nearline Enterprise SATA seem especially hard to come by.

This has resulted in many people using just about anything they can lay their hands on to connect to their RAID cards. Problem is that while hard drives have become hard to find, RAID cards haven’t changed. The same old rules apply about connecting desktop drives to RAID cards – they are not designed for such, the drive vendors don’t support them in those environments and you are going to get problems down the track.

I can see a busy time for tech support teams coming up in a couple of years time when servers being built today start to age, and all the fun and games of older desktop drives on RAID cards start to surface. Maybe that’s why I decided it was time to go and do a management course at night :-)

Ciao
Neil

The dangers of performance testing …

Posted in Advisor - Neil, Application Environments, General, Platforms, Storage Applications, Storage Interconnects & RAID, Storage Management by Neil

I spend a lot of my time talking to people about RAID performance numbers. Almost everyone as their own bent on this subject, and many organisations have specific tests they run against hardware to “check” performance.

The real problem is … what exactly do the numbers mean and do they relate to your real data. An example comes to mind when I sat with a customer in a foreign land using a Linux performance testing tool that I was not particularly familiar with. The customer has an absolute bent on IOPs, and was very keen to see the performance of our card under the single particular performance metric test they commonly use.

The answer: 1.9 million IOPs

I smiled and should have left there and then. That’s possibly the highest number of IOPs any machine has ever produced, let alone a single RAID card. I could sell a million of these things!

However … commonsense took over and I asked the customer how much RAM was in their system. Due to language barriers this took a bit of time, but finally they cottoned on to what I was asking and the penny dropped as to why I was asking it.

Answer – 96gb (in a machine with 2 8-core processors).

So where do you think all those IOPs came from – simple answer is that the system RAM had cached the entire performance test file and all IOPs were being read from system RAM – the test was not going anywhere near the hard disks because the test was only 60Gb in size.

While this might seem like a pretty funny example (I certainly thought it was), it was a simple example of the fact that performance testing is fraught with dangers and variables. Add to that the fact that you really need to be testing a data pattern that is as close as possible to your real-world application and it all gets very messy from here on in.

I have had plenty of customers quoting Mb/sec speeds to me – when I know that customer is using a SQL database and Mb/sec is meaningless for that particular application. IOPs are the go here. I also come across plenty of customers who play with the variables of their testing software – doing such thing as a 128 queue depth when their application is never going to get anywhere near that number (very few applications are that well written, or parallel in nature that they even get to a queue depth of 16) … again, a meaningless test that will produce all sorts of numbers that will never be matched in the real world.

Therefore my recommendation to most customers is to study their applications first, and try to work out what the software will be doing under heavy load (and this one is totally out of my field – I’m not going to study every customer’s individual application) – then they can set relevant metrics in their testing software to determine what sort of performance they “should” get from any given hardware configuration. Be careful of what you ask for – you might just get numbers that look good but don’t mean anything. The trick is knowing “what” to ask for.

There are “lies, damned lies and statistics”, which I believe is a quote that can be attributed to Benjamin Disraeli, the famous English Politician.

I tend to agree.

Ciao
Neil

When the small things in life matter …

Posted in Advisor - Neil, Application Environments, General, Platforms, Storage Applications, Storage Interconnects & RAID, Storage Management by Neil

Quite some time ago Adaptec released a technology called “Hybrid RAID”. This is the combination of an SSD and a spinning disk in a RAID1 or RAID10 array.

Initially it was a little unclear as to who would use this technology, with gamers, workstations and entry-level servers seeming like the immediate candidates. However as we have progressed with this technology it has become clear that the top end of town (datacenters and corporate) are extremely interested in this technology. It takes a little bit of an explanation to see why …

If you are building a 16-drive server, and want to put the OS on fast disks (eg SSD), there are not many users who will run their server boot drives on just one disk – therefore a mirror is generally the accepted way of protecting this portion of the data puzzle on a server (no matter whether datacenter or home user).

The problem becomes one of capacity and slots. If you use up two slots for SSD for your boot drive, then you only have (in this example), 14 drive slots left for data storage.

So lets look at some maths.

Server 1
2 x 30Gb SSD drives for OS
14 x 2Tb drives for data
(for this example to keep the maths easy we’ll use say that 2Tb drives are in fact 2000Gb – which of course they are not)

Server 2
1 x 30Gb SSD drives for OS
15 x 2Tb drives for data

Capacity of Server 1 is:
30Gb for OS
13 x 2000Gb for data (losing one disk for RAID5 parity) = 26000Gb

Capacity of Server 2 is:
30Gb for OS
(this is made of 30Gb from both the SSD and 30Gb from the first 2000Gb hard drive – leaving 1970Gb usable space left on that drive)
14 x 1970Gb for data (losing one disk for RAID5 parity) = 27580Gb

That’s a 6.08% increase in capacity.

Of course, you could always make a RAID5 array of the 30Gb chunks not used on 14 of the hard drives, which would give an additional 420Gb of space – might be handy for swap files etc but we’ll discount this usage for the moment.

Price:
Server 1 costs … 2 x 30Gb SSD + 14 x 2Tb HDD
Server 2 costs … 1 x 30Gb SSD + 15 x 2Tb HDD

Just looking at Google for pricing indicates that server 2 will be cheaper than server 1, especially if a good SLC SSD is used for the boot drive.

Now a 6.08% increase in capacity might not sound like much if you are just buying one server, but when you are tasked with purchasing large amounts of data, this makes a big difference. If you can reduce the number of physical servers in your datacenter you reduce rack space, running costs, cooling costs – everything is impacted by having less hardware in your datacenter.

As far as cost is concerned, this can add up to massive savings for large organisations. Server is cheaper in the first place, less serves required, less servers to power, less cooling costs associated with fewer servers – it’s win, win, win when it comes to the financial side of these calculations.

Now that’s a smart use for this simple technology.

Ciao
Neil

How to make things fast …

Posted in Advisor - Neil, Application Environments, General, Platforms, Storage Applications, Storage Interconnects & RAID, Storage Management by Neil

I recently returned from one of my overseas jaunts (country will remain unnamed to protect the innocent) … where I found a general attitude towards making things go fast.

That was … use 15K SAS drives.

Simple as that. Don’t worry about which RAID card you use, or which RAID type you use for your different data sets, just use SAS drives for everything and that’s how you make the fastest server you can make.

While in principle I agree that for fileservers you should use SAS drives – they are fast, reliable and getting larger by the day, but to use these drives for every server type is missing the point when it comes to building a server to suit your needs.

10K and 15K SAS drives have a great advantage over SATA drives when it comes to seek time – so general file serving and database work will benefit greatly from these drives – but for video work I don’t personally see the need to use these drives. SATA drives (enterprise of course) are not far behind SAS drives in sequential throughput, and a dramatically cheaper and larger – meaning you can use more spindles in a SATA environment than you can afford to use in a SAS environment – and that’s what will really give you the speed in a streaming environment – spindles.

(While re-reading this article before posting, I actually find I’m disagreeing with myself on this point (slightly). 7200rpm SAS drives (commonly called “nearline”) seem to be the best go for this sort of work. The SAS interface gives these drives a slight performance improvement over their SATA counterparts and therefore are my favoured drives for many types of storage.)

Add to that the fact that certain RAID types are suited to different data environments and you can make a big difference to the speed of the system by using the correct RAID level for your data (eg don’t use RAID5 with database).

When I showed some of my customers results from a couple of my crazy Australian customers doing unbelievable speeds on video with nearline SAS drives – people were stunned. They had been told, and firmly believed, that to make something fast you needed 15K SAS drives.

Not so (not all the time at least).

Therefore, think about the drive types you are using, and about building a server to suit your data – not just filling the box up with expensive fast drives and hoping that will do the job.

Ciao
Neil

5EE – is anyone actually using this RAID level? …

Posted in Advisor - Neil, Application Environments, General, Platforms, Storage Applications, Storage Interconnects & RAID, Storage Management by Neil

Several years ago Adaptec adopted the 5EE RAID level as a standard component of our hardware RAID cards. It seemed like a very good idea – making use of the idle drive that was traditionally a hot spare.

However, as I’ve come to know this RAID level better, I’ve been advising people against using it for one particular reason.

Compare: System 1 – 5 drives in a RAID5 and 1 x hot spare, System 2 – 6 drives in a RAID5EE.

System 2 should run faster than system 1 – after all it has more spindles doing the day to day work … and yes, it does run faster. In theory it’s around 15% faster but I’m yet to see that in practice.

So what’s wrong with this RAID level? My problem lies when a drive dies. With System 1 when a drive dies, the hot spare kicks in, the RAID rebuilds and all is good again.

With System 2, when a drive dies, the RAID5EE compacts itself into a standard RAID5. That’s all good, except when you replace the drive – that’s when I have problem. The RAID5EE will expand itself back out to a RAID5EE – which is another lengthy process which I believe (now) is not required.

So 5EE probably makes sense when you are running just 4 drives in a small JBOD or 1U server, but when the drive count increases I think (again, now) that it’s probably better to run just a RAID5 with a hot spare than to run 5EE.

So is anyone actually using this RAID level and if so, how do you find it?

Ciao
Neil

Measuring performance (in the modern server world) …

Posted in Advisor - Neil, Application Environments, General, Platforms, Storage Applications, Storage Interconnects & RAID, Storage Management by Neil

In days gone by measuring performance was easy – Mb/sec (megabytes per second) was all anyone was really interested in. However measuring performance this way is only suited to systems where you are pushing large amounts of data through the system, not for measuring database or cloud-computing type environments.

For database we measure performance in IOPs – I/O’s per second. This measures how quickly small amounts of data can be transferred to and from the storage subsystem – in general the amount of data may not add up to many Mb/sec at all, so this traditional measurement method has little bearing here.

Now many customers are fully aware of these two measurement systems – and certainly anyone who has sat through one of my boring presentations is aware of where and when to use these two measurement systems, but there’s a new kid on the block that’s proving a little tougher to nail down.

Latency.

What is latency? It’s basically the amount of time it takes for information to be delivered to an end user, generally in a web or cloud environment. This metric is measured in milliseconds and for many organisations is the difference between profit and loss, or life and death.

I recently sat through a presentation with one of the industry’s leading experts in measuring data, and working out what is important to a customer. In that presentation I saw some pretty amazing statistics regarding latency:

Amazon consider that an additional 100ms latency costs 1% in sales (and that’s a very, very large amount of money), Google consider that an additional 500ms latency reduces page viewing by 20% (in other words people won’t wait).

Those are just a few of the amazing statistics that customers attribute to high latency. For most people, it’s a case of just getting sick of waiting for a page of information to turn up on their screens, but there are big dollars behind that frustration.

So how do we measure latency? That’s the 64-dollar question because it’s not easily, or sometimes even possible to measure at the server point. When you take into account there are a lot of network factors inbetween the server and the end user it adds up to a complex environment to measure.

IOPs don’t necessarily relate to latency. Having high IOPs doesn’t always relate to low latency – in fact it can be quite the opposite.

Sounds complicated doesn’t it – well yes, it is. Adaptec have a lot of years of experience in sorting out problems for customers – whether it be streaming data speeds, database IOPs or now cloud latency – seems our work is never finished. So the question is … what sort of data do you have in what sort of environment, and how do you measure the actual, real performance of that data?

Ciao
Neil

Changing the default task priority …

Posted in Advisor - Neil, Application Environments, General, Platforms, Storage Applications, Storage Interconnects & RAID, Storage Management by Neil

Have you ever wondered why rebuilds take so long. Maybe it’s because you have 24 x 3tb drives in a RAID6 with 2 failed drives, or maybe it’s because the default task priority is set to low.

You can set the default priority by right-clicking on the controller in ASM and changing the value. Note that this won’t affect running tasks – you can change that by right-clicking on the array and changing the task there, but I recommend to set the default priority to high.

This brings about lots of questions … won’t it impact performance? Will my users be affected? Possiby yes, but the real question is … what is important here? Getting the server rebuilt back to optimal as quickly as possible or not having people complain. Since people complain all the time I don’t take much notice of that. Their complaints will be much louder if the server goes down because something else goes wrong before the array is optimal again so I tend to go for the lesser of two evils and get things back working correctly as quickly as possible.

It all depends on your priorities, but it’s worth considering setting the default to high – it makes a considerable difference in such things as rebuilding RAID 6 arrays.

Food for thought.

Ciao
Neil