Why one Software Defined Storage solution will rule them all!!!

In my travels across Asia I’ve spoken to many customers to showcase the merits of running their datacenter with software defined storage. If webscalers like Google and Facebook can run their datacenter using commodity x86 servers along with software defined storage, why can’t you? When was the last time you saw 404 in your browser when accessing Google or Facebook? In this blog post I’ll describe why you should consider Dell EMC ScaleIO as the ONLY software defined block storage solution that not only replaces traditional SAN but can also run in hyperconverged environments. No other solution in the market today can run both at the same time in the same cluster for any application at any scale!

Software defined storage has been around for a short period of time and yet it has garnered a lot of attention over the years. There are many solutions out there in the market that claim to be the leader and that’s ok. What you need to understand first and foremost is the architecture of how these software defined storage systems/hyperconverged systems operate (moving forward I’ll dub them both as SDS). Once you understand how the architecture works will it then help you decide what’s right for you.

Firstly, solutions that have to rely on an SSD or two SSDs as caching devices is not a very smart approach. Even when that SSD has to cache other SSD’s sitting behind it to me is a true waste of SSD resources – you should be able to use all the SSD’s together as a single pool of storage for both reads and writes and leverage their true combined potential. Have you thought about what happens when you have bursty traffic and the SSD fills up and it cannot de-stage the data quick enough? How do you think it will affect your VM’s sitting on that volume? What happens if one of the VM’s in that same volume is busier than the rest? That will undoubtedly cause hot spots and performance drain. Yes, you can set QoS and so forth but you’re still only limiting yourself to those one or two SSD’s and it is just not wise. How are you going to predict bursty traffic? The answer is you can’t and typically admins will need to move that VM to another host to give it the performance it needs which will then cause more unnecessary backend network traffic. With ScaleIO on the other hand, it abstracts all the devices from all the servers and puts them all together in a single storage pool. Imagine the performance you’ll get if you had 100 x SSD’s serving every volume you create from that pool? And all of this processing is done parallel! Who needs data locality?! That is why we are seeing insane performance figures and no wonder storagereview.com has said it is the fastest thing that they’ve ever tested. You can read up on the review and be sure to read the first sentence in the 2nd paragraph in the Conclusion section ;-) 
http://www.storagereview.com/emc_vxrack_node_powered_by_scaleio_review. You can also see the other test results they’ve performed such as Sysbench OLTP, VMmark and SQL. Btw can you find performance reports of the other leaders on there?

Secondly, there is the software defined storage appliance approach. This one bothers me all the time as we see these other leaders position themselves for the datacenter. An appliance based approach = one cluster. Multiple clusters = multiple separate appliances. Btw, the average size of a cluster is between 12-16 nodes and I’ve never seen or heard of a customer go up to 64 nodes. With this appliance approach there is no sharing of resources from a performance and storage perspective. It’s like managing multiple arrays again so imagine the pain in having to manage multiple appliances! Oh and don’t forget data migrations from one appliance to another or the fact you cannot add storage and compute independently. Yes, these leaders say you can but if you actually read their documentation, they say upgrades should be symmetric so that performance is predictable otherwise they cannot guarantee performance. Well, what happens if I don’t need the extra compute and just need storage? That means my CapEx has just gone up due to their rigid design. And I can’t even add an all flash to an existing hybrid cluster. I have to start a new cluster!? This siloed approach should never be considered for the datacenter.



      

At Dell EMC World 2017 it was announced that ScaleIO is now part of the Dell EMC Enterprise Storage Family. What that means is that ScaleIO is classified as a datacenter solution along with VMAX and XtremIO. Not bad for a software defined storage solution that is only 6 years old (https://en.wikipedia.org/wiki/EMC_ScaleIO) compared to VMAX which has been around for 25 years (https://en.wikipedia.org/wiki/EMC_Symmetrix). You start with a minimum of 3 nodes and you can scale compute and storage independently all the way up to 1,024 servers in a single cluster! It has all the features one would come to expect from an enterprise grade solution like multi-tenancy, snapshots, thin provisioning, QoS, etc. Yes, it doesn’t have certain data services such as native replication or data reduction but these features are coming. Once ScaleIO has its own native replication (Btw, why not let the application do the replication? Shouldn’t the application know it has multiple copies of the data instead? ScaleIO supports any application replication solution today such as Oracle Data Guard) and compression, what else is left? Yes, there is the notion of active-active but is the application itself aware that it is active-active? And one thing that makes me laugh is when the leaders perform POCs with their metro-clustering ability but with only one VM - that doesn’t really simulate a real-life workload! Or when they do performance testing, they only test for a short period of time and see phenomenal results. Little do customers’ realise is that the workload is still hitting the cache! A performance test should run for a minimum of 30 mins and not 5 mins.


Lastly, I like it when these leaders say that they can do everything from storage to backup to disaster recovery. One thing that you have to remember is that by doing everything in one appliance you’re literally putting all your eggs in one basket. Yes, it is one throat to choke but in my mind you need solutions that were designed to solve a particular problem - not a band aid approach. For example, when it comes to backup these leaders actually perform a snapshot. A snapshot is a point in time copy of the data and is not a true secondary copy of the data which is a backup. Also, you must keep primary data and backup data separate, i.e. they must be isolated, redundant and resilient. Think of a photo and a cassette tape. If I take multiple photos that’s multiple point in time copies and usually the storage system won’t be able to handle that many snaps that is kept for weeks/months on end. What happens when after the 9am snap and a data corruption occurs. When it is time to take the next snap say 10 mins later, you’re actually snapshotting the corruption as well. The right strategy here is to define what your actual RPO/RTO requirements are for each application. By doing so will allow you to select the right data protection strategy. Take Dell EMC’s RecoverPoint (RP) and the snapshot example above. With RP I can actually roll back in time prior to the corruption and recover from there. With the snapshot though I would have lost 10 minutes’ worth of data when I restored to the previous snapshot. Not an ideal situation to be in. A backup strategy should have a dedicated backup software and backup appliance like Avamar and Data Domain that is designed to back up your most important asset, i.e. your data.

Well that was my first blog post, I do hope you found it educational and interesting. I look forward to your comments below and any suggestions to make this blog more interesting for you.

Next blog post: Would you like de-dupe and auto-tiering with that? 

Comments

Popular posts from this blog

Attack of the tools!

Who's attacking who?