Object Storage in 2020

Object Jul 27, 2020

Object Storage is not exactly new.  It's been round for a long time but has usually been talked about in hushed tone by geeks, web developers and certain content provider solutions.  It’s had a place, it’s a great technology, but it wasn’t exactly mainstream.

But things are changing.  Not only for the geeks, but many enterprise solutions are now coming out of the box with Object storage support.  We’ll get to that in a moment, but first

What’s all this Object storage stuff about?

Many people are aware of File storage.  This is your standard FAT/NTFS/NFS or NAS type file systems that have been around forever.  You have a file system, with a directory structure.  And you have files.  These files can be created/deleted, opened/closed, modified, changed, renamed etc etc.  The file has data and a number of fixed attributes relating to that.

Block storage on the other hand is lower level.   It’s what the hard drives in your desktop/laptop work on.  It’s SCSI / SATA.   These don’t know anything about files, only blocks of data.   You can read/write/delete/seek/move etc.   Many enterprise class arrays operate in this space.  We present block storage from high performance arrays to servers or hypervisor clusters using protocols like iSCSI or Fibre Channel, or my new favorite NVMeOF.  The server/cluster then overlays a filesystem like NTFS/ReFS/XFS/ext4 or VMFS and we have a traditional storage model.

Object is different.  It doesn’t care about files and structure.   Anything can be an Object, or a blob.  An Object store is accessed using http(s) network protocols only.  Every object just sits in a bucket, and you can have a number of metadata tags to describe and search those objects instead of fixed attributes.  This means you can describe your data in a way that is suitable to you.  There is also a different way of accessing.  You can create, read or delete an Object. But importantly you cannot modify an object, all objects are immutable.  In most cases you cannot seek to and read part of an Object.  It’s usually an all or nothing kind of operation.  Object storage platforms are also designed to be scalable, up to many PetaBytes of data, something that traditional filesystems struggle with, and it’s also designed to be resilient. You hear things like 11 9’s resilience plus floating around.

Another important fact to remember about Object, is that it is “eventually” consistent.  This can be somewhat tricky to get your head around.  If you are used to writing a file, and then you read it, you expect to get back the version you just wrote right?  Not always the case with Object, it may take some time for all nodes to be updated and the new version available for read.  With Object – last write wins. Cloud Native Apps expect this, but unless you are aware of it – you may get caught out.

So why is this Object a good thing?  

Well if you look at some of the original uses of Object, images would be written to a bucket, then served up over the internet.  This is great, you write once, and it’s read many times.  But quickly it became clear that with todays expanding data requirements Object had a number of other benefits.

Operationally Object also has its advantages.  You no longer deal with files being too big, you move the space management to a scalable backend.  This is great if you have ever tried to expand an NTFS partition past 16TB only to find the thing was formatted with the wrong allocation unit size.  You need to copy and move data to new locations.  Resiliency is also a problem at larger sizes.  How do you back it up?  How do you restore? how to you make sure your data is safe?  Object addresses all these, you just need to change your way of thinking.

There are different versions of Object stores.   All based on the same principles, but with slightly different flavours.  There was Centera CAS (Content Addressable Storage), the ubiquitous Amazon S3, Azure Blob storage to name but a few.    Standards are different, but like the old Betamax vs VHS, the Amazon S3 protocol appears to have emerged victorious in the ‘de-facto’ standards race.

This leads us back to the enterprise type solutions that are now using Object.

Veeam can leverage Object for backup.   Either as a native backend for Office 365 backup solution, or a part of a Scale Out Backup Repository where older data is tiered out to S3 compatible backend.  This functionality has great promise to provide an alternative to Tape for long term retention whilst still supporting cool features like Instant VM Restore, and Object Lock for ransomware protection.

Analytics is another area where are where Object is gaining strength.  The unstructured nature of Object, along with metadata tags, makes analyzing large data sets practical.

Pure CloudSnap is another great example of another storage vendor leveraging Object.

The number of enterprise tools where using an Object backend for storage is growing on a daily basis, Object is no longer only for the Cloud Native Applications, it really is becoming ubiquitous.

But not all Object stores are created equal….

Many times Object is synonymous with ‘cheap’.  But you can’t have it all ways.   AWS glacier is cheap for storage, but expensive for operations.  AWS again charge for egress data, and put/get delete operations.  This all means that the cost of the service is hard to determine up front.

Also, if your data is far away on cheap/slow storage – how usable is it?  Will it be there for you in times of need?  Will you be able to restore your data in time to meet RTO commitments? Will you be able to afford to retrieve it?

If you need to run analytics on your data how fast is fast enough?  If you race to the bottom on the cheap/deep price war will you be able to use the service for your expected workload?

If you buy a turnkey solution from a Vendor, how will you know what performance or IO demands it will ask of your storage?

Determining what your workload profile is actually really difficult and many get caught out.

S3 Object solutions can range from the highly performing (Pure Flashblade for example), to slow and cold (Amazon Glacier), and choosing the wrong solution up front can lead to unfortunate consequences.

Where does vBridge fit into this?   Object Storage has cemented it's place alongside Block and File as one of the 3 pillars storage.  Watch this space…….

Phil Snowdon

Phil is the Technical Operations Manager at vBridge. Loves all things infrastructure. Network/Security/Storage/Compute and Virtualization.