Home Cloud & Networking OpenStack Object Storage (Swift) Essentials

OpenStack Object Storage (Swift) Essentials

books-svg-icon Book
eBook $25.99 $17.99
Print $32.99
Subscription $15.99 $10 p/m for three months
$10 p/m for first 3 months. $15.99 p/m after that. Cancel Anytime!
What do you get with a Packt Subscription?
This book & 7000+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook + Subscription?
Download this book in EPUB and PDF formats, plus a monthly download credit
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook?
Download this book in EPUB and PDF formats
Access this title in our online reader
DRM FREE - Read whenever, wherever and however you want
Online reader with customised display settings for better reading experience
What do you get with video?
Download this video in MP4 format
Access this title in our online reader
DRM FREE - Watch whenever, wherever and however you want
Online reader with customised display settings for better learning experience
What do you get with video?
Stream this video
Access this title in our online reader
DRM FREE - Watch whenever, wherever and however you want
Online reader with customised display settings for better learning experience
What do you get with Audiobook?
Download a zip folder consisting of audio files (in MP3 Format) along with supplementary PDF
What do you get with Exam Trainer?
Flashcards, Mock exams, Exam Tips, Practice Questions
Access these resources with our interactive certification platform
Mobile compatible-Practice whenever, wherever, however you want
BUY NOW $10 p/m for first 3 months. $15.99 p/m after that. Cancel Anytime!
eBook $25.99 $17.99
Print $32.99
Subscription $15.99 $10 p/m for three months
What do you get with a Packt Subscription?
This book & 7000+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook + Subscription?
Download this book in EPUB and PDF formats, plus a monthly download credit
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook?
Download this book in EPUB and PDF formats
Access this title in our online reader
DRM FREE - Read whenever, wherever and however you want
Online reader with customised display settings for better reading experience
What do you get with video?
Download this video in MP4 format
Access this title in our online reader
DRM FREE - Watch whenever, wherever and however you want
Online reader with customised display settings for better learning experience
What do you get with video?
Stream this video
Access this title in our online reader
DRM FREE - Watch whenever, wherever and however you want
Online reader with customised display settings for better learning experience
What do you get with Audiobook?
Download a zip folder consisting of audio files (in MP3 Format) along with supplementary PDF
What do you get with Exam Trainer?
Flashcards, Mock exams, Exam Tips, Practice Questions
Access these resources with our interactive certification platform
Mobile compatible-Practice whenever, wherever, however you want
  1. Free Chapter
    Cloud Storage – Why Can't I Be Like Google?
About this book
Publication date:
May 2015
Publisher
Packt
Pages
174
ISBN
9781785283598

 

Chapter 1. Cloud Storage – Why Can't I Be Like Google?

If you could build your IT systems and operations from scratch today, would you recreate what you have? That's the question Geir Ramleth, CIO of the construction giant Bechtel, asked himself in 2005. The answer was obviously not, and Bechtel ended up using the best practices from four Internet forerunners of that time—YouTube, Google, Amazon, and Salesforce—to create their next set of data centers.

This is exactly the same question CIOs and IT administrators around the world are asking themselves! In this book, you will learn about a revolutionary new storage system called cloud storage that uses the best practices (though not the exact technologies) of these web giants. This will cut the total cost of ownership (TCO) of storage by more than 10 times compared to traditional enterprise block or file storage.

This book will show you how you can implement cloud storage using a leading open source storage software stack called OpenStack Swift. Let's first explore some key elements that constitute cloud storage:

  • Dramatic reduction in TCO

  • Unlimited scalability

  • Elasticity achieved by virtualization

  • On-demand; that is, pay for what you use

  • Universal, that is, access from anywhere

  • Multitenancy, which means sharing storage hardware with other departments or companies

  • Data durability and availability, even with partial failures of the storage system

 

What constitutes cloud storage?


Let's review each of these elements of cloud storage in more detail.

Reduced TCO

Reduced TCO is the crux of cloud storage. Unless this new storage cuts storage cost by more than 10 times, it is not worth switching from block or file storage and dealing with something new and different. By total cost of ownership, we mean the total of capital expenditure (CAPEX) which involves equipment and operational expenditure (OPEX) in the form of IT storage administrators, electricity, power, cooling, and so on. This TCO reduction must be achieved without sacrificing durability (keeping data intact) or availability.

Unlimited scalability

Whether the cloud storage offering is public (that is, offered by a service provider) or private (that is, offered by central IT), it must have unlimited scalability. As we will see, cloud storage is built on distributed systems, which means that it scales very well. Traditional storage systems typically have an upper limit, making them unsuitable for cloud storage.

Elastic

Storage virtualization decouples and abstracts the storage pool from its physical implementation. This means that you can get an elastic (grow and shrink as required) and unified storage pool, when in reality, the underlying hardware is neither. IT professionals who have spent endless hours forecasting data growth and then waiting for their equipment will appreciate the magnitude of this benefit.

On-demand

Consumers do not reserve blocks of electricity and pay for it upfront, yet we routinely pay for storage upfront, whether we use it or not. Cloud storage uses a pay-as-you-go model, where you pay only for the data stored and the data accessed. For a private cloud, there is a minimal cluster to start with, beyond which it is on-demand. This can result in huge cost savings for the storage user.

Universal access

Existing enterprise storage has limitations in terms of access. Block storage is very limiting; a server has to be on the same storage area network, and storage volumes cannot be shared. Network-attached-storage (NAS) must be mounted to access it. This creates limitations on the number of clients and requires LAN access.

Cloud storage is extremely flexible—there is no limit on the number of users or from where you can access it. This is possible since cloud storage systems usually use a REST API over HTTP (GET, PUT, POST, and DELETE) instead of the traditional SCSI or CIFS/NFS protocols.

Multitenancy

Cloud storage is typically multi-tenant. The tenants may be different organizations in a public cloud or different departments in a private cloud. The benefit is centralized management and higher storage utilization, which reduces costs. Security, often an issue with multi-tenant systems, is addressed comprehensively in cloud storage through strong authentication, access controls, and various encryption options.

Data durability and availability

Cloud storage is able to run on commodity hardware, yet it is highly durable and available. This is even more impressive in that durability and availability is maintained in the face of a partial system failure. As with many modern distributed systems, the burden of data durability and availability is on the software layer rather than the underlying hardware layer.

 

Limitations of cloud storage


While cloud storage has numerous benefits, there are some limitations in the areas of performance and new APIs.

Performance

Storage systems have struggled to balance reliability, cost, and performance. Generally, you can get two out of these three aspects. Cloud storage optimizes reliability and cost, but not performance. In fact, as we will see later, reliability in cloud storage is better than the traditional RAID when you reach a large scale. By the way RAID works, you are at a very high risk of getting an unrecoverable failure during a RAID rebuild when operating at-scale. Cloud storage uses different techniques such as replication or erasure coding to provide high reliability.

This means that cloud storage is well suited for applications such as web servers and application servers, but not for databases or high-performance computing. It is also suitable for tier 2/3 storage, for example, backup, archival (photos, documents, videos, logs, and so on), and creating an additional copy for disaster recovery.

New APIs

Cloud storage affects applications in two ways: its interface with storage and its behavior. Firstly, applications need to port to a new and different storage interface utilizing HTTP instead of SCSI or CIFS/ NFS. Secondly, applications need to handle an eventually consistent storage system. The second part requires explanation.

Cloud storage is built using distributed systems that are governed by a theorem called the CAP theorem, which states that out of the following three points, it is impossible to guarantee more than two:

  • Consistency: For cloud storage, this means that a request to any region or node returns the same data

  • Availability: For cloud storage, this signifies that a request is successfully acknowledged with a response other than no response or an error

  • Tolerance to partial failures: For cloud storage, this implies that the architecture is able to withstand failures in connectivity or parts of the system

Most cloud storage systems guarantee availability and tolerance to partial failures at the expense of consistency, making the system eventually consistent. This means that an operation such as an update may not be reflected to all nodes at the same time. Traditional applications expect strict consistency and may need to be modified.

If an application has not ported to cloud storage, is that a dead end? Fortunately not. There is a class of devices called cloud gateway that provides file or block interfaces to an application (for example, CIFS, NFS, iSCSI, or FTP/SFTP) and performs protocol conversion on the cloud. These gateways provide other functions as well, such as caching, WAN optimization, optional compression, encryption, and deduplication. They also eliminate the need for an application to handle the eventual consistency problem.

 

Object storage


How do you build a cloud storage system? The most suitable underlying technology is object storage.

Object storage is different from block or file storage as it allows a user to store data in the form of objects (essentially files) in a flat namespace using REST HTTP APIs. Object storage completely virtualizes the physical implementation from the logical presentation. It is similar to check-in luggage versus carry-on luggage, where once you put your check-in luggage in the system, you really don't know where it is. You simply get it back at your destination. With carry-on luggage, you have to know exactly where you have kept it at all times.

Object storage is built using scale-out distributed systems. Each node, most often, actually runs on a local filesystem. As we will see, object storage architectures allow for the use of commodity hardware, as opposed to specialized, expensive hardware used by traditional storage systems. The most critical tasks of an object storage system are as follows:

  • Data placement

  • Automating management tasks, including durability and availability

Typically, a user sends their HTTP GET, PUT, POST, HEAD, or DELETE request to any one node out from a set of nodes, and the request is translated to physical nodes by the object storage software. The software also takes care of the durability model by doing any one of the following: creating multiple copies of the object, chunking it, creating erasure codes, or a combination of these.

The durability model is not RAID because, as discussed earlier, RAID simply does not scale beyond hundreds of terabytes. The second critical task deals with management, such as periodic health checks, self-healing, and data migration. Management is also made easy by using a single flat namespace, which means that a storage administrator can manage the entire cluster as a single entity.

Let's evaluate through the following table how object storage meets the aforementioned cloud storage benefits:

Criteria

Ability to meet

Low TCO

Storage nodes have no special requirements such as high availability, management, or special hardware such as RAID. This means that commodity hardware can be used to cut capital expenses (CAPEX).

A single flat namespace with automated management features allows you to cut operational expenses (OPEX).

A full analysis of how this cuts the TCO by 10 times or more is beyond the scope of this book.

Unlimited scalability

A distributed architecture allows capacity and performance to scale.

Elasticity

A fully virtualized approach allows data to grow and shrink as necessary.

On-demand

A fully virtualized approach with centralized management allows storage to be offered as an on-demand self-service resource.

Universal access

REST HTTP APIs provide access from wherever the user is, with no restriction on the number of users.

 

The importance of being open


Although the need for software to be open is not a technical requirement, it is increasingly becoming a business requirement. Open means three things:

  • Open source: While there are numerous benefits of open source software, the key advantages are the users' ability to influence the direction of the project, the velocity of innovation, reduced license fees, and the ability to switch vendors.

  • Open APIs: To avoid vendor lock-in, the APIs must be open. Often, proprietary APIs are enticing upfront but lock users in.

  • Agnostic to underlying hardware choices: To reduce hardware costs and maintain users' preferences, the software needs to be hardware agnostic.

 

OpenStack Swift


OpenStack Swift is a leading open source object storage project that meets the mentioned object storage and open technology requirements, and is the topic of this book. Let us first look at what the OpenStack project is about, and then specifically what OpenStack Swift (also referred to as just Swift) is.

OpenStack, a project launched by NASA and Rackspace in 2010, is currently the fastest growing open source project, and its mission is to produce a cloud computing platform useful for both public and private implementations. Its two core principles are simplicity and scalability. OpenStack has numerous subprojects under its umbrella, ranging from computing and storage to networking, among others. The object storage project is called Swift and is a highly available, durable, distributed, masterless, and eventually consistent software stack.

The Swift project, in particular, came out of Rackspace's cloud files platform. The project was unique because it utilized a DevOps methodology, where the engineers and ops professionals worked together to create and operate it. This resulted in a very powerful storage system that is simple, yet easy to manage. Rackspace made Swift open source in 2010, and the leading contributors include SwiftStack, Rackspace, Red Hat, HP, Intel, IBM, and others.

In addition to sharing the mentioned generic object storage characteristics, OpenStack Swift has some unique additional functionality, as follows:

  • Open source: Comes with no license fees, as mentioned previously.

  • Open standards: Using HTTP REST APIs with SSL for optional encryption. The combination of open source and open standards eliminates any potential vendor lock-in.

  • Account container object structure: OpenStack Swift incorporates rich naming and organization capacity, unlike a number of object storage systems that offer a primitive interface, where the user gets a key upon submitting an object. In these other systems, the burden of mapping names to keys and organizing them in a reasonable manner is left to the user. Swift, on the other hand, handles the organization of data along with multitenancy.

  • Global cluster capability: This allows replication and distribution of data around the world. This functionality helps with disaster recovery, distribution of hot data, and so on.

  • Storage policies: This feature allows sets of data (stored in separate containers) to be optionally stored on different types of underlying storage using different durability models. For example, a valuable set of digital assets can be stored on high-quality hardware using triple replication, while less important assets can be stored on lower quality hardware with a lower level of replication. Hot data could be stored on SSDs.

  • Partial object retrieval: For example, you want just a portion of a movie object or a TAR file.

  • Middleware architecture: This allow users to add functionality. A great example of this is integrating with an authentication system.

  • Large object support: Objects of any size can be stored.

  • Additional functionality: This includes object versioning, causing objects to expire, rate limiting, temporary URL support, CNAME lookup, domain remap, account-to-account data copy, quota support, and static web mode. This list is constantly growing as a consequence of Swift being an open source project.

 

Summary


In this chapter, we saw why cloud storage is a new way of building storage systems that cuts the total cost of ownership significantly. It uses a technology called object storage. A high-quality and open source object storage software stack to consider is OpenStack Swift. OpenStack Swift uses a dramatically different architecture from traditional enterprise storage systems by using a distributed architecture on commodity servers. The next chapter explains this architecture in detail.

OpenStack Object Storage (Swift) Essentials
Unlock this book and the full library FREE for 7 days
Start now