Implementing Cloud Storage with OpenStack Swift

By Amar Kapadia , Sreedhar Varma , Kris Rajana
    Advance your knowledge in tech with a Packt subscription

  • Instant online access to over 7,500+ books and videos
  • Constantly updated with 100+ new titles each month
  • Breadth and depth in over 1,000+ technologies
  1. Cloud Storage: Why Can't I be like Google?

About this book

Swift, OpenStack's cloud software project, allows users to build cloud storage, a method used widely to slash costs and improve usability. With Swift, not only can users build storage using inexpensive commodity hardware, but they can also use public cloud storage that is built using the same technology. This book will provide you with the skills to build and operate your own cloud storage or use a third-party cloud.

You will start with the fundamentals of cloud storage, how OpenStack Swift is useful for cloud storage, and a review of Swift's architecture. Next, learn about installation, use, and managing Swift with step-by-step instructions and ample screenshots. Perform basic data transfers and access-control-list management using REST APIs. Hardware choice, Swift tuning, and use cases will round off your skills. This book is an invaluable tool if you want to get a head-start in the world of cloud storage using OpenStack Swift.

Publication date:
May 2014
Publisher
Packt
Pages
140
ISBN
9781782168058

 

Chapter 1. Cloud Storage: Why Can't I be like Google?

If you could build your IT systems and operations from scratch today, would you recreate what you have? That's the question Geir Ramleth, CIO of construction giant Bechtel, asked himself in 2005. The answer was obviously not, and Bechtel ended up using best practices from four Internet forerunners of the time, YouTube, Google, Amazon.com, and Salesforce.com, to create their next set of datacenters. This is exactly the same question CIOs around the world are asking themselves, and that's what cloud storage is about! Through this book, you will learn how to implement a storage system that uses the best practices of these web giants rather than a traditional enterprise, thus cutting Total Cost of Ownership (TCO) by more than 10 times. This type of storage is called cloud storage.

The following are some key elements that constitute cloud storage:

  • Benefits:

    • Dramatic reduction in TCO

    • Unlimited scalability

    • Elasticity achieved by virtualization

    • On-demand; that is, pay for what you use

    • Universal access from anywhere

  • Limitations:

    • Sharing storage with other departments or companies

    • Is not a high-performance option

    • Requires a cloud gateway or an application change

 

Elements of cloud storage


Let us review the benefits and limitations of cloud storage in more detail.

Reduced TCO

Reduced TCO is the crux of cloud storage. Unless this new storage cuts storage cost by more than 10 times, it is not worth switching from block or file storage and dealing with something new and different. By total cost of ownership, we mean the total of capital expenditures (CAPEX) in the form of equipment, and operational expenditures (OPEX) in the form of IT storage administrators, electricity, power, cooling, and so on. This TCO reduction must be achieved without sacrificing durability (keeping data intact) or availability.

Unlimited scalability

Whether the cloud storage offering is public, that is, offered by a service provider or it is private, that is, offered by central IT, it must have unlimited scalability. As we will see, cloud storage is built on distributed systems, meaning that it scales very well. Traditional storage systems typically have an upper limit, so this is a huge benefit.

Elastic

Storage virtualization decouples and abstracts the storage pool from its physical implementation. This means that you can get an elastic (grow and shrink as required) and unified storage pool, when in reality the underlying hardware is neither. IT professionals who have spent endless hours forecasting data growth and then waiting for their equipment will appreciate the magnitude of this benefit.

On-demand

Consumers do not reserve blocks of electricity and pay for it upfront in countries such as the United States. Yet we routinely pay for storage upfront whether we use it or not. Cloud storage uses a pay-as-you-go model, where you only pay for the data stored and the data accessed. This can result in huge cost savings for the storage user.

Universal access

The existing enterprise storage has limitations in terms of access. Block storage is very limiting; a server has to be on the same storage-area network, and LUNs (storage pools) cannot be shared. Network-attached-storage (NAS) must be mounted to access it. This creates limitations on the number of clients and requires LAN access. Cloud storage is extremely flexible—there is no limit on the number of users or from where you access it. This is possible since cloud storage systems usually use a REST API over HTTP (get, put, post, and delete) instead of traditional SCSI or CIFS/ NFS protocols.

Multitenanancy

This is both a benefit and a potential limitation. Cloud storage is typically multitenant. Tenants may be different organizations in a public cloud or different departments in a private cloud. The benefit is centralized management that reduces costs. On the other hand, security is not a real concern because of strong authentication, access controls, and various encryption options; but it is certainly a perceived issue.

Use cases

Storage systems have struggled to balance reliability, cost, and performance. Generally, you can get two out of the three mentioned aspects. Cloud storage optimizes reliability and cost, but not performance. In fact, as we will see later, reliability in cloud storage is better than traditional RAID when you reach a large scale. The way RAID works, you are at a very high risk of having a failure during a RAID rebuild. Cloud storage uses different techniques such as replication or erasure coding to provide high reliability even when scaled.

This means cloud storage is good for primary storage for applications such as web servers and application servers, but not for databases or high-performance computing tier 2/3 storage, for example, backup, archival (photos, documents, videos, logs, and so on), and creating an additional copy for disaster recovery.

Application impact

Cloud storage affects applications in two ways, its interface to storage and its behavior. First, applications need to port to a new and different storage interface. Second, applications need to handle an eventually consistent storage system. The second part requires explanation. Cloud storage is built using distributed systems, and it is based on a theorem called the CAP theorem, which states that out of the following three points, it is impossible to guarantee more than two:

  • Consistency: For cloud storage, this means that a request to any region/node returns the same data

  • Availability: For cloud storage, this signifies that a request is successfully acknowledged with a response

  • Partial tolerance: For cloud storage, this implies that the architecture is able to withstand failures in connectivity or parts of the system

Most cloud storage systems guarantee availability and partial tolerance at the expense of consistency, making the system eventually consistent. This means that an operation such as write or delete may not be reflected to all nodes at the same time. Traditional applications expect strict consistency and must be modified.

Cloud gateways

If an application has not ported to cloud storage, is that a dead end? Fortunately not; there is a class of devices called cloud gateways that provide file or block interfaces to an application (for example, CIFS, NFS, iSCSI, or FTP/ SFTP) and perform protocol conversion to the cloud. These gateways provide other functionalities such as caching, WAN optimization, optional compression, encryption, and deduplication as well. These gateways also eliminate the need for an application to handle the eventual consistency problem.

 

Object storage


How do you build a cloud storage system? The most suitable underlying technology is object storage.

Object storage is different from block or file storage and it allows a user to store data in the form of objects (essentially files) in a flat namespace using REST HTTP APIs. Object storage completely virtualizes the physical implementation from the logical presentation. It is similar to check-in luggage versus carry-on luggage, where once you put your check-in luggage in the system, you really don't know where it is. You simply get it back at your destination. With carry-on luggage, you have to know exactly where you have kept it at all times.

Object storage is built using scale-out distributed systems. Each node, most often, actually runs on a local file system. As we will see, object storage architectures allow for the use of commodity hardware as opposed to expensive specialized hardware used by traditional storage systems. You could argue that object storage is a higher-level storage system than file systems. The two most critical tasks of an object storage system are:

  • Data placement

  • Automating management tasks

Typically, a user sends their HTTP GET, PUT, POST, or DELETE request to any one of a set of nodes, and the request is translated to physical nodes by the object storage software. The software also takes care of the durability model by either creating multiple copies of the object, chunking it, creating erasure codes, or a combination. The durability model is not RAID because RAID simply does not scale beyond hundreds of terabytes. The second critical task deals with management, such as periodic health checks, self-healing, and data migration. Management is also made easy by having a single flat namespace, which means that a storage administrator can manage the entire cluster as a single entity.

Let's evaluate, through the following table, how object storage meets the mentioned cloud storage benefits:

Criteria

Ability to meet

Low TCO

Storage nodes have no special requirements such as high availability, management, or special hardware such as RAID; this means commodity hardware can be used to cut capital expenses (CAPEX).

A single flat namespace with automated management features allows you to cut operational expenses (OPEX).

A full analysis of how this cuts the TCO by 10 times or more is outside the scope of this book.

Unlimited scalability

A distributed architecture allows capacity and performance to scale.

Elasticity

A fully virtualized approach allows data to grow and shrink as necessary.

On-demand

A fully virtualized approach with centralized management allows storage to be offered as an on-demand service.

Universal access

REST HTTP APIs provide access from wherever the user is, with no restriction on the number of users.

Multitenancy

A combination of multiple accounts, strong authentication, and access controls ensures multitenancy with adequate security.

 

OpenStack Swift


Is there an object storage stack best suited for our purposes? We believe the right choice is OpenStack Swift. Let us first look at what the OpenStack project is about, what OpenStack Swift (also referred to as just Swift) is, and then answer the preceding question about its choice.

OpenStack, a project launched by NASA and RackSpace in 2010, is currently the fastest growing open source project, and its mission is to produce a cloud computing platform useful for both public and private implementations. The two core principles are simplicity and scalability. OpenStack has numerous subprojects in its umbrella, ranging from computing and storing to networking, among others. The object storage project is called Swift and is a highly available, distributed, masterless, and eventually consistent software stack.

Why Swift when there are several vendors selling proprietary object storage software? The answer is in the first few sentences of this chapter; if you want to be like the web giants, the only option is open source. Open source cuts the total cost of ownership dramatically and provides access to a vibrant community that can provide technical support. Open source projects also provide longevity since open source has been shown to outlast proprietary projects. Moreover, open source projects allow users to benefit from the work done by bigger players and creates an ecosystem of tools and know-how. Finally, open source projects add functionality at a lot faster rate than proprietary projects. All this makes Swift the right choice.

The Swift project, in particular, came out of RackSpace's Cloud Files platform. The project was unique because the engineers and dev ops folks worked together to create it. This resulted in a very powerful storage system that is simple yet easy to manage. RackSpace "open-sourced" Swift in 2010 and numerous organizations such as Seagate, EVault, IBM, HP, Internap, Korea Telecom, Intel, SwiftStack, CloudScaling, Mirantis, and so on have joined the project since then.

In addition to sharing the mentioned generic object storage characteristics, OpenStack Swift has some unique additional functionality, as follows:

  • Open source: With no license fees, as mentioned previously.

  • Open standards: Using HTTP REST APIs with SSL for optional encryption. The combination of open source and open standards eliminates any potential vendor lock-in.

  • Account / container / object structure: OpenStack Swift incorporates rich naming and organization capacity, unlike a number of object storage systems that offer a primitive interface where the user gets a key upon submitting an object. The burden of mapping names to keys and organizing them in a reasonable manner is left to the user.

  • Global cluster capability: This allows replication and distribution of data around the world. This functionality helps with disaster recovery, distribution of hot data, and so on.

  • Partial object retrieval: For example, if you want just a portion of a movie object or a TAR file.

  • Middleware architecture: Allows you to add functionality. A great example of this is integrating with an authentication system.

  • Large object support: For objects over 5 GB.

  • Additional functionality: This includes object versioning, expiring objects, rate limiting, temporary URL support, CNAME lookup, domain remap, and static web mode. This list is constantly growing as a consequence of Swift being an open source project.

 

Summary


In this chapter, we covered why cloud storage is a new way to build storage systems that cuts the total cost of ownership significantly. It uses a technology called object storage. A high-quality open source object storage software stack to consider is OpenStack Swift. OpenStack Swift uses a dramatically different architecture than traditional enterprise storage systems by using a distributed architecture on commodity servers. The next chapter explains this architecture in detail.

About the Authors

  • Amar Kapadia

    Amar Kapadia is a storage technologist and blogger based in the San Francisco Bay Area. He is currently the senior director of product marketing for Mirantis, the #1 pure-play OpenStack company. Prior to Mirantis, he was the senior director of strategy for EVault's Long-Term Storage Service, a public cloud storage offering based on OpenStack Swift. He has over 20 years of experience in storage, server, and I/O technologies at Emulex, Philips, and HP. Amar's current passion is in cloud and object storage technologies. He holds a master's degree in electrical engineering from the University of California, Berkeley.

    When not working on OpenStack Swift, Amar can be found working on technologies such as Kubernetes, MongoDB, PHP, or jQuery. His blogs can be found at www.buildcloudstorage.com.

    Browse publications by this author
  • Sreedhar Varma

    Sreedhar Varma has more than 15 years of experience in the storage industry, and has worked on various storage technologies such as SCSI, SAS, SATA, FC HBA drivers (Adaptec, Emulex, Qlogic, Promise, and so on), RAID, storage stacks of various operating systems, and system software for fault-tolerant and high-availability systems. He has good experience with SAN, NAS, and iSCSI networks; various storage arrays (Dothill, IBM, EMC, Netapp, Oracle Pillar, and so on); object storage implementations (Swift and Ceph); and software development using the corresponding REST APIs.

    Sreedhar is currently working for Vedams software providing storage engineering services. In the past, he has worked for Stratus Technologies, Compaq, Digital Equipment Corp, and IBM. He has a master's degree in computer science from the University of Massachusetts.

    Browse publications by this author
  • Kris Rajana

    Kris Rajana is a technologist and serial entrepreneur, passionate about building globally distributed teams to deliver innovative infrastructure solutions. His areas of interest include data infrastructure and fast-emerging open source cloud storage technologies, such as OpenStack, Cloud Foundry, Dockers/Containers, and big data. As the CEO of Vedams and Biarca (an offshoot of Vedams), he takes immense pride in his team and its development, which leads to excellence in execution. Kris has over 20 years of experience in managing engineering teams in fields such as space, aviation, storage at BFGoodrich Aerospace, Snap Appliance (currently Overland Storage), Adaptec, Xyratex, and Sullego. His current passion is DevOps, and he likes to leverage leading open source cloud technologies to make enterprises more agile, speed up the development and deployment of modern enterprise applications, and make IT operations more efficient. Kris earned his doctorate in engineering science from Pennsylvania State University.

    He is a member of the board of the Pratham Bay Area Chapter. Along with the Vedams team, he is a sponsor of an urban learning center in Hyderabad. He is a student and sevak of the San Jose Chinmaya mission.

    Browse publications by this author
Book Title
Access this book and the full library for FREE
Access now