Most developers would describe a modern database as relational with stored procedures and cross-table functions such as join. So why would you use a database that has none of these capabilities? The answer is scalability.
This morning, CNN ran a story on your new web application. Yesterday you had 10 concurrent users, and now your site is viral with 50,000 users signing on. Which database will handle 50,000 concurrent users without a complex expensive cluster? The answer is SimpleDB.
Pay only for your use
Access from any web-based system
No fixed schema
New metaphor—write seldom, read many
SimpleDB is one of the core Amazon Web Services, which include Amazon Simple Storage Service (S3) and Amazon Elastic Compute Cloud (EC2). Amazon SimpleDB stores your structured data as key-value pairs in the Amazon Web Services (AWS) cloud and lets you run real-time queries against this data. You can scale it easily in response to increased load from your successful applications without the need for a costly cluster database server complex.
SimpleDB, as illustrated in the following diagram, is designed to be used either as an independent data storage component in your applications or in conjunction with some of the other services from Amazon's stable of Cloud Services, such as Amazon S3 and Amazon EC2.
The biggest challenge in SimpleDB is learning to think in its unique metaphor. Like speaking a new language, you need to stop translating and start thinking in that language. Rather than thinking of SimpleDB as a database, approach it as a spreadsheet with some XML characteristics.
SimpleDB functionality can be accessed from almost any programming language (such as Python, Ruby, Java, PHP, Erlang, and Perl) using super simple HTTP-based requests. You can get started anytime you like, and you pay for it based on how much you use it. It is very different from a relational database, and takes a completely different approach toward storing and querying data. It follows the convention of eventual consistency. Think of it as a single master database for updates and a large collection of read database slaves. Any changes made to your data will need to be propagated across all the different copies. This can sometimes take a few seconds depending upon the system load at that time and network latency, which means that a consumer of your domain and data may not see the changes immediately. The changes will eventually be propagated throughout SimpleDB, but this is an important consideration you need to think about when designing your application.
As SimpleDB is so different, it helps to have a tool for manipulating and exploring the database. When developing with a MySQL database, phpMyAdmin allows the developer to work directly on the database. SimpleDB has a similar free Firefox plugin called sdbtool (http://code.google.com/p/sdbtool/). Another Firefox plugin used in the more advanced examples is S3Fox (http://www.s3fox.net/) for administering the Amazon S3 storage. In this book, we cover several basic sample applications. We have also provided code to show each SimpleDB application using three languages: Java, PHP, and Python.
As access to SimpleDB can be from any site on the Web, the PHP samples can be downloaded and run directly from your site. To run any sample, an Amazon account is required. These samples let you explore most of the SimpleDB API, as well as some of the S3 API capabilities.
You can both download and try the PHP samples from http://www.webmasterinresidence.ca/simpledb/.
The best way to wrap your head around the way SimpleDB works is to picture a spreadsheet that contains your structured data. For instance, a contact database that stores information on your customers can be represented in SimpleDB as follows:
The entire customers table will be represented as the domain Customers. Domains group similar data for your application, and you can have up to 100 domains per AWS account. If required, you can increase this limit further by filling out a form on the SimpleDB website. The data stored in these domains is retrieved by making queries against the specific domain. There is no concept of joins as in the relational database world; therefore, queries run within a specific domain and not across domains.
Each customer is represented by a unique Customer ID. Items are similar to rows in a database table. Each item identifies a single object and contains data for that individual item as a number of key-value attributes. Each item is identified by a unique key or identifier, or in traditional terminology, the primary key. SimpleDB does not support the concept of auto-incrementing keys, and most people use a generated key such as the unix timestamp combined with the user identifier or something similar as the unique identifier for an item. You can have up to one billion items in each domain.
Each customer item will have distinguishing characteristics that are represented by an attribute. A customer will have a name, a phone number, an address, and other such attributes, which are similar to the columns in a table in a database. SimpleDB even enables you to have different attributes for each item in a domain. This kind of schema independence lets you mix and match items within a domain to satisfy the needs of your application easily, while at the same time enables you to take advantage of the benefits of the automatic indexing provided by SimpleDB. If your company suddenly decides to start marketing using Twitter, you can simply add a new attribute to your customer domain for the customers who have a Twitter ID! In traditional database terminology, there is no need to add a new column to the table.
Each customer attribute will be associated with a value, which is the same as a cell in a spreadsheet or the value of a column in a database. A relational database or a spreadsheet supports only a single value for each cell or column, while SimpleDB allows you to have multiple values for a single attribute. This lets you do things such as store multiple e-mail addresses for a customer while taking advantage of automatic indexing, without the need for you to manually create new and separate columns for each e-mail address, and then index each new column. In a relational DB, a separate table with a join would be used to store the multiple values. Unlike a delimited list in a character field, the multiple values are indexed, enabling quick searching.
It is a simple way of modeling your data, but at the same time, it is different from a relational database model that is familiar to most users. The following table compares SimpleDB components with a spreadsheet and a relational database:
You interact with SimpleDB by making authenticated HTTP requests along with the desired parameters. There are several libraries available in different programming languages that encapsulate this entire process and make it even easier to interact with SimpleDB by removing some of the tedium of manually constructing the HTTP requests. The next chapter explores these libraries and the advantages provided by them.
The date and time the metadata was last updated
The number of all items in the domain
The number of attribute name/value pairs in the domain
The number of unique attribute names in the domain
The total size of all item names in the domain, in bytes
The total size of all attribute values, in bytes
The total size of all unique attribute names, in bytes
You can create or modify the data stored within your domains by using the following operations:
You can retrieve items that match your criteria from the dataset stored in your domains using the following operations:
The following diagram illustrates the different components of SimpleDB and the operations that can be used for interacting with them:
Amazon provides a free tier for SimpleDB along with pricing for usage above the free tier limit. The charges are based on the machine utilization of each SimpleDB request along with the amount of machine capacity that is utilized for completing the specified request normalized to the hourly capacity of a 1.7 GHz Xeon processor.
As of the publication date of this book, there are no charges on the first 25 machine hours, 1 GB of data transfer, and 1 GB of storage that you consume every month. This is a significant amount of usage being provided for free for this limited time period by Amazon, and there are many kinds of applications that can operate entirely within this free tier. Pricing details are available at http://aws.amazon.com/simpledb/.
While a credit card is required to sign up, usage can be checked at any time with the Amazon Account Activity web page. Amazon estimates about 2,000,000 GET or SELECT API calls per month without any charge.
The pricing details might make it a bit daunting to figure out what your costs may be initially, but the free tier provided by Amazon goes a long way toward getting you more comfortable using the service and also putting SimpleDB through its paces without significant cost. There is also a nice calculator provided on the AWS site that is very helpful for computing the monthly usage costs for SimpleDB and the other Amazon web services. You can find the Amazon web services cost calculator at http://calculator.s3.amazonaws.com/calc5.html.
You now have an overview of the service, and you are reasonably familiar with what SimpleDB can do. It is a great piece of technology that enables you to create scalable applications that are capable of using massive amounts of data, and you can put this power and simplicity to use in your own applications.
You can leverage SimpleDB in your applications to quickly add, edit, and retrieve data using a simple set of API calls. The well-thought-out API and simplicity of usage will make your applications easier to design, architect, and maintain in the long run, while removing the burdens of data modeling, index maintenance, and performance tuning.
You no longer have to either know or pre-define every single piece of data that you will possibly "need to store" for your application. You expand your data set as you go and add the new attributes only when they are absolutely needed. SimpleDB enables you to do this easily, and even the indexing for these newly-added attributes is automatically handled behind the scenes without any need for your intervention.
High-performance web applications need the ability to store and retrieve data in a fast and efficient way. Amazon SimpleDB provides your applications with this ability while removing a lot of the administrative and maintenance complexities, leaving you free to focus on what's important to you—your application.
You pay only for the SimpleDB resources that you actually consume, and you no longer need to lay out significant expenditures up front for database software licenses or even hardware. The capacity planning and handling of any spikes in load and traffic are automatically handled by Amazon, freeing valuable resources that can be deployed in other areas. SimpleDB pricing passes on to you the cost savings achieved by Amazon's economies of scale.
And last but most importantly, you can easily handle traffic and load spikes on your applications, as SimpleDB will be doing all of the heavy lifting and scaling for you. You can even handle the massive and tsunami-like increases in traffic that can result from being mentioned on the front page of Yahoo or Digg, or becoming a trendy topic on Twitter.
SimpleDB is designed to integrate easily and work well with the other cloud services from Amazon such as Amazon EC2 and Amazon S3. This enables you to take full advantage of these other services and offload data processing and file storage needs to the cloud, while still using SimpleDB for your structured data storage needs. Web-scale computing for your application needs along with cost-effectiveness is easier thanks to these cloud services.