Amazon SimpleDB Developer Guide

Chapter 1. Getting to Know SimpleDB

Most developers would describe a modern database as relational with stored procedures and cross-table functions such as join. So why would you use a database that has none of these capabilities? The answer is scalability.

This morning, CNN ran a story on your new web application. Yesterday you had 10 concurrent users, and now your site is viral with 50,000 users signing on. Which database will handle 50,000 concurrent users without a complex expensive cluster? The answer is SimpleDB.

Why SimpleDB?

Scalability
Pay only for your use
Access from any web-based system
No fixed schema

Challenges?

New metaphor—write seldom, read many
Eventual consistency

SimpleDB is one of the core Amazon Web Services, which include Amazon Simple Storage Service (S3) and Amazon Elastic Compute Cloud (EC2). Amazon SimpleDB stores your structured data as key-value pairs in the Amazon Web Services (AWS) cloud and lets you run real-time queries against this data. You can scale it easily in response to increased load from your successful applications without the need for a costly cluster database server complex.

SimpleDB, as illustrated in the following diagram, is designed to be used either as an independent data storage component in your applications or in conjunction with some of the other services from Amazon's stable of Cloud Services, such as Amazon S3 and Amazon EC2.

The biggest challenge in SimpleDB is learning to think in its unique metaphor. Like speaking a new language, you need to stop translating and start thinking in that language. Rather than thinking of SimpleDB as a database, approach it as a spreadsheet with some XML characteristics.

SimpleDB functionality can be accessed from almost any programming language (such as Python, Ruby, Java, PHP, Erlang, and Perl) using super simple HTTP-based requests. You can get started anytime you like, and you pay for it based on how much you use it. It is very different from a relational database, and takes a completely different approach toward storing and querying data. It follows the convention of eventual consistency. Think of it as a single master database for updates and a large collection of read database slaves. Any changes made to your data will need to be propagated across all the different copies. This can sometimes take a few seconds depending upon the system load at that time and network latency, which means that a consumer of your domain and data may not see the changes immediately. The changes will eventually be propagated throughout SimpleDB, but this is an important consideration you need to think about when designing your application.

Chapter 1. Getting to Know SimpleDB

Why SimpleDB?

Scalability
Pay only for your use
Access from any web-based system
No fixed schema

Challenges?

New metaphor—write seldom, read many
Eventual consistency

How does SimpleDB work?

The best way to wrap your head around the way SimpleDB works is to picture a spreadsheet that contains your structured data. For instance, a contact database that stores information on your customers can be represented in SimpleDB as follows:

As SimpleDB is a different database metaphor, new terms have been introduced. The use of this new terminology by Amazon stresses that the traditional assumptions may not be valid.

Domain

The entire customers table will be represented as the domain Customers. Domains group similar data for your application, and you can have up to 100 domains per AWS account. If required, you can increase this limit further by filling out a form on the SimpleDB website. The data stored in these domains is retrieved by making queries against the specific domain. There is no concept of joins as in the relational database world; therefore, queries run within a specific domain and not across domains.

Item

Each customer is represented by a unique Customer ID. Items are similar to rows in a database table. Each item identifies a single object and contains data for that individual item as a number of key-value attributes. Each item is identified by a unique key or identifier, or in traditional terminology, the primary key. SimpleDB does not support the concept of auto-incrementing keys, and most people use a generated key such as the unix timestamp combined with the user identifier or something similar as the unique identifier for an item. You can have up to one billion items in each domain.

Attributes

Each customer item will have distinguishing characteristics that are represented by an attribute. A customer will have a name, a phone number, an address, and other such attributes, which are similar to the columns in a table in a database. SimpleDB even enables you to have different attributes for each item in a domain. This kind of schema independence lets you mix and match items within a domain to satisfy the needs of your application easily, while at the same time enables you to take advantage of the benefits of the automatic indexing provided by SimpleDB. If your company suddenly decides to start marketing using Twitter, you can simply add a new attribute to your customer domain for the customers who have a Twitter ID! In traditional database terminology, there is no need to add a new column to the table.

Values

Each customer attribute will be associated with a value, which is the same as a cell in a spreadsheet or the value of a column in a database. A relational database or a spreadsheet supports only a single value for each cell or column, while SimpleDB allows you to have multiple values for a single attribute. This lets you do things such as store multiple e-mail addresses for a customer while taking advantage of automatic indexing, without the need for you to manually create new and separate columns for each e-mail address, and then index each new column. In a relational DB, a separate table with a join would be used to store the multiple values. Unlike a delimited list in a character field, the multiple values are indexed, enabling quick searching.

It is a simple way of modeling your data, but at the same time, it is different from a relational database model that is familiar to most users. The following table compares SimpleDB components with a spreadsheet and a relational database:

Relational Database	Spreadsheet	SimpleDB
Table	Worksheet	Domain
Row	Row	Item
Column	Cell	Attribute
Value	Value	Value(s)

How do I interact with SimpleDB?

You interact with SimpleDB by making authenticated HTTP requests along with the desired parameters. There are several libraries available in different programming languages that encapsulate this entire process and make it even easier to interact with SimpleDB by removing some of the tedium of manually constructing the HTTP requests. The next chapter explores these libraries and the advantages provided by them.

There are three main types of actions that you will need to do when you are working with SimpleDB—create, modify, and retrieve information about your domains by using the following operations:

CreateDomain: Create a new domain that contains your dataset.
DeleteDomain: Delete an existing domain.
ListDomains: List all the domains.
DomainMetadata: Retrieve information that gives you a general picture of the domain and the items that are stored within it, such as:
- The date and time the metadata was last updated
- The number of all items in the domain
- The number of attribute name/value pairs in the domain
- The number of unique attribute names in the domain
- The total size of all item names in the domain, in bytes
- The total size of all attribute values, in bytes
- The total size of all unique attribute names, in bytes

You can create or modify the data stored within your domains by using the following operations:

PutAttributes: Create or update an item and its attributes. Items will automatically be indexed by SimpleDB as they are added.
BatchPutAttributes: Create or update multiple attributes (up to 25) in a single call for improved overall throughput of bulk write operations.
DeleteAttributes: Delete an item, an attribute, or an attribute value.

You can retrieve items that match your criteria from the dataset stored in your domains using the following operations:

GetAttributes: Retrieve an item and all or a subset of its attributes and values matching your criteria.
Select: Retrieve an item and all or a subset of its attributes and values matching your criteria, using the SELECT syntax that is popular in the SQL world.

The following diagram illustrates the different components of SimpleDB and the operations that can be used for interacting with them:

Why should I use SimpleDB?

You now have an overview of the service, and you are reasonably familiar with what SimpleDB can do. It is a great piece of technology that enables you to create scalable applications that are capable of using massive amounts of data, and you can put this power and simplicity to use in your own applications.

Make your applications simpler to architect

You can leverage SimpleDB in your applications to quickly add, edit, and retrieve data using a simple set of API calls. The well-thought-out API and simplicity of usage will make your applications easier to design, architect, and maintain in the long run, while removing the burdens of data modeling, index maintenance, and performance tuning.

Build flexibility into your applications

You no longer have to either know or pre-define every single piece of data that you will possibly "need to store" for your application. You expand your data set as you go and add the new attributes only when they are absolutely needed. SimpleDB enables you to do this easily, and even the indexing for these newly-added attributes is automatically handled behind the scenes without any need for your intervention.

Create high-performance web applications

High-performance web applications need the ability to store and retrieve data in a fast and efficient way. Amazon SimpleDB provides your applications with this ability while removing a lot of the administrative and maintenance complexities, leaving you free to focus on what's important to you—your application.

Take advantage of lower costs

You pay only for the SimpleDB resources that you actually consume, and you no longer need to lay out significant expenditures up front for database software licenses or even hardware. The capacity planning and handling of any spikes in load and traffic are automatically handled by Amazon, freeing valuable resources that can be deployed in other areas. SimpleDB pricing passes on to you the cost savings achieved by Amazon's economies of scale.

Scale your applications on demand

And last but most importantly, you can easily handle traffic and load spikes on your applications, as SimpleDB will be doing all of the heavy lifting and scaling for you. You can even handle the massive and tsunami-like increases in traffic that can result from being mentioned on the front page of Yahoo or Digg, or becoming a trendy topic on Twitter.

Architect for the cloud

SimpleDB is designed to integrate easily and work well with the other cloud services from Amazon such as Amazon EC2 and Amazon S3. This enables you to take full advantage of these other services and offload data processing and file storage needs to the cloud, while still using SimpleDB for your structured data storage needs. Web-scale computing for your application needs along with cost-effectiveness is easier thanks to these cloud services.

Patric Ojala Feb 18, 2013

The different scripting languages take up quite a bit of realestate, but I cannot think of much else negative to say about this. Great read for learning NoSQL basics, NoSQL vs SQL differences, how to use SimpleDB and what to use it for.

Amazon Verified review

Michael Bulla May 22, 2011

Das Buch bietet einen guten allumfassenden Einblick in das Thema Amazon Webserverices. Es wird angesprochen wozu Cloud Computing überhaupt gut ist. Es werden die ersten Schritte aufgezeigt, um mit AWS durchstarten zu können. Es werden alle Dienst (Stand Ende 2010) erklärt. Insofern ein super Buch, für alle, die noch keine Schnittpunkte hatten und gucken wollen, was AWS überhaupt ist.Ich hatte mich vor dem Kauf des Buches schon intensiv mit der AWS-Homepage auseinander gesetzt. Daher gab es für mich nicht viel Neues. Ein weiterer Kritikpunkt ist, dass der Autor es mit dem Beispielcode zu gut macht. Da wird über Seiten hinweg PHP-Code abgedruckt, der imho keinen Mehrwert bietet. Ohne diesen könnte das Buch bestimmt 25% dünner sein.

Amazon SimpleDB Developer Guide: Scale your application's database on the cloud using Amazon SimpleDB

What do you get with Print?

Amazon SimpleDB Developer Guide