In this article by the author, Chukri Soueidi, of the book, Microsoft Azure Storage Essentials, we learn about service offered by Microsoft Azure. The topic of this article will be Table storage. Tables provide scalable, structured data stores that allow you to store huge amounts of data. With the growing needs of modern high-performance applications, choosing the right data store is a major factor in the success of any application.
(For more resources related to this topic, see here.)
The Table storage basics
Azure Table storage is a key/value NoSQL database that allows you to store non-relational, structured, and semi-structured data that does not have to conform to a specific schema. It is ideal for storing huge amounts of data that is ready for simple and quick read/write operations, and is interfaced by the Table REST API using the OData protocol.
A table can contain one or many entities (rows), each up to 1 MB in size. The whole table cannot exceed a 100 TB limit. Each table entity can hold up to 252 columns. Rows in the same table can have different schemas, as opposed to relational database rows that should all comply under one strict schema.
Table storage is structured in a hierarchical relationship between storage accounts, tables, and entities. A storage account can have a zero or more non-related tables, each containing one or more entities.
The components of the service are:
- Storage account: Governs access to the underlying service
- Table: A collection of entities
- Entity: A set of properties, a row in the table
- Properties: Key/value pairs that can hold scalar data such as string, byte array, guid, dateTime, integer, double, boolean
- Table address: http://<account-name>.table.core.windows.net/<table-name >
The address differs when using the Storage Emulator, which emulates the Table storage service locally without incurring any cost. The default address will be the local machine loopback address followed by a predefined default storage account name. It would look as follows:
http://127.0.0.1:10002/devstorageaccount1/<table-name>
For more on the Storage Emulator, you can check out the Use the Azure Storage Emulator for Development and Testing article on the Azure documentation.
Entities
Entities, in tables, are associated with two key properties: the PartitionKey and the RowKey, which together form the entity's primary key.
The PartitionKey must be a string with a maximum allowed size of 1,024 characters. As its name suggests, it divides the table logically into partitions. Entities with the same partition key will be always stored in the same physical node, promoting performance and scalability.
When a table scales up, the Azure platform may automatically move different partitions of a table into separate physical storage nodes for load balancing purposes. Entities with the same partition key stay on the same node no matter how much their table scales; hence, the key selection is essential in determining the general performance of the table.
The second column is the RowKey, which is also of the string type, with a maximum allowed size of 1,024 characters. It has to be unique inside a partition, since the partition key and row key combine to uniquely identify an entity within the table.
In addition to that, each entity has a read-only mandatory Timestamp property, which is essential for managing concurrent updates on records.
Naming rules
When naming tables and properties, there are some naming rules to follow:
- Tables: A table name should be unique per storage account, ranging from 3 to 63 characters long. It can only contain alphanumeric characters and cannot begin with a number. Table names are case-insensitive, but will reflect the case that was used upon creation. (Mytable and MyTable are logically the same name, but if you create a table called Mytable then Mytable—not MyTable—will be displayed when you retrieve table names from the account.)
- Properties: A property name is case-sensitive with a size of up to 255 characters. It should follow C# identifier naming rules. When passed within a URL, certain characters must be percent-encoded, which is automatically done when using the .NET Library.
Some characters are disallowed in the values of the row and partition keys, including the back slash (), forward slash (/), the number sign (#) and the question mark (?).
Using the Azure Storage Client Library
The Azure Storage Client Library for .NET has classes to manage and maintain Table storage objects. After creating a storage account, and getting the appropriate account storage access keys, we now have everything it takes to use the storage service.
For the simplicity of this demo's setup, we will be using a C# Console Application template. To create a Console Application open Visual Studio and navigate to New Project | Templates | Visual C# | Console Application.
By default, the application does not have a reference to the storage client library, so we need to add it using the Package Manager by typing Install-Package WindowsAzure.Storage. This will download and install the library along with all dependencies required.
You can navigate through the library by expanding References in Solution Explorer in Visual Studio, right-clicking on Microsoft.WindowsAzure.Storage, and selecting View in Object Browser. This will show you all library classes and methods.
After getting the storage library, we need to create the connection string by providing the storage account name and access key. The connection string would look like this:
<appSettings>
<add key="StorageConnectionString"
value="DefaultEndpointsProtocol=https;AccountName=account name;AccountKey=access key/>
</appSettings>
In program.cs we need to reference the following libraries:
using System.Configuration; // reference to System.Configuration should be added to the project using Microsoft.WindowsAzure.Storage;
using Microsoft.WindowsAzure.Storage.Table;
Starting with a table
Using the connection string defined in the App.config file, we will proceed to create a CloudStorageAccount object which represents the storage account where the tables will be created. After that we need to use the CloudTableClient class, which is defined under the Microsoft.WindowsAzure.Storage.Table namespace, to create an object which represents a frontage for dealing with Table storage specifically and directly.
After creating the table object we will call its CreateIfNotExists() method that will actually call the Table REST API to create a table. The method is idempotent: if the table already exists, the method will do nothing no matter how many times you call it.
The following is the code to do create a table:
string connectionString = ConfigurationManager.AppSettings["StorageConnectionString"];
CloudStorageAccount storageAccount = CloudStorageAccount.Parse(connectionString);
CloudTableClient tableClient = storageAccount.CreateCloudTableClient();
CloudTable table = tableClient.GetTableReference("Weather");
table.CreateIfNotExists();
A table can be deleted using the following code:
table.DeleteIfExists();
We created a Weather table for the purpose of demonstrating a broader overview of Table storage features. Our sample is a table for saving the weather history of all the cities around the world. For each city, the table holds a weather entry per day and time. Assuming there are more than 200,000 cities and each city has 24 entries per day, the table will contain millions of records in no time.
Adding entities to a table
After creating the table, Weather, we need to populate it with entities. In the .NET library, entities are mapped to C# objects. The library defines the ITableEntity interface which should be implemented by all classes representing entities.
The ITableEntity interface defines the following properties:
- PartitionKey: It is a string for the partition key of the an entity
- RowKey: It is a string for the row key
- Timestamp: It is a DateTime offset for the last update on the entity
- ETag: It is an entity tag for managing concurrency
The library also contains two classes that implement ITableEntity.
The TableEntity is a base class that can be inherited by custom classes. The DynamicTableEntity is a class that holds an IDictionary<String,EntityProperty> property named Properties that is used to store properties inside it.
Now, let's define our custom entity by inheriting from the TableEntity base class:
public class WeatherEntity : TableEntity
{
public WeatherEntity(string cityId, string daytime)
{
this.PartitionKey = cityId;
this.RowKey = daytime;
}
public WeatherEntity() { }
public string CityName { get; set; }
public string CountryCode { get; set; }
public string Description { get; set; }
public string Temperature { get; set; }
public string Humidity { get; set; }
public string Wind { get; set; }
}
In the preceding code, we have defined the Weather entity class with a constructor that takes the arguments of cityId and a string representing the day and time, which we chose to declare as PartitionKey and RowKey, respectively; since the entity should inherit from a TableEntity class, we assign the keys to the base class properties using this.
The partition and row key selection was based on the table's querying needs. The PartitionKey is selected to be the cityId, and the RowKey is a string representation of the date and time in the format "yyyyMMddHHmm".
Now, after defining the entity as an object, we will proceed to add the entity to the WeatherHistory table. First we have to create an instance of this object and populate its properties, then hand it to a TableOperation object to insert it in to our table. The following is the code for this:
WeatherEntity weather = new WeatherEntity("5809844", "201503151200")
{
CityName = "Seattle",
CountryCode = "US",
Description = "Clear",
Temperature = "25.5",
Humidity = "44",
Wind = "11 km/h",
};
TableOperation insertOperation = TableOperation.Insert (weather);
table.Execute (insertOperation);
The TableOperation class represents a single table operation. In addition to the insert operation, this class defines the static merge, replace, and delete, and other methods that we will see later on. After creating the operation, we pass it to the Execute method of the CloudTable table entity, which represents the Weather table.
The Timestamp property of the entity is required, but we did not set it in our code. This is because this property will be filled by the server automatically; even if we set it the server will rewrite its own timestamp.
Entity Group Transactions
In case we have several entities to insert into the table, we have the option to perform a batch operation that can insert many records in one group transaction. We can use the TableBatchOperation and add the entities to it, then perform the CloudTable.ExecuteBatch.
The following is some sample code to add two weathers at the same time:
TableBatchOperation batchOperation = new TableBatchOperation();
WeatherEntity weather1 = new WeatherEntity("5809844", "201503151300")
{
CityName = "Seattle",
CountryCode = "US",
Description = "Light Rain",
Temperature = "23",
Humidity = "82",
Wind = "16 km/h",
};
WeatherEntity weather2 = new WeatherEntity("5809844", "201503151400")
{
CityName = "Seattle",
CountryCode = "US",
Description = "Heavy Rain",
Temperature = "20",
Humidity = "95",
Wind = "16 km/h",
};
batchOperation.Insert(weather1);
batchOperation.Insert(weather2);
table.ExecuteBatch(batchOperation);
The following is a sample of how the table will look when it contains records for several days and cities; we omitted the timestamp for the clarity of the sample:
|
PartitionKey
|
RowKey
|
City
|
Country
|
Description
|
|
Humidity
|
Wind
|
|
…
|
…
|
…
|
…
|
…
|
…
|
…
|
…
|
|
276781
|
201503151300
|
Beirut
|
LB
|
Clear
|
25
|
65
|
5 km/h
|
|
276781
|
201503151400
|
Beirut
|
LB
|
Clear
|
26
|
65
|
5 km/h
|
|
5128638
|
201503151300
|
New York
|
US
|
Sunny
|
25
|
46
|
11 km/h
|
|
5128638
|
201503151400
|
New York
|
US
|
Few clouds
|
25
|
46
|
11 km/h
|
|
5809844
|
201503151300
|
Seattle
|
US
|
Light rain
|
23
|
82
|
16 km/h
|
|
5809844
|
201503151400
|
Seattle
|
US
|
Heavy rain
|
24
|
95
|
16 km/h
|
|
6173331
|
201503151300
|
Vancouver
|
CA
|
Scattered clouds
|
6
|
84
|
12 km/h
|
|
6173331
|
201503151400
|
Vancouver
|
CA
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime
|
Haze rain
|
5
|
84
|
12 km/h
|
|
7284824
|
201503151300
|
Budapest
|
HU
|
Light rain
|
4
|
80
|
14 km/h
|
|
7284824
|
201503151400
|
Budapest
|
HU
|
Light rain
|
4
|
81
|
14 km/h
|
|
…
|
…
|
…
|
…
|
…
|
…
|
…
|
…
|
Batch operations are quite an interesting feature of Tables, as they provide quasi-ACID transactions in the Table storage. Batch operations require all the entities included to have the same partition key or else an exception will be thrown.
Batch operations can perform inserts, updates, and deletes in the same operation, and can include up to 100 entities with a payload size of 4 MB only.
Updating entities
In order to update an entity, we need to retrieve it first, then modify its properties to save it back again. TableOperation supports two types of updates: Replace and Merge. If the entity was modified between the retrieval and saving, an error will be generated and the execution of the operation will fail.
The Replace method will replace the values of the properties of an existing item with the object we provide. Have a look at the following code that changes an existing record for Seattle city for a specific day and hour:
TableOperation retrieveOperation = TableOperation.Retrieve<WeatherEntity> ("5809844", "201503151300");
TableResult result = table.Execute(retrieveOperation);
WeatherEntity updateEntity = (WeatherEntity)result.Result;
if(updateEntity != null)
{
updateEntity.Temperature = 26;
TableOperation updateOperation = TableOperation.Merge(updateEntity);
table.Execute(updateOperation);
}
Notice in the preceding code that the TableOperation.Retrieve method takes the partition key with the row key as arguments, and retrieves the entity into a TableResult object that contains the Result property that can be casted into a WeatherEntity object.
But that's not all, changing an existing property in an entity is not the only type of update that may be required. As we have discussed previously, tables are schema-less, allowing entities in the same table to have different properties, which also means that you can add or remove columns even after the creation of a record.
Suppose you want to add more properties to a weather record, you can achieve this by modifying the class WeatherEntity and adding the new properties to it.
But what if you don't want to change the class since it's a rare collection of data that you are adding? This is where Merge steps in: it allows you to merge changes with an existing entity without overwriting old data.
TableOperation retrieveOperation = TableOperation.Retrieve<WeatherEntity> ("5809844", "201503151300"); //all records for Seattle at March 15, 2015
TableResult result = table.Execute(retrieveOperation);
WeatherEntity originalEntity =(WeatherEntity)result.Result;
if (originalEntity != null)
{
DynamicTableEntity dynamicEntity = new DynamicTableEntity()
{
PartitionKey = originalEntity .PartitionKey,
RowKey = originalEntity .RowKey,
ETag = originalEntity .ETag,
Properties = new Dictionary<string, EntityProperty>
{
{"Visibility", new EntityProperty("16093")},
{"Pressure", new EntityProperty("1015")}
}
};
TableResult results = table.Execute(TableOperation.Merge(dynamicEntity ));
}
The preceding code gets an existing weather entity and assigns it to the originalEntity, object. Then it creates a new object, DynamicTableEntity and copies the PartitionKey, RowKey, and ETag to the new object and adds two properties, Language and Country, as defined previously. TableOperation.Merge was executed over the newly created object.
What this code does is that it adds two properties to the weather record without changing other existing properties. If we had used Replace instead, the original WeatherEntity properties CityName, CountryCode, Description, Temperature, Humidity, and Wind would have lost their values.
Copying the three properties PartitionKey, RowKey, and ETag (for optimistic concurrency) to the newly created DynamicTableEntity is mandatory and without it the operation would have failed.
Our Weather table now looks like this:
|
PartitionKey
|
RowKey
|
City
|
Country
|
Description
|
|
Humidity
|
Wind
|
Visibility
|
Pressure
|
|
….
|
…
|
…
|
…
|
…
|
…
|
…
|
…
|
|
|
|
5128638
|
201503151400
|
New York
|
US
|
Few clouds
|
25
|
46
|
11 km/h
|
|
|
|
5809844
|
201503151300
|
Seattle
|
US
|
Light rain
|
23
|
82
|
16 km/h
|
16093
|
1015
|
|
5809844
|
201503151400
|
Seattle
|
US
|
Heavy rain
|
24
|
95
|
16 km/h
|
|
|
|
6173331
|
201503151300
|
Vancouver
|
CA
|
Scattered clouds
|
6
|
84
|
12 km/h
|
|
|
|
…
|
…
|
…
|
…
|
…
|
…
|
…
|
…
|
|
|
As mentioned earlier, operations will fail if the entity has changed between the retrieval and the update. Sometimes you are not sure if the object already exists on the server or not. You might need to overwrite or insert your values regardless; to do that you can use the InsertOrReplace operation. This will overwrite your values even if they were changed, and if the record doesn't exist it will be inserted.
Deleting an entity can be achieved after retrieving it, as follows:
TableOperation deleteOperation = TableOperation.Delete(originalEntity);
Summary
In this article we have discussed the basics of Table storage, which is the key/value NoSQL storage option offered by Azure platform. We have performed basic operations on entities and tables.
Resources for Article:
Further resources on this subject: