Putting Your Database at the Heart of Azure Solutions

In this article by Riccardo Becker, author of the book Learning Azure DocumentDB, we will see how to build a real scenario around an Internet of Things scenario. This scenario will build a basic Internet of Things platform that can help to accelerate building your own.

In this article, we will cover the following:

  • Have a look at a fictitious scenario
  • Learn how to combine Azure components with DocumentDB
  • Demonstrate how to migrate data to DocumentDB

(For more resources related to this topic, see here.)

Introducing an Internet of Things scenario

Before we start exploring different capabilities to support a real-life scenario, we will briefly explain the scenario we will use throughout this article.

IoT, Inc.

IoT, Inc. is a fictitious start-up company that is planning to build solutions in the Internet of Things domain. The first solution they will build is a registration hub, where IoT devices can be registered. These devices can be diverse, ranging from home automation devices up to devices that control traffic lights and street lights. The main use case for this solution is offering the capability for devices to register themselves against a hub. The hub will be built with DocumentDB as its core component and some Web API to expose this functionality. Before devices can register themselves, they need to be whitelisted in order to prevent malicious devices to start registering.

In the following screenshot, we see the high-level design of the registration requirement:

The first version of the solution contains the following components:

  • A Web API containing methods to whitelist, register, unregister, and suspend devices
  • DocumentDB, containing all the device information including information regarding other Microsoft Azure resources
  • Event Hub, a Microsoft Azure asset that enables scalable publish-subscribe mechanism to ingress and egress millions of events per second
  • Power BI, Microsoft’s online offering to expose reporting capabilities and the ability to share reports

Obviously, we will focus on the core of the solution which is DocumentDB but it is nice to touch some of the Azure components, as well to see how well they co-operate and how easy it is to set up a demonstration for IoT scenarios. The devices on the left-hand side are chosen randomly and will be mimicked by an emulator written in C#.

The Web API will expose the functionality required to let devices register themselves at the solution and start sending data afterwards (which will be ingested to the Event Hub and reported using Power BI).

Technical requirements

To be able to service potentially millions of devices, it is necessary that registration request from a device is being stored in a separate collection based on the country where the device is located or manufactured.

  • Every device is being modeled in the same way, whereas additional metadata can be provided upon registration or afterwards when updating.
  • To achieve country-based partitioning, we will create a custom PartitionResolver to achieve this goal.
  • To extend the basic security model, we reduce the amount of sensitive information in our configuration files.
  • Enhance searching capabilities because we want to service multiple types of devices each with their own metadata and device-specific information. Querying on all the information is desired to support full-text search and enable users to quickly search and find their devices.

Designing the model

Every device is being modeled similar to be able to service multiple types of devices. The device model contains at least the deviceid and a location. Furthermore, the device model contains a dictionary where additional device properties can be stored. The next code snippet shows the device model:

[JsonProperty("id")]
        public string DeviceId { get; set; }
        [JsonProperty("location")]
        public Point Location { get; set; }
        //practically store any metadata information for this device
        [JsonProperty("metadata")]
        public IDictionary<string, object> MetaData { get; set; }

The Location property is of type Microsoft.Azure.Documents.Spatial.Point because we want to run spatial queries later on in this section, for example, getting all the devices within 10 kilometers of a building.

Building a custom partition resolver

To meet the first technical requirement (partition data based on the country), we need to build a custom partition resolver. To be able to build one, we need to implement the IPartitionResolver interface and add some logic. The resolver will take the Location property of the device model and retrieves the country that corresponds with the latitude and longitude provided upon registration.

In the following code snippet, you see the full implementation of the GeographyPartitionResolver class:

public class GeographyPartitionResolver : IPartitionResolver
    {
        private readonly DocumentClient _client;
        private readonly BingMapsHelper _helper;
        private readonly Database _database;
 
        public GeographyPartitionResolver(DocumentClient client, Database database)
        {
            _client = client;
            _database = database;
            _helper = new BingMapsHelper();
        }
        public object GetPartitionKey(object document)
        {
            //get the country for this document
            //document should be of type DeviceModel
            if (document.GetType() == typeof(DeviceModel))
            {
                //get the Location and translate to country
                var country = _helper.GetCountryByLatitudeLongitude(
                    (document as DeviceModel).Location.Position.Latitude,
                    (document as DeviceModel).Location.Position.Longitude);
                return country;
            }
            return String.Empty;
        }
 
        public string ResolveForCreate(object partitionKey)
        {
            //get the country for this partitionkey
            //check if there is a collection for the country found

            var countryCollection = _client.CreateDocumentCollectionQuery(database.SelfLink).
           ToList().Where(cl => cl.Id.Equals(partitionKey.ToString())).FirstOrDefault();
            if (null == countryCollection)
            {
                countryCollection = new DocumentCollection { Id = partitionKey.ToString() };
                countryCollection =
                    _client.CreateDocumentCollectionAsync(_database.SelfLink, countryCollection).Result;
            }
            return countryCollection.SelfLink;
        }
 
        /// <summary>
        /// Returns a list of collectionlinks for the designated partitionkey (one per country)
        /// </summary>
        /// <param name="partitionKey"></param>
        /// <returns></returns>
        public IEnumerable<string> ResolveForRead(object partitionKey)
        {
            var countryCollection = _client.CreateDocumentCollectionQuery(_database.SelfLink).
            ToList().Where(cl => cl.Id.Equals(partitionKey.ToString())).FirstOrDefault();
 
            return new List<string>
            {
                countryCollection.SelfLink
            };
        }
    }

In order to have the DocumentDB client use this custom PartitionResolver, we need to assign it. The code is as follows:

GeographyPartitionResolver resolver = new GeographyPartitionResolver(docDbClient, _database);
 
docDbClient.PartitionResolvers[_database.SelfLink] = resolver;
//Adding a typical device and have the resolver sort out what //country is involved and whether or not the collection already //exists (and create a collection for the country if needed), use //the next code snippet.
var deviceInAmsterdam = new DeviceModel
            {
                DeviceId = Guid.NewGuid().ToString(),
                Location = new Point(4.8951679, 52.3702157)
            };
 
Document modelAmsDocument = docDbClient.CreateDocumentAsync(_database.SelfLink,
                deviceInAmsterdam).Result;
            //get all the devices in Amsterdam   
        var doc = docDbClient.CreateDocumentQuery<DeviceModel>(
                _database.SelfLink, null, resolver.GetPartitionKey(deviceInAmsterdam));

Now that we have created a country-based PartitionResolver, we can start working on the Web API that exposes the registration method.

Building the Web API

A Web API is an online service that can be used by any clients running any framework that supports the HTTP programming stack. Currently, REST is a way of interacting with APIs so that we will build a REST API. Building a good API should aim for platform independence. A well-designed API should also be able to extend and evolve without affecting existing clients.

First, we need to whitelist the devices that should be able to register themselves against our device registry. The whitelist should at least contain a device ID, a unique identifier for a device that is used to match during the whitelisting process. A good candidate for a device ID is the mac address of the device or some random GUID.

Registering a device

The registration Web API contains a POST method that does the actual registration. First, it creates access to an Event Hub (not explained here) and stores the credentials needed inside the DocumentDB document. The document is then created inside the designated collection (based on the location). To learn more about Event Hubs, please visit https://azure.microsoft.com/en-us/services/event-hubs/.

 [Route("api/registration")]
        [HttpPost]
        public async Task<IHttpActionResult> Post([FromBody]DeviceModel value)
        {
            //add the device to the designated documentDB collection (based on country)
            try
            {
var serviceUri = ServiceBusEnvironment.CreateServiceUri("sb", serviceBusNamespace,
                    String.Format("{0}/publishers/{1}", "telemetry", value.DeviceId))
                    .ToString()
                    .Trim('/');
                var sasToken = SharedAccessSignatureTokenProvider.GetSharedAccessSignature(EventHubKeyName,
                    EventHubKey, serviceUri, TimeSpan.FromDays(365 * 100)); // hundred years will do
                //this token can be used by the device to send telemetry
                //this token and the eventhub name will be saved with the metadata of the document to be saved to DocumentDB
                value.MetaData.Add("Namespace", serviceBusNamespace);
                value.MetaData.Add("EventHubName", "telemetry");
                value.MetaData.Add("EventHubToken", sasToken);
                var document = await docDbClient.CreateDocumentAsync(_database.SelfLink, value);
                return Created(document.ContentLocation, value);            }
            catch (Exception ex)
            {
                return InternalServerError(ex);
            }
        }

After this registration call, the right credentials on the Event Hub have been created for this specific device. The device is now able to ingress data to the Event Hub and have consumers like Power BI consume the data and present it.

Event Hubs is a highly scalable publish-subscribe event ingestor. It can collect millions of events per second so that you can process and analyze the massive amounts of data produced by your connected devices and applications. Once collected into Event Hubs, you can transform and store the data by using any real-time analytics provider or with batching/storage adapters.

At the time of writing, Microsoft announced the release of Azure IoT Suite and IoT Hubs. These solutions offer internet of things capabilities as a service and are well-suited to build our scenario as well.

Increasing searching

We have seen how to query our documents and retrieve the information we need. For this approach, we need to understand the DocumentDB SQL language. Microsoft has an online offering that enables full-text search called Azure Search service. This feature enables us to perform full-text searches and it also includes search behaviours similar to search engines. We could also benefit from so called type-ahead query suggestions based on the input of a user. Imagine a search box on our IoT Inc. portal that offers free text searching while the user types and search for devices that include any of the search terms on the fly. Azure Search runs on Azure; therefore, it is scalable and can easily be upgraded to offer more search and storage capacity.

Azure Search stores all your data inside an index, offering full-text search capabilities on your data.

Setting up Azure Search

Setting up Azure Search is pretty straightforward and can be done by using the REST API it offers or on the Azure portal. We will set up the Azure Search service through the portal and later on, we will utilize the REST API to start configuring our search service.

We set up the Azure Search service through the Azure portal (http://portal.azure.com). Find the Search service and fill out some information. In the following screenshot, we can see how we have created the free tier for Azure Search:

You can see that we use the Free tier for this scenario and that there are no datasources configured yet. We will do that know by using the REST API.

We will use the REST API, since it offers more insight on how the whole concept works. We use Fiddler to create a new datasource inside our search environment. The following screenshot shows how to use Fiddler to create a datasource and add a DocumentDB collection:

In the Composer window of Fiddler, you can see we need to POST a payload to the Search service we created earlier. The Api-Key is mandatory and also set the content type to be JSON. Inside the body of the request, the connection information to our DocumentDB environment is need and the collection we want to add (in this case, Netherlands).

Now that we have added the collection, it is time to create an Azure Search index. Again, we use Fiddler for this purpose. Since we use the free tier of Azure Search, we can only add five indexes at most. For this scenario, we add an index on ID (device ID), location, and metadata. At the time of writing, Azure Search does not support complex types. Note that the metadata node is represented as a collection of strings.

We could check in the portal to see if the creation of the index was successful. Go to the Search blade and select the Search service we have just created. You can check the indexes part to see whether the index was actually created.

The next step is creating an indexer. An indexer connects the index with the provided data source.

Creating this indexer takes some time. You can check in the portal if the indexing process was successful. We actually find that documents are part of the index now.

If your indexer needs to process thousands of documents, it might take some time for the indexing process to finish. You can check the progress of the indexer using the REST API again.

https://iotinc.search.windows.net/indexers/deviceindexer/status?api-version=2015-02-28

Using this REST call returns the result of the indexing process and indicates if it is still running and also shows if there are any errors. Errors could be caused by documents that do not have the id property available.

The final step involves testing to check whether the indexing works. We will search for a device ID, as shown in the next screenshot:

In the Inspector tab, we can check for the results. It actually returns the correct document also containing the location field. The metadata is missing because complex JSON is not supported (yet) at the time of writing.

Indexing complex JSON types is not supported yet. It is possible to add SQL queries to the data source. We could explicitly add a SELECT statement to surface the properties of the complex JSON we have like metadata or the Point property.

Try adding additional queries to your data source to enable querying complex JSON types.

Now that we have created an Azure Search service that indexes our DocumentDB collection(s), we can build a nice query-as-you-type field on our portal. Try this yourself.

Enhancing security

Microsoft Azure offers a capability to move your secrets away from your application towards Azure Key Vault. Azure Key Vault helps to protect cryptographic keys, secrets, and other information you want to store in a safe place outside your application boundaries (connectionstring are also good candidates). Key Vault can help us to protect the DocumentDB URI and its key.

DocumentDB has no (in-place) encryption feature at the time of writing, although a lot of people already asked for it to be on the roadmap.

Creating and configuring Key Vault

Before we can use Key Vault, we need to create and configure it first. The easiest way to achieve this is by using PowerShell cmdlets. Please visit https://msdn.microsoft.com/en-us/mt173057.aspx to read more about PowerShell.

The following PowerShell cmdlets demonstrate how to set up and configure a Key Vault:

Command

Description

Get-AzureSubscription

This command will prompt you to log in using your Microsoft Account. It returns a list of all Azure subscriptions that are available to you.

Select-AzureSubscription -SubscriptionName "Windows Azure MSDN Premium"

This tells PowerShell to use this subscription as being subject to our next steps.

Switch-AzureMode AzureResourceManager

New-AzureResourceGroup –Name 'IoTIncResourceGroup' –Location 'West Europe'

This creates a new Azure Resource Group with a name and a location.

New-AzureKeyVault -VaultName 'IoTIncKeyVault' -ResourceGroupName 'IoTIncResourceGroup' -Location 'West Europe'

This creates a new Key Vault inside the resource group and provide a name and location.

$secretvalue = ConvertTo-SecureString '<DOCUMENTDB KEY>' -AsPlainText –Force

This creates a security string for my DocumentDB key.

$secret = Set-AzureKeyVaultSecret -VaultName 'IoTIncKeyVault' -Name 'DocumentDBKey' -SecretValue $secretvalue

This creates a key named DocumentDBKey into the vault and assigns it the secret value we have just received.

Set-AzureKeyVaultAccessPolicy -VaultName 'IoTIncKeyVault' -ServicePrincipalName <SPN> -PermissionsToKeys decrypt,sign

This configures the application with the Service Principal Name <SPN> to get the appropriate rights to decrypt and sign

Set-AzureKeyVaultAccessPolicy -VaultName 'IoTIncKeyVault' -ServicePrincipalName <SPN> -PermissionsToSecrets Get

This configures the application with SPN to also be able to get a key.

Key Vault must be used together with Azure Active Directory to work. The SPN we need in the steps for powershell is actually is a client ID of an application I have set up in my Azure Active Directory. Please visit https://azure.microsoft.com/nl-nl/documentation/articles/active-directory-integrating-applications/ to see how you can create an application.

Make sure to copy the client ID (which is retrievable afterwards) and the key (which is not retrievable afterwards). We use these two pieces of information to take the next step.

Using Key Vault from ASP.NET

In order to use the Key Vault we have created in the previous section, we need to install some NuGet packages into our solution and/or projects:

Install-Package Microsoft.IdentityModel.Clients.ActiveDirectory -Version 2.16.204221202
 
Install-Package Microsoft.Azure.KeyVault

These two packages enable us to use AD and Key Vault from our ASP.NET application. The next step is to add some configuration information to our web.config file:

<add key="ClientId" value="<CLIENTID OF THE APP CREATED IN AD" />
    <add key="ClientSecret" value="<THE SECRET FROM AZURE AD PORTAL>" />
 
    <!-- SecretUri is the URI for the secret in Azure Key Vault -->
    <add key="SecretUri" value="https://iotinckeyvault.vault.azure.net:443/secrets/DocumentDBKey" />

If you deploy the ASP.NET application to Azure, you could even configure these settings from the Azure portal itself, completely removing this from the web.config file. This technique adds an additional ring of security around your application.

The following code snippet shows how to use AD and Key Vault inside the registration functionality of our scenario:

//no more keys in code or .config files. Just a appid, secret and the unique URL to our key (SecretUri). When deploying to Azure we could
            //even skip this by setting appid and clientsecret in the Azure Portal.
            var kv = new KeyVaultClient(new KeyVaultClient.AuthenticationCallback(Utils.GetToken));
            var sec = kv.GetSecretAsync(WebConfigurationManager.AppSettings["SecretUri"]).Result.Value;

The Utils.GetToken method is shown next. This method retrieves an access token from AD by supplying the ClientId and the secret. Since we configured Key Vault to allow this application to get the keys, the call to GetSecretAsync() will succeed. The code is as follows:

public async static Task<string> GetToken(string authority, string resource, string scope)
        {
            var authContext = new AuthenticationContext(authority);
            ClientCredential clientCred = new ClientCredential(WebConfigurationManager.AppSettings["ClientId"],
                        WebConfigurationManager.AppSettings["ClientSecret"]);
            AuthenticationResult result = await authContext.AcquireTokenAsync(resource, clientCred);
 
            if (result == null)
                throw new InvalidOperationException("Failed to obtain the JWT token");
            return result.AccessToken;
        }

Instead of storing the key to DocumentDB somewhere in code or in the web.config file, it is now moved away to Key Vault. We could do the same with the URI to our DocumentDB and with other sensitive information as well (for example, storage account keys or connection strings).

Encrypting sensitive data

The documents we created in the previous section contains sensitive data like namespaces, Event Hub names, and tokens.

We could also use Key Vault to encrypt those specific values to enhance our security. In case someone gets hold of a document containing the device information, he is still unable to mimic this device since the keys are encrypted.

Try to use Key Vault to encrypt the sensitive information that is stored in DocumentDB before it is saved in there.

Migrating data

This section discusses how to use a tool to migrate data from an existing data source to DocumentDB. For this scenario, we assume that we already have a large datastore containing existing devices and their registration information (Event Hub connection information). In this section, we will see how to migrate an existing data store to our new DocumentDB environment. We use the DocumentDB Data Migration Tool for this.

You can download this tool from the Microsoft Download Center (http://www.microsoft.com/en-us/download/details.aspx?id=46436) or from GitHub if you want to check the code.

The tool is intuitive and enables us to migrate from several datasources:

  • JSON files
  • MongoDB
  • SQL Server
  • CSV files
  • Azure Table storage
  • Amazon DynamoDB
  • HBase
  • DocumentDB collections

To demonstrate the use, we migrate our existing Netherlands collection to our United Kingdom collection.

Start the tool and enter the right connection string to our DocumentDB database. We do this for both our source and target information in the tool. The connection strings you need to provide should look like this:

AccountEndpoint=https://<YOURDOCDBURL>;AccountKey=<ACCOUNTKEY>;Database=<NAMEOFDATABASE>.

You can click on the Verify button to make sure these are correct.

In the Source Information field, we provide the Netherlands as being the source to pull data from. In the Target Information field, we specify the United Kingdom as the target. In the following screenshot, you can see how these settings are provided in the migration tool for the source information:

The following screenshot shows the settings for the target information:

It is also possible to migrate data to a collection that is not created yet. The migration tool can do this if you enter a collection name that is not available inside your database. You also need to select the pricing tier. Optionally, setting the partition key could help to distribute your documents based on this key across all collections you add in this screen.

This information is sufficient to run our example. Go to the Summary tab and verify the information you entered. Press Import to start the migration process.

We can verify a successful import on the Import results pane.

This example is a simple migration scenario but the tool is also capable of using complex queries to only migrate those documents that need to moved or migrated.

Try migrating data from an Azure Table storage table to DocumentDB by using this tool.

Summary

In this article, we saw how to integrate DocumentDB with other Microsoft Azure features. We discussed how to setup the Azure Search service and how create an index to our collection. We also covered how to use the Azure Search feature to enable full-text search on our documents which could enable users to query while typing. Next, we saw how to add additional security to our scenario by using Key Vault. We also discussed how to create and configure Key Vault by using PowerShell cmdlets, and we saw how to enable our ASP.NET scenario application to make use of the Key Vault .NET SDK. Then, we discussed how to retrieve the sensitive information from Key Vault instead of configuration files. Finally, we saw how to migrate an existing data source to our collection by using the DocumentDB Data Migration Tool.

Resources for Article:


Further resources on this subject:


You've been reading an excerpt of:

Learning Azure DocumentDB

Explore Title