Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials

7010 Articles
article-image-build-intelligent-interfaces-with-coreml-using-a-cnn-tutorial
Savia Lobo
03 Sep 2018
19 min read
Save for later

Build intelligent interfaces with CoreML using a CNN [Tutorial]

Savia Lobo
03 Sep 2018
19 min read
Core ML gives the potential for devices to better serve us rather than us serving them. This adheres to a rule stated by developer Eric Raymond that a computer should never ask the user for any information that it can auto-detect, copy, or deduce. This article is an excerpt taken from Machine Learning with Core ML written by Joshua Newnham. In today's post, we will implement an application that will attempt to guess what the user is trying to draw and provide pre-drawn drawings that the user can substitute with (image search).  We will be exploring two techniques. The first is using a convolutional neural network (CNN), which we are becoming familiar with, to make the prediction, and then look at how we can apply a context-based similarity sorting strategy to better align the suggestions with what the user is trying to sketch. Reviewing the training data and model We will be using a slightly smaller set, with 205 out of the 250 categories; the exact categories can be found in the CSV file /Chapter7/Training/sketch_classes.csv, along with the Jupyter Notebooks used to prepare the data and train the model. The original sketches are available in SVG and PNG formats. Because we're using a CNN, rasterized images (PNG) were used but rescaled from 1111 x 1111 to 256 x 256; this is the expected input of our model. The data was then split into a training and a validation set, using 80% (64 samples from each category) for training and 20% (17 samples from each category) for validation. After 68 iterations (epochs), the model was able to achieve an accuracy of approximately 65% on the validation data. Not exceptional, but if we consider the top two or three predictions, then this accuracy increases to nearly 90%. The following diagram shows the plots comparing training and validation accuracy, and loss during training: With our model trained, our next step is to export it using the Core ML Tools made available by Apple (as discussed in previous chapters) and imported into our project. Classifying sketches Here we will walk through importing the Core ML model into our project and hooking it up, including using the model to perform inference on the user's sketch and also searching and suggesting substitute images for the user to swap their sketch with. Let's get started with importing the Core ML model into our project. Locate the model in the project repositories folder /CoreMLModels/Chapter7/cnnsketchclassifier.mlmodel; with the model selected, drag it into your Xcode project, leaving the defaults for the Import options. Once imported, select the model to inspect the details, which should look similar to the following screenshot: As with all our models, we verify that the model is included in the target by verifying that the appropriate Target Membership is checked, and then we turn our attention to the inputs and outputs, which should be familiar by now. We can see that our model is expecting a single-channel (grayscale) 256 x 256 image and it returns the dominate class via the classLabel property of the output, along with a dictionary of probabilities of all classes via the classLabelProbs property. With our model now imported, let's discuss the details of how we will be integrating it into our project. Recall that our SketchView emits the events UIControlEvents.editingDidStart, UIControlEvents.editingChanged, and UIControlEvents.editingDidEnd as the user draws. If you inspect the SketchViewController, you will see that we have already registered to listen for the UIControlEvents.editingDidEnd event, as shown in the following code snippet: override func viewDidLoad() { super.viewDidLoad() ... ... self.sketchView.addTarget(self, action: #selector(SketchViewController.onSketchViewEditingDidEnd), for: .editingDidEnd) queryFacade.delegate = self } Each time the user ends a stroke, we will start the process of trying to guess what the user is sketching and search for suitable substitutes. This functionality is triggered via the .editingDidEnd action method onSketchViewEditingDidEnd, but will be delegated to the class QueryFacade, which will be responsible for implementing this functionality. This is where we will spend the majority of our time in this section and the next section. It's also probably worth highlighting the statement queryFacade.delegate = self in the previous code snippet. QueryFacade will be performing most of its work off the main thread and will notify this delegate of the status and results once finished, which we will get to in a short while. Let's start by implementing the functionality of the onSketchViewEditingDidEnd method, before turning our attention to the QueryFacade class. Within the SketchViewController class, navigate to the onSketchViewEditingDidEnd method and append the following code: guard self.sketchView.currentSketch != nil, let sketch = self.sketchView.currentSketch as? StrokeSketch else{ return } queryFacade.asyncQuery(sketch: sketch) Here, we are getting the current sketch, and returning it if no sketch is available or if it's not a StrokeSketch; we hand it over to our queryFacade (an instance of the QueryFacade class). Let's now turn our attention to the QueryFacade class; select the QueryFacade.swift file from the left-hand panel within Xcode to bring it up in the editor area. A lot of plumbing has already been implemented to allow us to focus our attention on the core functionality of predicting, searching, and sorting. Let's quickly discuss some of the details, starting with the properties: let context = CIContext() let queryQueue = DispatchQueue(label: "query_queue") var targetSize = CGSize(width: 256, height: 256) weak var delegate : QueryDelegate? var currentSketch : Sketch?{ didSet{ self.newQueryWaiting = true self.queryCanceled = false } } fileprivate var queryCanceled : Bool = false fileprivate var newQueryWaiting : Bool = false fileprivate var processingQuery : Bool = false var isProcessingQuery : Bool{ get{ return self.processingQuery } } var isInterrupted : Bool{ get{ return self.queryCanceled || self.newQueryWaiting } } QueryFacade is only concerned with the most current sketch. Therefore, each time a new sketch is assigned using the currentSketch property, queryCanceled is set to true. During each task (such as performing prediction, search, and downloading), we check the isInterrupted property, and if true, we will exit early and proceed to process the latest sketch. When you pass the sketch to the asyncQuery method, the sketch is assigned to the currentSketch property and then proceeds to call queryCurrentSketch to do the bulk of the work, unless there is one currently being processed: func asyncQuery(sketch:Sketch){ self.currentSketch = sketch if !self.processingQuery{ self.queryCurrentSketch() } } fileprivate func processNextQuery(){ self.queryCanceled = false if self.newQueryWaiting && !self.processingQuery{ self.queryCurrentSketch() } } fileprivate func queryCurrentSketch(){ guard let sketch = self.currentSketch else{ self.processingQuery = false self.newQueryWaiting = false return } self.processingQuery = true self.newQueryWaiting = false queryQueue.async { DispatchQueue.main.async{ self.processingQuery = false self.delegate?.onQueryCompleted( status:self.isInterrupted ? -1 : -1, result:nil) self.processNextQuery() } } } Let's work bottom-up by implementing all the supporting methods before we tie everything together within the queryCurrentSketch method. Let's start by declaring an instance of our model; add the following variable within the QueryFacade class near the top: let sketchClassifier = cnnsketchclassifier() Now, with our model instantiated and ready, we will navigate to the classifySketch method of the QueryFacade class; it is here that we will make use of our imported model to perform inference, but let's first review what already exists: func classifySketch(sketch:Sketch) -> [(key:String,value:Double)]?{ if let img = sketch.exportSketch(size: nil)? .resize(size: self.targetSize).rescalePixels(){ return self.classifySketch(image: img) } return nil } func classifySketch(image:CIImage) -> [(key:String,value:Double)]?{ return nil } Here, we see that the classifySketch is overloaded, with one method accepting a Sketch and the other a CIImage. The former, when called, will obtain the rasterize version of the sketch using the exportSketch method. If successful, it will resize the rasterized image using the targetSize property. Then, it will rescale the pixels before passing the prepared CIImage along to the alternative classifySketch method. Pixel values are in the range of 0-255 (per channel; in this case, it's just a single channel). Typically, you try to avoid having large numbers in your network. The reason is that they make it more difficult for your model to learn (converge)—somewhat analogous to trying to drive a car whose steering wheel can only be turned hard left or hard right. These extremes would cause a lot of over-steering and make navigating anywhere extremely difficult. The second classifySketch method will be responsible for performing the actual inference. Add the following code within the classifySketch(image:CIImage) method: if let pixelBuffer = image.toPixelBuffer(context: self.context, gray: true){ let prediction = try? self.sketchClassifier.prediction(image: pixelBuffer) if let classPredictions = prediction?.classLabelProbs{ let sortedClassPredictions = classPredictions.sorted(by: { (kvp1, kvp2) -> Bool in kvp1.value > kvp2.value }) return sortedClassPredictions } } return nil Here, we use the images, toPixelBuffer method, an extension we added to the CIImage class, to obtain a grayscale CVPixelBuffer representation of itself. Now, with reference to its buffer, we pass it onto the prediction method of our model instance, sketchClassifier, to obtain the probabilities for each label. We finally sort these probabilities from the most likely to the least likely before returning the sorted results to the caller. Now, with some inkling as to what the user is trying to sketch, we will proceed to search and download the ones we are most confident about. The task of searching and downloading will be the responsibility of the downloadImages method within the QueryFacade class. This method will make use of an existing BingService that exposes methods for searching and downloading images. Let's hook this up now; jump into the downloadImages method and append the following highlighted code to its body: func downloadImages(searchTerms:[String], searchTermsCount:Int=4, searchResultsCount:Int=2) -> [CIImage]?{ var bingResults = [BingServiceResult]() for i in 0..<min(searchTermsCount, searchTerms.count){ let results = BingService.sharedInstance.syncSearch( searchTerm: searchTerms[i], count:searchResultsCount) for bingResult in results{ bingResults.append(bingResult) } if self.isInterrupted{ return nil } } } The downloadImages method takes the arguments searchTerms, searchTermsCount, and searchResultsCount. The searchTerms is a sorted list of labels returned by our classifySketch method, from which the searchTermsCount determines how many of these search terms we use (defaulting to 4). Finally, searchResultsCount limits the results returned for each search term. The preceding code performs a sequential search using the search terms passed into the method. And as mentioned previously, here we are using Microsoft's Bing Image Search API, which requires registration, something we will return to shortly. After each search, we check the property isInterrupted to see whether we need to exit early; otherwise, we continue on to the next search. The result returned by the search includes a URL referencing an image; we will use this next to download the image with each of the results, before returning an array of CIImage to the caller. Let's add this now. Append the following code to the downloadImages method: var images = [CIImage]() for bingResult in bingResults{ if let image = BingService.sharedInstance.syncDownloadImage( bingResult: bingResult){ images.append(image) } if self.isInterrupted{ return nil } } return images As before, the process is synchronous and after each download, we check the isInterrupted property to see if we need to exit early, otherwise returning the list of downloaded images to the caller. So far, we have implemented the functionality to support prediction, searching, and downloading; our next task is to hook all of this up. Head back to the queryCurrentSketch method and add the following code within the queryQueue.async block. Ensure that you replace the DispatchQueue.main.async block: queryQueue.async { guard let predictions = self.classifySketch( sketch: sketch) else{ DispatchQueue.main.async{ self.processingQuery = false self.delegate?.onQueryCompleted( status:-1, result:nil) self.processNextQuery() } return } let searchTerms = predictions.map({ (key, value) -> String in return key }) guard let images = self.downloadImages( searchTerms: searchTerms, searchTermsCount: 4) else{ DispatchQueue.main.async{ self.processingQuery = false self.delegate?.onQueryCompleted( status:-1, result:nil) self.processNextQuery() } return } guard let sortedImage = self.sortByVisualSimilarity( images: images, sketch: sketch) else{ DispatchQueue.main.async{ self.processingQuery = false self.delegate?.onQueryCompleted( status:-1, result:nil) self.processNextQuery() } return } DispatchQueue.main.async{ self.processingQuery = false self.delegate?.onQueryCompleted( status:self.isInterrupted ? -1 : 1, result:QueryResult( predictions: predictions, images: sortedImage)) self.processNextQuery() } } It's a large block of code but nothing complicated; let's quickly walk our way through it. We start by calling the classifySketch method we just implemented. As you may recall, this method returns a sorted list of label and probability peers unless interrupted, in which case nil will be returned. We should handle this by notifying the delegate before exiting the method early (a check we apply to all of our tasks). Once we've obtained the list of sorted labels, we pass them to the downloadImages method to receive the associated images, which we then pass to the sortByVisualSimilarity method. This method currently returns just the list of images, but it's something we will get back to in the next section. Finally, the method passes the status and sorted images wrapped in a QueryResult instance to the delegate via the main thread, before checking whether it needs to process a new sketch (by calling the processNextQuery method). At this stage, we have implemented all the functionality required to download our substitute images based on our guess as to what the user is currently sketching. Now, we just need to jump into the SketchViewController class to hook this up, but before doing so, we need to obtain a subscription key to use Bing's Image Search. Within your browser, head to https://azure.microsoft.com/en-gb/services/cognitive-services/bing-image-search-api/ and click on the Try Bing Image Search API, as shown in the following screenshot: After clicking on Try Bing Image Search API, you will be presented with a series of dialogs; read, and once (if) agreed, sign in or register. Continue following the screens until you reach a page informing you that the Bing Search API has been successfully added to your subscription, as shown in the following screenshot: On this page, scroll down until you come across the entry Bing Search APIs v7. If you inspect this block, you should see a list of Endpoints and Keys. Copy and paste one of these keys within the BingService.swift file, replacing the value of the constant subscriptionKey; the following screenshot shows the web page containing the service key: Return to the SketchViewController by selecting the SketchViewController.swift file from the left-hand panel, and locate the method onQueryCompleted: func onQueryCompleted(status: Int, result:QueryResult?){ } Recall that this is a method signature defined in the QueryDelegate protocol, which the QueryFacade uses to notify the delegate if the query fails or completes. It is here that we will present the matching images we have found through the process we just implemented. We do this by first checking the status. If deemed successful (greater than zero), we remove every item that is referenced in the queryImages array, which is the data source for our UICollectionView used to present the suggested images to the user. Once emptied, we iterate through all the images referenced within the QueryResult instance, adding them to the queryImages array before requesting the UICollectionView to reload the data. Add the following code to the body of the onQueryCompleted method: guard status > 0 else{ return } queryImages.removeAll() if let result = result{ for cimage in result.images{ if let cgImage = self.ciContext.createCGImage(cimage, from:cimage.extent){ queryImages.append(UIImage(cgImage:cgImage)) } } } toolBarLabel.isHidden = queryImages.count == 0 collectionView.reloadData() There we have it; everything is in place to handle guessing of what the user draws and present possible suggestions. Now is a good time to build and run the application on either the simulator or the device to check whether everything is working correctly. If so, then you should see something similar to the following: There is one more thing left to do before finishing off this section. Remembering that our goal is to assist the user to quickly sketch out a scene or something similar, our hypothesis is that guessing what the user is drawing and suggesting ready-drawn images will help them achieve their task. So far, we have performed prediction and provided suggestions to the user, but currently the user is unable to replace their sketch with any of the presented suggestions. Let's address this now. Our SketchView currently only renders StrokeSketch (which encapsulates the metadata of the user's drawing). Because our suggestions are rasterized images, our choice is to either extend this class (to render strokes and rasterized images) or create a new concrete implementation of the Sketch protocol. In this example, we will opt for the latter and implement a new type of Sketch capable of rendering a rasterized image. Select the Sketch.swift file to bring it to focus in the editor area of Xcode, scroll to the bottom, and add the following code: class ImageSketch : Sketch{ var image : UIImage! var size : CGSize! var origin : CGPoint! var label : String! init(image:UIImage, origin:CGPoint, size:CGSize, label: String) { self.image = image self.size = size self.label = label self.origin = origin } } We have defined a simple class that is referencing an image, origin, size, and label. The origin determines the top-left position where the image should be rendered, while the size determines its, well, size! To satisfy the Sketch protocol, we must implement the properties center and boundingBox along with the methods draw and exportSketch. Let's implement each of these in turn, starting with boundingBox. The boundingBox property is a computed property derived from the properties origin and size. Add the following code to your ImageSketch class: var boundingBox : CGRect{ get{ return CGRect(origin: self.origin, size: self.size) } } Similarly, center will be another computed property derived from the origin and size properties, simply translating the origin with respect to the size. Add the following code to your ImageSketch class: var center : CGPoint{ get{ let bbox = self.boundingBox return CGPoint(x:bbox.origin.x + bbox.size.width/2, y:bbox.origin.y + bbox.size.height/2) } set{ self.origin = CGPoint(x:newValue.x - self.size.width/2, y:newValue.y - self.size.height/2) } } The draw method will simply use the passed-in context to render the assigned image within the boundingBox; append the following code to your ImageSketch class: func draw(context:CGContext){ self.image.draw(in: self.boundingBox) }   Our last method, exportSketch, is also fairly straightforward. Here, we create an instance of CIImage, passing in the image (of type UIImage). Then, we resize it using the extension method we implemented back in Chapter 3, Recognizing Objects in the World. Add the following code to finish off the ImageSketch class: func exportSketch(size:CGSize?) -> CIImage?{ guard let ciImage = CIImage(image: self.image) else{ return nil } if self.image.size.width == self.size.width && self.image.size.height == self.size.height{ return ciImage } else{ return ciImage.resize(size: self.size) } } We now have an implementation of Sketch that can handle rendering of rasterized images (like those returned from our search). Our final task is to swap the user's sketch with an item the user selects from the UICollectionView. Return to SketchViewController class by selecting the SketchViewController.swift from the left-hand-side panel in Xcode to bring it up in the editor area. Once loaded, navigate to the method collectionView(_ collectionView:, didSelectItemAt:); this should look familiar to most of you. It is the delegate method for handling cells selected from a UICollectionView and it's where we will handle swapping of the user's current sketch with the selected item. Let's start by obtaining the current sketch and associated image that was selected. Add the following code to the body of the collectionView(_collectionView:,didSelectItemAt:) method: guard let sketch = self.sketchView.currentSketch else{ return } self.queryFacade.cancel() let image = self.queryImages[indexPath.row]   Now, with reference to the current sketch and image, we want to try and keep the size relatively the same as the user's sketch. We will do this by simply obtaining the sketch's bounding box and scaling the dimensions to respect the aspect ratio of the selected image. Add the following code, which handles this: var origin = CGPoint(x:0, y:0) var size = CGSize(width:0, height:0) if bbox.size.width > bbox.size.height{ let ratio = image.size.height / image.size.width size.width = bbox.size.width size.height = bbox.size.width * ratio } else{ let ratio = image.size.width / image.size.height size.width = bbox.size.height * ratio size.height = bbox.size.height } Next, we obtain the origin (top left of the image) by obtaining the center of the sketch and offsetting it relative to its width and height. Do this by appending the following code: origin.x = sketch.center.x - size.width / 2 origin.y = sketch.center.y - size.height / 2 We can now use the image, size, and origin to create an ImageSketch, and replace it with the current sketch simply by assigning it to the currentSketch property of the SketchView instance. Add the following code to do just that: self.sketchView.currentSketch = ImageSketch(image:image, origin:origin, size:size, label:"") Finally, some housekeeping; we'll clear the UICollectionView by removing all images from the queryImages array (its data source) and request it to reload itself. Add the following block to complete the collectionView(_ collectionView:,didSelectItemAt:) method: self.queryImages.removeAll() self.toolBarLabel.isHidden = queryImages.count == 0 self.collectionView.reloadData() Now is a good time to build and run to ensure that everything is working as planned. If so then, you should be able to swap out your sketch with one of the suggestions presented at the top, as shown in the following screenshot: We learned how to build Intelligent interfaces using Core ML. If you've enjoyed reading this post, do check out Machine Learning with Core ML to further implement Core ML for visual-based applications using the principles of transfer learning and neural networks. Introducing Intelligent Apps 5 examples of Artificial Intelligence in Web apps Voice, natural language, and conversations: Are they the next web UI?
Read more
  • 0
  • 0
  • 7937

article-image-getting-started-with-amazon-machine-learning-workflow-tutorial
Melisha Dsouza
02 Sep 2018
14 min read
Save for later

Getting started with Amazon Machine Learning workflow [Tutorial]

Melisha Dsouza
02 Sep 2018
14 min read
Amazon Machine Learning is useful for building ML models and generating predictions. It also enables the development of robust and scalable smart applications. The process of building ML models with Amazon Machine Learning consists of three operations: data analysis model training evaluation. The code files for this article are available on Github. This tutorial is an excerpt from a book written by Alexis Perrier titled Effective Amazon Machine Learning. The Amazon Machine Learning service is available at https://console.aws.amazon.com/machinelearning/. The Amazon ML workflow closely follows a standard Data Science workflow with steps: Extract the data and clean it up. Make it available to the algorithm. Split the data into a training and validation set, typically a 70/30 split with equal distribution of the predictors in each part. Select the best model by training several models on the training dataset and comparing their performances on the validation dataset. Use the best model for predictions on new data. As shown in the following Amazon ML menu, the service is built around four objects: Datasource ML model Evaluation Prediction The Datasource and Model can also be configured and set up in the same flow by creating a new Datasource and ML model. Let us take a closer look at each one of these steps. Understanding the dataset used We will use the simple Predicting Weight by Height and Age dataset (from Lewis Taylor (1967)) with 237 samples of children's age, weight, height, and gender, which is available at https://v8doc.sas.com/sashtml/stat/chap55/sect51.htm. This dataset is composed of 237 rows. Each row has the following predictors: sex (F, M), age (in months), height (in inches), and we are trying to predict the weight (in lbs) of these children. There are no missing values and no outliers. The variables are close enough in range and normalization is not required. We do not need to carry out any preprocessing or cleaning on the original dataset. Age, height, and weight are numerical variables (real-valued), and sex is a categorical variable. We will randomly select 20% of the rows as the held-out subset to use for prediction on previously unseen data and keep the other 80% as training and evaluation data. This data split can be done in Excel or any other spreadsheet editor: By creating a new column with randomly generated numbers Sorting the spreadsheet by that column Selecting 190 rows for training and 47 rows for prediction (roughly a 80/20 split) Let us name the training set LT67_training.csv and the held-out set that we will use for prediction LT67_heldout.csv, where LT67 stands for Lewis and Taylor, the creator of this dataset in 1967. As with all datasets, scripts, and resources mentioned in this book, the training and holdout files are available in the GitHub repository at https://github.com/alexperrier/packt-aml. It is important for the distribution in age, sex, height, and weight to be similar in both subsets. We want the data on which we will make predictions to show patterns that are similar to the data on which we will train and optimize our model. Loading the data on S3 Follow these steps to load the training and held-out datasets on S3: Go to your s3 console at https://console.aws.amazon.com/s3. Create a bucket if you haven't done so already. Buckets are basically folders that are uniquely named across all S3. We created a bucket named aml.packt. Since that name has now been taken, you will have to choose another bucket name if you are following along with this demonstration. Click on the bucket name you created and upload both the LT67_training.csv and LT67_heldout.csv files by selecting Upload from the Actions drop-down menu: Both files are small, only a few KB, and hosting costs should remain negligible for that exercise. Note that for each file, by selecting the Properties tab on the right, you can specify how your files are accessed, what user, role, group or AWS service may download, read, write, and delete the files, and whether or not they should be accessible from the Open Web. When creating the datasource in Amazon ML, you will be prompted to grant Amazon ML access to your input data. You can specify the access rules to these files now in S3 or simply grant access later on. Our data is now in the cloud in an S3 bucket. We need to tell Amazon ML where to find that input data by creating a datasource. We will first create the datasource for the training file ST67_training.csv. Declaring a datasource Go to the Amazon ML dashboard, and click on Create new... | Datasource and ML model. We will use the faster flow available by default: As shown in the following screenshot, you are asked to specify the path to the LT67_training.csv file {S3://bucket}{path}{file}. Note that the S3 location field automatically populates with the bucket names and file names that are available to your user: Specifying a Datasource name is useful to organize your Amazon ML assets. By clicking on Verify, Amazon ML will make sure that it has the proper rights to access the file. In case it needs to be granted access to the file, you will be prompted to do so as shown in the following screenshot: Just click on Yes to grant access. At this point, Amazon ML will validate the datasource and analyze its contents. Creating the datasource An Amazon ML datasource is composed of the following: The location of the data file: The data file is not duplicated or cloned in Amazon ML but accessed from S3 The schema that contains information on the type of the variables contained in the CSV file: Categorical Text Numeric (real-valued) Binary It is possible to supply Amazon ML with your own schema or modify the one created by Amazon ML. At this point, Amazon ML has a pretty good idea of the type of data in your training dataset. It has identified the different types of variables and knows how many rows it has: Move on to the next step by clicking on Continue, and see what schema Amazon ML has inferred from the dataset as shown in the next screenshot: Amazon ML needs to know at that point which is the variable you are trying to predict. Be sure to tell Amazon ML the following: The first line in the CSV file contains te column name The target is the weight We see here that Amazon ML has correctly inferred the following: sex is categorical age, height, and weight are numeric (continuous real values) Since we chose a numeric variable as the target Amazon ML, will use Linear Regression as the predictive model. For binary or categorical values, we would have used Logistic Regression. This means that Amazon ML will try to find the best a, b, and c coefficients so that the weight predicted by the following equation is as close as possible to the observed real weight present in the data: predicted weight = a * age + b * height + c * sex Amazon ML will then ask you if your data contains a row identifier. In our present case, it does not. Row identifiers are useful when you want to understand the prediction obtained for each row or add an extra column to your dataset later on in your project. Row identifiers are for reference purposes only and are not used by the service to build the model. You will be asked to review the datasource. You can go back to each one of the previous steps and edit the parameters for the schema, the target and the input data. Now that the data is known to Amazon ML, the next step is to set up the parameters of the algorithm that will train the model. Understanding the model We select the default parameters for the training and evaluation settings. Amazon ML will do the following: Create a recipe for data transformation based on the statistical properties it has inferred from the dataset Split the dataset (ST67_training.csv) into a training part and a validation part, with a 70/30 split. The split strategy assumes the data has already been shuffled and can be split sequentially. The recipe will be used to transform the data in a similar way for the training and the validation datasets. The only transformation suggested by Amazon ML is to transform the categorical variable sex into a binary variable, where m = 0 and f = 1 for instance. No other transformation is needed. The default advanced settings for the model are shown in the following screenshot: We see that Amazon ML will pass over the data 10 times, shuffle splitting the data each time. It will use an L2 regularization strategy based on the sum of the square of the coefficients of the regression to prevent overfitting. We will evaluate the predictive power of the model using our LT67_heldout.csv dataset later on. Regularization comes in 3 levels with a mild (10^-6), medium (10^-4), or aggressive (10^-02) setting, each value stronger than the previous one. The default setting is mild, the lowest, with a regularization constant of 0.00001 (10^-6) implying that Amazon ML does not anticipate much overfitting on this dataset. This makes sense when the number of predictors, three in our case, is much smaller than the number of samples (190 for the training set). Clicking on the Create ML model button will launch the model creation. This takes a few minutes to resolve, depending on the size and complexity of your dataset. You can check its status by refreshing the model page. In the meantime, the model status remains pending. At that point, Amazon ML will split our training dataset into two subsets: a training and a validation set. It will use the training portion of the data to train several settings of the algorithm and select the best one based on its performance on the training data. It will then apply the associated model to the validation set and return an evaluation score for that model. By default, Amazon ML will sequentially take the first 70% of the samples for training and the remaining 30% for validation. It's worth noting that Amazon ML will not create two extra files and store them on S3, but instead create two new datasources out of the initial datasource we have previously defined. Each new datasource is obtained from the original one via a Data rearrangement JSON recipe such as the following: { "splitting": { "percentBegin": 0, "percentEnd": 70 } } You can see these two new datasources in the Datasource dashboard. Three datasources are now available where there was initially only one, as shown by the following screenshot: While the model is being trained, Amazon ML runs the Stochastic Gradient algorithm several times on the training data with different parameters: Varying the learning rate in increments of powers of 10: 0.01, 0.1, 1, 10, and 100. Making several passes over the training data while shuffling the samples before each path. At each pass, calculating the prediction error, the Root Mean Squared Error (RMSE), to estimate how much of an improvement over the last pass was obtained. If the decrease in RMSE is not really significant, the algorithm is considered to have converged, and no further pass shall be made. At the end of the passes, the setting that ends up with the lowest RMSE wins, and the associated model (the weights of the regression) is selected as the best version. Once the model has finished training, Amazon ML evaluates its performance on the validation datasource. Once the evaluation itself is also ready, you have access to the model's evaluation. Evaluating the model Amazon ML uses the standard metric RMSE for linear regression. RMSE is defined as the sum of the squares of the difference between the real values and the predicted values: Here, ŷ is the predicted values, and y the real values we want to predict (the weight of the children in our case). The closer the predictions are to the real values, the lower the RMSE is. A lower RMSE means a better, more accurate prediction. Making batch predictions We now have a model that has been properly trained and selected among other models. We can use it to make predictions on new data. A batch prediction consists in applying a model to a datasource in order to make predictions on that datasource. We need to tell Amazon ML which model we want to apply on which data. Batch predictions are different from streaming predictions. With batch predictions, all the data is already made available as a datasource, while for streaming predictions, the data will be fed to the model as it becomes available. The dataset is not available beforehand in its entirety. In the Main Menu select Batch Predictions to access the dashboard predictions and click on Create a New Prediction: The first step is to select one of the models available in your model dashboard. You should choose the one that has the lowest RMSE: The next step is to associate a datasource to the model you just selected. We had uploaded the held-out dataset to S3 at the beginning of this chapter (under the Loading the data on S3 section) but had not used it to create a datasource. We will do so now.When asked for a datasource in the next screen, make sure to check My data is in S3, and I need to create a datasource, and then select the held-out dataset that should already be present in your S3 bucket: Don't forget to tell Amazon ML that the first line of the file contains columns. In our current project, our held-out dataset also contains the true values for the weight of the students. This would not be the case for "real" data in a real-world project where the real values are truly unknown. However, in our case, this will allow us to calculate the RMSE score of our predictions and assess the quality of these predictions. The final step is to click on the Verify button and wait for a few minutes: Amazon ML will run the model on the new datasource and will generate predictions in the form of a CSV file. Contrary to the evaluation and model-building phase, we now have real predictions. We are also no longer given a score associated with these predictions. After a few minutes, you will notice a new batch-prediction folder in your S3 bucket. This folder contains a manifest file and a results folder. The manifest file is a JSON file with the path to the initial datasource and the path to the results file. The results folder contains a gzipped CSV file: Uncompressed, the CSV file contains two columns, trueLabel, the initial target from the held-out set, and score, which corresponds to the predicted values. We can easily calculate the RMSE for those results directly in the spreadsheet through the following steps: Creating a new column that holds the square of the difference of the two columns. Summing all the rows. Taking the square root of the result. The following illustration shows how we create a third column C, as the squared difference between the trueLabel column A and the score (or predicted value) column B: As shown in the following screenshot, averaging column C and taking the square root gives an RMSE of 11.96, which is even significantly better than the RMSE we obtained during the evaluation phase (RMSE 14.4): The fact that the RMSE on the held-out set is better than the RMSE on the validation set means that our model did not overfit the training data, since it performed even better on new data than expected. Our model is robust. The left side of the following graph shows the True (Triangle) and Predicted (Circle) Weight values for all the samples in the held-out set. The right side shows the histogram of the residuals. Similar to the histogram of residuals we had observed on the validation set, we observe that the residuals are not centered on 0. Our model has a tendency to overestimate the weight of the students: In this tutorial, we have successfully performed the loading of the data on S3 and let Amazon ML infer the schema and transform the data. We also created a model and evaluated its performance. Finally, we made a prediction on the held -out dataset. To understand how to leverage Amazon's powerful platform for your predictive analytics needs,  check out this book Effective Amazon Machine Learning. Four interesting Amazon patents in 2018 that use machine learning, AR, and robotics Amazon Sagemaker makes machine learning on the cloud easy Amazon ML Solutions Lab to help customers “work backwards” and leverage machine learning    
Read more
  • 0
  • 0
  • 10120

article-image-should-you-use-javascript-for-machine-learning-and-how-do-you-get-started
Sugandha Lahoti
01 Sep 2018
4 min read
Save for later

Why use JavaScript for machine learning?

Sugandha Lahoti
01 Sep 2018
4 min read
Python has always been and remains the language of choice for machine learning, in part due to the maturity of the language, in part due to the maturity of the ecosystem, and in part due to the positive feedback loop of early ML efforts in Python. Recent developments in the JavaScript world, however, are making JavaScript more attractive to ML projects. I think we will see a major ML renaissance in JavaScript within a few years, especially as laptops and mobile devices become ever more powerful and JavaScript itself surges in popularity. This post is extracted from the book  Hands-on Machine Learning with JavaScript by Burak Kanber. The book is a  definitive guide to creating intelligent web applications with the best of machine learning and JavaScript. Advantages and challenges of JavaScript JavaScript, like any other tool, has its advantages and disadvantages. Much of the historical criticism of JavaScript has focused on a few common themes: strange behavior in type coercion, the prototypical object-oriented model, difficulty organizing large codebases, and managing deeply nested asynchronous function calls with what many developers call callback hell. Fortunately, most of these historic gripes have been resolved by the introduction of ES6, that is, ECMAScript 2015, a recent update to the JavaScript syntax. Con: Immature ecosystem for machine learning development Despite the recent language improvements, most developers would still advise against using JavaScript for ML for one reason: the ecosystem. The Python ecosystem for ML is so mature and rich that it's difficult to justify choosing any other ecosystem. But this logic is self-fulfilling and self-defeating; we need brave individuals to take the leap and work on real ML problems if we want JavaScript's ecosystem to mature. Fortunately, JavaScript has been the most popular programming language on GitHub for a few years running, and is growing in popularity by almost every metric. Pro #1: JavaScript is the most popular web development language with a mature npm ecosystem There are some advantages to using JavaScript for ML. Its popularity is one; while ML in JavaScript is not very popular at the moment, the language itself is. As demand for ML applications rises, and as hardware becomes faster and cheaper, it's only natural for ML to become more prevalent in the JavaScript world. There are tons of resources available for learning JavaScript in general, maintaining Node.js servers, and deploying JavaScript applications. The Node Package Manager (npm) ecosystem is also large and still growing, and while there aren't many very mature ML packages available, there are a number of well built, useful tools out there that will come to maturity soon. Pro #2: JavaScript is now a general purpose, cross-platform programming language Another advantage to using JavaScript is the universality of the language. The modern web browser is essentially a portable application platform which allows you to run your code, basically without modification, on nearly any device. Tools like electron (while considered by many to be bloated) allow developers to quickly develop and deploy downloadable desktop applications to any operating system. Node.js lets you run your code in a server environment. React Native brings your JavaScript code to the native mobile application environment, and may eventually allow you to develop desktop applications as well. JavaScript is no longer confined to just dynamic web interactions, it's now a general-purpose, cross-platform programming language. Pro #3: JavaScript makes Machine Learning accessible to web and front-end developers Finally, using JavaScript makes ML accessible to web and frontend developers, a group that historically has been left out of the ML discussion. Server-side applications are typically preferred for ML tools, since the servers are where the computing power is. That fact has historically made it difficult for web developers to get into the ML game, but as hardware improves, even complex ML models can be run on the client, whether it's the desktop or the mobile browser. If web developers, frontend developers, and JavaScript developers all start learning about ML today, that same community will be in a position to improve the ML tools available to us all tomorrow. If we take these technologies and democratize them, expose as many people as possible to the concepts behind ML, we will ultimately elevate the community and seed the next generation of ML researchers. Summary In this article, we've discussed the important moments of JavaScript's history as applied to ML. We’ve discussed some advantages to using JavaScript for machine learning, and also some of the challenges we’re facing, particularly in terms of the machine learning ecosystem. To begin exploring and processing the data itself, read our book  Hands-on Machine Learning with JavaScript. 5 JavaScript machine learning libraries you need to know V8 JavaScript Engine releases version 6.9! HTML5 and the rise of modern JavaScript browser APIs [Tutorial]
Read more
  • 0
  • 0
  • 43869

article-image-how-to-build-a-real-time-data-pipeline-for-web-developers-part-2-tutorial
Sugandha Lahoti
30 Aug 2018
15 min read
Save for later

How to build a real-time data pipeline for web developers - Part 2 [Tutorial]

Sugandha Lahoti
30 Aug 2018
15 min read
Our previous post talked about two components to build a real-time data pipeline. To recap: Most data pipelines  contain these components: Data querying and event subscription Data joining or aggregation Transformation and normalization Storage and delivery In this post, we will talk about the last two components and introduce some tools and techniques that can achieve them. This post is extracted from the book Hands-on Machine Learning with JavaScript by Burak Kanber. The book is a  definitive guide to creating an intelligent web application with the best of machine learning and JavaScript. Transformation and normalization of a data pipeline As your data makes its way through a pipeline, it may need to be converted into a structure compatible with your algorithm's input layer. There are many possible transformations that can be performed on the data in the pipeline. For example, in order to protect sensitive user data before it reaches a token-based classifier, you might apply a cryptographic hashing function to the tokens so that they are no longer human readable. Types of Data transformations More typically, the types of transformations will be related to sanitization, normalization, or transposition. A sanitization operation might involve removing unnecessary whitespace or HTML tags, removing email addresses from a token stream, and removing unnecessary fields from the data structure. If your pipeline has subscribed to an event stream as the source of the data and the event stream attaches source server IP addresses to event data, it would be a good idea to remove these values from the data structure, both in order to save space and to minimize the surface area for potential data leaks. Similarly, if email addresses are not necessary for your classification algorithm, the pipeline should remove that data so that it interacts with the fewest possible servers and systems. If you've designed a spam filter, you may want to look into using only the domain portion of the email address instead of the fully qualified address. Alternately, the email addresses or domains may be hashed by the pipeline so that the classifier can still recognize them but a human cannot. Make sure to audit your data for other potential security and privacy issues as well. If your application collects the end user's IP address as part of its event stream, but the classifier does not need that data, remove it from the pipeline as early as possible. These considerations are becoming ever more important with the implementation of new European privacy laws, and every developer should be aware of privacy and compliance concerns. Data Normalization A common category of data transformation is normalization. When working with a range of numerical values for a given field or feature, it's often desirable to normalize the range such that it has a known minimum and maximum bound. One approach is to normalize all values of the same field to the range [0,1], using the maximum encountered value as the divisor (for example, the sequence 1, 2, 4 can be normalized to 0.25, 0.5, 1). Whether data needs to be normalized in this manner will depend entirely on the algorithm that consumes the data. Another approach to normalization is to convert values into percentiles. In this scheme, very large outlying values will not skew the algorithm too drastically. If most values lie between 0 and 100 but a few points include values such as 50,000, an algorithm may give outsized precedence to the large values. If the data is normalized as a percentile, however, you are guaranteed to not have any values exceeding 100 and the outliers are brought into the same range as the rest of the data. Whether or not this is a good thing depends on the algorithm. Instagram Example The data pipeline is also a good place to calculate derived or second-order features. Imagine a random forest classifier that uses Instagram profile data to determine if the profile belongs to a human or a bot. The Instagram profile data will include fields such as the user's followers count, friends count, posts count, website, bio, and username. A random forest classifier will have difficulty using those fields in their original representations, however, by applying some simple data transformations, you can achieve accuracies of 90%. In the Instagram case, one type of helpful data transformation is calculating ratios. Followers count and friends count, as separate features or signals, may not be useful to the classifier since they are treated somewhat independently. But the friends-to-followers ratio can turn out to be a very strong signal that may expose bot users. An Instagram user with 1,000 friends doesn't raise any flags, nor would an Instagram user with 50 followers; treated independently, these features are not strong signals. However, an Instagram user with a friends-to-followers ratio of 20 (or 1,000/50) is almost certainly a bot designed to follow other users. Similarly, a ratio such as posts-versus-followers or posts-versus-friends may end up being a stronger signal than any of those features independently. Text content such as the Instagram user's profile bio, website, or username is made useful by deriving second-order features from them as well. A classifier may not be able to do anything with a website's URL, but perhaps a Boolean has_profile_website feature can be used as a signal instead. If, in your research, you notice that usernames of bots tend to have a lot of numbers in them, you can derive features from the username itself. One feature can calculate the ratio of letters to numbers in the username, another Boolean feature can represent whether the username has a number at the end or beginning, and a more advanced feature could determine if dictionary words were used in the username or not (therefore distinguishing between @themachinelearningwriter and something gibberish like @panatoe234). Derived features can be of any level of sophistication or simplicity. Another simple feature could be whether the Instagram profile contains a URL in the profile bio field (as opposed to the dedicated website field); this can be detected with a regex and the Boolean value used as the feature. A more advanced feature could automatically detect whether the language used in the user's content is the same as the language specified by the user's locale setting. If the user claims they're in France but always writes captions in Russian it may indeed be a Russian living in France, but when combined with other signals like a friends-to-followers ratio far from 1, this information may be indicative of a bot user. There are lower level transformations that may need to be applied to the data in the pipeline as well. If the source data is in an XML format but the classifier requires JSON formatting, the pipeline should take responsibility for the parsing and conversion of formats. Other mathematical transformations may also be applied. If the native format of the data is row-oriented but the classifier needs column-oriented data, the pipeline can perform a vector transposition operation as part of the processing. Similarly, the pipeline can use mathematical interpolation to fill in missing values. If your pipeline subscribes to events emitted by a suite of sensors in a laboratory setting and a single sensor goes offline for a couple of measurements, it may be reasonable to interpolate between the two known values in order to fill in the missing data. In other cases, missing values can be replaced with the population's mean or median value. Replacing missing values with a mean or median will often result in the classifier deprioritizing that feature for that data point, as opposed to breaking the classifier by giving it a null value. What to consider when transforming and normalizing In general, there are two things to consider in terms of transformation and normalization within a data pipeline. The first is the mechanical details of the source data and the target format: XML data must be transformed to JSON, rows must be converted to columns, images must be converted from JPEG to BMP formats, and so on. The mechanical details are not too tricky to work out, as you will already be aware of the source and target formats required by the system. The other consideration is the semantic or mathematical transformation of your data. This is an exercise in feature selection and feature engineering, and is not as straightforward as the mechanical transformation. Determining which second-order features to derive is both art and science. The art is coming up with new ideas for derived features, and the science is to rigorously test and experiment with your work. In my experience with Instagram bot detection, for instance, I found that the letters-to-numbers ratio in Instagram usernames was a very weak signal. I abandoned that idea after some experimentation in order to avoid adding unnecessary dimensionality to the problem. At this point, we have a hypothetical data pipeline that collects data, joins and aggregates it, processes it, and normalizes it. We're almost done, but the data still needs to be delivered to the algorithm itself. Once the algorithm is trained, we might also want to serialize the model and store it for later use. In the next section, we'll discuss a few considerations to make when transporting and storing training data or serialized models. Storing and delivering data in a data pipeline Once your data pipeline has applied all the necessary processing and transformations, it has one task left to do: deliver the data to your algorithm. Ideally, the algorithm will not need to know about the implementation details of the data pipeline. The algorithm should have a single location that it can interact with in order to get the fully processed data. This location could be a file on disk, a message queue, a service such as Amazon S3, a database, or an API endpoint. The approach you choose will depend on the resources available to you, the topology or architecture of your server system, and the format and size of the data. Models that are trained only periodically are typically the simplest case to handle. If you're developing an image recognition RNN that learns labels for a number of images and only needs to be retrained every few months, a good approach would be to store all the images as well as a manifest file (relating image names to labels) in a service such as Amazon S3 or a dedicated path on disk. The algorithm would first load and parse the manifest file and then load the images from the storage service as needed. Similarly, an Instagram bot detection algorithm may only need to be retrained every week or every month. The algorithm can read training data directly from a database table or a JSON or CSV file stored on S3 or a local disk. It is rare to have to do this, but in some exotic data pipeline implementations you could also provide the algorithm with a dedicated API endpoint built as a microservice; the algorithm would simply query the API endpoint first for a list of training point references, and then request each in turn from the API. Models which require online updates or near-real-time updates, on the other hand, are best served by a message queue. If a Bayesian classifier requires live updates, the algorithm can subscribe to a message queue and apply updates as they come in. Even when using a sophisticated multistage pipeline, it is possible to process new data and update a model in fractions of a second if you've designed all the components well. Storing and Delivering Data in a Spam Filter Returning to the spam filter example, we can design a highly performant data pipeline like so: first, an API endpoint receives feedback from a user. In order to keep the user interface responsive, this API endpoint is responsible only for placing the user's feedback into a message queue and can finish its task in under a millisecond. The data pipeline in turn subscribes to the message queue, and in another few milliseconds is made aware of a new message. The pipeline then applies a few simple transformations to the message, like tokenizing, stemming, and potentially even hashing the tokens. The next stage of the pipeline transforms the token stream into a hashmap of tokens and their counts (for example, from hey hey there to {hey: 2, there: 1}); this avoids the need for the classifier to update the same token's count more than once. This stage of processing will only require another couple of milliseconds at worst. Finally, the fully processed data is placed in a separate message queue which the classifier subscribes to. Once the classifier is made aware of the data it can immediately apply the updates to the model. If the classifier is backed by Redis, for instance, this final stage will also require only a few milliseconds. The entire process we have described, from the time the user's feedback reaches the API server to the time the model is updated, may only require 20 ms. Considering that communication over the internet (or any other means) is limited by the speed of light, the best-case scenario for a TCP packet making a round-trip between New York and San Francisco is 40 ms; in practice, the average cross-country latency for a good internet connection is about 80 ms. Our data pipeline and model is therefore capable of updating itself based on user feedback a full 20 ms before the user will even receive their HTTP response. Not every application requires real-time processing. Managing separate servers for an API, a data pipeline, message queues, a Redis store, and hosting the classifier might be overkill both in terms of effort and budget. You'll have to determine what's best for your use case. Storage and Delivery of the model The last thing to consider is not related to the data pipeline but rather the storage and delivery of the model itself, in the case of a hybrid approach where a model is trained on the server but evaluated on the client. The first question to ask yourself is whether the model is considered public or private. Private models should not be stored on a public Amazon S3 bucket, for instance; instead, the S3 bucket should have access control rules in place and your application will need to procure a signed download link with an expiration time (the S3 API assists with this). The next consideration is how large the model is and how often it will be downloaded by clients. If a public model is downloaded frequently but updated infrequently, it might be best to use a CDN in order to take advantage of edge caching. If your model is stored on Amazon S3, for example, then the Amazon CloudFront CDN would be a good choice. Of course, you can always build your own storage and delivery solution. In this post, I have assumed a cloud architecture, however if you have a single dedicated or collocated server then you may simply want to store the serialized model on disk and serve it either through your web server software or through your application's API. When dealing with large models, make sure to consider what will happen if many users attempt to download the model simultaneously. You may inadvertently saturate your server's network connection if too many people request the file at once, you might overrun any bandwidth limits set by your server's ISP, or you might end up with your server's CPU stuck in I/O wait while it moves data around. No one-size-fits-all solution for data pipelining As mentioned previously, there's no one-size-fits-all solution for data pipelining. If you're a hobbyist developing applications for fun or just a few users, you have lots of options for data storage and delivery. If you're working in a professional capacity on a large enterprise project, however, you will have to consider all aspects of the data pipeline and how they will impact your application's performance. I will offer one final piece of advice to the hobbyists reading this section. While it's true that you don't need a sophisticated, real-time data pipeline for hobby projects, you should build one anyway. Being able to design and build real-time data pipelines is a highly marketable and valuable skill that not many people possess, and if you're willing to put in the practice to learn ML algorithms then you should also practice building performant data pipelines. I'm not saying that you should build a big, fancy data pipeline for every single hobby project—just that you should do it a few times, using several different approaches, until you're comfortable not just with the concepts but also the implementation. Practice makes perfect, and practice means getting your hands dirty. In this two part series, we learned about data pipelines and various mechanisms that manage the collection, combination, transformation, and delivery of data from one system to the next. Next, to exactly choose the right ML algorithm for a given problem, read our book  Hands-on Machine Learning with JavaScript. Create machine learning pipelines using unsupervised AutoML [Tutorial] Top AutoML libraries for building your ML pipelines
Read more
  • 0
  • 0
  • 15577

article-image-how-to-build-a-real-time-data-pipeline-for-web-developers-part-1-tutorial
Sugandha Lahoti
29 Aug 2018
12 min read
Save for later

How to build a real-time data pipeline for web developers - Part 1 [Tutorial]

Sugandha Lahoti
29 Aug 2018
12 min read
There are many differences between the idealized usage of ML algorithms and real-world usage. This post gives advice related to using ML in the real world, in real applications, and in production environments. Specifically, we will talk about how to build a real-time data pipeline. The article aims to answer the following questions How do you collect, store, and process gigabytes or terabytes of training data? How and where do you store and distribute serialized models to clients? How do you collect new training examples from millions of users? This post is extracted from the book Hands-on Machine Learning with JavaScript by Burak Kanber. The book is a  definitive guide to creating an intelligent web application with the best of machine learning and JavaScript. What are Data pipelines? When developing a production ML system, it's not likely that you will have the training data handed to you in a ready-to-process format. Production ML systems are typically part of larger application systems, and the data that you use will probably originate from several different sources. The training set for an ML algorithm may be a subset of your larger database, combined with images hosted on a Content Delivery Network (CDN) and event data from an Elasticsearch server. The process of ushering data through various stages of a life cycle is called data pipelining. Data pipelining may include data selectors that run SQL or Elasticsearch queries for objects, event subscriptions which allow data to flow in from event-or log-based data, aggregations, joins, combining data with data from third-party APIs, sanitization, normalization, and storage. In an ideal implementation, the data pipeline acts as an abstraction layer between the larger application environment and the ML process. The ML algorithm should be able to read the output of the data pipeline without any knowledge of the original source of the data, similar to our examples. As there are many possible data sources and infinite ways to architect an application, there is no one-size-fits-all data pipeline. However, most data pipelines will contain these components, which we will discuss in the following sections: Data querying and event subscription Data joining or aggregation Transformation and normalization Storage and delivery This article is a two-part post. In the first part, we will talk about Data Querying and event subscription and Data joining. Data querying Imagine an application such as Disqus, which is an embeddable comment form that website owners can use to add comment functionality to blog posts or other pages. The primary functionality of Disqus is to allow users to like or leave comments on posts, however, as an additional feature and revenue stream, Disqus can make content recommendations and display them alongside sponsored content. The content recommendation system is an example of an ML system that is only one feature of a larger application. A content recommendation system in an application such as Disqus does not necessarily need to interact with the comment data, but might use the user's likes history to generate recommendations similar to the current page. Such a system would also need to analyze the text content of the liked pages and compare that to the text content of all pages in the network in order to make recommendations. Disqus does not need the post's content in order to provide comment functionality, but does need to store metadata about the page (like its URL and title) in its database. The post content may therefore not reside in the application's main database, though the likes and page metadata would likely be stored there. A data pipeline built around Disqus's recommendation system needs first to query the main database for pages the user has liked—or pages that were liked by users who liked the current page—and return their metadata. In order to find similar content, however, the system will need to use the text content of each liked post. This data might be stored in a separate system, perhaps a secondary database such as MongoDB or Elasticsearch, or in Amazon S3 or some other data warehouse. The pipeline will need to retrieve the text content based on the metadata returned by the main database, and associate the content with the metadata. This is an example of multiple data selectors or data sources in the early stages of a data pipeline. One data source is the primary application data, which stores post and likes metadata. The other data source is a secondary server which stores the post's text content. The next step in this pipeline might involve finding a number of candidate posts similar to the ones the user has liked, perhaps through a request to Elasticsearch or some other service that can find similar content. Similar content is not necessarily the correct content to serve, however, so these candidate articles will ultimately be ranked by an (hypothetical) ANN in order to determine the best content to display. In this example, the input to the data pipeline is the current page and the output from the data pipeline is a list of, say, 200 similar pages that the ANN will then rank. If all the necessary data resides in the primary database, the entire pipeline can be achieved with an SQL statement and some JOINs. Even in this case, care should be taken to develop a degree of abstraction between the ML algorithm and the data pipeline, as you may decide to update the application's architecture in the future. In other cases, however, the data will reside in separate locations and a more considered pipeline should be developed. There are many ways to build this data pipeline. You could develop a JavaScript module that performs all the pipeline tasks, and in some cases, you could even write a bash script using standard Unix tools to accomplish the task. On the other end of the complexity spectrum, there are purpose-built tools for data pipelining such as Apache Kafka and AWS Pipeline. These systems are designed modularly and allow you to define a specific data source, query, transformation, and aggregation modules as well as the workflows that connect them. In AWS Pipeline, for instance, you define data nodes that understand how to interact with the various data sources in your application. The earliest stage of a pipeline is typically some sort of data query operation. Training examples must be extracted from a larger database, keeping in mind that not every record in a database is necessarily a training example. In the case of a spam filter, for instance, you should only select messages that have been marked as spam or not spam by a user. Messages that were automatically marked as spam by a spam filter should probably not be used for training, as that might cause a positive feedback loop that ultimately causes an unacceptable false positive rate. Similarly, you may want to prevent users that have been blocked or banned by your system from influencing your model training. A bad actor could intentionally mislead an ML model by taking inappropriate actions on their own data, so you should disqualify these data points as training examples. Alternatively, if your application is such that recent data points should take precedence over older training points, your data query operation might set a time-based limit on the data to use for training, or select a fixed limit ordered reverse chronologically. No matter the situation, make sure you carefully consider your data queries as they are an essential first step in your data pipeline. Not all data needs to come from database queries, however. Many applications use a pub/sub or event subscription architecture to capture streaming data. This data could be activity logs aggregated from a number of servers, or live transaction data from a number of sources. In these cases, an event subscriber will be an early part of your data pipeline. Note that event subscription and data querying are not mutually exclusive operations. Events that come in through a pub/sub system can still be filtered based on various criteria; this is still a form of data querying. One potential issue with an event subscription model arises when it's combined with a batch-training scheme. If you require 5,000 data points but receive only 100 per second, your pipeline will need to maintain a buffer of data points until the target size is reached. There are various message-queuing systems that can assist with this, such as RabbitMQ or Redis. A pipeline requiring this type of functionality might hold messages in a queue until the target of 5,000 messages is achieved, and only then release the messages for batch processing through the rest of the pipeline. In the case that data is collected from multiple sources, it most likely will need to be joined or aggregated in some manner. Let's now take a look at a situation where data needs to be joined to data from an external API. Data joining and aggregation Let's return to our example of the Disqus content recommendation system. Imagine that the data pipeline is able to query likes and post metadata directly from the primary database, but that no system in the applications stores the post's text content. Instead, a microservice was developed in the form of an API that accepts a post ID or URL and returns the page's sanitized text content. In this case, the data pipeline will need to interact with the microservice API in order to get the text content for each post. This approach is perfectly valid, though if the frequency of post content requests is high, some caching or storage should probably be implemented. The data pipeline will need to employ an approach similar to the buffering of messages in the event subscription model. The pipeline can use a message queue to queue posts that still require content, and make requests to the content microservice for each post in the queue until the queue is depleted. As each post's content is retrieved it is added to the post metadata and stored in a separate queue for completed requests. Only when the source queue is depleted and the sink queue is full should the pipeline move on to the next step. Data joining does not necessarily need to involve a microservice API. If the pipeline collects data from two separate sources that need to be combined, a similar approach can be employed. The pipeline is the only component that needs to understand the relationship between the two data sources and formats, leaving both the data sources and the ML algorithm to operate independently of those details. The queue approach also works well when a data aggregation is required. An example of this situation is a pipeline in which the input is streaming input data and the output is token counts or value aggregations. Using a message queue is desirable in these situations as most message queues ensure that a message can be consumed only once, therefore preventing any duplication by the aggregator. This is especially valuable when the event stream is very high frequency, such that tokenizing each event as it comes in would lead to backups or server overload. Because message queues ensure that each message is consumed only once, high-frequency event data can stream directly into a queue where messages are consumed by multiple workers in parallel. Each worker might be responsible for tokenizing the event data and then pushing the token stream to a different message queue. The message queue software ensures that no two workers process the same event, and each worker can operate as an independent unit that is only concerned with tokenization. As the tokenizers push their results onto a new message queue, another worker can consume those messages and aggregate token counts, delivering its own results to the next step in the pipeline every second or minute or 1,000 events, whatever is appropriate for the application. The output of this style of pipeline might be fed into a continually updating Bayesian model, for example. One benefit of a data pipeline designed in this manner is performance. If you were to attempt to subscribe to high-frequency event data, tokenize each message, aggregate token counts, and update a model all in one system, you might be forced to use a very powerful (and expensive) single server. The server would simultaneously need a high-performance CPU, lots of RAM, and a high-throughput network connection. By breaking up the pipeline into stages, however, you can optimize each stage of the pipeline for its specific task and load condition. The message queue that receives the source event stream needs only to receive the event stream but does not need to process it. The tokenizer workers do not necessarily need to be high-performance servers, as they can be run in parallel. The aggregating queue and worker will process a large volume of data but will not need to retain data for longer than a few seconds and therefore may not need much RAM. The final model, which is a compressed version of the source data, can be stored on a more modest machine. Many components of the data pipeline can be built of commodity hardware simply because a data pipeline encourages modular design. In many cases, you will need to transform your data from format to format throughout the pipeline. That could mean converting from native data structures to JSON, transposing or interpolating values, or hashing values. We talked about two data pipelines components. In the next post, we will discuss several types of data transformations that may occur in the data pipeline. We will also discuss a few considerations to make when transporting and storing training data or serialized models. Create machine learning pipelines using unsupervised AutoML [Tutorial] Top AutoML libraries for building your ML pipelines
Read more
  • 0
  • 0
  • 21096

article-image-intelligent-mobile-projects-with-tensorflow-build-a-basic-raspberry-pi-robot-that-listens-moves-sees-and-speaks-tutorial
Bhagyashree R
27 Aug 2018
14 min read
Save for later

Intelligent mobile projects with TensorFlow: Build a basic Raspberry Pi robot that listens, moves, sees, and speaks [Tutorial]

Bhagyashree R
27 Aug 2018
14 min read
According to Wikipedia, "The Raspberry Pi is a series of small single-board computers developed in the United Kingdom by the Raspberry Pi Foundation to promote the teaching of basic computer science in schools and in developing countries." The official site of Raspberry Pi describes it as "a small and affordable computer that you can use to learn programming." If you have never heard of or used Raspberry Pi before, just go its website and chances are you'll quickly fall in love with the cool little thing. Little yet powerful—in fact, developers of TensorFlow made TensorFlow available on Raspberry Pi from early versions around mid-2016, so we can run complicated TensorFlow models on the tiny computer that you can buy for about $35. In this article we will see how to set up TensorFlow on Raspberry Pi and use the TensorFlow image recognition and audio recognition models, along with text to speech and robot movement APIs, to build a Raspberry Pi robot that can move, see, listen, and speak. This tutorial is an excerpt from a book written by Jeff Tang titled Intelligent Mobile Projects with TensorFlow. Setting up TensorFlow on Raspberry Pi To use TensorFlow in Python, we can install the TensorFlow 1.6 nightly build for Pi at the TensorFlow Jenkins continuous integrate site (http://ci.tensorflow.org/view/Nightly/job/nightly-pi/223/artifact/output-artifacts): sudo pip install http://ci.tensorflow.org/view/Nightly/job/nightly-pi/lastSuccessfulBuild/artifact/output-artifacts/tensorflow-1.6.0-cp27-none-any.whl This method is quite common. A more complicated method is to use the makefile, required when you need to build and use the TensorFlow library. The Raspberry Pi section of the official TensorFlow makefile documentation (https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/makefile) has detailed steps to build the TensorFlow library, but it may not work with every release of TensorFlow. The steps there work perfectly with an earlier version of TensorFlow (0.10), but would cause many "undefined reference to google::protobuf" errors with the TensorFlow 1.6. The following steps have been tested with the TensorFlow 1.6 release, downloadable at https://github.com/tensorflow/tensorflow/releases/tag/v1.6.0; you can certainly try a newer version in the TensorFlow releases page, or clone the latest TensorFlow source by git clone https://github.com/tensorflow/tensorflow, and fix any possible hiccups. After cd to your TensorFlow source root, we run the following commands: tensorflow/contrib/makefile/download_dependencies.sh sudo apt-get install -y autoconf automake libtool gcc-4.8 g++-4.8 cd tensorflow/contrib/makefile/downloads/protobuf/ ./autogen.sh ./configure make CXX=g++-4.8 sudo make install sudo ldconfig # refresh shared library cache cd ../../../../.. export HOST_NSYNC_LIB=`tensorflow/contrib/makefile/compile_nsync.sh` export TARGET_NSYNC_LIB="$HOST_NSYNC_LIB" Make sure you run make CXX=g++-4.8, instead of just make, as documented in the official TensorFlow Makefile documentation, because Protobuf must be compiled with the same gcc version as that used for building the following TensorFlow library, in order to fix those "undefined reference to google::protobuf" errors. Now try to build the TensorFlow library using the following command: make -f tensorflow/contrib/makefile/Makefile HOST_OS=PI TARGET=PI \ OPTFLAGS="-Os -mfpu=neon-vfpv4 -funsafe-math-optimizations -ftree-vectorize" CXX=g++-4.8 After a few hours of building, you'll likely get an error such as "virtual memory exhausted: Cannot allocate memory" or the Pi board will just freeze due to running out of memory. To fix this, we need to set up a swap, because without the swap, when an application runs out of the memory, the application will get killed due to a kernel panic. There are two ways to set up a swap: swap file and swap partition. Raspbian uses a default swap file of 100 MB on the SD card, as shown here using the free command: pi@raspberrypi:~/tensorflow-1.6.0 $ free -h total used free shared buff/cache available Mem: 927M 45M 843M 660K 38M 838M Swap: 99M 74M 25M To improve the swap file size to 1 GB, modify the /etc/dphys-swapfile file via sudo vi /etc/dphys-swapfile, changing CONF_SWAPSIZE=100 to CONF_SWAPSIZE=1024, then restart the swap file service: sudo /etc/init.d/dphys-swapfile stop sudo /etc/init.d/dphys-swapfile start After this, free -h will show the Swap total to be 1.0 GB. A swap partition is created on a separate USB disk and is preferred because a swap partition can't get fragmented but a swap file on the SD card can get fragmented easily, causing slower access. To set up a swap partition, plug a USB stick with no data you need on it to the Pi board, then run sudo blkid, and you'll see something like this: /dev/sda1: LABEL="EFI" UUID="67E3-17ED" TYPE="vfat" PARTLABEL="EFI System Partition" PARTUUID="622fddad-da3c-4a09-b6b3-11233a2ca1f6" /dev/sda2: UUID="E67F-6EAB" TYPE="vfat" PARTLABEL="NO NAME" PARTUUID="a045107a-9e7f-47c7-9a4b-7400d8d40f8c" /dev/sda2 is the partition we'll use as the swap partition. Now unmount and format it to be a swap partition: sudo umount /dev/sda2 sudo mkswap /dev/sda2 mkswap: /dev/sda2: warning: wiping old swap signature. Setting up swapspace version 1, size = 29.5 GiB (31671701504 bytes) no label, UUID=23443cde-9483-4ed7-b151-0e6899eba9de You'll see a UUID output in the mkswap command; run sudo vi /etc/fstab, add a line as follows to the fstab file with the UUID value: UUID=<UUID value> none swap sw,pri=5 0 0 Save and exit the fstab file and then run sudo swapon -a. Now if you run free -h again, you'll see the Swap total to be close to the USB storage size. We definitely don't need all that size for swap—in fact, the recommended maximum swap size for the Raspberry Pi 3 board with 1 GB memory is 2 GB, but we'll leave it as is because we just want to successfully build the TensorFlow library. With either of the swap setting changes, we can rerun the make command: make -f tensorflow/contrib/makefile/Makefile HOST_OS=PI TARGET=PI \ OPTFLAGS="-Os -mfpu=neon-vfpv4 -funsafe-math-optimizations -ftree-vectorize" CXX=g++-4.8 After this completes, the TensorFlow library will be generated as tensorflow/contrib/makefile/gen/lib/libtensorflow-core.a. Now we can build the image classification example using the library. Image recognition and text to speech There are two TensorFlow Raspberry Pi example apps (https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/pi_examples) located in tensorflow/contrib/pi_examples: label_image and camera. We'll modify the camera example app to integrate text to speech so the app can speak out its recognized images when moving around. Before we build and test the two apps, we need to install some libraries and download the pre-built TensorFlow Inception model file: sudo apt-get install -y libjpeg-dev sudo apt-get install libv4l-dev curl https://storage.googleapis.com/download.tensorflow.org/models/inception_dec_2015_stripped.zip -o /tmp/inception_dec_2015_stripped.zip cd ~/tensorflow-1.6.0 unzip /tmp/inception_dec_2015_stripped.zip -d tensorflow/contrib/pi_examples/label_image/data/ To build the label_image and camera apps, run: make -f tensorflow/contrib/pi_examples/label_image/Makefile make -f tensorflow/contrib/pi_examples/camera/Makefile You may encounter the following error when building the apps: ./tensorflow/core/platform/default/mutex.h:25:22: fatal error: nsync_cv.h: No such file or directory #include "nsync_cv.h" ^ compilation terminated. To fix this, run sudo cp tensorflow/contrib/makefile/downloads/nsync/public/nsync*.h /usr/include. Then edit the tensorflow/contrib/pi_examples/label_image/Makefile or  tensorflow/contrib/pi_examples/camera/Makefile file, add the following library, and include paths before running the make command again: -L$(DOWNLOADSDIR)/nsync/builds/default.linux.c++11 \ -lnsync \ To test run the two apps, run the apps directly: tensorflow/contrib/pi_examples/label_image/gen/bin/label_image tensorflow/contrib/pi_examples/camera/gen/bin/camera Take a look at the C++ source code,  tensorflow/contrib/pi_examples/label_image/label_image.cc and tensorflow/contrib/pi_examples/camera/camera.cc, and you'll see they use the similar C++ code as in our iOS apps in the previous chapters to load the model graph file, prepare input tensor, run the model, and get the output tensor. By default, the camera example also uses the prebuilt Inception model unzipped in the label_image/data folder. But for your own specific image classification task, you can provide your own model retrained via transfer learning using the --graph parameter when running the two example apps. In general, voice is a Raspberry Pi robot's main UI to interact with us. Ideally, we should run a TensorFlow-powered natural-sounding Text-to-Speech (TTS) model such as WaveNet (https://deepmind.com/blog/wavenet-generative-model-raw-audio) or Tacotron (https://github.com/keithito/tacotron), but it'd be beyond the scope of this article to run and deploy such a model. It turns out that we can use a much simpler TTS library called Flite by CMU (http://www.festvox.org/flite), which offers pretty decent TTS, and it takes just one simple command to install it: sudo apt-get install flite. If you want to install the latest version of Flite to hopefully get a better TTS quality, just download the latest Flite source from the link and build it. To test Flite with our USB speaker, run flite with the -t parameter followed by a double quoted text string such as  flite -t "i recommend the ATM machine". If you don't like the default voice, you can find other supported voices by running flite -lv, which should return Voices available: kal awb_time kal16 awb rms slt. Then you can specify a voice used for TTS: flite -voice rms -t "i recommend the ATM machine". To let the camera app speak out the recognized objects, which should be the desired behavior when the Raspberry Pi robot moves around, you can use this simple pipe command: tensorflow/contrib/pi_examples/camera/gen/bin/camera | xargs -n 1 flite -t You'll likely hear too much voice. To fine tune the TTS result of image classification, you can also modify the camera.cc file and add the following code to the PrintTopLabels function before rebuilding the example using make -f tensorflow/contrib/pi_examples/camera/Makefile: std::string cmd = "flite -voice rms -t \""; cmd.append(labels[label_index]); cmd.append("\""); system(cmd.c_str()); Now that we have completed the image classification and speech synthesis tasks, without using any Cloud APIs, let's see how we can do audio recognition on Raspberry Pi. Audio recognition and robot movement To use the pre-trained audio recognition model in the TensorFlow tutorial (https://www.tensorflow.org/tutorials/audio_recognition), we'll reuse a listen.py Python script from https://gist.github.com/aallan, and add the GoPiGo API calls to control the robot movement after it recognizes four basic audio commands: "left," "right," "go," and "stop." The other six commands supported by the pre-trained model—"yes," "no," "up," "down," "on," and "off"—don't apply well in our example. To run the script, first download the pre-trained audio recognition model from http://download.tensorflow.org/models/speech_commands_v0.01.zip and unzip it to /tmp for example, to the Pi board's /tmp directory, then run: python listen.py --graph /tmp/conv_actions_frozen.pb --labels /tmp/conv_actions_labels.txt -I plughw:1,0 Or you can run: python listen.py --graph /tmp/speech_commands_graph.pb --labels /tmp/conv_actions_labels.txt -I plughw:1,0 Note that plughw value 1,0 should match the card number and device number of your USB microphone, which can be found using the arecord -l command we showed before. The listen.py script also supports many other parameters. For example, we can use --detection_threshold 0.5 instead of the default detection threshold 0.8. Let's now take a quick look at how listen.py works before we add the GoPiGo API calls to make the robot move. listen.py uses Python's subprocess module and its Popen class to spawn a new process of running the arecord command with appropriate parameters. The Popen class has an stdout attribute that specifies the arecord executed command's standard output file handle, which can be used to read the recorded audio bytes. The Python code to load the trained model graph is as follows: with tf.gfile.FastGFile(filename, 'rb') as f: graph_def = tf.GraphDef() graph_def.ParseFromString(f.read()) tf.import_graph_def(graph_def, name='') A TensorFlow session is created using tf.Session() and after the graph is loaded and session created, the recorded audio buffer gets sent, along with the sample rate, as the input data to the TensorFlow session's run method, which returns the prediction of the recognition: run(softmax_tensor, { self.input_samples_name_: input_data, self.input_rate_name_: self.sample_rate_ }) Here, softmax_tensor is defined as the TensorFlow graph's get_tensor_by_name(self.output_name_), and output_name_,  input_samples_name_, and input_rate_name_ are defined as  labels_softmax, decoded_sample_data:0, decoded_sample_data:1, respectively. On Raspberry Pi, you can choose to run the TensorFlow models on Pi using the TensorFlow Python API directly, or C++ API (as in the label_image and camera examples), although normally you'd still train the models on a more powerful computer. For the complete TensorFlow Python API documentation, see https://www.tensorflow.org/api_docs/python. To use the GoPiGo Python API to make the robot move based on your voice command, first add the following two lines to listen.py: import easygopigo3 as gpg gpg3_obj = gpg.EasyGoPiGo3() Then add the following code to the end of the def add_data method: if current_top_score > self.detection_threshold_ and time_since_last_top > self.suppression_ms_: self.previous_top_label_ = current_top_label self.previous_top_label_time_ = current_time_ms is_new_command = True logger.info(current_top_label) if current_top_label=="go": gpg3_obj.drive_cm(10, False) elif current_top_label=="left": gpg3_obj.turn_degrees(-30, False) elif current_top_label=="right": gpg3_obj.turn_degrees(30, False) elif current_top_label=="stop": gpg3_obj.stop() Now put your Raspberry Pi robot on the ground, connect to it with ssh from your computer, and run the following script: python listen.py --graph /tmp/conv_actions_frozen.pb --labels /tmp/conv_actions_labels.txt -I plughw:1,0 --detection_threshold 0.5 You'll see output like this: INFO:audio:started recording INFO:audio:_silence_ INFO:audio:_silence_ Then you can say left, right, stop, go, and stop to see the commands get recognized and the robot moves accordingly: INFO:audio:left INFO:audio:_silence_ INFO:audio:_silence_ INFO:audio:right INFO:audio:_silence_ INFO:audio:stop INFO:audio:_silence_ INFO:audio:go INFO:audio:stop You can run the camera app in a separate Terminal, so while the robot moves around based on your voice commands, it'll recognize new images it sees and speak out the results. That's all it takes to build a basic Raspberry Pi robot that listens, moves, sees, and speaks—what the Google I/O 2016 demo does but without using any Cloud APIs. It's far from a fancy robot that can understand natural human speech, engage in interesting conversations, or perform useful and non-trivial tasks. But powered with pre-trained, retrained, or other powerful TensorFlow models, and using all kinds of sensors, you can certainly add more and more intelligence and physical power to the Pi robot we have built. Google TensorFlow is used to train all the models deployed and running on mobile devices. This book covers 10 projects on the implementation of all major AI areas on iOS, Android, and Raspberry Pi: computer vision, speech and language processing, and machine learning, including traditional, reinforcement, and deep reinforcement. If you liked this tutorial and would like to implement projects for major AI areas on iOS, Android, and Raspberry Pi, check out the book Intelligent Mobile Projects with TensorFlow. TensorFlow 2.0 is coming. Here’s what we can expect. Build and train an RNN chatbot using TensorFlow [Tutorial] Use TensorFlow and NLP to detect duplicate Quora questions [Tutorial]
Read more
  • 0
  • 0
  • 24043
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at €18.99/month. Cancel anytime
article-image-unit-testing-angular-components-and-classes
Natasha Mathur
27 Aug 2018
12 min read
Save for later

Unit testing Angular components and classes [Tutorial]

Natasha Mathur
27 Aug 2018
12 min read
Testing (and more specifically, unit testing) is meant to be carried out by the developer as the project is being developed. In this article, we will see how to implement testing tools to perform proper unit testing for your application classes and components. This tutorial is an excerpt taken from the book Learning Angular (Second Edition) written by Christoffer Noring, Pablo Deeleman. When venturing into unit testing in Angular, it's important to know what major parts it consists of. In Angular these are: Jasmine, the testing framework Angular testing utilities Karma, a test runner for running unit tests, among other things Protractor, Angular's framework for E2E testing Configuration and setting up of Angular CLI In terms of configuration, when using the Angular CLI, you don't have to do anything to make it work. You can, as soon as you scaffold a project, run your first test and it will work.  The Angular CLI is using Karma as the test runner. What we need to know about Karma is that it uses a karma.conf.js file, a configuration file, in which a lot of things are specified, such as: The various plugins that enhance your test runner. Where to find the tests to run?  It should be said that there is usually a files property in this file specifying where to find the application and the tests. For the Angular CLI, however, this specification is found in another file called src/tscconfig-spec.json. Setup of your selected coverage tool, a tool that measures to what degree your tests cover the production code. Reporters report every executed test in a console window, to a browser, or through some other means. Browsers run your tests in: for example, Chrome or PhantomJS. Using the Angular CLI, you most likely won't need to change or edit this file yourself. It is good to know that it exists and what it does for you. Angular testing utilities The Angular testing utilities help to create a testing environment that makes writing tests for your various constructs really easy. It consists of the TestBed class and various helper functions, found under the @angular/core/testing namespace. Let's have a look at what these are and how they can help us to test various constructs. We will shortly introduce the most commonly used concepts so that you are familiar with them as we present them more deeply further on: The TestBed class is the most important concept and creates its own testing module. In reality, when you test out a construct to detach it from the module it resides in and reattach it to the testing module created by the TestBed. The TestBed class has a configureTestModule() helper method that we use to set up the test module as needed. The TestBed can also instantiate components. ComponentFixture is a class wrapping the component instance. This means that it has some functionality on it and it has a member that is the component instance itself. The DebugElement, much like the ComponentFixture, acts as a wrapper. It, however, wraps the DOM element and not the component instance. It's a bit more than that though, as it has an injector on it that allows us to access the services that have been injected into a component. This was a brief overview of our testing environment, the frameworks, and libraries used. Now let's discuss component testing. Introduction to component testing A usual method of operation for doing anything Angular is to use the Angular CLI. Working with tests is no different. The Angular CLI lets us create tests, debug them, and run them; it also gives us an understanding of how well our tests cover the code and its many scenarios. Component testing with dependencies We have learned a lot already, but let's face it, no component that we build will be as simple as the one we wrote in the preceding section. There will almost certainly be at least one dependency, looking like this: @Component({}) export class ExampleComponent { constructor(dependency:Dependency) {} } We have different ways of dealing with testing such a situation. One thing is clear though: if we are testing the component, then we should not test the service as well. This means that when we set up such a test, the dependency should not be the real thing. There are different ways of dealing with that when it comes to unit testing; no solution is strictly better than the other: Using a stub means that we tell the dependency injector to inject a stub that we provide, instead of the real thing. Injecting the real thing, but attaching a spy, to the method that we call in our component. Regardless of the approach, we ensure that the test is not performing a side effect such as talking to a filesystem or attempting to communicate via HTTP; we are, using this approach, isolated. Using a stub to replace the dependency Using a stub means that we completely replace what was there before. It is as simple to do as instructing the TestBed in the following way: TestBed.configureTestingModule({ declarations: [ExampleComponent] providers: [{ provide: DependencyService, useClass: DependencyServiceStub }] }); We define a providers array like we do with the NgModule, and we give it a list item that points out the definition we intend to replace and we give it the replacement instead; that is our stub. Let's now build our DependencyStub to look like this: class DependencyServiceStub { getData() { return 'stub'; } } Just like with an @NgModule, we are able to override the definition of our dependency with our own stub. Imagine our component looks like the following: import { Component } from '@angular/core'; import { DependencyService } from "./dependency.service"; @Component({ selector: 'example', template: ` <div>{{ title }}</div> ` }) export class ExampleComponent { title: string; constructor(private dependency: DependencyService) { this.title = this.dependency.getData(); } } Here we pass an instance of the dependency in the constructor. With our testing module correctly set up, with our stub, we can now write a test that looks like this: it(`should have as title 'stub'`, async(() => { const fixture = TestBed.createComponent(AppComponent); const app = fixture.debugElement.componentInstance; expect(app.title).toEqual('stub'); })); The test looks normal, but at the point when the dependency would be called in the component code, our stub takes its place and responds instead. Our dependency should be overridden, and as you can see, the expect(app.title).toEqual('stub') assumes the stub will answer, which it does. Spying on the dependency method The previously-mentioned approach, using a stub, is not the only way to isolate ourselves in a unit test. We don't have to replace the entire dependency, only the parts that our component is using. Replacing certain parts means that we point out specific methods on the dependency and assign a spy to them. A spy is an interesting construct; it has the ability to answer what you want it to answer, but you can also see how many times it is being called and with what argument/s, so a spy gives you a lot more information about what is going on. Let's have a look at how we would set a spy up: beforeEach(() => { TestBed.configureTestingModule({ declarations: [ExampleComponent], providers: [DependencyService] }); dependency = TestBed.get(DependencyService); spy = spyOn( dependency,'getData'); fixture = TestBed.createComponent(ExampleComponent); }) Now as you can see, the actual dependency is injected into the component. After that, we grab a reference to the component, our fixture variable. This is followed by us using the TestBed.get('Dependency') to get hold of the dependency inside of the component. At this point, we attach a spy to its getData() method through the spyOn( dependency,'getData') call. This is not enough, however; we have yet to instruct the spy what to respond with when being called. Let us do just that: spyOn(dependency,'getData').and.returnValue('spy value'); We can now write our test as usual: it('test our spy dependency', () => { var component = fixture.debugElement.componentInstance; expect(component.title).toBe('spy value'); }); This works as expected, and our spy responds as it should. Remember how we said that spies were capable of more than just responding with a value, that you could also check whether they were invoked and with what? To showcase this, we need to improve our tests a little bit and check for this extended functionality, like so: it('test our spy dependency', () => { var component = fixture.debugElement.componentInstance; expect(spy.calls.any()).toBeTruthy(); }) You can also check for the number of times it was called, with spy.callCount, or whether it was called with some specific arguments: spy.mostRecentCalls.args or spy.toHaveBeenCalledWith('arg1', 'arg2'). Remember if you use a spy, make sure it pays for itself by you needing to do checks like these; otherwise, you might as well use a stub. Spies are a feature of the Jasmine framework, not Angular. The interested reader is urged to research this topic further at http://tobyho.com/2011/12/15/jasmine-spy-cheatsheet/. Async services Very few services are nice and well-behaved, in the sense that they are synchronous. A lot of the time, your service will be asynchronous and the return from it is most likely an observable or a promise. If you are using RxJS with the Http service or HttpClient, it will be observable, but if using the fetch API, it will be a promise. These are two good options for dealing with HTTP, but the Angular team added the RxJS library to Angular to make your life as a developer easier. Ultimately it's up to you, but we recommend going with RxJS. Angular has two constructs ready to tackle the asynchronous scenario when testing: async() and whenStable(): This code ensures that any promises are immediately resolved; it can look more synchronous though fakeAsync() and tick(): This code does what the async does but it looks more synchronous when used Let's describe the async() and whenStable() approaches. Our service has now grown up and is doing something asynchronous when we call it like a timeout or an HTTP call. Regardless of which, the answer doesn't reach us straightaway. By using async() in combination with whenStable(), we can, however, ensure that any promises are immediately resolved. Imagine our service now looks like this: export class AsyncDependencyService { getData(): Promise<string> { return new Promise((resolve, reject) => { setTimeout(() => { resolve('data') }, 3000); }) } } We need to change our spy setup to return a promise instead of returning a static string, like so: spy = spyOn(dependency,'getData') .and.returnValue(Promise.resolve('spy data')); We do need to change inside of our component, like so: import { Component, OnInit } from '@angular/core'; import { AsyncDependencyService } from "./async.dependency.service"; @Component({ selector: 'async-example', template: ` <div>{{ title }}</div> ` }) export class AsyncExampleComponent { title: string; constructor(private service: AsyncDependencyService) { this.service.getData().then(data => this.title = data); } } At this point, it's time to update our tests. We need to do two more things. We need to tell our test method to use the async() function, like so: it('async test', async() => { // the test body }) We also need to call fixture.whenStable() to make sure that the promise will have had ample time to resolve, like so: import { TestBed } from "@angular/core/testing"; import { AsyncExampleComponent } from "./async.example.component"; import { AsyncDependencyService } from "./async.dependency.service"; describe('test an component with an async service', () => { let fixture; beforeEach(() => { TestBed.configureTestingModule({ declarations: [AsyncExampleComponent], providers: [AsyncDependencyService] }); fixture = TestBed.createComponent(AsyncExampleComponent); }); it('should contain async data', async () => { const component = fixture.componentInstance; fixture.whenStable.then(() => { fixture.detectChanges(); expect(component.title).toBe('async data'); }); }); }); This version of doing it works as it should, but feels a bit clunky. There is another approach using fakeAsync() and tick(). Essentially, fakeAsync() replaces the async() call and we get rid of whenStable(). The big benefit, however, is that we no longer need to place our assertion statements inside of the promise's then() callback. This gives us synchronous-looking code. Back to fakeAsync(), we need to make a call to tick(), which can only be called within a fakeAsync() call, like so: it('async test', fakeAsync() => { let component = fixture.componentInstance; fixture.detectChanges(); fixture.tick(); expect(component.title).toBe('spy data'); }); As you can see, this looks a lot cleaner; which version you want to use for async testing is up to you. Testing pipes A pipe is basically a class that implements the PipeTransform interface, thus exposing a transform() method that is usually synchronous. Pipes are therefore very easy to test. We will begin by testing a simple pipe, creating, as we mentioned, a test spec right next to its code unit file. The code is as follows: import { Pipe, PipeTransform } from '@angular/core'; @Pipe({ name: 'formattedpipe' }) export class FormattedPipe implements PipeTransform { transform(value: any, ...args: any[]): any { return "banana" + value; } } Our code is very simple; we take a value and add banana to it. Writing a test for it is equally simple. The only thing we need to do is to import the pipe and verify two things: That it has a transform method That it produces the expected results The following code writes a test for each of the bullet points listed earlier: import FormattedTimePipe from './formatted-time.pipe'; import { TestBed } from '@angular/core/testing'; describe('A formatted time pipe' , () => { let fixture; beforeEach(() => { fixture = new FormattedTimePipe(); }) // Specs with assertions it('should expose a transform() method', () => { expect(typeof formattedTimePipe.transform).toEqual('function'); }); it('should produce expected result', () => { expect(fixture.transform( 'val' )).toBe('bananaval'); }) }); In our beforeEach() method, we set up the fixture by instantiating the pipe class. In the first test, we ensure that the transform() method exists. This is followed by our second test that asserts that the transform() method produces the expected result. We saw how to code powerful tests for our components and pipes. If you found this post useful, be sure to check out the book Learning Angular (Second Edition) to learn about mocking HTTP responses and unit testing for routes, input, and output, directives, etc. Getting started with Angular CLI and build your first Angular Component Building Components Using Angular Why switch to Angular for web development – Interview with Minko Gechev
Read more
  • 0
  • 0
  • 29940

article-image-build-botnet-detectors-using-machine-learning-algorithms-in-python-tutorial
Melisha Dsouza
26 Aug 2018
12 min read
Save for later

Build botnet detectors using machine learning algorithms in Python [Tutorial]

Melisha Dsouza
26 Aug 2018
12 min read
Botnets are connected computers that perform a number of repetitive tasks to keep websites going. Connected devices play an important role in modern life. From smart home appliances, computers, coffee machines, and cameras, to connected cars, this huge shift in our lifestyles has made our lives easier. Unfortunately, these exposed devices could be easily targeted by attackers and cybercriminals who could use them later to enable larger-scale attacks. Security vendors provide many solutions and products to defend against botnets, but in this tutorial, we are going to learn how to build novel botnet detection systems with Python and machine learning techniques. You will find all the code discussed, in addition to some other useful scripts, in the following repository: https://github.com/PacktPublishing/Mastering-Machine-Learning-for-Penetration-Testing/tree/master/Chapter05 This article is an excerpt from a book written by Chiheb Chebbi titled Mastering Machine Learning for Penetration Testing We are going to learn how to build different botnet detection systems with many machine learning algorithms. As a start to a first practical lab, let's start by building a machine learning-based botnet detector using different classifiers. By now, I hope you have acquired a clear understanding about the major steps of building machine learning systems. So, I believe that you already know that, as a first step, we need to look for a dataset. Many educational institutions and organizations are given a set of collected datasets from internal laboratories. One of the most well known botnet datasets is called the CTU-13 dataset. It is a labeled dataset with botnet, normal, and background traffic delivered by CTU University, Czech Republic. During their work, they tried to capture real botnet traffic mixed with normal traffic and background traffic. To download the dataset and check out more information about it, you can visit the following link: https://mcfp.weebly.com/the-ctu-13-dataset-a-labeled-dataset-with-botnet-normal-and-background-traffic.html. The dataset is bidirectional NetFlow files. But what are bidirectional NetFlow files? Netflow is an internet protocol developed by Cisco. The goal of this protocol is to collect IP traffic information and monitor network traffic in order to have a clearer view about the network traffic flow. The main components of a NetFlow architecture are a NetFlow Exporter, a Netflow collector, and a Flow Storage. The following diagram illustrates the different components of a NetFlow infrastructure: When it comes to NetFlow generally, when host A sends an information to host B and from host B to host A as a reply, the operation is named unidirectional NetFlow. The sending and the reply are considered different operations. In bidirectional NetFlow, we consider the flows from host A and host B as one flow. Let's download the dataset by using the following command: $ wget --no-check-certificate https://mcfp.felk.cvut.cz/publicDatasets/CTU-13-Dataset/CTU-13-Dataset.tar.bz2 Extract the downloaded tar.bz2 file by using the following command: # tar xvjf CTU-13-Dataset.tar.bz2 The file contains all the datasets, with the different scenarios. For the demonstration, we are going to use dataset 8 (scenario 8). You can select any scenario or you can use your own collected data, or any other .binetflow files delivered by other institutions: Load the data using pandas as usual: >>> import pandas as pd >>> data = pd.read_csv("capture20110816-3.binetflow") >>> data['Label'] = data.Label.str.contains("Botnet") Exploring the data is essential in any data-centric project. For example, you can start by checking the names of the features or the columns: >> data.columns The command results in the columns of the dataset: StartTime, Dur, Proto, SrcAddr, Sport, Dir, DstAddr, Dport, State, sTos, dTos, TotPkts, TotBytes, SrcBytes, and Label. The columns represent the features used in the dataset; for example, Dur represents duration, Sport represents the source port, and so on. You can find the full list of features in the chapter's GitHub repository. Before training the model, we need to build some scripts to prepare the data. This time, we are going to build a separate Python script to prepare data, and later we can just import it into the main script. I will call the first script DataPreparation.py. There are many proposals done to help extract the features and prepare data to build botnet detectors using machine learning. In our case, I customized two new scripts inspired by the data loading scripts built by NagabhushanS: from __future__ import division import os, sys import threading After importing the required Python packages, we created a class called Prepare to select training and testing data: class Prepare(threading.Thread): def __init__(self, X, Y, XT, YT, accLabel=None): threading.Thread.__init__(self) self.X = X self.Y = Y self.XT=XT self.YT=YT self.accLabel= accLabel def run(self): X = np.zeros(self.X.shape) Y = np.zeros(self.Y.shape) XT = np.zeros(self.XT.shape) YT = np.zeros(self.YT.shape) np.copyto(X, self.X) np.copyto(Y, self.Y) np.copyto(XT, self.XT) np.copyto(YT, self.YT) for i in range(9): X[:, i] = (X[:, i] - X[:, i].mean()) / (X[:, i].std()) for i in range(9): XT[:, i] = (XT[:, i] - XT[:, i].mean()) / (XT[:, i].std()) The second script is called LoadData.py. You can find it on GitHub and use it directly in your projects to load data from .binetflow files and generate a pickle file. Let's use what we developed previously to train the models. After building the data loader and preparing the machine learning algorithms that we are going to use, it is time to train and test the models. First, load the data from the pickle file, which is why we need to import the pickle Python library. Don't forget to import the previous scripts using: import LoadData import DataPreparation import pickle file = open('flowdata.pickle', 'rb') data = pickle.load(file) Select the data sections: Xdata = data[0] Ydata = data[1] XdataT = data[2] YdataT = data[3] As machine learning classifiers, we are going to try many different algorithms so later we can select the best algorithm for our model. Import the required modules to use four machine learning algorithms from sklearn: from sklearn.linear_model import * from sklearn.tree import * from sklearn.naive_bayes import * from sklearn.neighbors import * Prepare the data by using the previous module build. Don't forget to import DataPreparation by typing import DataPreparation: >>> DataPreparation.Prepare(Xdata,Ydata,XdataT,YdataT) Now, we can train the models; and to do that, we are going to train the model with different techniques so later we can select the most suitable machine learning technique for our project. The steps are like what we learned in previous projects: after preparing the data and selecting the features, define the machine learning algorithm, fit the model, and print out the score after defining its variable. As machine learning classifiers, we are going to test many of them. Let's start with a decision tree: Decision tree model: >>> clf = DecisionTreeClassifier() >>> clf.fit(Xdata,Ydata) >>> Prediction = clf.predict(XdataT) >>> Score = clf.score(XdataT,YdataT) >>> print (“The Score of the Decision Tree Classifier is”, Score * 100) The score of the decision tree classifier is 99% Logistic regression model: >>> clf = LogisticRegression(C=10000) >>> clf.fit(Xdata,Ydata) >>> Prediction = clf.predict(XdataT) >>> Score = clf.score(XdataT,YdataT) >>> print ("The Score of the Logistic Regression Classifier is", Score * 100) The score of the logistic regression classifier is 96% Gaussian Naive Bayes model: >>> clf = GaussianNB() >>> clf.fit(Xdata,Ydata) >>> Prediction = clf.predict(XdataT) >>> Score = clf.score(XdataT,YdataT) >>> print("The Score of the Gaussian Naive Bayes classifier is", Score * 100) The score of the Gaussian Naive Bayes classifier is 72% k-Nearest Neighbors model: >>> clf = KNeighborsClassifier() >>> clf.fit(Xdata,Ydata) >>> Prediction = clf.predict(XdataT) >>> Score = clf.score(XdataT,YdataT) >>> print("The Score of the K-Nearest Neighbours classifier is", Score * 100) The score of the k-Nearest Neighbors classifier is 96% Neural network model: To build a Neural network Model use the following code: >>> from keras.models import * >>> from keras.layers import Dense, Activation >>> from keras.optimizers import * model = Sequential() model.add(Dense(10, input_dim=9, activation="sigmoid")) model.add(Dense(10, activation='sigmoid')) model.add(Dense(1)) sgd = SGD(lr=0.01, decay=0.000001, momentum=0.9, nesterov=True) model.compile(optimizer=sgd, loss='mse') model.fit(Xdata, Ydata, nb_epoch=200, batch_size=100) Score = model.evaluate(XdataT, YdataT, verbose=0) Print(“The Score of the Neural Network is”, Score * 100 ) With this code, we imported the required Keras modules, we built the layers, we compiled the model with an SGD optimizer, we fit the model, and we printed out the score of the model. How to build a Twitter bot detector In the previous sections, we saw how to build a machine learning-based botnet detector. In this new project, we are going to deal with a different problem instead of defending against botnet malware. We are going to detect Twitter bots because they are also dangerous and can perform malicious actions. For the model, we are going to use the NYU Tandon Spring 2017 Machine Learning Competition: Twitter Bot classification dataset. You can download it from this link: https://www.kaggle.com/c/twitter-bot-classification/data. Import the required Python packages: >>> import pandas as pd >>> import numpy as np >>> import seaborn Let's load the data using pandas and highlight the bot and non-bot data: >>> data = pd.read_csv('training_data_2_csv_UTF.csv') >>> Bots = data[data.bot==1] >> NonBots = data[data.bot==0] Visualization with seaborn In every project, I want to help you discover new data visualization Python libraries because, as you saw, data engineering and visualization are essential to every modern data-centric project. This time, I chose seaborn to visualize the data and explore it before starting the training phase. Seaborn is a Python library for making statistical visualizations. The following is an example of generating a plot with seaborn: >>> data = np.random.multivariate_normal([0, 0], [[5, 2], [2, 2]], size=2000) >>> data = pd.DataFrame(data, columns=['x', 'y']) >>> for col in 'xy': ... seaborn.kdeplot(data[col], shade=True) For example, in our case, if we want to identify the missing data: matplotlib.pyplot.figure(figsize=(10,6)) seaborn.heatmap(data.isnull(), yticklabels=False, cbar=False, cmap='viridis') matplotlib.pyplot.tight_layout() The previous two code snippets were some examples to learn how to visualize data. Visualization helps data scientists to explore and learn more about the data. Now, let's go back and continue building our model. Identify the bag of words by selecting some bad words used by Twitter bots. The following is an example of bad words used by a bot. Of course, you can add more words: bag_of_words_bot = r'bot|b0t|cannabis|tweet me|mishear|follow me|updates every|gorilla|yes_ofc|forget' \ r'expos|kill|bbb|truthe|fake|anony|free|virus|funky|RNA|jargon' \ r'nerd|swag|jack|chick|prison|paper|pokem|xx|freak|ffd|dunia|clone|genie|bbb' \ r'ffd|onlyman|emoji|joke|troll|droop|free|every|wow|cheese|yeah|bio|magic|wizard|face' Now, it is time to identify training features: data['screen_name_binary'] = data.screen_name.str.contains(bag_of_words_bot, case=False, na=False) data['name_binary'] = data.name.str.contains(bag_of_words_bot, case=False, na=False) data['description_binary'] = data.description.str.contains(bag_of_words_bot, case=False, na=False) data['status_binary'] = data.status.str.contains(bag_of_words_bot, case=False, na=False) Feature extraction: Let's select features to use in our model: data['listed_count_binary'] = (data.listed_count>20000)==False features = ['screen_name_binary', 'name_binary', 'description_binary', 'status_binary', 'verified', 'followers_count', 'friends_count', 'statuses_count', 'listed_count_binary', 'bot'] Now, train the model with a decision tree classifier: from sklearn.tree import DecisionTreeClassifier from sklearn.metrics import accuracy_score, roc_curve, auc from sklearn.model_selection import train_test_split We import some previously discussed modules: X = data[features].iloc[:,:-1] y = data[features].iloc[:,-1] We define the classifier: clf = DecisionTreeClassifier(criterion='entropy', min_samples_leaf=50, min_samples_split=10) We split the classifier: X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=101) We fit the model: clf.fit(X_train, y_train) y_pred_train = clf.predict(X_train) y_pred_test = clf.predict(X_test) We print out the accuracy scores: print("Training Accuracy: %.5f" %accuracy_score(y_train, y_pred_train)) print("Test Accuracy: %.5f" %accuracy_score(y_test, y_pred_test)) Our model detects Twitter bots with an 88% detection rate, which is a good accuracy rate. This technique is not the only possible way to detect botnets. Researchers have proposed many other models based on different machine learning algorithms, such as Linear SVM and decision trees. All these techniques have an accuracy of 90%. Most studies showed that feature engineering was a key contributor to improving machine learning models. To study a real-world case, check out a paper called What we learn from learning - Understanding capabilities and limitations of machine learning in botnet attacks (https://arxiv.org/pdf/1805.01333.pdf), conducted by David Santana, Shan Suthaharan, and Somya Mohanty. Summary In this tutorial, we learned how to build a botnet detector and a Twitter botnet detecter with different machine learning algorithms. To become a master at penetration testing using machine learning with Python, check out this book Mastering Machine Learning for Penetration Testing Cisco and Huawei Routers hacked via backdoor attacks and botnets How to protect yourself from a botnet attack Tackle trolls with Machine Learning bots: Filtering out inappropriate content just got easy
Read more
  • 0
  • 0
  • 41089

article-image-understanding-amazon-machine-learning-workflow
Natasha Mathur
24 Aug 2018
11 min read
Save for later

Understanding Amazon Machine Learning Workflow [ Tutorial ]

Natasha Mathur
24 Aug 2018
11 min read
This article presents an overview of the workflow of a simple Amazon Machine Learning (Amazon ML) project. Amazon Machine Learning is an online service by Amazon Web Services (AWS) that does supervised learning for predictive analytics. Launched in April 2015 at the AWS Summit, Amazon ML joins a growing list of cloud-based machine learning services, such as Microsoft Azure, Google prediction, IBM Watson, Prediction IO, BigML, and many others. These online machine learning services form an offer commonly referred to as Machine Learning as a Service or MLaaS following a similar denomination pattern of other cloud-based services such as SaaS, PaaS, and IaaS respectively for Software, Platform, or Infrastructure as a Service. The Amazon ML workflow closely follows a standard Data Science workflow with steps: Extract the data and clean it up. Make it available to the algorithm. Split the data into a training and validation set, typically a 70/30 split with equal distribution of the predictors in each part. Select the best model by training several models on the training dataset and comparing their performances on the validation dataset. Use the best model for predictions on new data. This article is an excerpt taken from the book 'Effective Amazon Machine Learning' written by Alexis Perrier. As shown in the following Amazon ML menu, the service is built around four objects: Datasource ML model Evaluation Prediction The Datasource and Model can also be configured and set up in the same flow by creating a new Datasource and ML model. We will take a closer look at the Datasource and ML model. Amazon ML  dataset For the rest of the article, we will use the simple Predicting Weight by Height and Age dataset (from Lewis Taylor (1967)) with 237 samples of children's age, weight, height, and gender, which is available at https://v8doc.sas.com/sashtml/stat/chap55/sect51.htm. This dataset is composed of 237 rows. Each row has the following predictors: sex (F, M), age (in months), height (in inches), and we are trying to predict the weight (in lbs) of these children. There are no missing values and no outliers. The variables are close enough in range and normalization is not required. In short, we do not need to carry out any preprocessing or cleaning on the original dataset. Age, height, and weight are numerical variables (real-valued), and sex is a categorical variable. We will randomly select 20% of the rows as the held-out subset to use for the prediction of previously unseen data and keep the other 80% as training and evaluation data. This data split can be done in Excel or any other spreadsheet editor: By creating a new column with randomly generated numbers Sorting the spreadsheet by that column Selecting 190 rows for training and 47 rows for prediction (roughly a 80/20 split) Let us name the training set LT67_training.csv and the held-out set that we will use for prediction LT67_heldout.csv, where LT67 stands for Lewis and Taylor, the creator of this dataset in 1967. Note that it is important for the distribution in age, sex, height, and weight to be similar in both subsets. We want the data on which we will make predictions to show patterns that are similar to the data on which we will train and optimize our model. Loading the data on Amazon S3 Follow these steps to load the training and held-out datasets on S3: Go to your s3 console at https://console.aws.amazon.com/s3. Create a bucket if you haven't done so already. Buckets are basically folders that are uniquely named across all S3. We created a bucket named aml.packt. Since that name has now been taken, you will have to choose another bucket name if you are following along with this demonstration. Click on the bucket name you created and upload both the LT67_training.csv and LT67_heldout.csv files by selecting Upload from the Actions drop-down menu: Both files are small, only a few KB, and hosting costs should remain negligible for that exercise. Note that for each file, by selecting the Properties tab on the right, you can specify how your files are accessed, what user, role, group or AWS service may download, read, write, and delete the files, and whether or not they should be accessible from the Open Web. When creating the datasource in Amazon ML, you will be prompted to grant Amazon ML access to your input data. You can specify the access rules to these files now in S3 or simply grant access later on. Our data is now in the cloud in an S3 bucket. We need to tell Amazon ML where to find that input data by creating a datasource. We will first create the datasource for the training file ST67_training.csv. Declaring a datasource Go to the Amazon ML dashboard, and click on Create new... | Datasource and ML model. We will use the faster flow available by default: As shown in the following screenshot, you are asked to specify the path to the LT67_training.csv file {S3://bucket}{path}{file}. Note that the S3 location field automatically populates with the bucket names and file names that are available to your user: Specifying a Datasource name is used to organize your Amazon ML assets. By clicking on Verify, Amazon ML will make sure that it has the proper rights to access the file. In case it needs to be granted access to the file, you will be prompted to do so as shown in the following screenshot: Just click on Yes to grant access. At this point, Amazon ML will validate the datasource and analyze its contents. Creating the datasource An Amazon ML datasource is composed of the following: The location of the data file: The data file is not duplicated or cloned in Amazon ML but accessed from S3 The schema that contains information on the type of the variables contained in the CSV file: Categorical Text Numeric (real-valued) Binary It is possible to supply Amazon ML with your own schema or modify the one created by Amazon ML. At this point, Amazon ML has a pretty good idea of the type of data in your training dataset. It has identified the different types of variables and knows how many rows it has: Move on to the next step by clicking on Continue, and see what schema Amazon ML has inferred from the dataset as shown in the next screenshot: Amazon ML needs to know at that point which is the variable you are trying to predict. Be sure to tell Amazon ML the following: The first line in the CSV file contains te column name The target is the weight We see here that Amazon ML has correctly inferred the following: sex is categorical age, height, and weight are numeric (continuous real values) Since we chose a numeric variable as the target Amazon ML, will use Linear Regression as the predictive model. For binary or categorical values, we would have used Logistic Regression. This means that Amazon ML will try to find the best a, b, and c coefficients so that the weight predicted by the following equation is as close as possible to the observed real weight present in the data: predicted weight = a * age + b * height + c * sex Amazon ML will then ask you if your data contains a row identifier. In our present case, it does not. Row identifiers are used when you want to understand the prediction obtained for each row or add an extra column to your dataset later on in your project. Row identifiers are for reference purposes only and are not used by the service to build the model. You will be asked to review the datasource. You can go back to each one of the previous steps and edit the parameters for the schema, the target, and the input data. Now that the data is known to Amazon ML, the next step is to set up the parameters of the algorithm that will train the model. The machine learning model We select the default parameters for the training and evaluation settings. Amazon ML will do the following: Create a step for data transformation based on the statistical properties it has inferred from the dataset Split the dataset (ST67_training.csv) into a training part and a validation part, with a 70/30 split. The split strategy assumes the data has already been shuffled and can be split sequentially. The step will be used to transform the data in a similar way for the training and the validation datasets. The only transformation suggested by Amazon ML is to transform the categorical variable sex into a binary variable, where m = 0 and f = 1 for instance. No other transformation is needed. The default advanced settings for the model are shown in the following screenshot: We see that Amazon ML will pass over the data 10 times, shuffle splitting the data each time. It will use an L2 regularization strategy based on the sum of the square of the coefficients of the regression to prevent overfitting. We will evaluate the predictive power of the model using our LT67_heldout.csv dataset later on. Regularization comes in 3 levels with a mild (10^-6), medium (10^-4), or aggressive (10^-02) setting, each value stronger than the previous one. The default setting is mild, the lowest, with a regularization constant of 0.00001 (10^-6) implying that Amazon ML does not anticipate much overfitting on this dataset. This makes sense when the number of predictors, three in our case, is much smaller than the number of samples (190 for the training set). Clicking on the Create ML model button will launch the model creation. This takes a few minutes to resolve, depending on the size and complexity of your dataset. You can check its status by refreshing the model page. In the meantime, the model status remains pending. At that point, Amazon ML will split our training dataset into two subsets: a training and a validation set. It will use the training portion of the data to train several settings of the algorithm and select the best one based on its performance on the training data. It will then apply the associated model to the validation set and return an evaluation score for that model. By default, Amazon ML will sequentially take the first 70% of the samples for training and the remaining 30% for validation. It's worth noting that Amazon ML will not create two extra files and store them on S3, but instead create two new datasources out of the initial datasource we have previously defined. Each new datasource is obtained from the original one via a Data rearrangement JSON recipe such as the following: { "splitting": { "percentBegin": 0, "percentEnd": 70 } } You can see these two new datasources in the Datasource dashboard. Three datasources are now available where there was initially only one, as shown by the following screenshot: While the model is being trained, Amazon ML runs the Stochastic Gradient algorithm several times on the training data with different parameters: Varying the learning rate in increments of powers of 10: 0.01, 0.1, 1, 10, and 100. Making several passes over the training data while shuffling the samples before each path. At each pass, calculating the prediction error, the Root Mean Squared Error (RMSE), to estimate how much of an improvement over the last pass was obtained. If the decrease in RMSE is not really significant, the algorithm is considered to have converged, and no further pass shall be made. At the end of the passes, the setting that ends up with the lowest RMSE wins, and the associated model (the weights of the regression) is selected as the best version. Once the model has finished training, Amazon ML evaluates its performance on the validation datasource. Once the evaluation itself is also ready, you have access to the model's evaluation. The Amazon ML flow is smooth and facilitates the inherent data science loop: data, model, evaluation, and prediction. We looked at an overview of the workflow of a simple Amazon Machine Learning (Amazon ML) project. We discussed two objects of the Amazon ML menu: Datasource and ML model. If you found this post useful, be sure to check out the book 'Effective Amazon Machine Learning' to learn about evaluation and prediction in Amazon ML along with other AWS ML concepts. Integrate applications with AWS services: Amazon DynamoDB & Amazon Kinesis [Tutorial] AWS makes Amazon Rekognition, its image recognition AI, available for Asia-Pacific developers
Read more
  • 0
  • 0
  • 3573

article-image-setting-up-jasmine-for-unit-testing-in-angular
Natasha Mathur
24 Aug 2018
8 min read
Save for later

Setting up Jasmine for Unit Testing in Angular [Tutorial]

Natasha Mathur
24 Aug 2018
8 min read
Web developers work hard to build a working application that they can be proud of. But how can we ensure a painless maintainability in the future? A comprehensive automated testing layer will become our lifeline once our application begins to scale up and we have to mitigate the impact of bugs caused by new functionalities colliding with the already existing ones. In this article, we will see why we need to perform unit testing and the process of setting up Jasmine for our tests. This tutorial is an excerpt taken from the book 'Learning Angular - second edition' written by Christoffer Noring, Pablo Deeleman.  Why do we need unit tests? What is a unit test? Unit tests are part of an engineering philosophy that takes a stand for efficient and agile development processes, by adding an additional layer of automated testing to the code, before it is developed. The core concept is that each piece of code is delivered with its own test, and both pieces of code are built by the developer who is working on that code. First, we design the test against the module we want to deliver, checking the accuracy of its output and behavior. Since the module is still not implemented, the test will fail. Hence, our job is to build the module in such a way that it passes its own test. Unit testing is quite controversial. While there is a common agreement about how beneficial test-driven development for ensuring code quality and maintenance over time is, not everybody undertakes unit testing in their daily practice. Why is that? Well, building tests while we develop our code can feel like a burden sometimes, particularly when the test winds up being bigger in size than the piece of functionality it aims to test. However, the arguments favoring testing outnumber the arguments against it: Building tests contribute to better code design. Our code must conform to the test requirements and not the other way around. In that sense, if we try to test an existing piece of code and we find ourselves blocked at some point, chances are that the piece of code we aim to test is not well designed and shows off a convoluted interface that requires some rethinking. On the other hand, building testable modules can help with early detection of side effects on other modules. Refactoring tested code is the lifeline against introducing bugs in later stages. Any development is meant to evolve with time, and on every refactor the risk of introducing a bug, that will only pop up in another part of our application, is high. Unit tests are a good way to ensure that we catch bugs at an early stage, either when introducing new features or when updating existing ones. Building tests is a good way to document our code APIs and functionalities. And this becomes a priceless resource when someone not acquainted with the code base takes over the development endeavor. These are only a few arguments, but you can find countless resources on the web about the benefits of testing your code. If you do not feel convinced yet, give it a try. Otherwise, let's continue with our journey and see the overall form of a test. Unit testing in Jasmine There are many different ways to test a piece of code, but in this article, we will look at the anatomy of a test, what it is made up of. The first thing we need, for testing any code, is a test framework. The test framework should provide utility functions for building test suites, containing one or several test specs each. So what are these concepts? Test suite: A suite creates a logical grouping for a bunch of tests. A suite can, for example, be all the tests for a product page. Test spec: This is another name for a unit test. The following shows what a test file can look like where we are using a test suite and placing a number of related tests inside. The chosen framework for this is Jasmine. In Jasmine, the describe() function helps us to define a test suite. The describe() method takes a name as the first parameter and a function as the second parameter. Inside of the describe() function are a number of invocations to the it() method. The it() function is our unit test; it takes the name of the test as the first parameter and a function as the second parameter: // Test suite describe('A math library', () => { // Test spec it('add(1,1,) should return 2', () => { // Test spec implementation goes here }); }); Each test spec checks out a specific functionality of the feature described in the suite description argument and declares one or several expectations in its body. Each expectation takes a value, which we call the expected value, and is compared against an actual value by means of a matcher function, which checks whether expected and actual values match accordingly. This is what we call an assertion, and the test framework will pass or fail the spec depending on the result of such assertions. The code is as follows: // Test suite describe('A math library', () => { // Test spec it('add(1,1) should return 2', () => { // Test assertion expect(add(1,1,)).toBe(2); }); it('subtract(2,1)', () =>{ //Test assertion expect(subtract(2,1)).toBe(1); }) }); In the previous example, add(1,1) will return the actual value that is supposed to match the expected value declared in the toBe() matcher function. Worth noting from the previous example is the addition of a second test that tests our subtract() function. We can clearly see that this test deals with yet another mathematical operation, thus it makes sense to group both these tests under one suite. So far, we have learned about test suites and how to group tests according to their function. Furthermore, we have learned about invoking the code you want to test and asserting that it does what you think it does. There are, however, more concepts to a unit test worth knowing about, namely setup and teardown functionality. A setup functionality is something that sets up your code before the test is run usually. It's a way to keep your code cleaner so you can focus on just invoking the code and asserting. A tear-down functionality is the opposite of a setup functionality and is dedicated to tearing down what you set up initially; essentially it's a way to clean up after the test. Let's see how this can look in practice with a code example, using the Jasmine framework. In Jasmine, the beforeEach() method is used for setup functionality; it runs before every unit test. The afterEach() method is used to run tear-down logic. The code is as follows: describe('a Product service', () => { let productService; beforeEach(() => { productService = new ProductService(); }); it('should return data', () => { let actual = productService.getData(); assert(actual.length).toBe(1); }); afterEach(() => { productService = null; }); }); We can see in the preceding code how the beforeEach() function is responsible for instantiating the productService, which means the test only has to care about invoking production code and asserting the outcome. This makes the test look cleaner. It should be said, though, in reality, tests tend to have a lot of setup going on and having a beforeEach() function can really make the tests look cleaner; above all, it tends to make it easier to add new tests, which is great. What you want at the end of the day is well-tested code; the easier it is to write and maintain such code, the better for your software. Testing web applications in general and Angular applications, in particular, poses a myriad of scenarios that usually need a specific approach. Remember that if a specific test requires a cumbersome and convoluted solution, we are probably facing a good case for a module redesign instead. There are several paths to compound our knowledge of web application testing in Angular and enable us to become great testing ninjas.  In this we saw the importance of unit testing in our Angular applications, the basic shape of a unit test, and the process of setting up Jasmine for our tests. If you found this post useful, do check out the book 'Learning Angular - Second Edition'  to learn more about unit testing and how to implement it for routes, inputs, outputs, directives, etc. Everything new in Angular 6: Angular Elements, CLI commands and more Angular 6 is here packed with exciting new features! Getting started with Angular CLI and build your first Angular Component
Read more
  • 0
  • 0
  • 14431
article-image-decorators-as-higher-order-functions-python
Aaron Lazar
24 Aug 2018
7 min read
Save for later

Using Decorators as higher-order functions in Python [Tutorial]

Aaron Lazar
24 Aug 2018
7 min read
The core idea of a decorator is to transform some original function into another form. A decorator creates a kind of composite function based on the decorator and the original function being decorated. In this tutorial, we'll understand how decorators can be used as higher-order functions in Python. This article is an extract from the 2nd edition of the bestseller, Functional Python Programming, authored by Steven Lott.  Working with Decorator function A decorator function can be used in one of the two following ways: As a prefix that creates a new function with the same name as the base function as follows: @decorator def original_function(): pass As an explicit operation that returns a new function, possibly with a new name: def original_function(): pass original_function = decorator(original_function) These are two different syntaxes for the same operation. The prefix notation has the advantages of being tidy and succinct. The prefix location is more visible to some readers. The suffix notation is explicit and slightly more flexible. While the prefix notation is common, there is one reason for using the suffix notation: we may not want the resulting function to replace the original function. We mayt want to execute the following command that allows us to use both the decorated and the undecorated functions: new_function = decorator(original_function) This will build a new function, named new_function(), from the original function. Python functions are first-class objects. When using the @decorator syntax, the original function is no longer available for use. A decorator is a function that accepts a function as an argument and returns a function as the result. This basic description is clearly a built-in feature of the language. The open question then is how do we update or adjust the internal code structure of a function? The answer is we don't. Rather than messing about with the inside of the code, it's much cleaner to define a new function that wraps the original function. It's easier to process the argument values or the result and leave the original function's core processing alone. We have two phases of higher-order functions involved in defining a decorator; they are as follows: At definition time, a decorator function applies a wrapper to a base function and returns the new, wrapped function. The decoration process can do some one-time-only evaluation as part of building the decorated function. Complex default values can be computed, for example. At evaluation time, the wrapping function can (and usually does) evaluate the base function. The wrapping function can pre-process the argument values or post-process the return value (or both). It's also possible that the wrapping function may avoid calling the base function. In the case of managing a cache, for example, the primary reason for wrapping is to avoid expensive calls to the base function. Simple Decorator function Here's an example of a simple decorator: from functools import wraps from typing import Callable, Optional, Any, TypeVar, cast FuncType = Callable[..., Any] F = TypeVar('F', bound=FuncType) def nullable(function: F) -> F: @wraps(function) def null_wrapper(arg: Optional[Any]) -> Optional[Any]: return None if arg is None else function(arg) return cast(F, null_wrapper) We almost always want to use the functools.wraps() function to assure that the decorated function retains the attributes of the original function. Copying the __name__, and __doc__ attributes, for example, assures that the resulting decorated function has the name and docstring of the original function. The resulting composite function, defined as the null_wrapper() function in the definition of the decorator, is also a type of higher-order function that combines the original function, the function() callable object, in an expression that preserves the None values. Within the resulting null_wrapper() function, the original function callable object is not an explicit argument; it is a free variable that will get its value from the context in which the null_wrapper() function is defined. The decorator function's return value is the newly minted function. It will be assigned to the original function's name. It's important that decorators only return functions and that they don't attempt to process data. Decorators use meta-programming: a code that creates a code. The resulting null_wrapper() function, however, will be used to process the real data. Note that the type hints use a feature of a TypeVar to assure that the result of applying the decorator will be a an object that's a type of Callable. The type variable F is bound to the original function's type; the decorator's type hint claims that the resulting function should have the same type as the argument function. A very general decorator will apply to a wide variety of functions, requiring a type variable binding. Creating composite function We can apply our @nullable decorator to create a composite function as follows: @nullable def nlog(x: Optional[float]) -> Optional[float]: return math.log(x) This will create a function, nlog(), which is a null-aware version of the built-in math.log() function. The decoration process returned a version of the null_wrapper() function that invokes the original nlog(). This result will be named nlog(), and will have the composite behavior of the wrapping and the original wrapped function. We can use this composite nlog() function as follows: >>> some_data = [10, 100, None, 50, 60] >>> scaled = map(nlog, some_data) >>> list(scaled) [2.302585092994046, 4.605170185988092, None, 3.912023005428146, 4.0943445622221] We've applied the function to a collection of data values. The None value politely leads to a None result. There was no exception processing involved. This type of example isn't really suitable for unit testing. We'll need to round the values for testing purposes. For this, we'll need a null-aware round() function too. Here's how we can create a null-aware rounding function using decorator notation: @nullable def nround4(x: Optional[float]) -> Optional[float]: return round(x, 4) This function is a partial application of the round() function, wrapped to be null-aware. In some respects, this is a relatively sophisticated bit of functional programming that's readily available to Python programmers. The typing module makes it particularly easy to describe the types of null-aware function and null-aware result, using the Optional type definition. The definition Optional[float] means Union[None, float]; either a None object or a float object may be used. We could also create the null-aware rounding function using the following code: nround4 = nullable(lambda x: round(x, 4)) Note that we didn't use the decorator in front of a function definition. Instead, we applied the decorator to a function defined as a lambda form.  This has the same effect as a decorator in front of a function definition. We can use this round4() function to create a better test case for our nlog() function as follows: >>> some_data = [10, 100, None, 50, 60] >>> scaled = map(nlog, some_data) >>> [nround4(v) for v in scaled] [2.3026, 4.6052, None, 3.912, 4.0943] This result will be independent of any platform considerations. It's very handy for doctest testing. It can be challenging to apply type hints to lambda forms. The following code shows what is required: nround4l: Callable[[Optional[float]], Optional[float]] = ( nullable(lambda x: round(x, 4)) ) The variable nround4l is given a type hint of Callable with an argument list of [Optional[float]] and a return type of Optional[float]. The use of the Callable hint is appropriate only for positional arguments. In cases where there will be keyword arguments or other complexities, see http://mypy.readthedocs.io/en/latest/kinds_of_types.html#extended-callable-types. The @nullable decorator makes an assumption that the decorated function is unary. We would need to revisit this design to create a more general-purpose null-aware decorator that works with arbitrary collections of arguments. If you found this tutorial useful and interested to learn more such techniques, grab the Steven Lott's bestseller, Functional Python Programming. Putting the Fun in Functional Python Why functional programming in Python matters: Interview with best selling author, Steven Lott Expert Python Programming: Interfaces
Read more
  • 0
  • 0
  • 9181

article-image-creating-macros-in-rust-tutorial
Aaron Lazar
23 Aug 2018
14 min read
Save for later

Creating Macros in Rust [Tutorial]

Aaron Lazar
23 Aug 2018
14 min read
Since Rust 1.0 has a great macro system, it allows us to apply some code to multiple types or expressions, as they work by expanding themselves at compile time. This means that when you use a macro, you are effectively writing a lot of code before the actual compilation starts. This has two main benefits, first, the codebase can be easier to maintain by being smaller and reusing code. Second, since macros expand before starting the creation of object code, you can abstract at the syntactic level. In this article, we'll learn how to create our very own macros in Rust. This Rust tutorial is an extract from Rust High Performance, authored by Iban Eguia Moraza. For example, you can have a function like this one: fn add_one(input: u32) -> u32 { input + 1 } This function restricts the input to u32 types and the return type to u32. We could add some more accepted types by using generics, which may accept &u32 if we use the Add trait. Macros allow us to create this kind of code for any element that can be written to the left of the + sign and it will be expanded differently for each type of element, creating a different code for each case. To create a macro, you will need to use a macro built into the language, the macro_rules!{} macro. This macro receives the name of the new macro as a first parameter and a block with the macro code as a second element. The syntax can be a bit complex the first time you see it, but it can be learned quickly. Let's start with a macro that does just the same as the function we saw before: macro_rules! add_one { ($input:expr) => { $input + 1 } } You can now call that macro from your main() function by calling add_one!(integer);. Note that the macro needs to be defined before the first call, even if it's in the same file. It will work with any integer, which wasn't possible with functions. Let's analyze how the syntax works. In the block after the name of the new macro (add_one), we can see two sections. In the first part, on the left of the =>, we see $input:expr inside parentheses. Then, to the right, we see a Rust block where we do the actual addition. The left part works similarly (in some ways) to a pattern match. You can add any combination of characters and then some variables, all of them starting with a dollar sign ($) and showing the type of variable after a colon. In this case, the only variable is the $input variable and it's an expression. This means that you can insert any kind of expression there and it will be written in the code to the right, substituting the variable with the expression. Creating Macro variants As you can see, it's not as complicated as you might think. As I wrote, you can have almost any pattern to the left of the macro_rules!{} side. Not only that, you can also have multiple patterns, as if it were a match statement, so that if one of them matches, it will be the one expanded. Let's see how this works by creating a macro which, depending on how we call it, will add one or two to the given integer: macro_rules! add { {one to $input:expr} => ($input + 1); {two to $input:expr} => ($input + 2); } fn main() { println!("Add one: {}", add!(one to 25/5)); println!("Add two: {}", add!(two to 25/5)); } You can see a couple of clear changes to the macro. First, we swapped braces for parentheses and parentheses for braces in the macro. This is because in a macro, you can use interchangeable braces ({ and }), square brackets ([ and ]), and parentheses (( and )). Not only that, you can use them when calling the macro. You have probably already used the vec![] macro and the format!() macro, and we saw the lazy_static!{} macro in the last chapter. We use brackets and parentheses here just for convention, but we could call the vec!{} or the format![] macros the same way, because we can use braces, brackets, and parentheses in any macro call. The second change was to add some extra text to our left-hand side patterns. We now call our macro by writing the text one to or two to, so I also removed the one redundancy to the macro name and called it add!(). This means that we now call our macro with literal text. That is not valid Rust, but since we are using a macro, we modify the code we are writing before the compiler tries to understand actual Rust code and the generated code is valid. We could add any text that does not end the pattern (such as parentheses or braces) to the pattern. The final change was to add a second possible pattern. We can now add one or two and the only difference will be that the right side of the macro definition must now end with a trailing semicolon for each pattern (the last one is optional) to separate each of the options. A small detail that I also added in the example was when calling the macro in the main() function. As you can see, I could have added one or two to 5, but I wrote 25/5 for a reason. When compiling this code, this will be expanded to 25/5 + 1 (or 2, if you use the second variant). This will later be optimized at compile time, since it will know that 25/5 + 1 is 6, but the compiler will receive that expression, not the final result. The macro system will not calculate the result of the expression; it will simply copy in the resulting code whatever you give to it and then pass it to the next compiler phase. You should be especially careful with this when a macro you are creating calls another macro. They will get expanded recursively, one inside the other, so the compiler will receive a bunch of final Rust code that will need to be optimized. Issues related to this were found in the CLAP crate that we saw in the last chapter, since the exponential expansions were adding a lot of bloat code to their executables. Once they found out that there were too many macro expansions inside the other macros and fixed it, they reduced the size of their binary contributions by more than 50%. Macros allow for an extra layer of customization. You can repeat arguments more than once. This is common, for example, in the vec![] macro, where you create a new vector with information at compile time. You can write something like vec![3, 4, 76, 87];. How does the vec![] macro handle an unspecified number of arguments? Creating Complex macros We can specify that we want multiple expressions in the left-hand side pattern of the macro definition by adding a * for zero or more matches or a + for one or more matches. Let's see how we can do that with a simplified my_vec![] macro: macro_rules! my_vec { ($($x: expr),*) => {{ let mut vector = Vec::new(); $(vector.push($x);)* vector }} } Let's see what is happening here. First, we see that on the left side, we have two variables, denoted by the two $ signs. The first makes reference to the actual repetition. Each comma-separated expression will generate a $x variable. Then, on the right side, we use the various repetitions to push $x to the vector once for every expression we receive. There is another new thing on the right-hand side. As you can see, the macro expansion starts and ends with a double brace instead of using only one. This is because, once the macro gets expanded, it will substitute the given expression for a new expression: the one that gets generated. Since what we want is to return the vector we are creating, we need a new scope where the last sentence will be the value of the scope once it gets executed. You will be able to see it more clearly in the next code snippet. We can call this code with the main() function: fn main() { let my_vector = my_vec![4, 8, 15, 16, 23, 42]; println!("Vector test: {:?}", my_vector); } It will be expanded to this code: fn main() { let my_vector = { let mut vector = Vec::new(); vector.push(4); vector.push(8); vector.push(15); vector.push(16); vector.push(23); vector.push(42); vector }; println!("Vector test: {:?}", my_vector); } As you can see, we need those extra braces to create the scope that will return the vector so that it gets assigned to the my_vector binding. You can have multiple repetition patterns on the left expression and they will be repeated for every use, as needed on the right. macro_rules! add_to_vec { ($( $x:expr; [ $( $y:expr ),* ]);* ) => { &[ $($( $x + $y ),*),* ] } } In this example, the macro can receive one or more $x; [$y1, $y2,...] input. So, for each input, it will have one expression, then a semicolon, then a bracket with multiple sub-expressions separated by a comma, and finally, another bracket and a semicolon. But what does the macro do with this input? Let's check to the right-hand side of it. As you can see, this will create multiple repetitions. We can see that it creates a slice (&[T]) of whatever we feed to it, so all the expressions we use must be of the same type. Then, it will start iterating over all $x variables, one per input group. So if we feed it only one input, it will iterate once for the expression to the left of the semicolon. Then, it will iterate once for every $y expression associated with the $x expression, add them to the + operator, and include the result in the slice. If this was too complex to understand, let's look at an example. Let's suppose we call the macro with 65; [22, 34] as input. In this case, 65 will be $x, and 22, 24, and so on will be $y variables associated with 65. So, the result will be a slice like this: &[65+22, 65+34]. Or, if we calculate the results: &[87, 99]. If, on the other hand, we give two groups of variables by using 65; [22, 34]; 23; [56, 35] as input, in the first iteration, $x will be 65, while in the second one, it will be 23. The $y variables of 64 will be 22 and 34, as before, and the ones associated with 23 will be 56 and 35. This means that the final slice will be &[87, 99, 79, 58], where 87 and 99 work the same way as before and 79 and 58 are the extension of adding 23 to 56 and 23 to 35. This gives you much more flexibility than the functions, but remember, all this will be expanded during compile time, which can make your compilation time much slower and the final codebase larger and slower still if the macro used duplicates too much code. In any case, there is more flexibility to it yet. So far, all variables have been of the expr kind. We have used this by declaring $x:expr and $y:expr but, as you can imagine, there are other kinds of macro variables. The list follows: expr: Expressions that you can write after an = sign, such as 76+4 or if a==1 {"something"} else {"other thing"}. ident: An identifier or binding name, such as foo or bar. path: A qualified path. This will be a path that you could write in a use sentence, such as foo::bar::MyStruct or foo::bar::my_func. ty: A type, such as u64 or MyStruct. It can also be a path to the type. pat: A pattern that you can write at the left side of an = sign or in a match expression, such as Some(t) or (a, b, _). stmt: A full statement, such as a let binding like let a = 43;. block: A block element that can have multiple statements and a possible expression between braces, such as {vec.push(33); vec.len()}. item: What Rust calls items. For example, function or type declarations, complete modules, or trait definitions. meta: A meta element, which you can write inside of an attribute (#[]). For example, cfg(feature = "foo"). tt: Any token tree that will eventually get parsed by a macro pattern, which means almost anything. This is useful for creating recursive macros, for example. As you can imagine, some of these kinds of macro variables overlap and some of them are just more specific than the others. The use will be verified on the right-hand side of the macro, in the expansion, since you might try to use a statement where an expression must be used, even though you might use an identifier too, for example. There are some extra rules, too, as we can see in the Rust documentation (https://doc.rust-lang.org/book/first-edition/macros.html#syntactic-requirements). Statements and expressions can only be followed by =>, a comma, or a semicolon. Types and paths can only be followed by =>, the as or where keywords, or any commas, =, |, ;, :, >, [, or {. And finally, patterns can only be followed by =>, the if or in keywords, or any commas, =, or |. Let's put this in practice by implementing a small Mul trait for a currency type we can create. This is an adapted example of some work we did when creating the Fractal Credits digital currency. In this case, we will look to the implementation of the Amount type (https://github.com/FractalGlobal/utils-rs/blob/49955ead9eef2d9373cc9386b90ac02b4d5745b4/src/amount.rs#L99-L102), which represents a currency amount. Let's start with the basic type definition: #[derive(Copy, Clone, PartialEq, Eq, PartialOrd, Ord)] pub struct Amount { value: u64, } This amount will be divisible by up to three decimals, but it will always be an exact value. We should be able to add an Amount to the current Amount, or to subtract it. I will not explain these trivial implementations, but there is one implementation where macros can be of great help. We should be able to multiply the amount by any positive integer, so we should implement the Mul trait for u8, u16, u32, and u64 types. Not only that, we should be able to implement the Div and the Rem traits, but I will leave those out, since they are a little bit more complex. You can check them in the implementation linked earlier. The only thing the multiplication of an Amount with an integer should do is to multiply the value by the integer given. Let's see a simple implementation for u8: use std::ops::Mul; impl Mul<u8> for Amount { type Output = Self; fn mul(self, rhs: u8) -> Self::Output { Self { value: self.value * rhs as u64 } } } impl Mul<Amount> for u8 { type Output = Amount; fn mul(self, rhs: Amount) -> Self::Output { Self::Output { value: self as u64 * rhs.value } } } As you can see, I implemented it both ways so that you can put the Amount to the left and to the right of the multiplication. If we had to do this for all integers, it would be a big waste of time and code. And if we had to modify one of the implementations (especially for Rem functions), it would be troublesome to do it in multiple code points. Let's use macros to help us. We can define a macro, impl_mul_int!{}, which will receive a list of integer types and then implement the Mul trait back and forward between all of them and the Amount type. Let's see: macro_rules! impl_mul_int { ($($t:ty)*) => ($( impl Mul<$t> for Amount { type Output = Self; fn mul(self, rhs: $t) -> Self::Output { Self { value: self.value * rhs as u64 } } } impl Mul<Amount> for $t { type Output = Amount; fn mul(self, rhs: Amount) -> Self::Output { Self::Output { value: self as u64 * rhs.value } } } )*) } impl_mul_int! { u8 u16 u32 u64 usize } As you can see, we specifically ask for the given elements to be types and then we implement the trait for all of them. So, for any code that you want to implement for multiple types, you might as well try this approach, since it will save you from writing a lot of code and it will make it more maintainable. If you found this article useful and would like to learn more such tips, head on over to pick up the book, Rust High Performance, authored by Iban Eguia Moraza. Perform Advanced Programming with Rust Rust 1.28 is here with global allocators, nonZero types and more Eclipse IDE's Photon release will support Rust
Read more
  • 0
  • 0
  • 35336

article-image-train-convolutional-neural-network-in-keras-improve-with-data-augmentation
Amey Varangaonkar
23 Aug 2018
10 min read
Save for later

Train a convolutional neural network in Keras and improve it with data augmentation [Tutorial]

Amey Varangaonkar
23 Aug 2018
10 min read
In this article, we will see how convolutional layers work and how to use them. We will also see how you can build your own convolutional neural network in Keras to build better, more powerful deep neural networks and solve computer vision problems. We will also see how we can improve this network using data augmentation. For a better understanding of the concepts, we will be taking a well-known dataset CIFAR-10. This dataset was created by Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. The following article has been taken from the book Deep Learning Quick Reference, written by Mike Bernico.  Adding inputs to the network The CIFAR-10 dataset is made up of 60,000 32 x 32 color images that belong to 10 classes, with 6,000 images per class. We'll be using 50,000 images as a training set, 5,000 images as a validation set, and 5,000 images as a test set. The input tensor layer for the convolutional neural network will be (N, 32, 32, 3), which we will pass to the build_network function. The following code is used to build the network: def build_network(num_gpu=1, input_shape=None): inputs = Input(shape=input_shape, name="input") Getting the output The output of this model will be a class prediction, from 0-9. We will use a 10-node softmax.  We will use the following code to define the output: output = Dense(10, activation="softmax", name="softmax")(d2) Cost function and metrics Earlier, we used categorical cross-entropy as the loss function for a multi-class classifier.  This is just another multiclass classifier and we can continue using categorical cross-entropy as our loss function, and accuracy as a metric. We've moved on to using images as input, but luckily our cost function and metrics remain unchanged. Working with convolutional layers We're going to use two convolutional layers, with batch normalization, and max pooling. This is going to require us to make quite a few choices, which of course we could choose to search as hyperparameters later. It's always better to get something working first though. As the popular computer scientist and mathematician Donald Knuth would say, premature optimization is the root of all evil. We will use the following code snippet to define the two convolutional blocks: # convolutional block 1 conv1 = Conv2D(64, kernel_size=(3,3), activation="relu", name="conv_1")(inputs) batch1 = BatchNormalization(name="batch_norm_1")(conv1) pool1 = MaxPooling2D(pool_size=(2, 2), name="pool_1")(batch1) # convolutional block 2 conv2 = Conv2D(32, kernel_size=(3,3), activation="relu", name="conv_2")(pool1) batch2 = BatchNormalization(name="batch_norm_2")(conv2) pool2 = MaxPooling2D(pool_size=(2, 2), name="pool_2")(batch2) So, clearly, we have two convolutional blocks here, that consist of a convolutional layer, a batch normalization layer, and a pooling layer. In the first block, I'm using 64 3 x 3 filters with relu activations. I'm using valid (no) padding and a stride of 1. Batch normalization doesn't require any parameters and it isn't really trainable. The pooling layer is using 2 x 2 pooling windows, valid padding, and a stride of 2 (the dimension of the window). The second block is very much the same; however, I'm halving the number of filters to 32. While there are many knobs we could turn in this architecture, the one I would tune first is the kernel size of the convolutions. Kernel size tends to be an important choice. In fact, some modern neural network architectures such as Google's inception, allow us to use multiple filter sizes in the same convolutional layer. Getting the fully connected layers After two rounds of convolution and pooling, our tensors have gotten relatively small and deep. After pool_2, the output dimension is (n, 6, 6, 32). We have, in these convolutional layers, hopefully extracted relevant image features that this 6 x 6 x 32 tensor represents. To classify images, using these features, we will connect this tensor to a few fully connected layers, before we go to our final output layer. In this example, I'll use a 512-neuron fully connected layer, a 256-neuron fully connected layer, and finally, the 10-neuron output layer. I'll also be using dropout to help prevent overfitting, but only a very little bit! The code for this process is given as follows for your reference: from keras.layers import Flatten, Dense, Dropout # fully connected layers flatten = Flatten()(pool2) fc1 = Dense(512, activation="relu", name="fc1")(flatten) d1 = Dropout(rate=0.2, name="dropout1")(fc1) fc2 = Dense(256, activation="relu", name="fc2")(d1) d2 = Dropout(rate=0.2, name="dropout2")(fc2) I haven't previously mentioned the flatten layer above. The flatten layer does exactly what its name suggests. It flattens the n x 6 x 6 x 32 tensor into an n x 1152 vector. This will serve as an input to the fully connected layers. Working with multi-GPU models in Keras Many cloud computing platforms can provision instances that include multiple GPUs. As our models grow in size and complexity you might want to be able to parallelize the workload across multiple GPUs. This can be a somewhat involved process in native TensorFlow, but in Keras, it's just a function call. Build your model, as normal, as shown in the following code: model = Model(inputs=inputs, outputs=output) Then, we just pass that model to keras.utils.multi_gpu_model, with the help of the following code: model = multi_gpu_model(model, num_gpu) In this example, num_gpu is the number of GPUs we want to use. Training the model Putting the model together, and incorporating our new cool multi-GPU feature, we come up with the following architecture: def build_network(num_gpu=1, input_shape=None): inputs = Input(shape=input_shape, name="input") # convolutional block 1 conv1 = Conv2D(64, kernel_size=(3,3), activation="relu", name="conv_1")(inputs) batch1 = BatchNormalization(name="batch_norm_1")(conv1) pool1 = MaxPooling2D(pool_size=(2, 2), name="pool_1")(batch1) # convolutional block 2 conv2 = Conv2D(32, kernel_size=(3,3), activation="relu", name="conv_2")(pool1) batch2 = BatchNormalization(name="batch_norm_2")(conv2) pool2 = MaxPooling2D(pool_size=(2, 2), name="pool_2")(batch2) # fully connected layers flatten = Flatten()(pool2) fc1 = Dense(512, activation="relu", name="fc1")(flatten) d1 = Dropout(rate=0.2, name="dropout1")(fc1) fc2 = Dense(256, activation="relu", name="fc2")(d1) d2 = Dropout(rate=0.2, name="dropout2")(fc2) # output layer output = Dense(10, activation="softmax", name="softmax")(d2) # finalize and compile model = Model(inputs=inputs, outputs=output) if num_gpu > 1: model = multi_gpu_model(model, num_gpu) model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=["accuracy"]) return model We can use this to build our model: model = build_network(num_gpu=1, input_shape=(IMG_HEIGHT, IMG_WIDTH, CHANNELS)) And then we can fit it, as you'd expect: model.fit(x=data["train_X"], y=data["train_y"], batch_size=32, epochs=200, validation_data=(data["val_X"], data["val_y"]), verbose=1, callbacks=callbacks) As we train this model, you will notice that overfitting is an immediate concern. Even with a relatively modest two convolutional layers, we're already overfitting a bit. You can see the effects of overfitting from the following graphs: It's no surprise, 50,000 observations is not a lot of data, especially for a computer vision problem. In practice, computer vision problems benefit from very large datasets. In fact, Chen Sun showed that additional data tends to help computer vision models linearly with the log of the data volume in https://arxiv.org/abs/1707.02968. Unfortunately, we can't really go find more data in this case. But maybe we can make some. Let's talk about data augmentation next. Using data augmentation Data augmentation is a technique where we apply transformations to an image and use both the original image and the transformed images to train on. Imagine we had a training set with a cat in it: If we were to apply a horizontal flip to this image, we'd get something that looks like this: This is exactly the same image, of course, but we can use both the original and transformation as training examples. This isn't quite as good as two separate cats in our training set; however, it does allow us to teach the computer that a cat is a cat regardless of the direction it's facing. In practice, we can do a lot more than just a horizontal flip. We can vertically flip, when it makes sense, shift, and randomly rotate images as well. This allows us to artificially amplify our dataset and make it seem bigger than it is. Of course, you can only push this so far, but it's a very powerful tool in the fight against overfitting when little data exists. What is the Keras ImageDataGenerator? Not so long ago, the only way to do image augmentation was to code up the transforms and apply them randomly to the training set, saving the transformed images to disk as we went (uphill, both ways, in the snow). Luckily for us, Keras now provides an ImageDataGenerator class that can apply transformations on the fly as we train, without having to hand code the transformations. We can create a data generator object from ImageDataGenerator by instantiating it like this: def create_datagen(train_X): data_generator = ImageDataGenerator( rotation_range=20, width_shift_range=0.02, height_shift_range=0.02, horizontal_flip=True) data_generator.fit(train_X) return data_generator In this example, I'm using both shifts, rotation, and horizontal flips. I'm using only very small shifts. Through experimentation, I found that larger shifts were too much and my network wasn't actually able to learn anything. Your experience will vary as your problem does, but I would expect larger images to be more tolerant of shifting. In this case, we're using 32 pixel images, which are quite small. Training with a generator If you haven't used a generator before, it works like an iterator. Every time you call the ImageDataGenerator .flow() method, it will produce a new training minibatch, with random transformations applied to the images it was fed. The Keras Model class comes with a .fit_generator() method that allows us to fit with a generator rather than a given dataset: model.fit_generator(data_generator.flow(data["train_X"], data["train_y"], batch_size=32), steps_per_epoch=len(data["train_X"]) // 32, epochs=200, validation_data=(data["val_X"], data["val_y"]), verbose=1, callbacks=callbacks) Here, we've replaced the traditional x and y parameters with the generator. Most importantly, notice the steps_per_epoch parameter. You can sample with replacement any number of times from the training set, and you can apply random transformations each time. This means that we can use more mini batches each epoch than we have data. Here, I'm going to only sample as many batches as I have observations, but that isn't required. We can and should push this number higher if we can. Before we wrap things up, let's look at how beneficial image augmentation is in this case: As you can see, just a little bit of image augmentation really helped us out. Not only is our overall accuracy higher, but our network is overfitting much slower. If you have a computer vision problem with just a little bit of data, image augmentation is something you'll want to do. We saw the benefits and ease of training a convolutional neural network from scratch using Keras and then improving that network using data augmentation. If you found the above article to be useful, make sure you check out the book Deep Learning Quick Reference for more information on modeling and training various different types of deep neural networks with ease and efficiency. Top 5 Deep Learning Architectures CapsNet: Are Capsule networks the antidote for CNNs kryptonite? What is a CNN?
Read more
  • 0
  • 0
  • 23624
article-image-shared-pointers-in-rust-challenges-solutions
Aaron Lazar
22 Aug 2018
8 min read
Save for later

Working with Shared pointers in Rust: Challenges and Solutions [Tutorial]

Aaron Lazar
22 Aug 2018
8 min read
One of Rust's most criticized problem is that it's difficult to develop an application with shared pointers. It's true that due to Rust's memory safety guarantees, it might be difficult to develop those kind of algorithms, but as we will see now, the standard library gives us types we can use to safely allow that behavior. In this article, we'll understand how to overcome the issue of shared pointers in Rust to increase efficiency. This article is an extract from Rust High Performance, authored by Iban Eguia Moraza. Overcoming issue with cell module The standard Rust library has one interesting module, the std::cell module, that allows us to use objects with interior mutability. This means that we can have an immutable object and still mutate it by getting a mutable borrow to the underlying data. This, of course, would not comply with the mutability rules we saw before, but the cells make sure this works by checking the borrows at runtime or by doing copies of the underlying data. Cells Let's start with the basic Cell structure. A Cell will contain a mutable value, but it can be mutated without having a mutable Cell. It has mainly three interesting methods: set(), swap(), and replace(). The first allows us to set the contained value, replacing it with a new value. The previous structure will be dropped (the destructor will run). That last bit is the only difference with the replace() method. In the replace() method, instead of dropping the previous value, it will be returned. The swap() method, on the other hand, will take another Cell and swap the values between the two. All this without the Cell needing to be mutable. Let's see it with an example: use std::cell::Cell; #[derive(Copy, Clone)] struct House { bedrooms: u8, } impl Default for House { fn default() -> Self { House { bedrooms: 1 } } } fn main() { let my_house = House { bedrooms: 2 }; let my_dream_house = House { bedrooms: 5 }; let my_cell = Cell::new(my_house); println!("My house has {} bedrooms.", my_cell.get().bedrooms); my_cell.set(my_dream_house); println!("My new house has {} bedrooms.", my_cell.get().bedrooms); let my_new_old_house = my_cell.replace(my_house); println!( "My house has {} bedrooms, it was better with {}", my_cell.get().bedrooms, my_new_old_house.bedrooms ); let my_new_cell = Cell::new(my_dream_house); my_cell.swap(&my_new_cell); println!( "Yay! my current house has {} bedrooms! (my new house {})", my_cell.get().bedrooms, my_new_cell.get().bedrooms ); let my_final_house = my_cell.take(); println!( "My final house has {} bedrooms, the shared one {}", my_final_house.bedrooms, my_cell.get().bedrooms ); } As you can see in the example, to use a Cell, the contained type must be Copy. If the contained type is not Copy, you will need to use a RefCell, which we will see next. Continuing with this Cell example, as you can see through the code, the output will be the following: So we first create two houses, we select one of them as the current one, and we keep mutating the current and the new ones. As you might have seen, I also used the take() method, only available for types implementing the Default trait. This method will return the current value, replacing it with the default value. As you can see, you don't really mutate the value inside, but you replace it with another value. You can either retrieve the old value or lose it. Also, when using the get() method, you get a copy of the current value, and not a reference to it. That's why you can only use elements implementing Copy with a Cell. This also means that a Cell does not need to dynamically check borrows at runtime. RefCell RefCell is similar to Cell, except that it accepts non-Copy data. This also means that when modifying the underlying object, it cannot simply copy it when returning it, it will need to return references. The same way, when you want to mutate the object inside, it will return a mutable reference. This only works because it will dynamically check at runtime whether a borrow exists before returning a mutable borrow, or the other way around, and if it does, the thread will panic. Instead of using the get() method as in Cell, RefCell has two methods to get the underlying data: borrow() and borrow_mut(). The first will get a read-only borrow, and you can have as many immutable borrows in a scope. The second one will return a read-write borrow, and you will only be able to have one in scope to follow the mutability rules. If you try to do a borrow_mut() after a borrow() in the same scope, or a borrow() after a borrow_mut(), the thread will panic. There are two non-panicking alternatives to these borrows: try_borrow() and try_borrow_mut(). These two will try to borrow the data (the first read-only and the second read/write), and if there are incompatible borrows present, they will return a Result::Err, so that you can handle the error without panicking. Both Cell and RefCell have a get_mut() method, that will get a mutable reference to the element inside, but it requires the Cell / RefCell to be mutable, so it doesn't make much sense if you need the Cell / RefCell to be immutable. Nevertheless, if in a part of the code you can actually have a mutable Cell / RefCell, you should use this method to change the contents, since it will check all rules statically at compile time, without runtime overhead. Interestingly enough, RefCell does not return a plain reference to the underlying data when we call borrow() or borrow_mut(). You would expect them to return &T and &mut T (where T is the wrapped element). Instead, they will return a Ref and a RefMut, respectively. This is to safely wrap the reference inside, so that the lifetimes get correctly calculated by the compiler without requiring references to live for the whole lifetime of the RefCell. They implement Deref into references, though, so thanks to Rust's Deref coercion, you can use them as references. Overcoming issue with rc module The std::rc module contains reference-counted pointers that can be used in single-threaded applications. They have very little overhead, thanks to counters not being atomic counters, but this means that using them in multithreaded applications could cause data races. Thus, Rust will stop you from sending them between threads at compile time. There are two structures in this module: Rc and Weak. An Rc is an owning pointer to the heap. This means that it's the same as a Box, except that it allows for reference-counted pointers. When the Rc goes out of scope, it will decrease by 1 the number of references, and if that count is 0, it will drop the contained object. Since an Rc is a shared reference, it cannot be mutated, but a common pattern is to use a Cell or a RefCell inside the Rc to allow for interior mutability. Rc can be downgraded to a Weak pointer, that will have a borrowed reference to the heap. When an Rc drops the value inside, it will not check whether there are Weak pointers to it. This means that a Weak pointer will not always have a valid reference, and therefore, for safety reasons, the only way to check the value of the Weak pointer is to upgrade it to an Rc, which could fail. The upgrade() method will return None if the reference has been dropped. Let's check all this by creating an example binary tree structure: use std::cell::RefCell; use std::rc::{Rc, Weak}; struct Tree<T> { root: Node<T>, } struct Node<T> { parent: Option<Weak<Node<T>>>, left: Option<Rc<RefCell<Node<T>>>>, right: Option<Rc<RefCell<Node<T>>>>, value: T, } In this case, the tree will have a root node, and each of the nodes can have up to two children. We call them left and right, because they are usually represented as trees with one child on each side. Each node has a pointer to one of the children, and it owns the children nodes. This means that when a node loses all references, it will be dropped, and with it, its children. Each child has a pointer to its parent. The main issue with this is that, if the child has an Rc pointer to its parent, it will never drop. This is a circular dependency, and to avoid it, the pointer to the parent will be a Weak pointer. So, you've finally understood how Rust manages shared pointers for complex structures, where the Rust borrow checker can make your coding experience much more difficult. If you found this article useful and would like to learn more such tips, head over to pick up the book, Rust High Performance, authored by Iban Eguia Moraza. Perform Advanced Programming with Rust Rust 1.28 is here with global allocators, nonZero types and more Say hello to Sequoia: a new Rust based OpenPGP library to secure your apps
Read more
  • 0
  • 0
  • 18050

article-image-generative-adversarial-networks-using-keras
Amey Varangaonkar
21 Aug 2018
12 min read
Save for later

Generative Adversarial Networks: Generate images using Keras GAN [Tutorial]

Amey Varangaonkar
21 Aug 2018
12 min read
You might have worked with the popular MNIST dataset before - but in this article, we will be generating new MNIST-like images with a Keras GAN. It can take a very long time to train a GAN; however, this problem is small enough to run on most laptops in a few hours, which makes it a great example. The following excerpt is taken from the book Deep Learning Quick Reference, authored by Mike Bernico. The network architecture that we will be using here has been found by, and optimized by, many folks, including the authors of the DCGAN paper and people like Erik Linder-Norén, who's excellent collection of GAN implementations called Keras GAN served as the basis of the code we used here. Loading the MNIST dataset The MNIST dataset consists of 60,000 hand-drawn numbers, 0 to 9. Keras provides us with a built-in loader that splits it into 50,000 training images and 10,000 test images. We will use the following code to load the dataset: from keras.datasets import mnist def load_data(): (X_train, _), (_, _) = mnist.load_data() X_train = (X_train.astype(np.float32) - 127.5) / 127.5 X_train = np.expand_dims(X_train, axis=3) return X_train As you probably noticed, We're not returning any of the labels or the testing dataset. We're only going to use the training dataset. The labels aren't needed because the only labels we will be using are 0 for fake and 1 for real. These are real images, so they will all be assigned a label of 1 at the discriminator. Building the generator The generator uses a few new layers that we will talk about in this section. First, take a chance to skim through the following code: def build_generator(noise_shape=(100,)): input = Input(noise_shape) x = Dense(128 * 7 * 7, activation="relu")(input) x = Reshape((7, 7, 128))(x) x = BatchNormalization(momentum=0.8)(x) x = UpSampling2D()(x) x = Conv2D(128, kernel_size=3, padding="same")(x) x = Activation("relu")(x) x = BatchNormalization(momentum=0.8)(x) x = UpSampling2D()(x) x = Conv2D(64, kernel_size=3, padding="same")(x) x = Activation("relu")(x) x = BatchNormalization(momentum=0.8)(x) x = Conv2D(1, kernel_size=3, padding="same")(x) out = Activation("tanh")(x) model = Model(input, out) print("-- Generator -- ") model.summary() return model We have not previously used the UpSampling2D layer. This layer will take increases in the rows and columns of the input tensor, leaving the channels unchanged. It does this by repeating the values in the input tensor. By default, it will double the input. If we give an UpSampling2D layer a 7 x 7 x 128 input, it will give us a 14 x 14 x 128 output. Typically when we build a CNN, we start with an image that is very tall and wide and uses convolutional layers to get a tensor that's very deep but less tall and wide. Here we will do the opposite. We'll use a dense layer and a reshape to start with a 7 x 7 x 128 tensor and then, after doubling it twice, we'll be left with a 28 x 28 tensor. Since we need a grayscale image, we can use a convolutional layer with a single unit to get a 28 x 28 x 1 output. This sort of generator arithmetic is a little off-putting and can seem awkward at first but after a few painful hours, you will get the hang of it! Building the discriminator The discriminator is really, for the most part, the same as any other CNN. Of course, there are a few new things that we should talk about. We will use the following code to build the discriminator: def build_discriminator(img_shape): input = Input(img_shape) x =Conv2D(32, kernel_size=3, strides=2, padding="same")(input) x = LeakyReLU(alpha=0.2)(x) x = Dropout(0.25)(x) x = Conv2D(64, kernel_size=3, strides=2, padding="same")(x) x = ZeroPadding2D(padding=((0, 1), (0, 1)))(x) x = (LeakyReLU(alpha=0.2))(x) x = Dropout(0.25)(x) x = BatchNormalization(momentum=0.8)(x) x = Conv2D(128, kernel_size=3, strides=2, padding="same")(x) x = LeakyReLU(alpha=0.2)(x) x = Dropout(0.25)(x) x = BatchNormalization(momentum=0.8)(x) x = Conv2D(256, kernel_size=3, strides=1, padding="same")(x) x = LeakyReLU(alpha=0.2)(x) x = Dropout(0.25)(x) x = Flatten()(x) out = Dense(1, activation='sigmoid')(x) model = Model(input, out) print("-- Discriminator -- ") model.summary() return model First, you might notice the oddly shaped zeroPadding2D() layer. After the second convolution, our tensor has gone from 28 x 28 x 3 to 7 x 7 x 64. This layer just gets us back into an even number, adding zeros on one side of both the rows and columns so that our tensor is now 8 x 8 x 64. More unusual is the use of both batch normalization and dropout. Typically, these two layers are not used together; however, in the case of GANs, they do seem to benefit the network. Building the stacked model Now that we've assembled both the generator and the discriminator, we need to assemble a third model that is the stack of both models together that we can use for training the generator given the discriminator loss. To do that we can just create a new model, this time using the previous models as layers in the new model, as shown in the following code: discriminator = build_discriminator(img_shape=(28, 28, 1)) generator = build_generator() z = Input(shape=(100,)) img = generator(z) discriminator.trainable = False real = discriminator(img) combined = Model(z, real) Notice that we're setting the discriminator's training attribute to False before building the model. This means that for this model we will not be updating the weights of the discriminator during backpropagation. We will freeze these weights and only move the generator weights with the stack. The discriminator will be trained separately. Now that all the models are built, they need to be compiled, as shown in the following code: gen_optimizer = Adam(lr=0.0002, beta_1=0.5) disc_optimizer = Adam(lr=0.0002, beta_1=0.5) discriminator.compile(loss='binary_crossentropy', optimizer=disc_optimizer, metrics=['accuracy']) generator.compile(loss='binary_crossentropy', optimizer=gen_optimizer) combined.compile(loss='binary_crossentropy', optimizer=gen_optimizer) If you'll notice, we're creating two custom Adam optimizers. This is because many times we will want to change the learning rate for only the discriminator or generator, slowing one or the other down so that we end up with a stable GAN where neither is overpowering the other. You'll also notice that we're using beta_1 = 0.5. This is a recommendation from the original DCGAN paper that we've carried forward and also had success with. A learning rate of 0.0002 is a good place to start as well, and was found in the original DCGAN paper. The training loop We have previously had the luxury of calling .fit() on our model and letting Keras handle the painful process of breaking the data apart into mini batches and training for us. Unfortunately, because we need to perform the separate updates for the discriminator and the stacked model together for a single batch we're going to have to do things the old-fashioned way, with a few loops. This is how things used to be done all the time, so while it's perhaps a little more work, it does admittedly leave me feeling nostalgic. The following code illustrates the training technique: num_examples = X_train.shape[0] num_batches = int(num_examples / float(batch_size)) half_batch = int(batch_size / 2) for epoch in range(epochs + 1): for batch in range(num_batches): # noise images for the batch noise = np.random.normal(0, 1, (half_batch, 100)) fake_images = generator.predict(noise) fake_labels = np.zeros((half_batch, 1)) # real images for batch idx = np.random.randint(0, X_train.shape[0], half_batch) real_images = X_train[idx] real_labels = np.ones((half_batch, 1)) # Train the discriminator (real classified as ones and generated as zeros) d_loss_real = discriminator.train_on_batch(real_images, real_labels) d_loss_fake = discriminator.train_on_batch(fake_images, fake_labels) d_loss = 0.5 * np.add(d_loss_real, d_loss_fake) noise = np.random.normal(0, 1, (batch_size, 100)) # Train the generator g_loss = combined.train_on_batch(noise, np.ones((batch_size, 1))) # Plot the progress print("Epoch %d Batch %d/%d [D loss: %f, acc.: %.2f%%] [G loss: %f]" % (epoch,batch, num_batches, d_loss[0], 100 * d_loss[1], g_loss)) if batch % 50 == 0: save_imgs(generator, epoch, batch) There is a lot going on here, to be sure. As before, let's break it down block by block. First, let's see the code to generate noise vectors: noise = np.random.normal(0, 1, (half_batch, 100)) fake_images = generator.predict(noise) fake_labels = np.zeros((half_batch, 1)) This code is generating a matrix of noise vectors called z) and sending it to the generator. It's getting a set of generated images back, which we're calling fake images. We will use these to train the discriminator, so the labels we want to use are 0s, indicating that these are in fact generated images. Note that the shape here is half_batch x 28 x 28 x 1. The half_batch is exactly what you think it is. We're creating half a batch of generated images because the other half of the batch will be real data, which we will assemble next. To get our real images, we will generate a random set of indices across X_train and use that slice of X_train as our real images, as shown in the following code: idx = np.random.randint(0, X_train.shape[0], half_batch) real_images = X_train[idx] real_labels = np.ones((half_batch, 1)) Yes, we are sampling with replacement in this case. It does work out but it's probably not the best way to implement minibatch training. It is, however, probably the easiest and most common. Since we are using these images to train the discriminator, and because they are real images, we will assign them 1s as labels, rather than 0s. Now that we have our discriminator training set assembled, we will update the discriminator. Also, note that we aren't using the soft labels. That's because we want to keep things as easy as they can be to understand. Luckily the network doesn't require them in this case. We will use the following code to train the discriminator: # Train the discriminator (real classified as ones and generated as zeros) d_loss_real = discriminator.train_on_batch(real_images, real_labels) d_loss_fake = discriminator.train_on_batch(fake_images, fake_labels) d_loss = 0.5 * np.add(d_loss_real, d_loss_fake) Notice that here we're using the discriminator's train_on_batch() method. The train_on_batch() method does exactly one round of forward and backward propagation. Every time we call it, it updates the model once from the model's previous state. Also, notice that we're making the update for the real images and fake images separately. This is advice that is given on the GAN hack Git we had previously referenced in the Generator architecture section. Especially in the early stages of training, when real images and fake images are from radically different distributions, batch normalization will cause problems with training if we were to put both sets of data in the same update. Now that the discriminator has been updated, it's time to update the generator. This is done indirectly by updating the combined stack, as shown in the following code: noise = np.random.normal(0, 1, (batch_size, 100)) g_loss = combined.train_on_batch(noise, np.ones((batch_size, 1))) To update the combined model, we create a new noise matrix, and this time it will be as large as the entire batch. We will use that as an input to the stack, which will cause the generator to generate an image and the discriminator to evaluate that image. Finally, we will use the label of 1 because we want to backpropagate the error between a real image and the generated image. Lastly, the training loop reports the discriminator and generator loss at the epoch/batch and then, every 50 batches, of every epoch we will use save_imgs to generate example images and save them to disk, as shown in the following code: print("Epoch %d Batch %d/%d [D loss: %f, acc.: %.2f%%] [G loss: %f]" % (epoch,batch, num_batches, d_loss[0], 100 * d_loss[1], g_loss)) if batch % 50 == 0: save_imgs(generator, epoch, batch) The save_imgs function uses the generator to create images as we go, so we can see the fruits of our labor. We will use the following code to define save_imgs: def save_imgs(generator, epoch, batch): r, c = 5, 5 noise = np.random.normal(0, 1, (r * c, 100)) gen_imgs = generator.predict(noise) gen_imgs = 0.5 * gen_imgs + 0.5 fig, axs = plt.subplots(r, c) cnt = 0 for i in range(r): for j in range(c): axs[i, j].imshow(gen_imgs[cnt, :, :, 0], cmap='gray') axs[i, j].axis('off') cnt += 1 fig.savefig("images/mnist_%d_%d.png" % (epoch, batch)) plt.close() It uses only the generator by creating a noise matrix and retrieving an image matrix in return. Then, using matplotlib.pyplot, it saves those images to disk in a 5 x 5 grid. Performing model evaluation Good is somewhat subjective when you're building a deep neural network to create images.  Let's take a look at a few examples of the training process, so you can see for yourself how the GAN begins to learn to generate MNIST. Here's the network at the very first batch of the very first epoch. Clearly, the generator doesn't really know anything about generating MNIST at this point; it's just noise, as shown in the following image: But just 50 batches in, something is happening, as you can see from the following image: And after 200 batches of epoch 0 we can almost see numbers, as you can see from the following image: And here's our generator after one full epoch. These generated numbers look pretty good, and we can see how the discriminator might be fooled by them. At this point, we could probably continue to improve a little bit, but it looks like our GAN has worked as the computer is generating some pretty convincing MNIST digits, as shown in the following image: Thus, we see the power of GANs in action when it comes to image generation using the Keras library. If you found the above article to be useful, make sure you check out our book Deep Learning Quick Reference, for more such interesting coverage of popular deep learning concepts and their practical implementation. Keras 2.2.0 releases! 2 ways to customize your deep learning models with Keras How to build Deep convolutional GAN using TensorFlow and Keras
Read more
  • 0
  • 3
  • 58913
Modal Close icon
Modal Close icon