Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials

7012 Articles
article-image-shared-pointers-in-rust-challenges-solutions
Aaron Lazar
22 Aug 2018
8 min read
Save for later

Working with Shared pointers in Rust: Challenges and Solutions [Tutorial]

Aaron Lazar
22 Aug 2018
8 min read
One of Rust's most criticized problem is that it's difficult to develop an application with shared pointers. It's true that due to Rust's memory safety guarantees, it might be difficult to develop those kind of algorithms, but as we will see now, the standard library gives us types we can use to safely allow that behavior. In this article, we'll understand how to overcome the issue of shared pointers in Rust to increase efficiency. This article is an extract from Rust High Performance, authored by Iban Eguia Moraza. Overcoming issue with cell module The standard Rust library has one interesting module, the std::cell module, that allows us to use objects with interior mutability. This means that we can have an immutable object and still mutate it by getting a mutable borrow to the underlying data. This, of course, would not comply with the mutability rules we saw before, but the cells make sure this works by checking the borrows at runtime or by doing copies of the underlying data. Cells Let's start with the basic Cell structure. A Cell will contain a mutable value, but it can be mutated without having a mutable Cell. It has mainly three interesting methods: set(), swap(), and replace(). The first allows us to set the contained value, replacing it with a new value. The previous structure will be dropped (the destructor will run). That last bit is the only difference with the replace() method. In the replace() method, instead of dropping the previous value, it will be returned. The swap() method, on the other hand, will take another Cell and swap the values between the two. All this without the Cell needing to be mutable. Let's see it with an example: use std::cell::Cell; #[derive(Copy, Clone)] struct House { bedrooms: u8, } impl Default for House { fn default() -> Self { House { bedrooms: 1 } } } fn main() { let my_house = House { bedrooms: 2 }; let my_dream_house = House { bedrooms: 5 }; let my_cell = Cell::new(my_house); println!("My house has {} bedrooms.", my_cell.get().bedrooms); my_cell.set(my_dream_house); println!("My new house has {} bedrooms.", my_cell.get().bedrooms); let my_new_old_house = my_cell.replace(my_house); println!( "My house has {} bedrooms, it was better with {}", my_cell.get().bedrooms, my_new_old_house.bedrooms ); let my_new_cell = Cell::new(my_dream_house); my_cell.swap(&my_new_cell); println!( "Yay! my current house has {} bedrooms! (my new house {})", my_cell.get().bedrooms, my_new_cell.get().bedrooms ); let my_final_house = my_cell.take(); println!( "My final house has {} bedrooms, the shared one {}", my_final_house.bedrooms, my_cell.get().bedrooms ); } As you can see in the example, to use a Cell, the contained type must be Copy. If the contained type is not Copy, you will need to use a RefCell, which we will see next. Continuing with this Cell example, as you can see through the code, the output will be the following: So we first create two houses, we select one of them as the current one, and we keep mutating the current and the new ones. As you might have seen, I also used the take() method, only available for types implementing the Default trait. This method will return the current value, replacing it with the default value. As you can see, you don't really mutate the value inside, but you replace it with another value. You can either retrieve the old value or lose it. Also, when using the get() method, you get a copy of the current value, and not a reference to it. That's why you can only use elements implementing Copy with a Cell. This also means that a Cell does not need to dynamically check borrows at runtime. RefCell RefCell is similar to Cell, except that it accepts non-Copy data. This also means that when modifying the underlying object, it cannot simply copy it when returning it, it will need to return references. The same way, when you want to mutate the object inside, it will return a mutable reference. This only works because it will dynamically check at runtime whether a borrow exists before returning a mutable borrow, or the other way around, and if it does, the thread will panic. Instead of using the get() method as in Cell, RefCell has two methods to get the underlying data: borrow() and borrow_mut(). The first will get a read-only borrow, and you can have as many immutable borrows in a scope. The second one will return a read-write borrow, and you will only be able to have one in scope to follow the mutability rules. If you try to do a borrow_mut() after a borrow() in the same scope, or a borrow() after a borrow_mut(), the thread will panic. There are two non-panicking alternatives to these borrows: try_borrow() and try_borrow_mut(). These two will try to borrow the data (the first read-only and the second read/write), and if there are incompatible borrows present, they will return a Result::Err, so that you can handle the error without panicking. Both Cell and RefCell have a get_mut() method, that will get a mutable reference to the element inside, but it requires the Cell / RefCell to be mutable, so it doesn't make much sense if you need the Cell / RefCell to be immutable. Nevertheless, if in a part of the code you can actually have a mutable Cell / RefCell, you should use this method to change the contents, since it will check all rules statically at compile time, without runtime overhead. Interestingly enough, RefCell does not return a plain reference to the underlying data when we call borrow() or borrow_mut(). You would expect them to return &T and &mut T (where T is the wrapped element). Instead, they will return a Ref and a RefMut, respectively. This is to safely wrap the reference inside, so that the lifetimes get correctly calculated by the compiler without requiring references to live for the whole lifetime of the RefCell. They implement Deref into references, though, so thanks to Rust's Deref coercion, you can use them as references. Overcoming issue with rc module The std::rc module contains reference-counted pointers that can be used in single-threaded applications. They have very little overhead, thanks to counters not being atomic counters, but this means that using them in multithreaded applications could cause data races. Thus, Rust will stop you from sending them between threads at compile time. There are two structures in this module: Rc and Weak. An Rc is an owning pointer to the heap. This means that it's the same as a Box, except that it allows for reference-counted pointers. When the Rc goes out of scope, it will decrease by 1 the number of references, and if that count is 0, it will drop the contained object. Since an Rc is a shared reference, it cannot be mutated, but a common pattern is to use a Cell or a RefCell inside the Rc to allow for interior mutability. Rc can be downgraded to a Weak pointer, that will have a borrowed reference to the heap. When an Rc drops the value inside, it will not check whether there are Weak pointers to it. This means that a Weak pointer will not always have a valid reference, and therefore, for safety reasons, the only way to check the value of the Weak pointer is to upgrade it to an Rc, which could fail. The upgrade() method will return None if the reference has been dropped. Let's check all this by creating an example binary tree structure: use std::cell::RefCell; use std::rc::{Rc, Weak}; struct Tree<T> { root: Node<T>, } struct Node<T> { parent: Option<Weak<Node<T>>>, left: Option<Rc<RefCell<Node<T>>>>, right: Option<Rc<RefCell<Node<T>>>>, value: T, } In this case, the tree will have a root node, and each of the nodes can have up to two children. We call them left and right, because they are usually represented as trees with one child on each side. Each node has a pointer to one of the children, and it owns the children nodes. This means that when a node loses all references, it will be dropped, and with it, its children. Each child has a pointer to its parent. The main issue with this is that, if the child has an Rc pointer to its parent, it will never drop. This is a circular dependency, and to avoid it, the pointer to the parent will be a Weak pointer. So, you've finally understood how Rust manages shared pointers for complex structures, where the Rust borrow checker can make your coding experience much more difficult. If you found this article useful and would like to learn more such tips, head over to pick up the book, Rust High Performance, authored by Iban Eguia Moraza. Perform Advanced Programming with Rust Rust 1.28 is here with global allocators, nonZero types and more Say hello to Sequoia: a new Rust based OpenPGP library to secure your apps
Read more
  • 0
  • 0
  • 18050

article-image-generative-adversarial-networks-using-keras
Amey Varangaonkar
21 Aug 2018
12 min read
Save for later

Generative Adversarial Networks: Generate images using Keras GAN [Tutorial]

Amey Varangaonkar
21 Aug 2018
12 min read
You might have worked with the popular MNIST dataset before - but in this article, we will be generating new MNIST-like images with a Keras GAN. It can take a very long time to train a GAN; however, this problem is small enough to run on most laptops in a few hours, which makes it a great example. The following excerpt is taken from the book Deep Learning Quick Reference, authored by Mike Bernico. The network architecture that we will be using here has been found by, and optimized by, many folks, including the authors of the DCGAN paper and people like Erik Linder-Norén, who's excellent collection of GAN implementations called Keras GAN served as the basis of the code we used here. Loading the MNIST dataset The MNIST dataset consists of 60,000 hand-drawn numbers, 0 to 9. Keras provides us with a built-in loader that splits it into 50,000 training images and 10,000 test images. We will use the following code to load the dataset: from keras.datasets import mnist def load_data(): (X_train, _), (_, _) = mnist.load_data() X_train = (X_train.astype(np.float32) - 127.5) / 127.5 X_train = np.expand_dims(X_train, axis=3) return X_train As you probably noticed, We're not returning any of the labels or the testing dataset. We're only going to use the training dataset. The labels aren't needed because the only labels we will be using are 0 for fake and 1 for real. These are real images, so they will all be assigned a label of 1 at the discriminator. Building the generator The generator uses a few new layers that we will talk about in this section. First, take a chance to skim through the following code: def build_generator(noise_shape=(100,)): input = Input(noise_shape) x = Dense(128 * 7 * 7, activation="relu")(input) x = Reshape((7, 7, 128))(x) x = BatchNormalization(momentum=0.8)(x) x = UpSampling2D()(x) x = Conv2D(128, kernel_size=3, padding="same")(x) x = Activation("relu")(x) x = BatchNormalization(momentum=0.8)(x) x = UpSampling2D()(x) x = Conv2D(64, kernel_size=3, padding="same")(x) x = Activation("relu")(x) x = BatchNormalization(momentum=0.8)(x) x = Conv2D(1, kernel_size=3, padding="same")(x) out = Activation("tanh")(x) model = Model(input, out) print("-- Generator -- ") model.summary() return model We have not previously used the UpSampling2D layer. This layer will take increases in the rows and columns of the input tensor, leaving the channels unchanged. It does this by repeating the values in the input tensor. By default, it will double the input. If we give an UpSampling2D layer a 7 x 7 x 128 input, it will give us a 14 x 14 x 128 output. Typically when we build a CNN, we start with an image that is very tall and wide and uses convolutional layers to get a tensor that's very deep but less tall and wide. Here we will do the opposite. We'll use a dense layer and a reshape to start with a 7 x 7 x 128 tensor and then, after doubling it twice, we'll be left with a 28 x 28 tensor. Since we need a grayscale image, we can use a convolutional layer with a single unit to get a 28 x 28 x 1 output. This sort of generator arithmetic is a little off-putting and can seem awkward at first but after a few painful hours, you will get the hang of it! Building the discriminator The discriminator is really, for the most part, the same as any other CNN. Of course, there are a few new things that we should talk about. We will use the following code to build the discriminator: def build_discriminator(img_shape): input = Input(img_shape) x =Conv2D(32, kernel_size=3, strides=2, padding="same")(input) x = LeakyReLU(alpha=0.2)(x) x = Dropout(0.25)(x) x = Conv2D(64, kernel_size=3, strides=2, padding="same")(x) x = ZeroPadding2D(padding=((0, 1), (0, 1)))(x) x = (LeakyReLU(alpha=0.2))(x) x = Dropout(0.25)(x) x = BatchNormalization(momentum=0.8)(x) x = Conv2D(128, kernel_size=3, strides=2, padding="same")(x) x = LeakyReLU(alpha=0.2)(x) x = Dropout(0.25)(x) x = BatchNormalization(momentum=0.8)(x) x = Conv2D(256, kernel_size=3, strides=1, padding="same")(x) x = LeakyReLU(alpha=0.2)(x) x = Dropout(0.25)(x) x = Flatten()(x) out = Dense(1, activation='sigmoid')(x) model = Model(input, out) print("-- Discriminator -- ") model.summary() return model First, you might notice the oddly shaped zeroPadding2D() layer. After the second convolution, our tensor has gone from 28 x 28 x 3 to 7 x 7 x 64. This layer just gets us back into an even number, adding zeros on one side of both the rows and columns so that our tensor is now 8 x 8 x 64. More unusual is the use of both batch normalization and dropout. Typically, these two layers are not used together; however, in the case of GANs, they do seem to benefit the network. Building the stacked model Now that we've assembled both the generator and the discriminator, we need to assemble a third model that is the stack of both models together that we can use for training the generator given the discriminator loss. To do that we can just create a new model, this time using the previous models as layers in the new model, as shown in the following code: discriminator = build_discriminator(img_shape=(28, 28, 1)) generator = build_generator() z = Input(shape=(100,)) img = generator(z) discriminator.trainable = False real = discriminator(img) combined = Model(z, real) Notice that we're setting the discriminator's training attribute to False before building the model. This means that for this model we will not be updating the weights of the discriminator during backpropagation. We will freeze these weights and only move the generator weights with the stack. The discriminator will be trained separately. Now that all the models are built, they need to be compiled, as shown in the following code: gen_optimizer = Adam(lr=0.0002, beta_1=0.5) disc_optimizer = Adam(lr=0.0002, beta_1=0.5) discriminator.compile(loss='binary_crossentropy', optimizer=disc_optimizer, metrics=['accuracy']) generator.compile(loss='binary_crossentropy', optimizer=gen_optimizer) combined.compile(loss='binary_crossentropy', optimizer=gen_optimizer) If you'll notice, we're creating two custom Adam optimizers. This is because many times we will want to change the learning rate for only the discriminator or generator, slowing one or the other down so that we end up with a stable GAN where neither is overpowering the other. You'll also notice that we're using beta_1 = 0.5. This is a recommendation from the original DCGAN paper that we've carried forward and also had success with. A learning rate of 0.0002 is a good place to start as well, and was found in the original DCGAN paper. The training loop We have previously had the luxury of calling .fit() on our model and letting Keras handle the painful process of breaking the data apart into mini batches and training for us. Unfortunately, because we need to perform the separate updates for the discriminator and the stacked model together for a single batch we're going to have to do things the old-fashioned way, with a few loops. This is how things used to be done all the time, so while it's perhaps a little more work, it does admittedly leave me feeling nostalgic. The following code illustrates the training technique: num_examples = X_train.shape[0] num_batches = int(num_examples / float(batch_size)) half_batch = int(batch_size / 2) for epoch in range(epochs + 1): for batch in range(num_batches): # noise images for the batch noise = np.random.normal(0, 1, (half_batch, 100)) fake_images = generator.predict(noise) fake_labels = np.zeros((half_batch, 1)) # real images for batch idx = np.random.randint(0, X_train.shape[0], half_batch) real_images = X_train[idx] real_labels = np.ones((half_batch, 1)) # Train the discriminator (real classified as ones and generated as zeros) d_loss_real = discriminator.train_on_batch(real_images, real_labels) d_loss_fake = discriminator.train_on_batch(fake_images, fake_labels) d_loss = 0.5 * np.add(d_loss_real, d_loss_fake) noise = np.random.normal(0, 1, (batch_size, 100)) # Train the generator g_loss = combined.train_on_batch(noise, np.ones((batch_size, 1))) # Plot the progress print("Epoch %d Batch %d/%d [D loss: %f, acc.: %.2f%%] [G loss: %f]" % (epoch,batch, num_batches, d_loss[0], 100 * d_loss[1], g_loss)) if batch % 50 == 0: save_imgs(generator, epoch, batch) There is a lot going on here, to be sure. As before, let's break it down block by block. First, let's see the code to generate noise vectors: noise = np.random.normal(0, 1, (half_batch, 100)) fake_images = generator.predict(noise) fake_labels = np.zeros((half_batch, 1)) This code is generating a matrix of noise vectors called z) and sending it to the generator. It's getting a set of generated images back, which we're calling fake images. We will use these to train the discriminator, so the labels we want to use are 0s, indicating that these are in fact generated images. Note that the shape here is half_batch x 28 x 28 x 1. The half_batch is exactly what you think it is. We're creating half a batch of generated images because the other half of the batch will be real data, which we will assemble next. To get our real images, we will generate a random set of indices across X_train and use that slice of X_train as our real images, as shown in the following code: idx = np.random.randint(0, X_train.shape[0], half_batch) real_images = X_train[idx] real_labels = np.ones((half_batch, 1)) Yes, we are sampling with replacement in this case. It does work out but it's probably not the best way to implement minibatch training. It is, however, probably the easiest and most common. Since we are using these images to train the discriminator, and because they are real images, we will assign them 1s as labels, rather than 0s. Now that we have our discriminator training set assembled, we will update the discriminator. Also, note that we aren't using the soft labels. That's because we want to keep things as easy as they can be to understand. Luckily the network doesn't require them in this case. We will use the following code to train the discriminator: # Train the discriminator (real classified as ones and generated as zeros) d_loss_real = discriminator.train_on_batch(real_images, real_labels) d_loss_fake = discriminator.train_on_batch(fake_images, fake_labels) d_loss = 0.5 * np.add(d_loss_real, d_loss_fake) Notice that here we're using the discriminator's train_on_batch() method. The train_on_batch() method does exactly one round of forward and backward propagation. Every time we call it, it updates the model once from the model's previous state. Also, notice that we're making the update for the real images and fake images separately. This is advice that is given on the GAN hack Git we had previously referenced in the Generator architecture section. Especially in the early stages of training, when real images and fake images are from radically different distributions, batch normalization will cause problems with training if we were to put both sets of data in the same update. Now that the discriminator has been updated, it's time to update the generator. This is done indirectly by updating the combined stack, as shown in the following code: noise = np.random.normal(0, 1, (batch_size, 100)) g_loss = combined.train_on_batch(noise, np.ones((batch_size, 1))) To update the combined model, we create a new noise matrix, and this time it will be as large as the entire batch. We will use that as an input to the stack, which will cause the generator to generate an image and the discriminator to evaluate that image. Finally, we will use the label of 1 because we want to backpropagate the error between a real image and the generated image. Lastly, the training loop reports the discriminator and generator loss at the epoch/batch and then, every 50 batches, of every epoch we will use save_imgs to generate example images and save them to disk, as shown in the following code: print("Epoch %d Batch %d/%d [D loss: %f, acc.: %.2f%%] [G loss: %f]" % (epoch,batch, num_batches, d_loss[0], 100 * d_loss[1], g_loss)) if batch % 50 == 0: save_imgs(generator, epoch, batch) The save_imgs function uses the generator to create images as we go, so we can see the fruits of our labor. We will use the following code to define save_imgs: def save_imgs(generator, epoch, batch): r, c = 5, 5 noise = np.random.normal(0, 1, (r * c, 100)) gen_imgs = generator.predict(noise) gen_imgs = 0.5 * gen_imgs + 0.5 fig, axs = plt.subplots(r, c) cnt = 0 for i in range(r): for j in range(c): axs[i, j].imshow(gen_imgs[cnt, :, :, 0], cmap='gray') axs[i, j].axis('off') cnt += 1 fig.savefig("images/mnist_%d_%d.png" % (epoch, batch)) plt.close() It uses only the generator by creating a noise matrix and retrieving an image matrix in return. Then, using matplotlib.pyplot, it saves those images to disk in a 5 x 5 grid. Performing model evaluation Good is somewhat subjective when you're building a deep neural network to create images.  Let's take a look at a few examples of the training process, so you can see for yourself how the GAN begins to learn to generate MNIST. Here's the network at the very first batch of the very first epoch. Clearly, the generator doesn't really know anything about generating MNIST at this point; it's just noise, as shown in the following image: But just 50 batches in, something is happening, as you can see from the following image: And after 200 batches of epoch 0 we can almost see numbers, as you can see from the following image: And here's our generator after one full epoch. These generated numbers look pretty good, and we can see how the discriminator might be fooled by them. At this point, we could probably continue to improve a little bit, but it looks like our GAN has worked as the computer is generating some pretty convincing MNIST digits, as shown in the following image: Thus, we see the power of GANs in action when it comes to image generation using the Keras library. If you found the above article to be useful, make sure you check out our book Deep Learning Quick Reference, for more such interesting coverage of popular deep learning concepts and their practical implementation. Keras 2.2.0 releases! 2 ways to customize your deep learning models with Keras How to build Deep convolutional GAN using TensorFlow and Keras
Read more
  • 0
  • 3
  • 58913

article-image-implementing-dependency-injection-in-spring-tutorial
Natasha Mathur
21 Aug 2018
9 min read
Save for later

Implementing Dependency Injection in Spring [Tutorial]

Natasha Mathur
21 Aug 2018
9 min read
Spring is a lightweight and open source enterprise framework created way back in 2003. Modularity is the heart of the Spring framework. Because of this, Spring can be used from the presentation layer to the persistence layer. The good thing is, Spring doesn't force you to use Spring in all layers. For example, if you use Spring in the persistence layer, you are free to use any other framework in the presentation of the controller layer. In this article we will look at implementing Dependency Injection (DI) in Spring Java application. This tutorial is an excerpt taken from the book  'Java 9 Dependency Injection', written by Krunal Patel, Nilang Patel. Spring is a POJO-based framework; a servlet container is suffice to run your application and a fully-fledged application server is not required. DI is a process of providing the dependent objects to other objects that need it. In Spring, the container supplies the dependencies. The flow of creating and managing the dependencies is inverted from client to container. That is the reason we call it an IoC container. A Spring IoC container uses the Dependency Injection (DI) mechanism to provide the dependency at runtime.  Now we'll talk about how we can implement the constructor and setter-based DI through Spring's IoC container. Implementing Constructor-based DI Constructor-based dependency is generally used where you want to pass mandatory dependencies before the object is instantiated. It's provided by a container through a constructor with different arguments, and each represents dependency. When a container starts, it checks whether any constructor-based DI is defined for <bean>. It will create the dependency objects first, and then pass them to the current object's constructor. We will understand this by taking the classic example of using logging. It is good practice to put the log statement at various places in the code to trace the flow of execution. Let's say you have an EmployeeService class where you need to put a log in each of its methods. To achieve separation of concern, you put the log functionality in a separated class called Logger. To make sure the EmployeeService and Logger are independent and loosely coupled, you need to inject the Logger object into the EmployeeService object. Let's see how to achieve this by constructor-based injection: public class EmployeeService { private Logger log; //Constructor public EmployeeService(Logger log) { this.log = log; } //Service method. public void showEmployeeName() { log.info("showEmployeeName method is called ...."); log.debug("This is Debuggin point"); log.error("Some Exception occured here ..."); } } public class Logger { public void info(String msg){ System.out.println("Logger INFO: "+msg); } public void debug(String msg){ System.out.println("Logger DEBUG: "+msg); } public void error(String msg){ System.out.println("Logger ERROR: "+msg); } } public class DIWithConstructorCheck { public static void main(String[] args) { ApplicationContext springContext = new ClassPathXmlApplicationContext("application-context.xml"); EmployeeService employeeService = (EmployeeService) springContext.getBean("employeeService"); employeeService.showEmployeeName(); } } As per the preceding code, when these objects are configured with Spring, the EmployeeService object expects the Spring container to inject the object of Logger through the constructor. To achieve this, you need to set the configuration metadata as per the following snippet: <?xml version="1.0" encoding="UTF-8"?> <beans xmlns="http://www.springframework.org/schema/beans" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd"> <!-- All your bean and its configuration metadata goes here --> <bean id="employeeService" class="com.packet.spring.constructor.di.EmployeeService"> <constructor-arg ref="logger"/> </bean> <bean id="logger" class="com.packet.spring.constructor.di.Logger"> </bean> </beans> In the preceding configuration, the Logger bean is injected into the employee bean through the constructor-arg element. It has a ref attribute, which is used to point to other beans with a matching id value.  This configuration instructs Spring to pass the object of Logger into the constructor of the EmployeeService bean. You can put the <bean> definition in any order here. Spring will create the objects of <bean> based on need, and not as per the order they are defined here. For more than one constructor argument, you can pass additional <constructor-arg> elements. The order is not important as far as the object type (class attribute of referred bean) is not ambiguous. Spring also supports DI with primitive constructor arguments. Spring provides the facility to pass the primitive values in a constructor from an application context (XML) file. Let's say you want to create an object of the Camera class with a default value, as per the following snippet: public class Camera { private int resolution; private String mode; private boolean smileShot; //Constructor. public Camera(int resolution, String mode, boolean smileShot) { this.resolution = resolution; this.mode = mode; this.smileShot = smileShot; } //Public method public void showSettings() { System.out.println("Resolution:"+resolution+"px mode:"+mode+" smileShot:"+smileShot); } } The Camera class has three properties: resolution, mode, and smileShot. Its constructor takes three primitive arguments to create a camera object with default values. You need to give configuration metadata in the following way so that Spring can create instances of the Camera object with default primitive values: <bean id="camera" class="com.packet.spring.constructor.di.Camera"> <constructor-arg type="int" value="12" /> <constructor-arg type="java.lang.String" value="normal" /> <constructor-arg type="boolean" value="false" /> </bean> We pass three <constructor-arg> elements under <bean>, corresponding to each constructor argument. Since these are primitive, Spring has no idea about its type while passing the value. So, we need to explicitly pass the type attribute, which defines the type of primitive constructor argument. In case of primitive also, there is no fixed order to pass the value of the constructor argument, as long as the type is not ambiguous. In previous cases, all three types are different, so Spring intelligently picks up the right constructor argument, no matter which orders you pass them. Now we are adding one more attribute to the Camera class called flash, as per the following snippet: //Constructor. public Camera(int resolution, String mode, boolean smileShot, boolean flash) { this.resolution = resolution; this.mode = mode; this.smileShot = smileShot; this.flash = flash; } In this case, the constructor arguments smileShot and flash are of the same type (Boolean), and you pass the constructor argument value from XML configuration as per the following snippet: <constructor-arg type="java.lang.String" value="normal"/> <constructor-arg type="boolean" value="true" /> <constructor-arg type="int" value="12" /> <constructor-arg type="boolean" value="false" /> In the preceding scenario, Spring will pick up the following: int value for resolution The string value for mode First Boolean value (true) in sequence for first Boolean argument—smileShot Second Boolean value (false) in sequence for second Boolean argument—flash In short, for similar types in constructor arguments, Spring will pick the first value that comes in the sequence. So sequence does matter in this case. This may lead to logical errors, as you are passing wrong values to the right argument. To avoid such accidental mistakes, Spring provides the facility to define a zero-based index in the <constructor-arg> element, as per the following snippet: <constructor-arg type="java.lang.String" value="normal" index="1"/> <constructor-arg type="boolean" value="true" index="3"/> <constructor-arg type="int" value="12" index="0"/> <constructor-arg type="boolean" value="false" index="2"/> This is more readable and less error-prone. Now Spring will pick up the last value (with index=2) for smileShot, and the second value (with index=3) for flash arguments. Index attributes resolve the ambiguity of two constructor arguments having the same type. If the type you defined in <constructor-arg> is not compatible with the actual type of constructor argument in that index, then Spring will raise an error. So just make sure about this while using index attribute. Implementing Setter-based DI Setter-based DI is generally used for optional dependencies. In case of setter-based DI, the container first creates an instance of your bean, either by calling a no-argument constructor or static factory method. It then passes the said dependencies through each setter method. Dependencies injected through the setter method can be re-injected or changed at a later stage of application. We will understand setter-based DI with the following code base: public class DocumentBase { private DocFinder docFinder; //Setter method to inject dependency. public void setDocFinder(DocFinder docFinder) { this.docFinder = docFinder; } public void performSearch() { this.docFinder.doFind(); } } public class DocFinder { public void doFind() { System.out.println(" Finding in Document Base "); } } public class DIWithSetterCheck { public static void main(String[] args) { ApplicationContext springContext = new ClassPathXmlApplicationContext("application-context.xml"); DocumentBase docBase = (DocumentBase) springContext.getBean("docBase"); docBase.performSearch(); } } The  DocumentBase class depends on DocFinder, and we are passing it through the setter method. You need to define the configuration metadata for Spring, as per the following snippet: <bean id="docBase" class="com.packet.spring.setter.di.DocumentBase"> <property name="docFinder" ref="docFinder" /> </bean> <bean id="docFinder" class="com.packet.spring.setter.di.DocFinder"> </bean> Setter-based DI can be defined through the <property> element under <bean>. The name attribute denotes the name of the setter name. In our case, the name attribute of the property element is docFinder, so Spring will call the setDocFinder method to inject the dependency. The pattern to find the setter method is to prepend set and make the first character capital. The name attribute of the <property> element is case-sensitive. So, if you set the name to docfinder, Spring will try to call the setDocfinder method and will show an error. Just like constructor DI, Setter DI also supports supplying the value for primitives, as per the following snippet: <bean id="docBase" class="com.packet.spring.setter.di.DocumentBase"> <property name="buildNo" value="1.2.6" /> </bean> Since the setter method takes only one argument, there is no scope of argument ambiguity. Whatever value you are passing here, Spring will convert it to an actual primitive type of the setter method parameter. If it's not compatible, it will show an error. We learned to implement DI with Spring and looked at different types of DI like setter-based injection and constructor-based injection. If you found this post useful, be sure to check out the book  'Java 9 Dependency Injection' to learn about factory method in Spring and other concepts in dependency injection. Learning Dependency Injection (DI) Angular 2 Dependency Injection: A powerful design pattern
Read more
  • 0
  • 0
  • 23913

article-image-build-reinforcement-learning-agent-in-keras-tutorial
Amey Varangaonkar
20 Aug 2018
6 min read
Save for later

Build your first Reinforcement learning agent in Keras [Tutorial]

Amey Varangaonkar
20 Aug 2018
6 min read
Today there are a variety of tools available at your disposal to develop and train your own Reinforcement learning agent. In this tutorial, we are going to learn about a Keras-RL agent called CartPole. We will go through this example because it won't consume your GPU, and your cloud budget to run. Also, this logic can be easily extended to other Atari problems. This article is an excerpt taken from the book Deep Learning Quick Reference, written by Mike Bernico. Let's talk quickly about the CartPole environment first: CartPole: The CartPole environment consists of a pole, balanced on a cart. The agent has to learn how to balance the pole vertically, while the cart underneath it moves. The agent is given the position of the cart, the velocity of the cart, the angle of the pole, and the rotational rate of the pole as inputs. The agent can apply a force on either side of the cart. If the pole falls more than 15 degrees from vertical, it's game over for our agent. The CartPole agent will use a fairly modest neural network that you should be able to train fairly quickly even without a GPU. We will start by looking at the model architecture. Then we will define the network's memory, exploration policy, and finally, train the agent. CartPole neural network architecture Three hidden layers with 16 neurons each are more than enough to solve this simple problem. We will use the following code to define the model: def build_model(state_size, num_actions): input = Input(shape=(1,state_size)) x = Flatten()(input) x = Dense(16, activation='relu')(x) x = Dense(16, activation='relu')(x) x = Dense(16, activation='relu')(x) output = Dense(num_actions, activation='linear')(x) model = Model(inputs=input, outputs=output) print(model.summary()) return model The input will be a 1 x state space vector and there will be an output neuron for each possible action that will predict the Q value of that action for each step. By taking the argmax of the outputs, we can choose the action with the highest Q value, but we don't have to do that ourselves as Keras-RL will do it for us. Keras-RL Memory Keras-RL provides us with a class called rl.memory.SequentialMemory that provides a fast and efficient data structure that we can store the agent's experiences in: memory = SequentialMemory(limit=50000, window_length=1) We need to specify a maximum size for this memory object, which is a hyperparameter. As new experiences are added to this memory and it becomes full, old experiences are forgotten. Keras-RL Policy Keras-RL provides an -greedy Q Policy called rl.policy.EpsGreedyQPolicy that we can use to balance exploration and exploitation. We can use rl.policy.LinearAnnealedPolicy to decay our  as the agent steps forward in the world, as shown in the following code: policy = LinearAnnealedPolicy(EpsGreedyQPolicy(), attr='eps', value_max=1., value_min=.1, value_test=.05, nb_steps=10000) Here we're saying that we want to start with a value of 1 for  and go no smaller than 0.1, while testing if our random number is less than 0.05. We set the number of steps between 1 and .1 to 10,000 and Keras-RL handles the decay math for us. Agent With a model, memory, and policy defined, we're now ready to create a deep Q network Agent and send that agent those objects. Keras-RL provides an agent class called rl.agents.dqn.DQNAgent that we can use for this, as shown in the following code: dqn = DQNAgent(model=model, nb_actions=num_actions, memory=memory, nb_steps_warmup=10, target_model_update=1e-2, policy=policy) dqn.compile(Adam(lr=1e-3), metrics=['mae']) Two of these parameters are probably unfamiliar at this point, target_model_update and nb_steps_warmup: nb_steps_warmup: Determines how long we wait before we start doing experience replay, which if you recall, is when we actually start training the network. This lets us build up enough experience to build a proper minibatch. If you choose a value for this parameter that's smaller than your batch size, Keras RL will sample with a replacement. target_model_update: The Q function is recursive and when the agent updates it's network for Q(s,a) that update also impacts the prediction it will make for Q(s', a). This can make for a very unstable network. The way most deep Q network implementations address this limitation is by using a target network, which is a copy of the deep Q network that isn't trained, but rather replaced with a fresh copy every so often. The target_model_update parameter controls how often this happens. Keras-RL Training Keras-RL provides several Keras-like callbacks that allow for convenient model checkpointing and logging. We will use both of those callbacks below. If you would like to see more of the callbacks Keras-RL provides, they can be found here: https://github.com/matthiasplappert/keras-rl/blob/master/rl/callbacks.py. You can also find a Callback class that you can use to create your own Keras-RL callbacks. We will use the following code to train our model: def build_callbacks(env_name): checkpoint_weights_filename = 'dqn_' + env_name + '_weights_{step}.h5f' log_filename = 'dqn_{}_log.json'.format(env_name) callbacks = [ModelIntervalCheckpoint(checkpoint_weights_filename, interval=5000)] callbacks += [FileLogger(log_filename, interval=100)] return callbacks callbacks = build_callbacks(ENV_NAME) dqn.fit(env, nb_steps=50000, visualize=False, verbose=2, callbacks=callbacks) Once the agent's callbacks are built, we can fit the DQNAgent by using a .fit() method. Take note of the visualize parameter in this example. If visualize were set to True, we would be able to watch the agent interact with the environment as we went. However, this significantly slows down the training. Results After the first 250 episodes, we will see that the total rewards for the episode approach 200 and the episode steps also approach 200. This means that the agent has learned to balance the pole on the cart until the environment ends at a maximum of 200 steps. It's of course fun to watch our success, so we can use the DQNAgent .test() method to evaluate for some number of episodes. The following code is used to define this method: dqn.test(env, nb_episodes=5, visualize=True) Here we've set visualize=True so we can watch our agent balance the pole, as shown in the following image: There we go, that's one balanced pole! Alright, I know, I'll admit that balancing a pole on a cart isn't all that cool, but it's a good enough demonstration of the process! Hopefully, you have now understood the dynamics behind the process, and as we discussed earlier, the solution to this problem can be applied to other similar game-based problems. If you found this article to be useful, make sure you check out the book Deep Learning Quick Reference to understand the other different types of reinforcement models you can build using Keras. Top 5 tools for reinforcement learning DeepCube: A new deep reinforcement learning approach solves the Rubik’s cube with no human help OpenAI builds reinforcement learning based system giving robots human like dexterity
Read more
  • 0
  • 0
  • 55776

article-image-app-metrics-analyze-http-traffic-errors-network-performance-net-core-app
Aaron Lazar
20 Aug 2018
12 min read
Save for later

Use App Metrics to analyze HTTP traffic, errors & network performance of a .NET Core app [Tutorial]

Aaron Lazar
20 Aug 2018
12 min read
App Metrics is an open source tool that can be plugged in with the ASP.NET Core applications. It provides real-time insights about how the application is performing and provides a complete overview of the application's health status. It provides metrics in a JSON format and integrates with the Grafana dashboards for visual reporting. App Metrics is based on .NET Standard and runs cross-platform. It provides various extensions and reporting dashboards that can run on Windows and Linux operating system as well. In this article, we will focus on App Metrics, analyse HTTP traffic, errors, and network performance in .NET Core. This tutorial is an extract from the book C# 7 and .NET Core 2.0 High Performance, authored by Ovais Mehboob Ahmed Khan. Setting up App Metrics with ASP.NET Core We can set up App Metrics in the ASP.NET Core application in three easy steps, which are as follows: Install App Metrics. App Metrics can be installed as NuGet packages. Here are the two packages that can be added through NuGet in your .NET Core project: Install-Package App.Metrics Install-Pacakge App.Metrics.AspnetCore.Mvc Add App Metrics in Program.cs. Add UseMetrics to Program.cs in the BuildWebHost method, as follows: public static IWebHost BuildWebHost(string[] args) => WebHost.CreateDefaultBuilder(args) .UseMetrics() .UseStartup<Startup>() .Build(); Add App Metrics in Startup.cs. Finally, we can add a metrics resource filter in the ConfigureServices method of the Startup class as follows: public void ConfigureServices(IServiceCollection services) { services.AddMvc(options => options.AddMetricsResourceFilter()); } Run your application. Build and run the application. We can test whether App Metrics is running well by using URLs, as shown in the following table. Just append the URL to the application's root URL: URL Description /metrics Shows metrics using the configured metrics formatter /metrics-text Shows metrics using the configured text formatter /env Shows environment information, which includes the operating system, machine name, assembly name, and version Appending /metrics or /metrics-text to the application's root URL gives complete information about application metrics. /metrics returns the JSON response that can be parsed and represented in a view with some custom parsing. Tracking middleware With App Metrics, we can manually define the typical web metrics which are essential to record telemetry information. However, for ASP.NET Core, there is a tracking middleware that can be used and configured in the project, which contains some built-in key metrics which are specific to the web application. Metrics that are recorded by the Tracking middleware are as follows: Apdex: This is used to monitor the user's satisfaction based on the overall performance of the application. Apdex is an open industry standard that measures the user's satisfaction based on the application's response time. We can configure the threshold of time, T, for each request cycle, and the metrics are calculated based on following conditions: User Satisfaction Description Satisfactory If the response time is less than or equal to the threshold time (T) Tolerating If the response time is between the threshold time (T) and 4 times that of the threshold time (T) in seconds Frustrating If the respo nse time is greater than 4 times that of the threshold time (T) Response times: This provides the overall throughput of the request being processed by the application and the duration it takes per route within the application. Active requests: This provides the list of active requests which have been received on the server in a particular amount of time. Errors: This provides the aggregated results of errors in a percentage that includes the overall error request rate, the overall count of each uncaught exception type, the total number of error requests per HTTP status code, and so on. POST and PUT sizes: This provides the request sizes for HTTP POST and PUT requests. Adding tracking middleware We can add tracking middleware as a NuGet package as follows: Install-Package App.Metrics.AspNetCore.Tracking Tracking middleware provides a set of middleware that is added to record telemetry for the specific metric. We can add the following middleware in the Configure method to measure performance metrics: app.UseMetricsApdexTrackingMiddleware(); app.UseMetricsRequestTrackingMiddleware(); app.UseMetricsErrorTrackingMiddleware(); app.UseMetricsActiveRequestMiddleware(); app.UseMetricsPostAndPutSizeTrackingMiddleware(); app.UseMetricsOAuth2TrackingMiddleware(); Alternatively, we can also use meta-pack middleware, which adds all the available tracking middleware so that we have information about all the different metrics which are in the preceding code: app.UseMetricsAllMiddleware(); Next, we will add tracking middleware in our ConfigureServices method as follows: services.AddMetricsTrackingMiddleware(); In the main Program.cs class, we will modify the BuildWebHost method and add the UseMetricsWebTracking method as follows: public static IWebHost BuildWebHost(string[] args) => WebHost.CreateDefaultBuilder(args) .UseMetrics() .UseMetricsWebTracking() .UseStartup<Startup>() .Build(); Setting up configuration Once the middleware is added, we need to set up the default threshold and other configuration values so that reporting can be generated accordingly. The web tracking properties can be configured in the appsettings.json file. Here is the content of the appsettings.json file that contains the MetricWebTrackingOptions JSON key: "MetricsWebTrackingOptions": { "ApdexTrackingEnabled": true, "ApdexTSeconds": 0.1, "IgnoredHttpStatusCodes": [ 404 ], "IgnoredRoutesRegexPatterns": [], "OAuth2TrackingEnabled": true }, ApdexTrackingEnabled is set to true so that the customer satisfaction report will be generated, and ApdexTSeconds is the threshold that decides whether the request response time was satisfactory, tolerating, or frustrating. IgnoredHttpStatusCodes contains the list of status codes that will be ignored if the response returns a 404 status. IgnoredRoutesRegexPatterns are used to ignore specific URIs that match the regular expression, and OAuth2TrackingEnabled can be set to monitor and record the metrics for each client and provide information specific to the request rate, error rate, and POST and PUT sizes for each client. Run the application and do some navigation. Appending /metrics-text in your application URL will display the complete report in textual format. Here is the sample snapshot of what textual metrics looks like: Adding visual reports There are various extensions and reporting plugins available that provide a visual reporting dashboard. Some of them are GrafanaCloud Hosted Metrics, InfluxDB, Prometheus, ElasticSearch, Graphite, HTTP, Console, and Text File. We will configure the InfluxDB extension and see how visual reporting can be achieved. Setting up InfluxDB InfluxDB is the open source time series database developed by Influx Data. It is written in the Go language and is widely used to store time series data for real-time analytics. Grafana is the server that provides reporting dashboards that can be viewed through a browser. InfluxDB can easily be imported as an extension in Grafana to display visual reporting from the InfluxDB database. Setting up the Windows subsystem for Linux In this section, we will set up InfluxDB on the Windows subsystem for the Linux operating system. First of all, we need to enable the Windows subsystem for Linux by executing the following command from the PowerShell as an Administrator: Enable-WindowsOptionalFeature -Online -FeatureName Microsoft-Windows-Subsystem-Linux After running the preceding command, restart your computer. Next, we will install Linux distro from the Microsoft store. In our case, we will install Ubuntu from the Microsoft Store. Go to the Microsoft Store, search for Ubuntu, and install it. Once the installation is done, click on Launch: This will open up the console window, which will ask you to create a user account for Linux OS (Operating System). Specify the username and password that will be used. Run the following command to update Ubuntu to the latest stable version from the bash shell. To run bash, open the command prompt, write bash, and hit Enter: Finally, it will ask you to create an Ubuntu username and password. Specify the username and password and hit enter. Installing InfluxDB Here, we will go through some steps to install the InfluxDB database in Ubuntu: To set up InfluxDB, open a command prompt in Administrator mode and run the bash shell. Execute the following commands to the InfluxDB data store on your local PC: $ curl -sL https://repos.influxdata.com/influxdb.key | sudo apt-key add - $ source /etc/lsb-release $ echo "deb https://repos.influxdata.com/${DISTRIB_ID,,} $ {DISTRIB_CODENAME} stable" | sudo tee /etc/apt/sources.list.d/influxdb.list Install InfluxDB by executing the following command: $ sudo apt-get update && sudo apt-get install influxdb Execute the following command to run InfluxDB: $ sudo influxd Start the InfluxDB shell by running the following command: $ sudo influx It will open up the shell where database-specific commands can be executed. Create a database by executing the following command. Specify a meaningful name for the database. In our case, it is appmetricsdb: > create database appmetricsdb Installing Grafana Grafana is an open source tool used to display dashboards in a web interface. There are various dashboards available that can be imported from the Grafana website to display real-time analytics. Grafana can simply be downloaded as a zip file from http://docs.grafana.org/installation/windows/. Once it is downloaded, we can start the Grafana server by clicking on the grafana-server.exe executable from the bin directory. Grafana provides a website that listens on port 3000. If the Grafana server is running, we can access the site by navigating to http://localhost:3000. Adding the InfluxDB dashboard There is an out-of-the-box InfluxDB dashboard available in Grafana which can be imported from the following link: https://grafana.com/dashboards/2125. Copy the dashboard ID and use this to import it into the Grafana website. We can import the InfluxDB dashboard by going to the Manage option on the Grafana website, as follows: From the Manage option, click on the + Dashboard button and hit the New Dashboard option. Clicking on Import Dashboard will lead to Grafana asking you for the dashboard ID: Paste the dashboard ID (for example, 2125) copied earlier into the box and hit Tab. The system will show the dashboard's details, and clicking on the Import button will import it into the system: Configuring InfluxDB We will now configure the InfluxDB dashboard and add a data source that connects to the database that we just created. To proceed, we will go to the Data Sources section on the Grafana website and click on the Add New Datasource option. Here is the configuration that adds the data source for the InfluxDB database: Modifying the Configure and ConfigureServices methods in Startup Up to now, we have set up Ubuntu and the InfluxDB database on our machine. We also set up the InfluxDB data source and added a dashboard through the Grafana website. Next, we will configure our ASP.NET Core web application to push real-time information to the InfluxDB database. Here is the modified ConfigureServices method that initializes the MetricsBuilder to define the attribute related to the application name, environment, and connection details: public void ConfigureServices(IServiceCollection services) { var metrics = new MetricsBuilder() .Configuration.Configure( options => { options.WithGlobalTags((globalTags, info) => { globalTags.Add("app", info.EntryAssemblyName); globalTags.Add("env", "stage"); }); }) .Report.ToInfluxDb( options => { options.InfluxDb.BaseUri = new Uri("http://127.0.0.1:8086"); options.InfluxDb.Database = "appmetricsdb"; options.HttpPolicy.Timeout = TimeSpan.FromSeconds(10); }) .Build(); services.AddMetrics(metrics); services.AddMetricsReportScheduler(); services.AddMetricsTrackingMiddleware(); services.AddMvc(options => options.AddMetricsResourceFilter()); } In the preceding code, we have set the application name app as the assembly name, and the environment env as the stage. http://127.0.0.1:8086 is the URL of the InfluxDB server that listens for the telemetry being pushed by the application. appmetricsdb is the database that we created in the preceding section. Then, we added the AddMetrics middleware and specified the metrics containing the configuration. AddMetricsTrackingMiddleware is used to track the web telemetry information which is displayed on the dashboard, and AddMetricsReportScheduled is used to push the telemetry information to the database. Here is the Configure method that contains UseMetricsAllMiddleware to use App Metrics. UseMetricsAllMiddleware adds all the middleware available in App Metrics: public void Configure(IApplicationBuilder app, IHostingEnvironment env) { if (env.IsDevelopment()) { app.UseBrowserLink(); app.UseDeveloperExceptionPage(); } else { app.UseExceptionHandler("/Error"); } app.UseStaticFiles(); app.UseMetricsAllMiddleware(); app.UseMvc(); } Rather than calling UseAllMetricsMiddleware, we can also add individual middleware explicitly based on the requirements. Here is the list of middleware that can be added: app.UseMetricsApdexTrackingMiddleware(); app.UseMetricsRequestTrackingMiddleware(); app.UseMetricsErrorTrackingMiddleware(); app.UseMetricsActiveRequestMiddleware(); app.UseMetricsPostAndPutSizeTrackingMiddleware(); app.UseMetricsOAuth2TrackingMiddleware(); Testing the ASP.NET Core App and reporting on the Grafana dashboard To test the ASP.NET Core application and to see visual reporting on the Grafana dashboard, we will go through following steps: Start the Grafana server by going to {installation_directory}\bin\grafana-server.exe. Start bash from the command prompt and run the sudo influx command. Start another bash from the command prompt and run the sudo influx command. Run the ASP.NET Core application. Access http://localhost:3000 and click on the App Metrics dashboard. This will start gathering telemetry information and will display the performance metrics, as shown in the following screenshots: The following graph shows the total throughput in Request Per Minute (RPM), error percentage, and active requests: Here is the Apdex score colorizing the user satisfaction into three different colors, where red is frustrating, orange is tolerating, and green is satisfactory. The following graph shows the blue line being drawn on the green bar, which means that the application performance is satisfactory: The following snapshot shows the throughput graph for all the requests being made, and each request has been colorized with the different colors: red, orange, and green. In this case, there are two HTTP GET requests for the about and contact us pages: Here is the response time graph showing the response time of both requests: If you liked this article and would like to learn more such techniques, go and pick up the full book, C# 7 and .NET Core 2.0 High Performance, authored by Ovais Mehboob Ahmed Khan. Get to know ASP.NET Core Web API [Tutorial] How to call an Azure function from an ASP.NET Core MVC application ASP.NET Core High Performance
Read more
  • 0
  • 0
  • 38171

article-image-best-practices-for-c-code-optimization-tutorial
Aaron Lazar
17 Aug 2018
9 min read
Save for later

Best practices for C# code optimization [Tutorial]

Aaron Lazar
17 Aug 2018
9 min read
There are many factors that negatively impact the performance of a .NET Core application. Sometimes these are minor things that were not considered earlier at the time of writing the code, and are not addressed by the accepted best practices. As a result, to solve these problems, programmers often resort to ad hoc solutions. However, when bad practices are combined together, they produce performance issues. It is always better to know the best practices that help developers write cleaner code and make the application performant. In this article, we will learn the following topics: Boxing and unboxing overhead String concatenation Exceptions handling for versus foreach Delegates This tutorial is an extract from the book, C# 7 and .NET Core 2.0 High Performance, authored by Ovais Mehboob Ahmed Khan. Boxing and unboxing overhead The boxing and unboxing methods are not always good to use and they negatively impact the performance of mission-critical applications. Boxing is a method of converting a value type to an object type, and is done implicitly, whereas unboxing is a method of converting an object type back to a value type and requires explicit casting. Let's go through an example where we have two methods executing a loop of 10 million records, and in each iteration, they are incrementing the counter by 1. The AvoidBoxingUnboxing method is using a primitive integer to initialize and increment it on each iteration, whereas the BoxingUnboxing method is boxing by assigning the numeric value to the object type first and then unboxing it on each iteration to convert it back to the integer type, as shown in the following code: private static void AvoidBoxingUnboxing() { Stopwatch watch = new Stopwatch(); watch.Start(); //Boxing int counter = 0; for (int i = 0; i < 1000000; i++) { //Unboxing counter = i + 1; } watch.Stop(); Console.WriteLine($"Time taken {watch.ElapsedMilliseconds}"); } private static void BoxingUnboxing() { Stopwatch watch = new Stopwatch(); watch.Start(); //Boxing object counter = 0; for (int i = 0; i < 1000000; i++) { //Unboxing counter = (int)i + 1; } watch.Stop(); Console.WriteLine($"Time taken {watch.ElapsedMilliseconds}"); } When we run both methods, we will clearly see the differences in performance. The BoxingUnboxing is executed seven times slower than the AvoidBoxingUnboxing method, as shown in the following screenshot: For mission-critical applications, it's always better to avoid boxing and unboxing. However, in .NET Core, we have many other types that internally use objects and perform boxing and unboxing. Most of the types under System.Collections and System.Collections.Specialized use objects and object arrays for internal storage, and when we store primitive types in these collections, they perform boxing and convert each primitive value to an object type, adding extra overhead and negatively impacting the performance of the application. Other types of System.Data, namely DateSet, DataTable, and DataRow, also use object arrays under the hood. Types under the System.Collections.Generic namespace or typed arrays are the best approaches to use when performance is the primary concern. For example, HashSet<T>, LinkedList<T>, and List<T> are all types of generic collections. For example, here is a program that stores the integer value in ArrayList: private static void AddValuesInArrayList() { Stopwatch watch = new Stopwatch(); watch.Start(); ArrayList arr = new ArrayList(); for (int i = 0; i < 1000000; i++) { arr.Add(i); } watch.Stop(); Console.WriteLine($"Total time taken is {watch.ElapsedMilliseconds}"); } Let's write another program that uses a generic list of the integer type: private static void AddValuesInGenericList() { Stopwatch watch = new Stopwatch(); watch.Start(); List<int> lst = new List<int>(); for (int i = 0; i < 1000000; i++) { lst.Add(i); } watch.Stop(); Console.WriteLine($"Total time taken is {watch.ElapsedMilliseconds}"); } When running both programs, the differences are pretty noticeable. The code with the generic list List<int> is over 10 times faster than the code with ArrayList. The result is as follows: String concatenation In .NET, strings are immutable objects. Two strings refer to the same memory on the heap until the string value is changed. If any of the string is changed, a new string is created on the heap and is allocated a new memory space. Immutable objects are generally thread safe and eliminate the race conditions between multiple threads. Any change in the string value creates and allocates a new object in memory and avoids producing conflicting scenarios with multiple threads. For example, let's initialize the string and assign the Hello World value to the  a string variable: String a = "Hello World"; Now, let's assign the  a string variable to another variable, b: String b = a; Both a and b point to the same value on the heap, as shown in the following diagram: Now, suppose we change the value of b to Hope this helps: b= "Hope this helps"; This will create another object on the heap, where a points to the same and b refers to the new memory space that contains the new text: With each change in the string, the object allocates a new memory space. In some cases, it may be an overkill scenario, where the frequency of string modification is higher and each modification is allocated a separate memory space, creates work for the garbage collector in collecting the unused objects and freeing up space. In such a scenario, it is highly recommended that you use the StringBuilder class. Exception handling Improper handling of exceptions also decreases the performance of an application. The following list contains some of the best practices in dealing with exceptions in .NET Core: Always use a specific exception type or a type that can catch the exception for the code you have written in the method. Using the Exception type for all cases is not a good practice. It is always a good practice to use try, catch, and finally block where the code can throw exceptions. The final block is usually used to clean up the resources, and returns a proper response that the calling code is expecting. In deeply nested code, don't use try catch block and handle it to the calling method or main method. Catching exceptions on multiple stacks slows down performance and is not recommended. Always use exceptions for fatal conditions that terminate the program. Using exceptions for noncritical conditions, such as converting the value to an integer or reading the value from an empty array, is not recommended and should be handled through custom logic. For example, converting a string value to the integer type can be done by using the Int32.Parse method rather than by using the Convert.ToInt32 method and then failing at a point when the string is not represented as a digit. While throwing an exception, add a meaningful message so that the user knows where that exception has actually occurred rather than going through the stack trace. For example, the following code shows a way of throwing an exception and adding a custom message based on the method and class being called: static string GetCountryDetails(Dictionary<string, string> countryDictionary, string key) { try { return countryDictionary[key]; } catch (KeyNotFoundException ex) { KeyNotFoundException argEx = new KeyNotFoundException(" Error occured while executing GetCountryDetails method. Cause: Key not found", ex); throw argEx; } } Throw exceptions rather than returning the custom messages or error codes and handle it in the main calling method. When logging exceptions, always check the inner exception and read the exception message or stack trace. It is helpful, and gives the actual point in the code where the error is thrown. For vs foreach For and foreach are two of the alternative ways of iterating over a list of items. Each of them operates in a different way. The for loop actually loads all the items of the list in memory first and then uses an indexer to iterate over each element, whereas foreach uses an enumerator and iterates until it reaches the end of the list. The following table shows the types of collections that are good to use for for and foreach: Type For/Foreach Typed array Good for both Array list Better with for Generic collections Better with for Delegates Delegates are a type in .NET which hold the reference to the method. The type is equivalent to the function pointer in C or C++. When defining a delegate, we can specify both the parameters that the method can take and its return type. This way, the reference methods will have the same signature. Here is a simple delegate that takes a string and returns an integer: delegate int Log(string n); Now, suppose we have a LogToConsole method that has the same signature as the one shown in the following code. This method takes the string and writes it to the console window: static int LogToConsole(string a) { Console.WriteLine(a); return 1; } We can initialize and use this delegate like this: Log logDelegate = LogToConsole; logDelegate ("This is a simple delegate call"); Suppose we have another method called LogToDatabase that writes the information in the database: static int LogToDatabase(string a) { Console.WriteLine(a); //Log to database return 1; } Here is the initialization of the new logDelegate instance that references the LogToDatabase method: Log logDelegateDatabase = LogToDatabase; logDelegateDatabase ("This is a simple delegate call"); The preceding delegate is the representation of unicast delegates, as each instance refers to a single method. On the other hand, we can also create multicast delegates by assigning  LogToDatabase to the same LogDelegate instance, as follows: Log logDelegate = LogToConsole; logDelegate += LogToDatabase; logDelegate("This is a simple delegate call"); The preceding code seems pretty straightforward and optimized, but under the hood, it has a huge performance overhead. In .NET, delegates are implemented by a MutlicastDelegate class that is optimized to run unicast delegates. It stores the reference of the method to the target property and calls the method directly. For multicast delegates, it uses the invocation list, which is a generic list, and holds the references to each method that is added. With multicast delegates, each target property holds the reference to the generic list that contains the method and executes in sequence. However, this adds an overhead for multicast delegates and takes more time to execute. If you liked this article and would like to learn more such techniques, grab this book, C# 7 and .NET Core 2.0 High Performance, authored by Ovais Mehboob Ahmed Khan. Behavior Scripting in C# and Javascript for game developers Exciting New Features in C# 8.0 Exploring Language Improvements in C# 7.2 and 7.3
Read more
  • 0
  • 1
  • 26997
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime
article-image-rust-for-web-development-tutorial
Aaron Lazar
17 Aug 2018
13 min read
Save for later

Use Rust for web development [Tutorial]

Aaron Lazar
17 Aug 2018
13 min read
You might think that Rust is only meant to be used for complex system development, or that it should be used where security is the number one concern. Thinking of using it for web development might sound to you like huge overkill. We already have proven web-oriented languages that have worked until now, such as PHP or JavaScript, right? This is far from true. Many projects use the web as their platform and for them, it's sometimes more important to be able to receive a lot of traffic without investing in expensive servers rather than using legacy technologies, especially in new products. This is where Rust comes in handy. Thanks to its speed and some really well thought out web-oriented frameworks, Rust performs even better than the legacy web programming languages. In this tutorial, we'll see how Rust can be used for Web Development. This article is an extract from Rust High Performance, authored by Iban Eguia Moraza. Rust is even trying to replace some of the JavaScript on the client side of applications, since Rust can compile to WebAssembly, making it extremely powerful for heavy client-side web workloads. Creating extremely efficient web templates We have seen that Rust is a really efficient language and metaprogramming allows for the creation of even more efficient code. Rust has great templating language support, such as Handlebars and Tera. Rust's Handlebars implementation is much faster than the JavaScript implementation, while Tera is a template engine created for Rust based on Jinja2. In both cases, you define a template file and then you use Rust to parse it. Even though this will be reasonable for most web development, in some cases, it might be slower than pure Rust alternatives. This is where the Maud crate comes in. We will see how it works and how it achieves orders of magnitude faster performance than its counterparts. To use Maud, you will need nightly Rust, since it uses procedural macros. As we saw in previous chapters, if you are using rustup you can simply run rustup override set nightly. Then, you will need to add Maud to your Cargo.toml file in the [dependencies] section: [dependencies] maud = "0.17.2 Maud brings an html!{} procedural macro that enables you to write HTML in Rust. You will, therefore, need to import the necessary crate and macro in your main.rs or lib.rs file, as you will see in the following code. Remember to also add the procedural macro feature at the beginning of the crate: #![feature(proc_macro)] extern crate maud; use maud::html; You will now be able to use the html!{} macro in your main() function. This macro will return a Markup object, which you can then convert to a String or return to Rocket or Iron for your website implementation (you will need to use the relevant Maud features in that case). Let's see what a short template implementation looks like: fn main() { use maud::PreEscaped; let user_name = "FooBar"; let markup = html! { (PreEscaped("<!DOCTYPE html>")) html { head { title { "Test website" } meta charset="UTF-8"; } body { header { nav { ul { li { "Home" } li { "Contact Us" } } } } main { h1 { "Welcome to our test template!" } p { "Hello, " (user_name) "!" } } footer { p { "Copyright © 2017 - someone" } } } } }; println!("{}", markup.into_string()); } It seems like a complex template, but it contains just the basic information a new website should have. We first add a doctype, making sure it will not escape the content (that is what the PreEscaped is for) and then we start the HTML document with two parts: the head and the body. In the head, we add the required title and the charset meta element to tell the browser that we will be using UTF-8. Then, the body contains the three usual sections, even though this can, of course, be modified. One header, one main section, and one footer. I added some example information in each of the sections and showed you how to add a dynamic variable in the main section inside a paragraph. The interesting syntax here is that you can create elements with attributes, such as the meta element, even without content, by finishing it early with a semicolon. You can use any HTML tag and add variables. The generated code will be escaped, except if you ask for non-escaped data, and it will be minified so that it occupies the least space when being transmitted. Inside the parentheses, you can call any function or variable that returns a type that implements the Display trait and you can even add any Rust code if you add braces around it, with the last statement returning a Display element. This works on attributes too. This gets processed at compile time, so that at runtime it will only need to perform the minimum possible amount of work, making it extremely efficient. And not only that; the template will be typesafe thanks to Rust's compile-time guarantees, so you won't forget to close a tag or an attribute. There is a complete guide to the templating engine that can be found at https://maud.lambda.xyz/. Connecting with a database If we want to use SQL/relational databases in Rust, there is no other crate to think about than Diesel. If you need access to NoSQL databases such as Redis or MongoDB, you will also find proper crates, but since the most used databases are relational databases, we will check Diesel here. Diesel makes working with MySQL/MariaDB, PostgreSQL, and SQLite very easy by providing a great ORM and typesafe query builder. It prevents all potential SQL injections at compile time, but is still extremely fast. In fact, it's usually faster than using prepared statements, due to the way it manages connections to databases. Without entering into technical details, we will check how this stable framework works. The development of Diesel has been impressive and it's already working in stable Rust. It even has a stable 1.x version, so let's check how we can map a simple table. Diesel comes with a command-line interface program, which makes it much easier to use. To install it, run cargo install diesel_cli. Note that, by default, this will try to install it for PostgreSQL, MariaDB/MySQL, and SQLite. For this short tutorial, you need to have SQLite 3 development files installed, but if you want to avoid installing all MariaDB/MySQL or PostgreSQL files, you should run the following command: cargo install --no-default-features --features sqlite diesel_cli Then, since we will be using SQLite for our short test, add a file named .env to the current directory, with the following content: DATABASE_URL=test.sqlite We can now run diesel setup and diesel migration generate initial_schema. This will create the test.sqlite SQLite database and a migrations folder, with the first empty initial schema migration. Let's add this to the initial schema up.sql file: CREATE TABLE 'users' ( 'username' TEXT NOT NULL PRIMARY KEY, 'password' TEXT NOT NULL, 'email' TEXT UNIQUE ); In its counterpart down.sql file, we will need to drop the created table: DROP TABLE `users`; Then, we can execute diesel migration run and check that everything went smoothly. We can execute diesel migration redo to check that the rollback and recreation worked properly. We can now start using the ORM. We will need to add diesel, diesel_infer_schema, and dotenv to our Cargo.toml. The dotenv crate will read the .env file to generate the environment variables. If you want to avoid using all the MariaDB/MySQL or PostgreSQL features, you will need to configure diesel for it: [dependencies] dotenv = "0.10.1" [dependencies.diesel] version = "1.1.1" default-features = false features = ["sqlite"] [dependencies.diesel_infer_schema] version = "1.1.0" default-features = false features = ["sqlite"] Let's now create a structure that we will be able to use to retrieve data from the database. We will also need some boilerplate code to make everything work: #[macro_use] extern crate diesel; #[macro_use] extern crate diesel_infer_schema; extern crate dotenv; use diesel::prelude::*; use diesel::sqlite::SqliteConnection; use dotenv::dotenv; use std::env; #[derive(Debug, Queryable)] struct User { username: String, password: String, email: Option<String>, } fn establish_connection() -> SqliteConnection { dotenv().ok(); let database_url = env::var("DATABASE_URL") .expect("DATABASE_URL must be set"); SqliteConnection::establish(&database_url) .expect(&format!("error connecting to {}", database_url)) } mod schema { infer_schema!("dotenv:DATABASE_URL"); } Here, the establish_connection() function will call dotenv() so that the variables in the .env file get to the environment, and then it uses that DATABASE_URL variable to establish the connection with the SQLite database and returns the handle. The schema module will contain the schema of the database. The infer_schema!() macro will get the DATABASE_URL variable and connect to the database at compile time to generate the schema. Make sure you run all the migrations before compiling. We can now develop a small main() function with the basics to list all of the users from the database: fn main() { use schema::users::dsl::*; let connection = establish_connection(); let all_users = users .load::<User>(&connection) .expect("error loading users"); println!("{:?}", all_users); } This will just load all of the users from the database into a list. Notice the use statement at the beginning of the function. This retrieves the required information from the schema for the users table so that we can then call users.load(). As you can see in the guides at diesel.rs, you can also generate Insertable objects, which might not have some of the fields with default values, and you can perform complex queries by filtering the results in the same way you would write a SELECT statement. Creating a complete web server There are multiple web frameworks for Rust. Some of them work in stable Rust, such as Iron and Nickel Frameworks, and some don't, such as Rocket. We will talk about the latter since, even if it forces you to use the latest nightly branch, it's so much more powerful than the rest that it really makes no sense to use any of the others if you have the option to use Rust nightly. Using Diesel with Rocket, apart from the funny wordplay joke, works seamlessly. You will probably be using the two of them together, but in this section, we will learn how to create a small Rocket server without any further complexity. There are some boilerplate code implementations that add a database, cache, OAuth, templating, response compression, JavaScript minification, and SASS minification to the website, such as my Rust web template in GitHub if you need to start developing a real-life Rust web application. Rocket trades that nightly instability, which will break your code more often than not, for simplicity and performance. Developing a Rocket application is really easy and the performance of the results is astonishing. It's even faster than using some other, seemingly simpler frameworks, and of course, it's much faster than most of the frameworks in other languages. So, how does it feel to develop a Rocket application? We start by adding the latest rocket and rocket_codegen crates to our Cargo.toml file and adding a nightly override to our current directory by running rustup override set nightly. The rocket crate contains all the code to run the server, while the rocket_codegen crate is actually a compiler plugin that modifies the language to adapt it for web development. We can now write the default Hello, world! Rocket example: #![feature(plugin)] #![plugin(rocket_codegen)] extern crate rocket; #[get("/")] fn index() -> &'static str { "Hello, world!" } fn main() { rocket::ignite().mount("/", routes![index]).launch(); } In this example, we can see how we ask Rust to let us use plugins to then import the rocket_codegen plugin. This will enable us to use attributes such as #[get] or #[post] with request information that will generate boilerplate code when compiled, leaving our code fairly simple for our development. Also, note that this code has been checked with Rocket 0.3 and it might fail in a future version, since the library is not stable yet. In this case, you can see that the index() function will respond to any GET request with a base URL. This can be modified to accept only certain URLs or to get the path of something from the URL. You can also have overlapping routes with different priorities so that if one is not taken for a request guard, the next will be tried. And, talking about request guards, you can create objects that can be generated when processing a request that will only let the request process a given function if they are properly built. This means that you can, for example, create a User object that will get generated by checking the cookies in the request and comparing them in a Redis database, only allowing the execution of the function for logged-in users. This easily prevents many logic flaws. The main() function ignites the Rocket and mounts the index route at /. This means that you can have multiple routes with the same path mounted at different route paths and they do not need to know about the whole path in the URL. In the end, it will launch the Rocket server and if you run it with cargo run, it will show the following: If you go to the URL, you will see the Hello, World! message. Rocket is highly configurable. It has a rocket_contrib crate which offers templates and further features, and you can create responders to add GZip compression to responses. You can also create your own error responders when an error occurs. You can also configure the behavior of Rocket by using the Rocket.toml file and environment variables. As you can see in this last output, it is running in development mode, which adds some debugging information. You can configure different behaviors for staging and production modes and make them perform faster. Also, make sure that you compile the code in --release mode in production. If you want to develop a web application in Rocket, make sure you check https://rocket.rs/ for further information. Future releases also look promising. Rocket will implement native CSRF and XSS prevention, which, in theory, should prevent all XSS and CSRF attacks at compile time. It will also make further customizations to the engine possible. If you found this article useful and would like to learn more such tips, head over to pick up the book, Rust High Performance, authored by Iban Eguia Moraza. Mozilla is building a bridge between Rust and JavaScript Perform Advanced Programming with Rust Say hello to Sequoia: a new Rust based OpenPGP library to secure your apps
Read more
  • 0
  • 0
  • 20788

article-image-task-parallel-library-multi-threading-net-core
Aaron Lazar
16 Aug 2018
11 min read
Save for later

Task parallel library for easy multi-threading in .NET Core [Tutorial]

Aaron Lazar
16 Aug 2018
11 min read
Compared to the classic threading model in .NET, Task Parallel Library minimizes the complexity of using threads and provides an abstraction through a set of APIs that help developers focus more on the application program instead of focusing on how the threads will be provisioned. In this article, we'll learn how TPL benefits of using traditional threading techniques for concurrency and high performance. There are several benefits of using TPL over threads: It autoscales the concurrency to a multicore level It autoscales LINQ queries to a multicore level It handles the partitioning of the work and uses ThreadPool where required It is easy to use and reduces the complexity of working with threads directly This tutorial is an extract from the book, C# 7 and .NET Core 2.0 High Performance, authored by Ovais Mehboob Ahmed Khan. Creating a task using TPL TPL APIs are available in the System.Threading and System.Threading.Tasks namespaces. They work around the task, which is a program or a block of code that runs asynchronously. An asynchronous task can be run by calling either the Task.Run or TaskFactory.StartNew methods. When we create a task, we provide a named delegate, anonymous method, or a lambda expression that the task executes. Here is a code snippet that uses a lambda expression to execute the ExecuteLongRunningTasksmethod using Task.Run: class Program { static void Main(string[] args) { Task t = Task.Run(()=>ExecuteLongRunningTask(5000)); t.Wait(); } public static void ExecuteLongRunningTask(int millis) { Thread.Sleep(millis); Console.WriteLine("Hello World"); } } In the preceding code snippet, we have executed the ExecuteLongRunningTask method asynchronously using the Task.Run method. The Task.Run method returns the Task object that can be used to further wait for the asynchronous piece of code to be executed completely before the program ends. To wait for the task, we have used the Wait method. Alternatively, we can also use the Task.Factory.StartNew method, which is more advanced and provides more options. While calling the Task.Factory.StartNew method, we can specify CancellationToken, TaskCreationOptions, and TaskScheduler to set the state, specify other options, and schedule tasks. TPL uses multiple cores of the CPU out of the box. When the task is executed using the TPL API, it automatically splits the task into one or more threads and utilizes multiple processors, if they are available. The decision as to how many threads will be created is calculated at runtime by CLR. Whereas a thread only has an affinity to a single processor, running any task on multiple processors needs a proper manual implementation. Task-based asynchronous pattern (TAP) When developing any software, it is always good to implement the best practices while designing its architecture. The task-based asynchronous pattern is one of the recommended patterns that can be used when working with TPL. There are, however, a few things to bear in mind while implementing TAP. Naming convention The method executing asynchronously should have the naming suffix Async. For example, if the method name starts with ExecuteLongRunningOperation, it should have the suffix Async, with the resulting name of ExecuteLongRunningOperationAsync. Return type The method signature should return either a System.Threading.Tasks.Task or System.Threading.Tasks.Task<TResult>. The task's return type is equivalent to the method that returns void, whereas TResult is the data type. Parameters The out and ref parameters are not allowed as parameters in the method signature. If multiple values need to be returned, tuples or a custom data structure can be used. The method should always return Task or Task<TResult>, as discussed previously. Here are a few signatures for both synchronous and asynchronous methods: Synchronous methodAsynchronous methodVoid Execute();Task ExecuteAsync();List<string> GetCountries();Task<List<string>> GetCountriesAsync();Tuple<int, string> GetState(int stateID);Task<Tuple<int, string>> GetStateAsync(int stateID);Person GetPerson(int personID);Task<Person> GetPersonAsync(int personID); Exceptions The asynchronous method should always throw exceptions that are assigned to the returning task. However, the usage errors, such as passing null parameters to the asynchronous method, should be properly handled. Let's suppose we want to generate several documents dynamically based on a predefined templates list, where each template populates the placeholders with dynamic values and writes it on the filesystem. We assume that this operation will take a sufficient amount of time to generate a document for each template. Here is a code snippet showing how the exceptions can be handled: static void Main(string[] args) { List<Template> templates = GetTemplates(); IEnumerable<Task> asyncDocs = from template in templates select GenerateDocumentAsync(template); try { Task.WaitAll(asyncDocs.ToArray()); }catch(Exception ex) { Console.WriteLine(ex); } Console.Read(); } private static async Task<int> GenerateDocumentAsync(Template template) { //To automate long running operation Thread.Sleep(3000); //Throwing exception intentionally throw new Exception(); } In the preceding code, we have a GenerateDocumentAsync method that performs a long running operation, such as reading the template from the database, populating placeholders, and writing a document to the filesystem. To automate this process, we used Thread.Sleep to sleep the thread for three seconds and then throw an exception that will be propagated to the calling method. The Main method loops the templates list and calls the GenerateDocumentAsync method for each template. Each GenerateDocumentAsync method returns a task. When calling an asynchronous method, the exception is actually hidden until the Wait, WaitAll, WhenAll, and other methods are called. In the preceding example, the exception will be thrown once the Task.WaitAll method is called, and will log the exception on the console. Task status The task object provides a TaskStatus that is used to know whether the task is executing the method running, has completed the method, has encountered a fault, or whether some other occurrence has taken place. The task initialized using Task.Run initially has the status of Created, but when the Start method is called, its status is changed to Running. When applying the TAP pattern, all the methods return the Task object, and whether they are using the Task.Run inside, the method body should be activated. That means that the status should be anything other than Created. The TAP pattern ensures the consumer that the task is activated and the starting task is not required. Task cancellation Cancellation is an optional thing for TAP-based asynchronous methods. If the method accepts the CancellationToken as the parameter, it can be used by the caller party to cancel a task. However, for a TAP, the cancellation should be properly handled. Here is a basic example showing how cancellation can be implemented: static void Main(string[] args) { CancellationTokenSource tokenSource = new CancellationTokenSource(); CancellationToken token = tokenSource.Token; Task.Factory.StartNew(() => SaveFileAsync(path, bytes, token)); } static Task<int> SaveFileAsync(string path, byte[] fileBytes, CancellationToken cancellationToken) { if (cancellationToken.IsCancellationRequested) { Console.WriteLine("Cancellation is requested..."); cancellationToken.ThrowIfCancellationRequested } //Do some file save operation File.WriteAllBytes(path, fileBytes); return Task.FromResult<int>(0); } In the preceding code, we have a SaveFileAsync method that takes the byte array and the CancellationToken as parameters. In the Main method, we initialize the CancellationTokenSource that can be used to cancel the asynchronous operation later in the program. To test the cancellation scenario, we will just call the Cancel method of the tokenSource after the Task.Factory.StartNew method and the operation will be canceled. Moreover, when the task is canceled, its status is set to Cancelled and the IsCompleted property is set to true. Task progress reporting With TPL, we can use the IProgress<T> interface to get real-time progress notifications from the asynchronous operations. This can be used in scenarios where we need to update the user interface or the console app of asynchronous operations. When defining the TAP-based asynchronous methods, defining IProgress<T> in a parameter is optional. We can have overloaded methods that can help consumers to use in the case of specific needs. However, they should only be used if the asynchronous method supports them.  Here is the modified version of SaveFileAsync that updates the user about the real progress: static void Main(string[] args) { var progressHandler = new Progress<string>(value => { Console.WriteLine(value); }); var progress = progressHandler as IProgress<string>; CancellationTokenSource tokenSource = new CancellationTokenSource(); CancellationToken token = tokenSource.Token; Task.Factory.StartNew(() => SaveFileAsync(path, bytes, token, progress)); Console.Read(); } static Task<int> SaveFileAsync(string path, byte[] fileBytes, CancellationToken cancellationToken, IProgress<string> progress) { if (cancellationToken.IsCancellationRequested) { progress.Report("Cancellation is called"); Console.WriteLine("Cancellation is requested..."); } progress.Report("Saving File"); File.WriteAllBytes(path, fileBytes); progress.Report("File Saved"); return Task.FromResult<int>(0); } Implementing TAP using compilers Any method that is attributed with the async keyword (for C#) or Async for (Visual Basic) is called an asynchronous method. The async keyword can be applied to a method, anonymous method, or a Lambda expression, and the language compiler can execute that task asynchronously. Here is a simple implementation of the TAP method using the compiler approach: static void Main(string[] args) { var t = ExecuteLongRunningOperationAsync(100000); Console.WriteLine("Called ExecuteLongRunningOperationAsync method, now waiting for it to complete"); t.Wait(); Console.Read(); } public static async Task<int> ExecuteLongRunningOperationAsync(int millis) { Task t = Task.Factory.StartNew(() => RunLoopAsync(millis)); await t; Console.WriteLine("Executed RunLoopAsync method"); return 0; } public static void RunLoopAsync(int millis) { Console.WriteLine("Inside RunLoopAsync method"); for(int i=0;i< millis; i++) { Debug.WriteLine($"Counter = {i}"); } Console.WriteLine("Exiting RunLoopAsync method"); } In the preceding code, we have the ExecuteLongRunningOperationAsync method, which is implemented as per the compiler approach. It calls the RunLoopAsync that executes a loop for a certain number of milliseconds that is passed in the parameter. The async keyword on the ExecuteLongRunningOperationAsync method actually tells the compiler that this method has to be executed asynchronously, and, once the await statement is reached, the method returns to the Main method that writes the line on a console and waits for the task to be completed. Once the RunLoopAsync is executed, the control comes back to await and starts executing the next statements in the ExecuteLongRunningOperationAsync method. Implementing TAP with greater control over Task As we know, that the TPL is centered on the Task and Task<TResult> objects. We can execute an asynchronous task by calling the Task.Run method and execute a delegate method or a block of code asynchronously and use Wait or other methods on that task. However, this approach is not always adequate, and there are scenarios where we may have different approaches to executing asynchronous operations, and we may use an Event-based Asynchronous Pattern (EAP) or an Asynchronous Programming Model (APM). To implement TAP principles here, and to get the same control over asynchronous operations executing with different models, we can use the TaskCompletionSource<TResult> object. The TaskCompletionSource<TResult> object is used to create a task that executes an asynchronous operation. When the asynchronous operation completes, we can use the TaskCompletionSource<TResult> object to set the result, exception, or state of the task. Here is a basic example that executes the ExecuteTask method that returns Task, where the ExecuteTask method uses the TaskCompletionSource<TResult> object to wrap the response as a Task and executes the ExecuteLongRunningTask through the Task.StartNew method: static void Main(string[] args) { var t = ExecuteTask(); t.Wait(); Console.Read(); } public static Task<int> ExecuteTask() { var tcs = new TaskCompletionSource<int>(); Task<int> t1 = tcs.Task; Task.Factory.StartNew(() => { try { ExecuteLongRunningTask(10000); tcs.SetResult(1); }catch(Exception ex) { tcs.SetException(ex); } }); return tcs.Task; } public static void ExecuteLongRunningTask(int millis) { Thread.Sleep(millis); Console.WriteLine("Executed"); } So now, we've been able to use TPL and TAP over traditional threads, thus improving performance. If you liked this article and would like to learn more such techniques, pick up this book, C# 7 and .NET Core 2.0 High Performance, authored by Ovais Mehboob Ahmed Khan. Get to know ASP.NET Core Web API [Tutorial] .NET Core completes move to the new compiler – RyuJIT Applying Single Responsibility principle from SOLID in .NET Core
Read more
  • 0
  • 0
  • 33211

article-image-f-for-net-core-application-development-tutorial
Aaron Lazar
16 Aug 2018
17 min read
Save for later

Getting started with F# for .Net Core application development [Tutorial]

Aaron Lazar
16 Aug 2018
17 min read
F# is Microsoft's purely functional programming language, that can be used along with the .NET Core framework. In this article, we will get introduced to F# to leverage .NET Core for our application development. This article is extracted from the book, .NET Core 2.0 By Example, written by Rishabh Verma and Neha Shrivastava. Basics of classes Classes are types of object which can contain functions, properties, and events. An F# class must have a parameter and a function attached like a member. Both properties and functions can use the member keyword. The following is the class definition syntax: type [access-modifier] type-name [type-params] [access-modifier] (parameter-list) [ as identifier ] = [ class ] [ inherit base-type-name(base-constructor-args) ] [ let-bindings ] [ do-bindings ] member-list [ end ] // Mutually recursive class definitions: type [access-modifier] type-name1 ... and [access-modifier] type-name2 ... Let’s discuss the preceding syntax for class declaration: type: In the F# language, class definition starts with a type keyword. access-modifier: The F# language supports three access modifiers—public, private, and internal. By default, it considers the public modifier if no other access modifier is provided. The Protected keyword is not used in the F# language, and the reason is that it will become object oriented rather than functional programming. For example, F# usually calls a member using a lambda expression and if we make a member type protected and call an object of a different instance, it will not work. type-name: It is any of the previously mentioned valid identifiers; the default access modifier is public. type-params: It defines optional generic type parameters. parameter-list: It defines constructor parameters; the default access modifier for the primary constructor is public. identifier: It is used with the optional as keyword, the as keyword gives a name to an instance variable which can be used in the type definition to refer to the instance of the type. Inherit: This keyword allows us to specify the base class for a class. let-bindings: This is used to declare fields or function values in the context of a class. do-bindings: This is useful for the execution of code to create an object member-list: The member-list comprises extra constructors, instance and static method declarations, abstract bindings, interface declarations, and event and property declarations. Here is an example of a class: type StudentName(firstName,lastName) = member this.FirstName = firstName member this.LastName = lastName In the previous example, we have not defined the parameter type. By default, the program considers it as a string value but we can explicitly define a data type, as follows: type StudentName(firstName:string,lastName:string) = member this.FirstName = firstName member this.LastName = lastName Constructor of a class In F#, the constructor works in a different way to any other .NET language. The constructor creates an instance of a class. A parameter list defines the arguments of the primary constructor and class. The constructor contains let and do bindings, which we will discuss next. We can add multiple constructors, apart from the primary constructor, using the new keyword and it must invoke the primary constructor, which is defined with the class declaration. The syntax of defining a new constructor is as shown: new (argument-list) = constructor-body Here is an example to explain the concept. In the following code, the StudentDetail class has two constructors: a primary constructor that takes two arguments and another constructor that takes no arguments: type StudentDetail(x: int, y: int) = do printfn "%d %d" x y new() = StudentDetail(0, 0) A let and do binding A let and do binding creates the primary constructor of a class and runs when an instance of a class is created. A function is compiled into a member if it has a let binding. If the let binding is a value which is not used in any function or member, then it is compiled into a local variable of a constructor; otherwise, it is compiled into a field of the class. The do expression executes the initialized code. As any extra constructors always call the primary constructor, let and do bindings always execute, irrespective of which constructor is called. Fields that are created by let bindings can be accessed through the methods and properties of the class, though they cannot be accessed from static methods, even if the static methods take an instance variable as a parameter: type Student(name) as self = let data = name do self.PrintMessage() member this.PrintMessage() = printf " Student name is %s" data Generic type parameters F# also supports a generic parameter type. We can specify multiple generic type parameters separated by a comma. The syntax of a generic parameter declaration is as follows: type MyGenericClassExample<'a> (x: 'a) = do printfn "%A" x The type of the parameter infers where it is used. In the following code, we call the MyGenericClassExample method and pass a sequence of tuples, so here the parameter type became a sequence of tuples: let g1 = MyGenericClassExample( seq { for i in 1 .. 10 -> (i, i*i) } ) Properties Values related to an object are represented by properties. In object-oriented programming, properties represent data associated with an instance of an object. The following snippet shows two types of property syntax: // Property that has both get and set defined. [ attributes ] [ static ] member [accessibility-modifier] [self- identifier.]PropertyName with [accessibility-modifier] get() = get-function-body and [accessibility-modifier] set parameter = set-function-body // Alternative syntax for a property that has get and set. [ attributes-for-get ] [ static ] member [accessibility-modifier-for-get] [self-identifier.]PropertyName = get-function-body [ attributes-for-set ] [ static ] member [accessibility-modifier-for-set] [self- identifier.]PropertyName with set parameter = set-function-body There are two kinds of property declaration: Explicitly specify the value: We should use the explicit way to implement the property if it has non-trivial implementation. We should use a member keyword for the explicit property declaration. Automatically generate the value: We should use this when the property is just a simple wrapper for a value. There are many ways of implementing an explicit property syntax based on need: Read-only: Only the get() method Write-only: Only the set() method Read/write: Both get() and set() methods An example is shown as follows: // A read-only property. member this.MyReadOnlyProperty = myInternalValue // A write-only property. member this.MyWriteOnlyProperty with set (value) = myInternalValue <- value // A read-write property. member this.MyReadWriteProperty with get () = myInternalValue and set (value) = myInternalValue <- value Backing stores are private values that contain data for properties. The keyword, member val instructs the compiler to create backing stores automatically and then gives an expression to initialize the property. The F# language supports immutable types, but if we want to make a property mutable, we should use get and set. As shown in the following example, the MyClassExample class has two properties: propExample1 is read-only and is initialized to the argument provided to the primary constructor, and propExample2 is a settable property initialized with a string value ".Net Core 2.0": type MyClassExample(propExample1 : int) = member val propExample1 = property1 member val propExample2 = ".Net Core 2.0" with get, set Automatically implemented properties don't work efficiently with some libraries, for example, Entity Framework. In these cases, we should use explicit properties. Static and instance properties There can be further categorization of properties as static or instance properties. Static, as the name suggests, can be invoked without any instance. The self-identifier is neglected by the static property while it is necessary for the instance property. The following is an example of the static property: static member MyStaticProperty with get() = myStaticValue and set(value) = myStaticValue <- value Abstract properties Abstract properties have no implementation and are fully abstract. They can be virtual. It should not be private and if one accessor is abstract all others must be abstract. The following is an example of the abstract property and how to use it: // Abstract property in abstract class. // The property is an int type that has a get and // set method [<AbstractClass>] type AbstractBase() = abstract Property1 : int with get, set // Implementation of the abstract property type Derived1() = inherit AbstractBase() let mutable value = 10 override this.Property1 with get() = value and set(v : int) = value <- v // A type with a "virtual" property. type Base1() = let mutable value = 10 abstract Property1 : int with get, set default this.Property1 with get() = value and set(v : int) = value <- v // A derived type that overrides the virtual property type Derived2() = inherit Base1() let mutable value2 = 11 override this.Property1 with get() = value2 and set(v) = value2 <- v Inheritance and casts In F#, the inherit keyword is used while declaring a class. The following is the syntax: type MyDerived(...) = inherit MyBase(...) In a derived class, we can access all methods and members of the base class, but it should not be a private member. To refer to base class instances in the F# language, the base keyword is used. Virtual methods and overrides  In F#, the abstract keyword is used to declare a virtual member. So, here we can write a complete definition of the member as we use abstract for virtual. F# is not similar to other .NET languages. Let's have a look at the following example: type MyClassExampleBase() = let mutable x = 0 abstract member virtualMethodExample : int -> int default u. virtualMethodExample (a : int) = x <- x + a; x type MyClassExampleDerived() = inherit MyClassExampleBase () override u. virtualMethodExample (a: int) = a + 1 In the previous example, we declared a virtual method, virtualMethodExample, in a base class, MyClassExampleBase, and overrode it in a derived class, MyClassExampleDerived. Constructors and inheritance An inherited class constructor must be called in a derived class. If a base class constructor contains some arguments, then it takes parameters of the derived class as input. In the following example, we will see how derived class arguments are passed in the base class constructor with inheritance: type MyClassBase2(x: int) = let mutable z = x * x do for i in 1..z do printf "%d " i type MyClassDerived2(y: int) = inherit MyClassBase2(y * 2) do for i in 1..y do printf "%d " i If a class has multiple constructors, such as new(str) or new(), and this class is inherited in a derived class, we can use a base class constructor to assign values. For example, DerivedClass, which inherits BaseClass, has new(str1,str2), and in place of the first string, we pass inherit BaseClass(str1). Similarly for blank, we wrote inherit BaseClass(). Let's explore the following example for more detail: type BaseClass = val string1 : string new (str) = { string1 = str } new () = { string1 = "" } type DerivedClass = inherit BaseClass val string2 : string new (str1, str2) = { inherit BaseClass(str1); string2 = str2 } new (str2) = { inherit BaseClass(); string2 = str2 } let obj1 = DerivedClass("A", "B") let obj2 = DerivedClass("A") Functions and lambda expressions A lambda expression is one kind of anonymous function, which means it doesn't have a name attached to it. But if we want to create a function which can be called, we can use the fun keyword with a lambda expression. We can pass the input parameter in the lambda function, which is created using the fun keyword. This function is quite similar to a normal F# function. Let's see a normal F# function and a lambda function: // Normal F# function let addNumbers a b = a+b // Evaluating values let sumResult = addNumbers 5 6 // Lambda function and evaluating values let sumResult = (fun (a:int) (b:int) -> a+b) 5 6 // Both the function will return value sumResult = 11 Handling data – tuples, lists, record types, and data manipulation F# supports many data types, for example: Primitive types: bool, int, float, string values. Aggregate type: class, struct, union, record, and enum Array: int[], int[ , ], and float[ , , ] Tuple: type1 * type2 * like (a,1,2,true) type is—char * int * int * bool Generic: list<’x>, dictionary < ’key, ’value> In an F# function, we can pass one tuple instead of multiple parameters of different types. Declaration of a tuple is very simple and we can assign values of a tuple to different variables, for example: let tuple1 = 1,2,3 // assigning values to variables , v1=1, v2= 2, v3=3 let v1,v2,v3 = tuple1 // if we want to assign only two values out of three, use “_” to skip the value. Assigned values: v1=1, //v3=3 let v1,_,v3 = tuple In the preceding examples, we saw that tuple supports pattern matching. These are option types and an option type in F# supports the idea that the value may or not be present at runtime. List List is a generic type implementation. An F# list is similar to a linked list implementation in any other functional language. It has a special opening and closing bracket construct, a short form of the standard empty list ([ ]) syntax: let empty = [] // This is an empty list of untyped type or we can say //generic type. Here type is: 'a list let intList = [10;20;30;40] // this is an integer type list The cons operator is used to prepend an item to a list using a double colon cons(prepend,::). To append another list to one list, we use the append operator—@: // prepend item x into a list let addItem xs x = x :: xs let newIntList = addItem intList 50 // add item 50 in above list //“intlist”, final result would be- [50;10;20;30;40] // using @ to append two list printfn "%A" (["hi"; "team"] @ ["how";"are";"you"]) // result – ["hi"; "team"; "how";"are";"you"] Lists are decomposable using pattern matching into a head and a tail part, where the head is the first item in the list and the tail part is the remaining list, for example: printfn "%A" newIntList.Head printfn "%A" newIntList.Tail printfn "%A" newIntList.Tail.Tail.Head let rec listLength (l: 'a list) = if l.IsEmpty then 0 else 1 + (listLength l.Tail) printfn "%d" (listLength newIntList) Record type The class, struct, union, record, and enum types come under aggregate types. The record type is one of them, it can have n number of members of any individual type. Record type members are by default immutable but we can make them mutable. In general, a record type uses the members as an immutable data type. There is no way to execute logic during instantiation as a record type don't have constructors. A record type also supports match expression, depending on the values inside those records, and they can also again decompose those values for individual handling, for example: type Box = {width: float ; height:int } let giftbox = {width = 6.2 ; height = 3 } In the previous example, we declared a Box with float a value width and an integer height. When we declare giftbox, the compiler automatically detects its type as Box by matching the value types. We can also specify type like this: let giftbox = {Box.width = 6.2 ; Box.height = 3 } or let giftbox : Box = {width = 6.2 ; height = 3 } This kind of type declaration is used when we have the same type of fields or field type declared in more than one type. This declaration is called a record expression. Object-oriented programming in F# F# also supports implementation inheritance, the creation of object, and interface instances. In F#, constructed types are fully compatible .NET classes which support one or more constructors. We can implement a do block with code logic, which can run at the time of class instance creation. The constructed type supports inheritance for class hierarchy creation. We use the inherit keyword to inherit a class. If the member doesn't have implementation, we can use the abstract keyword for declaration. We need to use the abstractClass attribute on the class to inform the compiler that it is abstract. If the abstractClass attribute is not used and type has all abstract members, the F# compiler automatically creates an interface type. Interface is automatically inferred by the compiler as shown in the following screenshot: The override keyword is used to override the base class implementation; to use the base class implementation of the same member, we use the base keyword. In F#, interfaces can be inherited from another interface. In a class, if we use the construct interface, we have to implement all the members in the interface in that class, as well. In general, it is not possible to use interface members from outside the class instance, unless we upcast the instance type to the required interface type. To create an instance of a class or interface, the object expression syntax is used. We need to override virtual members if we are creating a class instance and need member implementation for interface instantiation: type IExampleInterface = abstract member IntValue: int with get abstract member HelloString: unit -> string type PrintValues() = interface IExampleInterface with member x.IntValue = 15 member x.HelloString() = sprintf "Hello friends %d" (x :> IExampleInterface).IntValue let example = let varValue = PrintValues() :> IExampleInterface { new IExampleInterface with member x.IntValue = varValue.IntValue member x.HelloString() = sprintf "<b>%s</b>" (varValue.HelloString()) } printfn "%A" (example.HelloString()) Exception handling The exception keyword is used to create a custom exception in F#; these exceptions adhere to Microsoft best practices, such as constructors supplied, serialization support, and so on. The keyword raise is used to throw an exception. Apart from this, F# has some helper functions, such as failwith, which throws a failure exception at F# runtime, and invalidop, invalidarg, which throw the .NET Framework standard type invalid operation and invalid argument exception, respectively. try/with is used to catch an exception; if an exception occurred on an expression or while evaluating a value, then the try/with expression could be used on the right side of the value evaluation and to assign the value back to some other value. try/with also supports pattern matching to check an individual exception type and extract an item from it. try/finally expression handling depends on the actual code block. Let's take an example of declaring and using a custom exception: exception MyCustomExceptionExample of int * string raise (MyCustomExceptionExample(10, "Error!")) In the previous example, we created a custom exception called MyCustomExceptionExample, using the exception keyword, passing value fields which we want to pass. Then we used the raise keyword to raise exception passing values, which we want to display while running the application or throwing the exception. However, as shown here, while running this code, we don't get our custom message in the error value and the standard exception message is displayed: We can see in the previous screenshot that the exception message doesn't contain the message that we passed. In order to display our custom error message, we need to override the standard message property on the exception type. We will use pattern matching assignment to get two values and up-cast the actual type, due to the internal representation of the exception object. If we run this program again, we will get the custom message in the exception: exception MyCustomExceptionExample of int * string with override x.Message = let (MyCustomExceptionExample(i, s)) = upcast x sprintf "Int: %d Str: %s" i s raise (MyCustomExceptionExample(20, "MyCustomErrorMessage!")) Now, we will get the following error message: In the previous screenshot, we can see our custom message with integer and string values included in the output. We can also use the helper function, failwith, to raise a failure exception, as it includes our message as an error message, as follows: failwith "An error has occurred" The preceding error message can be seen in the following screenshot: Here is a detailed exception screenshot: An example of the invalidarg helper function follows. In this factorial function, we are checking that the value of x is greater than zero. For cases where x is less than 0, we call invalidarg, pass x as the parameter name that is invalid, and then some error message saying the value should be greater than 0. The invalidarg helper function throws an invalid argument exception from the standard system namespace in .NET: let rec factorial x = if x < 0 then invalidArg "x" "Value should be greater than zero" match x with | 0 -> 1 | _ -> x * (factorial (x - 1)) By now, you should be pretty familiar with the F# programming language, to use in your application development, alongside C#. If you found this tutorial helpful and you're interested in learning more, head over to this book .NET Core 2.0 By Example, by Rishabh Verma and Neha Shrivastava. .NET Core completes move to the new compiler – RyuJIT Applying Single Responsibility principle from SOLID in .NET Core Unit Testing in .NET Core with Visual Studio 2017 for better code quality
Read more
  • 0
  • 0
  • 17764

article-image-multithreading-in-rust-using-crates-tutorial
Aaron Lazar
15 Aug 2018
17 min read
Save for later

Multithreading in Rust using Crates [Tutorial]

Aaron Lazar
15 Aug 2018
17 min read
The crates.io ecosystem in Rust can make use of approaches to improve our development speed as well as the performance of our code. In this tutorial, we'll learn how to use the crates ecosystem to manipulate threads in Rust. This article is an extract from Rust High Performance, authored by Iban Eguia Moraza. Using non-blocking data structures One of the issues we saw earlier was that if we wanted to share something more complex than an integer or a Boolean between threads and if we wanted to mutate it, we needed to use a Mutex. This is not entirely true, since one crate, Crossbeam, allows us to use great data structures that do not require locking a Mutex. They are therefore much faster and more efficient. Often, when we want to share information between threads, it's usually a list of tasks that we want to work on cooperatively. Other times, we want to create information in multiple threads and add it to a list of information. It's therefore not so usual for multiple threads to be working with exactly the same variables since as we have seen, that requires synchronization and it will be slow. This is where Crossbeam shows all its potential. Crossbeam gives us some multithreaded queues and stacks, where we can insert data and consume data from different threads. We can, in fact, have some threads doing an initial processing of the data and others performing a second phase of the processing. Let's see how we can use these features. First, add crossbeam to the dependencies of the crate in the Cargo.toml file. Then, we start with a simple example: extern crate crossbeam; use std::thread; use std::sync::Arc; use crossbeam::sync::MsQueue; fn main() { let queue = Arc::new(MsQueue::new()); let handles: Vec<_> = (1..6) .map(|_| { let t_queue = queue.clone(); thread::spawn(move || { for _ in 0..1_000_000 { t_queue.push(10); } }) }) .collect(); for handle in handles { handle.join().unwrap(); } let final_queue = Arc::try_unwrap(queue).unwrap(); let mut sum = 0; while let Some(i) = final_queue.try_pop() { sum += i; } println!("Final sum: {}", sum); } Let's first understand what this example does. It will iterate 1,000,000 times in 5 different threads, and each time it will push a 10 to a queue. Queues are FIFO lists, first input, first output. This means that the first number entered will be the first one to pop() and the last one will be the last to do so. In this case, all of them are a 10, so it doesn't matter. Once the threads finish populating the queue, we iterate over it and we add all the numbers. A simple computation should make you able to guess that if everything goes perfectly, the final number should be 50,000,000. If you run it, that will be the result, and that's not all. If you run it by executing cargo run --release, it will run blazingly fast. On my computer, it took about one second to complete. If you want, try to implement this code with the standard library Mutex and vector, and you will see that the performance difference is amazing. As you can see, we still needed to use an Arc to control the multiple references to the queue. This is needed because the queue itself cannot be duplicated and shared, it has no reference count. Crossbeam not only gives us FIFO queues. We also have LIFO stacks. LIFO comes from last input, first output, and it means that the last element you inserted in the stack will be the first one to pop(). Let's see the difference with a couple of threads: extern crate crossbeam; use std::thread; use std::sync::Arc; use std::time::Duration; use crossbeam::sync::{MsQueue, TreiberStack}; fn main() { let queue = Arc::new(MsQueue::new()); let stack = Arc::new(TreiberStack::new()); let in_queue = queue.clone(); let in_stack = stack.clone(); let in_handle = thread::spawn(move || { for i in 0..5 { in_queue.push(i); in_stack.push(i); println!("Pushed :D"); thread::sleep(Duration::from_millis(50)); } }); let mut final_queue = Vec::new(); let mut final_stack = Vec::new(); let mut last_q_failed = 0; let mut last_s_failed = 0; loop { // Get the queue match queue.try_pop() { Some(i) => { final_queue.push(i); last_q_failed = 0; println!("Something in the queue! :)"); } None => { println!("Nothing in the queue :("); last_q_failed += 1; } } // Get the stack match stack.try_pop() { Some(i) => { final_stack.push(i); last_s_failed = 0; println!("Something in the stack! :)"); } None => { println!("Nothing in the stack :("); last_s_failed += 1; } } // Check if we finished if last_q_failed > 1 && last_s_failed > 1 { break; } else if last_q_failed > 0 || last_s_failed > 0 { thread::sleep(Duration::from_millis(100)); } } in_handle.join().unwrap(); println!("Queue: {:?}", final_queue); println!("Stack: {:?}", final_stack); } As you can see in the code, we have two shared variables: a queue and a stack. The secondary thread will push new values to each of them, in the same order, from 0 to 4. Then, the main thread will try to get them back. It will loop indefinitely and use the try_pop() method. The pop() method can be used, but it will block the thread if the queue or the stack is empty. This will happen in any case once all values get popped since no new values are being added, so the try_pop() method will help not to block the main thread and end gracefully. The way it checks whether all the values were popped is by counting how many times it failed to pop a new value. Every time it fails, it will wait for 100 milliseconds, while the push thread only waits for 50 milliseconds between pushes. This means that if it tries to pop new values two times and there are no new values, the pusher thread has already finished. It will add values as they are popped to two vectors and then print the result. In the meantime, it will print messages about pushing and popping new values. You will understand this better by seeing the output: Note that the output can be different in your case, since threads don't need to be executed in any particular order. In this example output, as you can see, it first tries to get something from the queue and the stack but there is nothing there, so it sleeps. The second thread then starts pushing things, two numbers actually. After this, the queue and the stack will be [0, 1]. Then, it pops the first item from each of them. From the queue, it will pop the 0 and from the stack it will pop the 1 (the last one), leaving the queue as [1] and the stack as [0]. It will go back to sleep and the secondary thread will insert a 2 in each variable, leaving the queue as [1, 2] and the stack as [0, 2]. Then, the main thread will pop two elements from each of them. From the queue, it will pop the 1 and the 2, while from the stack it will pop the 2 and then the 0, leaving both empty. The main thread then goes to sleep, and for the next two tries, the secondary thread will push one element and the main thread will pop it, twice. It might seem a little bit complex, but the idea is that these queues and stacks can be used efficiently between threads without requiring a Mutex, and they accept any Send type. This means that they are great for complex computations, and even for multi-staged complex computations. The Crossbeam crate also has some helpers to deal with epochs and even some variants of the mentioned types. For multithreading, Crossbeam also adds a great utility: scoped threads. Scoped threads In all our examples, we have used standard library threads. As we have discussed, these threads have their own stack, so if we want to use variables that we created in the main thread we will need to send them to the thread. This means that we will need to use things such as Arc to share non-mutable data. Not only that, having their own stack means that they will also consume more memory and eventually make the system slower if they use too much. Crossbeam gives us some special threads that allow sharing stacks between them. They are called scoped threads. Using them is pretty simple and the crate documentation explains them perfectly; you will just need to create a Scope by calling crossbeam::scope(). You will need to pass a closure that receives the Scope. You can then call spawn() in that scope the same way you would do it in std::thread, but with one difference, you can share immutable variables among threads if they were created inside the scope or moved to it. This means that for the queues or stacks we just talked about, or for atomic data, you can simply call their methods without requiring an Arc! This will improve the performance even further. Let's see how it works with a simple example: extern crate crossbeam; fn main() { let all_nums: Vec<_> = (0..1_000_u64).into_iter().collect(); let mut results = Vec::new(); crossbeam::scope(|scope| { for num in &all_nums { results.push(scope.spawn(move || num * num + num * 5 + 250)); } }); let final_result: u64 = results.into_iter().map(|res| res.join()).sum(); println!("Final result: {}", final_result); } Let's see what this code does. It will first just create a vector with all the numbers from 0 to 1000. Then, for each of them, in a crossbeam scope, it will run one scoped thread per number and perform a supposedly complex computation. This is just an example, since it will just return a result of a simple second-order function. Interestingly enough, though, the scope.spawn() method allows returning a result of any type, which is great in our case. The code will add each result to a vector. This won't directly add the resulting number, since it will be executed in parallel. It will add a result guard, which we will be able to check outside the scope. Then, after all the threads run and return the results, the scope will end. We can now check all the results, which are guaranteed to be ready for us. For each of them, we just need to call join() and we will get the result. Then, we sum it up to check that they are actual results from the computation. This join() method can also be called inside the scope and get the results, but it will mean that if you do it inside the for loop, for example, you will block the loop until the result is generated, which is not efficient. The best thing is to at least run all the computations first and then start checking the results. If you want to perform more computations after them, you might find it useful to run the new computation in another loop or iterator inside the crossbeam scope. But, how does crossbeam allow you to use the variables outside the scope freely? Won't there be data races? Here is where the magic happens. The scope will join all the inner threads before exiting, which means that no further code will be executed in the main thread until all the scoped threads finish. This means that we can use the variables of the main thread, also called parent stack, due to the main thread being the parent of the scope in this case without any issue. We can actually check what is happening by using the println!() macro. If we remember from previous examples, printing to the console after spawning some threads would usually run even before the spawned threads, due to the time it takes to set them up. In this case, since we have crossbeam preventing it, we won't see it. Let's check the example: extern crate crossbeam; fn main() { let all_nums: Vec<_> = (0..10).into_iter().collect(); crossbeam::scope(|scope| { for num in all_nums { scope.spawn(move || { println!("Next number is {}", num); }); } }); println!("Main thread continues :)"); } If you run this code, you will see something similar to the following output: As you can see, scoped threads will run without any particular order. In this case, it will first run the 1, then the 0, then the 2, and so on. Your output will probably be different. The interesting thing, though, is that the main thread won't continue executing until all the threads have finished. Therefore, reading and modifying variables in the main thread is perfectly safe. There are two main performance advantages with this approach; Arc will require a call to malloc() to allocate memory in the heap, which will take time if it's a big structure and the memory is a bit full. Interestingly enough, that data is already in our stack, so if possible, we should try to avoid duplicating it in the heap. Moreover, the Arc will have a reference counter, as we saw. And it will even be an atomic reference counter, which means that every time we clone the reference, we will need to atomically increment the count. This takes time, even more than incrementing simple integers. Most of the time, we might be waiting for some expensive computations to run, and it would be great if they just gave all the results when finished. We can still add some more chained computations, using scoped threads, that will only be executed after the first ones finish, so we should use scoped threads more often than normal threads, if possible. Using thread pool So far, we have seen multiple ways of creating new threads and sharing information between them. Nevertheless, the ideal number of threads we should spawn to do all the work should be around the number of virtual processors in the system. This means we should not spawn one thread for each chunk of work. Nevertheless, controlling what work each thread does can be complex, since you have to make sure that all threads have work to do at any given point in time. Here is where thread pooling comes in handy. The Threadpool crate will enable you to iterate over all your work and for each of your small chunks, you can call something similar to a thread::spawn(). The interesting thing is that each task will be assigned to an idle thread, and no new thread will be created for each task. The number of threads is configurable and you can get the number of CPUs with other crates. Not only that, if one of the threads panics, it will automatically add a new one to the pool. To see an example, first, let's add threadpool and num_cpus as dependencies in our Cargo.toml file.  Then, let's see an example code: extern crate num_cpus; extern crate threadpool; use std::sync::atomic::{AtomicUsize, Ordering}; use std::sync::Arc; use threadpool::ThreadPool; fn main() { let pool = ThreadPool::with_name("my worker".to_owned(), num_cpus::get()); println!("Pool threads: {}", pool.max_count()); let result = Arc::new(AtomicUsize::new(0)); for i in 0..1_0000_000 { let t_result = result.clone(); pool.execute(move || { t_result.fetch_add(i, Ordering::Relaxed); }); } pool.join(); let final_res = Arc::try_unwrap(result).unwrap().into_inner(); println!("Final result: {}", final_res); } This code will create a thread pool of threads with the number of logical CPUs in your computer. Then, it will add a number from 0 to 1,000,000 to an atomic usize, just to test parallel processing. Each addition will be performed by one thread. Doing this with one thread per operation (1,000,000 threads) would be really inefficient. In this case, though, it will use the appropriate number of threads, and the execution will be really fast. There is another crate that gives thread pools an even more interesting parallel processing feature: Rayon. Using parallel iterators If you can see the big picture in these code examples, you'll have realized that most of the parallel work has a long loop, giving work to different threads. It happened with simple threads and it happens even more with scoped threads and thread pools. It's usually the case in real life, too. You might have a bunch of data to process, and you can probably separate that processing into chunks, iterate over them, and hand them over to various threads to do the work for you. The main issue with that approach is that if you need to use multiple stages to process a given piece of data, you might end up with lots of boilerplate code that can make it difficult to maintain. Not only that, you might find yourself not using parallel processing sometimes due to the hassle of having to write all that code. Luckily, Rayon has multiple data parallelism primitives around iterators that you can use to parallelize any iterative computation. You can almost forget about the Iterator trait and use Rayon's ParallelIterator alternative, which is as easy to use as the standard library trait! Rayon uses a parallel iteration technique called work stealing. For each iteration of the parallel iterator, the new value or values get added to a queue of pending work. Then, when a thread finishes its work, it checks whether there is any pending work to do and if there is, it starts processing it. This, in most languages, is a clear source of data races, but thanks to Rust, this is no longer an issue, and your algorithms can run extremely fast and in parallel. Let's look at how to use it for an example similar to those we have seen in this chapter. First, add rayon to your Cargo.toml file and then let's start with the code: extern crate rayon; use rayon::prelude::*; fn main() { let result = (0..1_000_000_u64) .into_par_iter() .map(|e| e * 2) .sum::<u64>(); println!("Result: {}", result); } As you can see, this works just as you would write it in a sequential iterator, yet, it's running in parallel. Of course, running this example sequentially will be faster than running it in parallel thanks to compiler optimizations, but when you need to process data from files, for example, or perform very complex mathematical computations, parallelizing the input can give great performance gains. Rayon implements these parallel iteration traits to all standard library iterators and ranges. Not only that, it can also work with standard library collections, such as HashMap and Vec. In most cases, if you are using the iter() or into_iter() methods from the standard library in your code, you can simply use par_iter() or into_par_iter() in those calls and your code should now be parallel and work perfectly. But, beware, sometimes parallelizing something doesn't automatically improve its performance. Take into account that if you need to update some shared information between the threads, they will need to synchronize somehow, and you will lose performance. Therefore, multithreading is only great if workloads are completely independent and you can execute one without any dependency on the rest. If you found this article useful and would like to learn more such tips, head over to pick up this book, Rust High Performance, authored by Iban Eguia Moraza. Rust 1.28 is here with global allocators, nonZero types and more Java Multithreading: How to synchronize threads to implement critical sections and avoid race conditions Multithreading with Qt
Read more
  • 0
  • 0
  • 33775
article-image-understanding-functional-reactive-programming-in-scala
Fatema Patrawala
15 Aug 2018
6 min read
Save for later

Understanding functional reactive programming in Scala [Tutorial]

Fatema Patrawala
15 Aug 2018
6 min read
Like OOP (Object-Oriented Programming), Functional Programming is a kind of programming paradigm. It is a programming style in which we write programs in terms of pure functions and immutable data. It treats its programs as function evaluation. As we use pure functions and immutable data to write our applications, we will get lots of benefits for free. For instance, with immutable data, we do not need to worry about shared-mutable states, side effects, and thread-safety. It follows a Declarative programming style, which means programming is done in terms of expressions, not statements. For instance, in OOP or imperative programming paradigms, we use statements to write programs where FP uses everything as expressions. In this scala functional programming tutorial we will understand the principles and benefits of FP and why Functional reactive programming is a best fit for Reactive programming in Scala. This Scala tutorial is an extract taken from the book Scala Reactive Programming written by Rambabu Posa.  Principles of functional programming FP has the following principles: Pure functions Immutable data No side effects Referential transparency (RT) Functions are first-class citizens Functions that include anonymous functions, higher order functions, combinators, partial functions, partially-applied functions, function currying, closures Tail recursion Functions composability A pure function is a function that always returns the same results for the same inputs irrespective of how many times and where you run this function. We will get lots of benefits with immutable data. For instance, no shared data, no side effects, thread safety for free, and so on. Like an object is a first-class citizen in OOP, in FP, a function is a first-class citizen. This means that we can use a function as any of these: An object A value A data A data type An operation In simple words, in FP, we treat both functions and data as the same. We can compose functions that are in sequential order so that we can solve even complex problems easily. Higher-Order Functions (HOF) are functions that take one or more functions as their parameters or return a function as their result or do both. For instance, map(), flatMap(), filter(), and so on are some of the important and frequently used higher-order functions. Consider the following example: map(x => x*x) Here, the map() function is an example of Higher-Order Function because it takes an anonymous function as its parameter. This anonymous function x => x *x is of type Int => Int, which takes an Int as input and returns Int as its result. An anonymous function is a function without any name. Benefits of functional programming FP provides us with many benefits: Thread-safe code Easy-to-write concurrency and parallel code We can write simple, readable, and elegant code Type safety Composability Supports Declarative programming As we use pure functions and immutability in FP, we will get thread-safety for free. One of the greatest benefits of FP is function composability. We can compose multiple functions one by one and execute them either sequentially or parentally. It gives us a great approach to solve complex problems easily. Functional Reactive programming The combination of FP and RP is known as function Reactive programming or, for short, FRP. It is a multiparadigm and combines the benefits and best features of two of the most popular programming paradigms, which are, FP and RP. FRP is a new programming paradigm or a new style of programming that uses the RP paradigm to support asynchronous non-blocking data streaming with backpressure and also uses the FP paradigm to utilize its features (such as pure functions, immutability, no side effects, RT, and more) and its HOF or combinators (such as map, flatMap, filter, reduce, fold, and zip). In simple words, FRP is a new programming paradigm to support RP using FP features and its building blocks. FRP = FP + RP, as shown here: Today, we have many FRP solutions, frameworks, tools, or technologies. Here's a list of a few FRP technologies: Scala, Play Framework, and Akka Toolkit RxJS Reactive-banana Reactive Sodium Haskell This book is dedicated toward discussing Lightbend's FRP technology stack—Lagom Framework, Scala, Play Framework, and Akka Toolkit (Akka Streams). FRP technologies are mainly useful in developing interactive programs, such as rich GUI (graphical user interfaces), animations, multiplayer games, computer music, or robot controllers. Types of Reactive Programming Even though most of the projects or companies use FP Paradigm to develop their Reactive systems or solutions, there are a couple of ways to use RP. They are known as types of RP: FRP (Functional Reactive Programming) OORP (Object-Oriented Reactive Programming) However, FP is the best programming paradigm to conflate with RP. We will get all the benefits of FP for free. Why FP is the best fit for RP When we conflate RP with FP, we will get the following benefits: Composability—we can compose multiple data streams using functional operations so that we can solve even complex problems easily Thread safety Readability Simple, concise, clear, and easy-to-understand code Easy-to-write asynchronous, concurrent, and parallel code Supports very flexible and easy-to-use operations Supports Declarative programming Easy to write, more Scalable, highly available, and robust code In FP, we concentrate on what to do to fulfill a job, whereas in other programming paradigms, such as OOP or imperative programming (IP), we concentrate on how to do. Declarative programming gives us the following benefits: No side effects Enforces to use immutability Easy to write concise and understandable code The main property of RP is real-time data streaming, and the main property of FP is composability. If we combine these two paradigms, we will get more benefits and can develop better solutions easily. In RP, everything is a stream, while everything is a function in FP. We can use these functions to perform operations on data streams. We learnt the principles and benefits of Scala functional programming. To build fault-tolerant, robust, and distributed applications in Scala, grab the book Scala Reactive Programming today. Introduction to the Functional Programming Manipulating functions in functional programming Why functional programming in Python matters: Interview with best selling author, Steven Lott
Read more
  • 0
  • 0
  • 19708

article-image-mongodb-sharding-clusters-choosing-right-shard-key
Fatema Patrawala
14 Aug 2018
9 min read
Save for later

MongoDB Sharding: Sharding clusters and choosing the right shard key [Tutorial]

Fatema Patrawala
14 Aug 2018
9 min read
Sharding was one of the features that MongoDB offered from an early stage, since version 1.6 was released in August 2010. Sharding is the ability to horizontally scale out our database by partitioning our datasets across different servers—the shards. Foursquare and Bitly are two of the most famous early customers for MongoDB that were also using sharding from its inception all the way to the general availability release. In this article we will learn how to design a sharding cluster and how to make the single most important decision around it of choosing the unique shard key. This article is a MongoDB shard tutorial taken from the book Mastering MongoDB 3.x by Alex Giamas. Sharding setup in MongoDB Sharding is performed at the collection level. We can have collections that we don't want or need to shard for several reasons. We can leave these collections unsharded. These collections will be stored in the primary shard. The primary shard is different for each database in MongoDB. The primary shard is automatically selected by MongoDB when we create a new database in a sharded environment. MongoDB will pick the shard that has the least data stored at the moment of creation. If we want to change the primary shard at any other point, we can issue the following command: > db.runCommand( { movePrimary : "mongo_books", to : "UK_based" } ) We thus move the database named mongo_books to the shard named UK_based. Choosing the shard key Choosing our shard key is the most important decision we need to make. The reason is that once we shard our data and deploy our cluster, it becomes very difficult to change the shard key. First, we will go through the process of changing the shard key. Changing the shard key There is no command or simple procedure to change the shard key in MongoDB. The only way to change the shard key involves backing up and restoring all of our data, something that may range from being extremely difficult to impossible in high-load production environments. The steps if we want to change our shard key are as follows: Export all data from MongoDB. Drop the original sharded collection. Configure sharding with the new key. Presplit the new shard key range. Restore our data back into MongoDB. From these steps, step 4 is the one that needs some more explanation. MongoDB uses chunks to split data in a sharded collection. If we bootstrap a MongoDB sharded cluster from scratch, chunks will be calculated automatically by MongoDB. MongoDB will then distribute the chunks across different shards to ensure that there are an equal number of chunks in each shard. The only case in which we cannot really do this is when we want to load data into a newly sharded collection. The reasons are threefold: MongoDB creates splits only after an insert operation. Chunk migration will copy all of the data in that chunk from one shard to another. The floor(n/2) chunk migrations can happen at any given time, where n is the number of shards we have. Even with three shards, this is only a floor(1.5)=1 chunk migration at a time. These three limitations combined mean that letting MongoDB to figure it out on its own will definitely take much longer and may result in an eventual failure. This is why we want to presplit our data and give MongoDB some guidance on where our chunks should go. Considering our example of the mongo_books database and the books collection, this would be: > db.runCommand( { split : "mongo_books.books", middle : { id : 50 } } ) The middle command parameter will split our key space in documents that have id<=50 and documents that have id>50. There is no need for a document to exist in our collection with id=50 as this will only serve as the guidance value for our partitions. In this example, we chose 50 assuming that our keys follow a uniform distribution (that is, the same count of keys for each value) in the range of values from 0 to 100. We should aim to create at least 20-30 chunks to grant MongoDB flexibility in potential migrations. We can also use bounds and find instead of middle if we want to manually define the partition key, but both parameters need data to exist in our collection before applying them. Choosing the correct shard key After the previous section, it's now self-evident that we need to take into great consideration the choice of our shard key as it is something that we have to stick with. A great shard key has three characteristics: High cardinality Low frequency Non-monotonically changing in value We will go over the definitions of these three properties first to understand what they mean. High cardinality means that the shard key must have as many distinct values as possible. A Boolean can take only values of true/false, and so it is a bad shard key choice. A 64-bit long value field that can take any value from −(2^63) to 2^63 − 1 and is a good example in terms of cardinality. Low frequency directly relates to the argument about high cardinality. A low-frequency shard key will have a distribution of values as close to a perfectly random / uniform distribution. Using the example of our 64-bit long value, it is of little use to us if we have a field that can take values ranging from −(2^63) to 2^63 − 1 only to end up observing the values of 0 and 1 all the time. In fact, it is as bad as using a Boolean field, which can also take only two values after all. If we have a shard key with high frequency values, we will end up with chunks that are indivisible. These chunks cannot be further divided and will grow in size, negatively affecting the performance of the shard that contains them. Non-monotonically changing values mean that our shard key should not be, for example, an integer that always increases with every new insert. If we choose a monotonically increasing value as our shard key, this will result in all writes ending up in the last of all of our shards, limiting our write performance. If we want to use a monotonically changing value as the shard key, we should consider using hash-based sharding. In the next section, we will describe different sharding strategies and their advantages and disadvantages. Range-based sharding The default and the most widely used sharding strategy is range-based sharding. This strategy will split our collection's data into chunks, grouping documents with nearby values in the same shard. For our example database and collection, mongo_books and books respectively, we have: > sh.shardCollection("mongo_books.books", { id: 1 } ) This creates a range-based shard key on id with ascending direction. The direction of our shard key will determine which documents will end up in the first shard and which ones in the subsequent ones. This is a good strategy if we plan to have range-based queries as these will be directed to the shard that holds the result set instead of having to query all shards. Hash-based sharding If we don't have a shard key (or can't create one) that achieves the three goals mentioned previously, we can use the alternative strategy of using hash-based sharding. In this case, we are trading data distribution with query isolation. Hash-based sharding will take the values of our shard key and hash them in a way that guarantees close to uniform distribution. This way we can be sure that our data will evenly distribute across shards. The downside is that only exact match queries will get routed to the exact shard that holds the value. Any range query will have to go out and fetch data from all shards. For our example database and collection (mongo_books and books respectively), we have: > sh.shardCollection("mongo_books.books", { id: "hashed" } ) Similar to the preceding example, we are now using the id field as our hashed shard key. Suppose we use fields with float values for hash-based sharding. Then we will end up with collisions if the precision of our floats is more that 2^53. These fields should be avoided where possible. Coming up with our own key Range-based sharding does not need to be confined to a single key. In fact, in most cases, we would like to combine multiple keys to achieve high cardinality and low frequency. A common pattern is to combine a low-cardinality first part (but still having as distinct values more than two times the number of shards that we have) with a high-cardinality key as its second field. This achieves both read and write distribution from the first part of the sharding key and then cardinality and read locality from the second part. On the other hand, if we don't have range queries, we can get away by using hash-based sharding on a primary key as this will exactly target the shard and document that we are going after. To make things more complicated, these considerations may change depending on our workload. A workload that consists almost exclusively (say 99.5%) of reads won't care about write distribution. We can use the built-in _id field as our shard key and this will only add 0.5% load in the last shard. Our reads will still be distributed across shards. Unfortunately, in most cases, this is not simple. Location-based data Due to government regulations and the desire to have our data as close to our users as possible, there is often a constraint and need to limit data in a specific data center. By placing different shards at different data centers, we can satisfy this requirement. To summarize we learned about MongoDB sharding and got to know techniques to choose the correct shard key. Get the expert guide Mastering MongoDB 3.x  today to build fault-tolerant MongoDB application. MongoDB 4.0 now generally available with support for multi-platform, mobile, ACID transactions and more MongoDB going relational with 4.0 release Indexing, Replicating, and Sharding in MongoDB [Tutorial]
Read more
  • 0
  • 0
  • 21654

article-image-cloud-native-architectures-microservices-containers-serverless-part-2
Guest Contributor
14 Aug 2018
8 min read
Save for later

Modern Cloud Native architectures: Microservices, Containers, and Serverless - Part 2

Guest Contributor
14 Aug 2018
8 min read
This whitepaper is written by Mina Andrawos, an experienced engineer who has developed deep experience in the Go language, and modern software architectures. He regularly writes articles and tutorials about the Go language, and also shares open source projects. Mina Andrawos has authored the book Cloud Native programming with Golang, which provides practical techniques, code examples, and architectural patterns required to build cloud native microservices in the Go language.He is also the author of the Mastering Go Programming, and the Modern Golang Programming video courses. We published Part 1 of this paper yesterday and here we come up with Part 2 which involves Containers and Serverless applications. Let us get started: Containers The technology of software containers is the next key technology that needs to be discussed to practically explain cloud native applications. A container is simply the idea of encapsulating some software inside an isolated user space or “container.” For example, a MySQL database can be isolated inside a container where the environmental variables, and the configurations that it needs will live. Software outside the container will not see the environmental variables or configuration contained inside the container by default. Multiple containers can exist on the same local virtual machine, cloud virtual machine, or hardware server. Containers provide the ability to run numerous isolated software services, with all their configurations, software dependencies, runtimes, tools, and accompanying files, on the same machine. In a cloud environment, this ability translates into saved costs and efforts, as the need for provisioning and buying server nodes for each microservices will diminish, since different microservices can be deployed on the same host without disrupting each other. Containers  combined with microservices architectures are powerful tools to build modern, portable, scalable, and cost efficient software. In a production environment, more than a single server node combined with numerous containers would be needed to achieve scalability and redundancy. Containers also add more benefits to cloud native applications beyond microservices isolation. With a container, you can move your microservices, with all the configuration, dependencies, and environmental variables that it needs, to fresh server nodes without the need to reconfigure the environment, achieving powerful portability. Due to the power and popularity of the software containers technology, some new operating systems like CoreOS, or Photon OS, are built from the ground up to function as hosts for containers. One of the most popular software container projects in the software industry is Docker. Major organizations such as Cisco, Google, and IBM utilize Docker containers in their infrastructure as well as in their products. Another notable project in the software containers world is Kubernetes. Kubernetes is a tool that allows the automation of deployment, management, and scaling of containers. It was built by Google to facilitate the management of their containers, which are counted by billions per week. Kubernetes provides some powerful features such as load balancing between containers, restart for failed containers, and orchestration of storage utilized by the containers. The project is part of the cloud native foundation along with Prometheus. Container complexities In case of containers, sometimes the task of managing them can get rather complex for the same reasons as managing expanding numbers of microservices. As containers or microservices grow in size, there needs to be a mechanism to identify where each container or microservices is deployed, what their purpose is, and what they need in resources to keep running. Serverless applications Serverless architecture is a new software architectural paradigm that was popularized with the AWS Lambda service. In order to fully understand serverless applications, we must first cover an important concept known as ‘Function As A service’, or FaaS for short. Function as a service or FaaS is the idea that a cloud provider such as Amazon or even a local piece of software such as Fission.io or funktion would provide a service, where a user can request a function to run remotely in order to perform a very specific task, and then after the function concludes, the function results return back to the user. No services or stateful data are maintained and the function code is provided by the user to the service that runs the function. The idea behind properly designed cloud native production applications that utilize the serverless architecture is that instead of building multiple microservices expected to run continuously in order to carry out individual tasks, build an application that has fewer microservices combined with FaaS, where FaaS covers tasks that don’t need services to run continuously. FaaS is a smaller construct than a microservice. For example, in case of the event booking application we covered earlier, there were multiple microservices covering different tasks. If we use a serverless applications model, some of those microservices would be replaced with a number of functions that serve their purpose. Here is a diagram that showcases the application utilizing a serverless architecture: In this diagram, the event handler microservices as well as the booking handler microservices were replaced with a number of functions that produce the same functionality. This eliminates the need to run and maintain the two existing microservices. Serverless architectures have the advantage that no virtual machines and/or containers need to be provisioned to build the part of the application that utilizes FaaS. The computing instances that run the functions cease to exist from the user point of view once their functions conclude. Furthermore, the number of microservices and/or containers that need to be monitored and maintained by the user decreases, saving cost, time, and effort. Serverless architectures provide yet another powerful software building tool in the hands of software engineers and architects to design flexible and scalable software. Known FaaS are AWS Lambda by Amazon, Azure Functions by Microsoft, Cloud Functions by Google, and many more. Another definition for serverless applications is the applications that utilize the BaaS or backend as a service paradigm. BaaS is the idea that developers only write the client code of their application, which then relies on several software pre-built services hosted in the cloud, accessible via APIs. BaaS is popular in mobile app programming, where developers would rely on a number of backend services to drive the majority of the functionality of the application. Examples of BaaS services are: Firebase, and Parse. Disadvantages of serverless applications Similarly to microservices and cloud native applications, the serverless architecture is not suitable for all scenarios. The functions provided by FaaS don’t keep state by themselves which means special considerations need to be observed when writing the function code. This is unlike a full microservice, where the developer has full control over the state. One approach to keep state in case of FaaS, in spite of this limitation, is to propagate the state to a database or a memory cache like Redis. The startup times for the functions are not always fast since there is time allocated to sending the request to the FaaS service provider then the time needed to start a computing instance that runs the function in some cases. These delays have to be accounted for when designing serverless applications. FaaS do not run continuously like microservices, which makes them unsuitable for any task that requires continuous running of the software. Serverless applications have the same limitation as other cloud native applications where portability of the application from one cloud provider to another or from the cloud to a local environment becomes challenging because of vendor lock-in Conclusion Cloud computing architectures have opened avenues for developing efficient, scalable, and reliable software. This paper covered some significant concepts in the world of cloud computing such as microservices, cloud native applications, containers, and serverless applications. Microservices are the building blocks for most scalable cloud native applications; they decouple the application tasks into various efficient services. Containers are how microservices could be isolated and deployed safely to production environments without polluting them.  Serverless applications decouple application tasks into smaller constructs mostly called functions that can be consumed via APIs. Cloud native applications make use of all those architectural patterns to build scalable, reliable, and always available software. You read Part 2 of of Modern cloud native architectures, a white paper by Mina Andrawos. Also read Part 1 which includes Microservices and Cloud native applications with their advantages and disadvantages. If you are interested to learn more, check out Mina’s Cloud Native programming with Golang to explore practical techniques for building cloud-native apps that are scalable, reliable, and always available. About Author: Mina Andrawos Mina Andrawos is an experienced engineer who has developed deep experience in Go from using it personally and professionally. He regularly authors articles and tutorials about the language, and also shares Go's open source projects. He has written numerous Go applications with varying degrees of complexity. Other than Go, he has skills in Java, C#, Python, and C++. He has worked with various databases and software architectures. He is also skilled with the agile methodology for software development. Besides software development, he has working experience of scrum mastering, sales engineering, and software product management. Build Java EE containers using Docker [Tutorial] Are containers the end of virtual machines? Why containers are driving DevOps
Read more
  • 0
  • 0
  • 15950
article-image-application-data-entity-framework-net-core
Aaron Lazar
14 Aug 2018
14 min read
Save for later

Access application data with Entity Framework in .NET Core [Tutorial]

Aaron Lazar
14 Aug 2018
14 min read
In this tutorial, we will get started with using the Entity Framework and create a simple console application to perform CRUD operations. The intent is to get started with EF Core and understand how to use it. Before we dive into coding, let us see the two development approaches that EF Core supports: Code-first Database-first These two paradigms have been supported for a very long time and therefore we will just look at them at a very high level. EF Core mainly targets the code-first approach and has limited support for the database-first approach, as there is no support for the visual designer or wizard for the database model out of the box. However, there are third-party tools and extensions that support this. The list of third-party tools and extensions can be seen at https://docs.microsoft.com/en-us/ef/core/extensions/. This tutorial has been extracted from the book .NET Core 2.0 By Example, by Rishabh Verma and Neha Shrivastava. In the code-first approach, we first write the code; that is, we first create the domain model classes and then, using these classes, EF Core APIs create the database and tables, using migration based on the convention and configuration provided. We will look at conventions and configurations a little later in this section. The following diagram illustrates the code-first approach: In the database-first approach, as the name suggests, we have an existing database or we create a database first and then use EF Core APIs to create the domain and context classes. As mentioned, currently EF Core has limited support for it due to a lack of tooling. So, our preference will be for the code-first approach throughout our examples. The reader can discover the third-party tools mentioned previously to learn more about the EF Core database-first approach as well. The following image illustrates the database-first approach: Building Entity Framework Core Console App Now that we understand the approaches and know that we will be using the code-first approach, let's dive into coding our getting started with EF Core console app. Before we do so, we need to have SQL Express installed in our development machine. If SQL Express is not installed, download the SQL Express 2017 edition from https://www.microsoft.com/en-IN/sql-server/sql-server-downloads and run the setup wizard. We will do the Basic installation of SQL Express 2017 for our learning purposes, as shown in the following screenshot: Our objective is to learn how to use EF Core and so we will not do anything fancy in our console app. We will just do simple Create Read Update Delete (CRUD) operations of a simple class called Person, as defined here: public class Person { public int Id { get; set; } public string Name { get; set; } public bool Gender { get; set; } public DateTime DateOfBirth { get; set; } public int Age { get { var age = DateTime.Now.Year - this.DateOfBirth.Year; if (DateTime.Now.DayOfYear < this.DateOfBirth.DayOfYear) { age = age - 1; } return age; } } } As we can see in the preceding code, the class has simple properties. To perform the CRUD operations on this class, let's create a console app by performing the following steps: Create a new .NET Core console project named GettingStartedWithEFCore, as shown in the following screenshot: Create a new folder named Models in the project node and add the Person class to this newly created folder. This will be our model entity class, which we will use for CRUD operations. Next, we need to install the EF Core package. Before we do that, it's important to know that EF Core provides support for a variety of databases. A few of the important ones are: SQL Server SQLite InMemory (for testing) The complete and comprehensive list can be seen at https://docs.microsoft.com/en-us/ef/core/providers/. We will be working with SQL Server on Windows for our learning purposes, so let's install the SQL Server package for Entity Framework Core. To do so, let's install the Microsoft.EntityFrameworkCore.SqlServer package from the NuGet Package Manager in Visual Studio 2017. Right-click on the project. Select Manage Nuget Packages and then search for Microsoft.EntityFrameworkCore.SqlServer. Select the matching result and click Install: Next, we will create a class called Context, as shown here: public class Context : DbContext { public DbSet<Person&gt; Persons { get; set; } protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder) { //// Get the connection string from configuration optionsBuilder.UseSqlServer(@"Server=.\SQLEXPRESS ;Database=PersonDatabase;Trusted_Connection=True;"); } protected override void OnModelCreating(ModelBuilder modelBuilder) { modelBuilder.Entity<Person> ().Property(nameof(Person.Name)).IsRequired(); } } The class looks quite simple, but it has the following subtle and important things to make note of: The Context class derives from DbContext, which resides in the Microsoft.EntityFrameworkCore namespace. DbContext is an integral part of EF Core and if you have worked with EF, you will already be aware of it. An instance of DbContext represents a session with the database and can be used to query and save instances of your entities. DbContext is a combination of the Unit Of Work and Repository Patterns. Typically, you create a class that derives from DbContext and contains Microsoft.EntityFrameworkCore.DbSet properties for each entity in the model. If properties have a public setter, they are automatically initialized when the instance of the derived context is created. It contains a property named Persons (plural of the model class Person) of type DbSet<Person&gt;. This will map to the Persons table in the underlying database. The class overrides the OnConfiguring method of DbContext and specifies the connection string to be used with the SQL Server database. The connection string should be read from the configuration file, appSettings.json, but for the sake of brevity and simplicity, it's hardcoded in the preceding code. The OnConfiguring method allows us to select and configure the data source to be used with a context using DbContextOptionsBuilder. Let's look at the connection string. Server= specifies the server. It can be .\SQLEXPRESS, .\SQLSERVER, .\LOCALDB, or any other instance name based on the installation you have done. Database= specifies the database name that will be created. Trusted_Connection=True specifies that we are using integrated security or Windows authentication. An enthusiastic reader should read the official Microsoft Entity framework documentation on configuring the context at https://docs.microsoft.com/en-us/ef/core/miscellaneous/configuring-dbcontext.  The OnModelCreating method allows us to configure the model using the ModelBuilder Fluent API. This is the most powerful method of configuration and allows configuration to be specified without modifying the entity classes. The Fluent API configuration has the highest precedence and will override conventions and data annotations. The preceding code has same effect as the following data annotation has on the Name property in the Person class: [Required] public string Name { get; set; } The preceding point highlights the flexibility and configuration that EF Core brings to the table. EF Core uses a combination of conventions, attributes, and Fluent API statements to build a database model at runtime. All we have to do is to perform actions on the model classes using a combination of these and they will automatically be translated to appropriate changes in the database. Before we conclude this point, let's have a quick look at each of the different ways to configure a database model: EF Core conventions: The conventions in EF Core are comprehensive. They are the default rules by which EF Core builds a database model based on classes. A few of the simpler yet important default conventions are listed here: EF Core creates database tables for all DbSet<TEntity&gt; properties in a Context class with the same name as that of the property. In the preceding example, the table name would be Persons based on this convention. EF Core creates tables for entities that are not included as DbSet properties but are reachable through reference properties in the other DbSet entities. If the Person class had a complex/navigation property, EF Core would have created a table for it as well. EF Core creates columns for all the scalar read-write properties of a class with the same name as the property by default. It uses the reference and collection properties for building relationships among corresponding tables in the database. In the preceding example, the scalar properties of Person correspond to a column in the Persons table. EF Core assumes a property named ID or one that is suffixed with ID as a primary key. If the property is an integer type or Guid type, then EF Core also assumes it to be IDENTITY and automatically assigns a value when inserting the data. This is precisely what we will make use of in our example while inserting or creating a new Person. EF Core maps the data type of a database column based on the data type of the property defined in the C# class. A few of the mappings between the C# data type to the SQL Server column data type are listed in the following table: C# data type SQL server data type int int string nvarchar(Max) decimal decimal(18,2) float real byte[] varbinary(Max) datetime datetime bool bit byte tinyint short smallint long bigint double float There are many other conventions, and we can define custom conventions as well. For more details, please read the official Microsoft documentation at https://docs.microsoft.com/en-us/ef/core/modeling/. Attributes: Conventions are often not enough to map the class to database objects. In such scenarios, we can use attributes called data annotation attributes to get the desired results. The [Required] attribute that we have just seen is an example of a data annotation attribute. Fluent API: This is the most powerful way of configuring the model and can be used in addition to or in place of attributes. The code written in the OnModelConfiguring method is an example of a Fluent API statement. If we check now, there is no PersonDatabase database. So, we need to create the database from the model by adding a migration. EF Core includes different migration commands to create or update the database based on the model. To do so in Visual Studio 2017, go to Tools | Nuget Package Manager | Package Manager Console, as shown in the following screenshot: This will open the Package Manager Console window. Select the Default Project as GettingStartedWithEFCore and type the following command: add-migration CreatePersonDatabase If you are not using Visual Studio 2017 and you are dependent on .NET Core CLI tooling, you can use the following command: dotnet ef migrations add CreatePersonDatabase We have not installed the Microsoft.EntityFrameworkCore.Design package, so it will give an error: Your startup project 'GettingStartedWithEFCore' doesn't reference Microsoft.EntityFrameworkCore.Design. This package is required for the Entity Framework Core Tools to work. Ensure your startup project is correct, install the package, and try again. So let's first go to the NuGet Package Manager and install this package. After successful installation of this package, if we run the preceding command again, we should be able to run the migrations successfully. It will also tell us the command to undo the migration by displaying the message To undo this action, use Remove-Migration. We should see the new files added in the Solution Explorer in the Migrations folder, as shown in the following screenshot: 8. Although we have migrations applied, we have still not created a database. To create the database, we need to run the following commands. In Visual Studio 2017: update-database –verbose In .NET Core CLI: dotnet ef database update If all goes well, we should have the database created with the Persons table (property of type DbSet<Person&gt;) in the database. Let's validate the table and database by using SQL Server Management Studio (SSMS). If SSMS is not installed in your machine, you can also use Visual Studio 2017 to view the database and table. Let's check the created database. In Visual Studio 2017, click on the View menu and select Server Explorer, as shown in the following screenshot: In Server Explorer, right-click on Data Connections and then select Add Connection. The Add Connection dialog will show up. Enter .\SQLEXPRESS in the Server name (since we installed SQL EXPRESS 2017) and select PersonDatabase as the database, as shown in the following screenshot: On clicking OK, we will see the database named PersonDatabase and if we expand the tables, we can see the Persons table as well as the _EFMigrationsHistory table. Notice that the properties in the Person class that had setters are the only properties that get transformed into table columns in the Persons table. Notice that the Age property is read-only in the class we created and therefore we do not see an age column in the database table, as shown in the following screenshot: This is the first migration to create a database. Whenever we add or update the model classes or configurations, we need to sync the database with the model using the add-migration and update-database commands. With this, we have our model class ready and the corresponding database created. The following image summarizes how the properties have been mapped from the C# class to the database table columns: Now, we will use the Context class to perform CRUD operations.  Let's go back to our Main.cs and write the following code. The code is well commented, so please go through the comments to understand the flow: class Program { static void Main(string[] args) { Console.WriteLine("Getting started with EF Core"); Console.WriteLine("We will do CRUD operations on Person class."); //// Lets create an instance of Person class. Person person = new Person() { Name = "Rishabh Verma", Gender = true, //// For demo true= Male, false = Female. Prefer enum in real cases. DateOfBirth = new DateTime(2000, 10, 23) }; using (var context = new Context()) { //// Context has strongly typed property named Persons which referes to Persons table. //// It has methods Add, Find, Update, Remove to perform CRUD among many others. //// Use AddRange to add multiple persons in once. //// Complete set of APIs can be seen by using F12 on the Persons property below in Visual Studio IDE. var personData = context.Persons.Add(person); //// Though we have done Add, nothing has actually happened in database. All changes are in context only. //// We need to call save changes, to persist these changes in the database. context.SaveChanges(); //// Notice above that Id is Primary Key (PK) and hence has not been specified in the person object passed to context. //// So, to know the created Id, we can use the below Id int createdId = personData.Entity.Id; //// If all goes well, person data should be persisted in the database. //// Use proper exception handling to discover unhandled exception if any. Not showing here for simplicity and brevity. createdId variable would now hold the id of created person. //// READ BEGINS Person readData = context.Persons.Where(j => j.Id == createdId).FirstOrDefault(); //// We have the data of person where Id == createdId, i.e. details of Rishabh Verma. //// Lets update the person data all together just for demonstarting update functionality. //// UPDATE BEGINS person.Name = "Neha Shrivastava"; person.Gender = false; person.DateOfBirth = new DateTime(2000, 6, 15); person.Id = createdId; //// For update cases, we need this to be specified. //// Update the person in context. context.Persons.Update(person); //// Save the updates. context.SaveChanges(); //// DELETE the person object. context.Remove(readData); context.SaveChanges(); } Console.WriteLine("All done. Please press Enter key to exit..."); Console.ReadLine(); } } With this, we have completed our sample app to get started with EF Core. I hope this simple example will set you up to start using EF Core with confidence and encourage you to start exploring it further. The detailed features of EF Core can be learned from the official Microsoft documentation available at https://docs.microsoft.com/en-us/ef/core/. If you're interested in learning more, head over to this book, .NET Core 2.0 By Example, by Rishabh Verma and Neha Shrivastava. How to build a chatbot with Microsoft Bot framework Working with Entity Client and Entity SQL Get to know ASP.NET Core Web API [Tutorial]
Read more
  • 0
  • 0
  • 24168

article-image-polymorphism-type-pattern-matching-python
Aaron Lazar
13 Aug 2018
11 min read
Save for later

Polymorphism and type-pattern matching in Python [Tutorial]

Aaron Lazar
13 Aug 2018
11 min read
Some functional programming languages offer clever approaches to the problem of working with statically typed function definitions. The problem is that many functions we'd like to write are entirely generic with respect to data type. For example, most of our statistical functions are identical for int or float numbers, as long as the division returns a value that is a subclass of numbers.Real (for example, Decimal, Fraction, or float). In many functional languages, sophisticated type or type-pattern matching rules are used by the compiler to make a single generic definition work for multiple data types. Python doesn't have this problem and doesn't need the pattern matching. In this article, we'll understand how to achieve Polymorphism and type-pattern matching in Python. This Python tutorial is an extract taken from the 2nd edition of the bestseller, Functional Python Programming, authored by Steven Lott. Instead of the (possibly) complex features of statically typed functional languages, Python changes the approach dramatically. Python uses dynamic selection of the final implementation of an operator based on the data types being used. In Python, we always write generic definitions. The code isn't bound to any specific data type. The Python runtime will locate the appropriate operations based on the types of the actual objects in use. The 3.3.7 Coercion rules section of the language reference manual and the numbers module in the library provide details on how this mapping from operation to special method name works. This means that the compiler doesn't certify that our functions are expecting and producing the proper data types. We generally rely on unit testing and the mypy tool for this kind of type checking. In rare cases, we might need to have different behavior based on the types of data elements. We have two ways to tackle this: We can use the isinstance() function to distinguish the different cases We can create our own subclass of numbers.Number or NamedTuple and implement proper polymorphic special method names. In some cases, we'll actually need to do both so that we can include appropriate data type conversions for each operation. Additionally, we'll also need to use the cast() function to make the types explicit to the mypy tool. The ranking example in the previous section is tightly bound to the idea of applying rank-ordering to simple pairs. While this is the way the Spearman correlation is defined, a multivariate dataset have a need to do rank-order correlation among all the variables. The first thing we'll need to do is generalize our idea of rank-order information. The following is a NamedTuple value that handles a tuple of ranks and a raw data object: from typing import NamedTuple, Tuple, Any class Rank_Data(NamedTuple): rank_seq: Tuple[float] raw: Any A typical use of this kind of class definition is shown in this example: >>> data = {'key1': 1, 'key2': 2} >>> r = Rank_Data((2, 7), data) >>> r.rank_seq[0] 2 >>> r.raw {'key1': 1, 'key2': 2} The row of raw data in this example is a dictionary. There are two rankings for this particular item in the overall list. An application can get the sequence of rankings as well as the original raw data item. We'll add some syntactic sugar to our ranking function. In many previous examples, we've required either an iterable or a concrete collection. The for statement is graceful about working with either one. However, we don't always use the for statement, and for some functions, we've had to explicitly use iter() to make an iterable out of a collection. We can handle this situation with a simple isinstance() check, as shown in the following code snippet: def some_function(seq_or_iter: Union[Sequence, Iterator]): if isinstance(seq_or_iter, Sequence): yield from some_function(iter(seq_or_iter), key) return # Do the real work of the function using the Iterator This example includes a type check to handle the small difference between a Sequence object and an Iterator. Specifically, the function uses iter() to create an Iterator from a Sequence, and calls itself recursively with the derived value. For rank-ordering, the Union[Sequence, Iterator] will be supported. Because the source data must be sorted for ranking, it's easier to use list() to transform a given iterator into a concrete sequence. The essential isinstance() check will be used, but instead of creating an iterator from a sequence (as shown previously), the following examples will create a sequence object from an iterator. In the context of our rank-ordering function, we can make the function somewhat more generic. The following two expressions define the inputs: Source = Union[Rank_Data, Any] Union[Sequence[Source], Iterator[Source]] There are four combinations defined by these two types: Sequence[Rank_Data] Sequence[Any] Iterator[Rank_Data] Iterator[Any] Handling four combination data types Here's the rank_data() function with three cases for handling the four combinations of data types: from typing import ( Callable, Sequence, Iterator, Union, Iterable, TypeVar, cast, Union ) K_ = TypeVar("K_") # Some comparable key type used for ranking. Source = Union[Rank_Data, Any] def rank_data( seq_or_iter: Union[Sequence[Source], Iterator[Source]], key: Callable[[Rank_Data], K_] = lambda obj: cast(K_, obj) ) -> Iterable[Rank_Data]: if isinstance(seq_or_iter, Iterator): # Iterator? Materialize a sequence object yield from rank_data(list(seq_or_iter), key) return data: Sequence[Rank_Data] if isinstance(seq_or_iter[0], Rank_Data): # Collection of Rank_Data is what we prefer. data = seq_or_iter else: # Convert to Rank_Data and process. empty_ranks: Tuple[float] = cast(Tuple[float], ()) data = list( Rank_Data(empty_ranks, raw_data) for raw_data in cast(Sequence[Source], seq_or_iter) ) for r, rd in rerank(data, key): new_ranks = cast( Tuple[float], rd.rank_seq + cast(Tuple[float], (r,))) yield Rank_Data(new_ranks, rd.raw) We've decomposed the ranking into three cases to cover the four different types of data. The following are the cases defined by the union of unions: Given an Iterator (an object without a usable __getitem__() method), we'll materialize a list object to work with. This will work for Rank_Data as well as any other raw data type. This case covers objects which are Iterator[Rank_Data] as well as Iterator[Any]. Given a Sequence[Any], we'll wrap the unknown objects into Rank_Data tuples with an empty collection of rankings to create a Sequence[Rank_Data]. Finally, given a Sequence[Rank_Data], add yet another ranking to the tuple of ranks inside the each Rank_Data container. The first case calls rank_data() recursively. The other two cases both rely on a rerank() function that builds a new Rank_Data tuple with additional ranking values. This contains several rankings for a complex record of raw data values. Note that a relatively complex cast() expression is required to disambiguate the use of generic tuples for the rankings. The mypy tool offers a reveal_type() function that can be incorporated to debug the inferred types. The rerank() function follows a slightly different design to the example of the rank() function shown previously. It yields two-tuples with the rank and the original data object: def rerank( rank_data_iter: Iterable[Rank_Data], key: Callable[[Rank_Data], K_] ) -> Iterator[Tuple[float, Rank_Data]]: sorted_iter = iter( sorted( rank_data_iter, key=lambda obj: key(obj.raw) ) ) # Apply ranker to head, *tail = sorted(rank_data_iter) head = next(sorted_iter) yield from ranker(sorted_iter, 0, [head], key) The idea behind rerank() is to sort a collection of Rank_Data objects. The first item, head, is used to provide a seed value to the ranker() function. The ranker() function can examine the remaining items in the iterable to see if they match this initial value, this allows computing a proper rank for a batch of matching items. The ranker() function accepts a sorted iterable of data, a base rank number, and an initial collection of items of the minimum rank. The result is an iterable sequence of two-tuples with a rank number and an associated Rank_Data object: def ranker( sorted_iter: Iterator[Rank_Data], base: float, same_rank_seq: List[Rank_Data], key: Callable[[Rank_Data], K_] ) -> Iterator[Tuple[float, Rank_Data]]: try: value = next(sorted_iter) except StopIteration: dups = len(same_rank_seq) yield from yield_sequence( (base+1+base+dups)/2, iter(same_rank_seq)) return if key(value.raw) == key(same_rank_seq[0].raw): yield from ranker( sorted_iter, base, same_rank_seq+[value], key) else: dups = len(same_rank_seq) yield from yield_sequence( (base+1+base+dups)/2, iter(same_rank_seq)) yield from ranker( sorted_iter, base+dups, [value], key) This starts by attempting to extract the next item from the sorted_iter collection of sorted Rank_Data items. If this fails with a StopIteration exception, there is no next item, the source was exhausted. The final output is the final batch of equal-valued items in the same_rank_seq sequence. If the sequence has a next item, the key() function extracts the key value. If this new value matches the keys in the same_rank_seq collection, it is accumulated into the current batch of same-valued keys.  The final result is based on the rest of the items in sorted_iter, the current value for the rank, a larger batch of same_rank items that now includes the head value, and the original key() function. If the next item's key doesn't match the current batch of equal-valued items, the final result has two parts. The first part is the batch of equal-valued items accumulated in same_rank_seq.  This is followed by the reranking of the remainder of the sorted items. The base value for these is incremented by the number of equal-valued items, a fresh batch of equal-rank items is initialized with the distinct key, and the original key() extraction function is provided. The output from ranker() depends on the yield_sequence() function, which looks as follows: def yield_sequence( rank: float, same_rank_iter: Iterator[Rank_Data] ) -> Iterator[Tuple[float, Rank_Data]]: head = next(same_rank_iter) yield rank, head yield from yield_sequence(rank, same_rank_iter) We've written this in a way that emphasizes the recursive definition. For any practical work, this should be optimized into a single for statement. When doing Tail-Call Optimization to transform a recursion into a loop define unit test cases first. Be sure the recursion passes the unit test cases before optimizing. The following are some examples of using this function to rank (and rerank) data. We'll start with a simple collection of scalar values: >>> scalars= [0.8, 1.2, 1.2, 2.3, 18] >>> list(rank_data(scalars)) [Rank_Data(rank_seq=(1.0,), raw=0.8), Rank_Data(rank_seq=(2.5,), raw=1.2), Rank_Data(rank_seq=(2.5,), raw=1.2), Rank_Data(rank_seq=(4.0,), raw=2.3), Rank_Data(rank_seq=(5.0,), raw=18)] Each value becomes the raw attribute of a Rank_Data object. When we work with a slightly more complex object, we can also have multiple rankings. The following is a sequence of two tuples: >>> pairs = ((2, 0.8), (3, 1.2), (5, 1.2), (7, 2.3), (11, 18)) >>> rank_x = list(rank_data(pairs, key=lambda x:x[0])) >>> rank_x [Rank_Data(rank_seq=(1.0,), raw=(2, 0.8)), Rank_Data(rank_seq=(2.0,), raw=(3, 1.2)), Rank_Data(rank_seq=(3.0,), raw=(5, 1.2)), Rank_Data(rank_seq=(4.0,), raw=(7, 2.3)), Rank_Data(rank_seq=(5.0,), raw=(11, 18))] >>> rank_xy = list(rank_data(rank_x, key=lambda x:x[1] )) >>> rank_xy [Rank_Data(rank_seq=(1.0, 1.0), raw=(2, 0.8)), Rank_Data(rank_seq=(2.0, 2.5), raw=(3, 1.2)), Rank_Data(rank_seq=(3.0, 2.5), raw=(5, 1.2)), Rank_Data(rank_seq=(4.0, 4.0), raw=(7, 2.3)), Rank_Data(rank_seq=(5.0, 5.0), raw=(11, 18))] Here, we defined a collection of pairs. Then, we ranked the two tuples, assigning the sequence of Rank_Data objects to the rank_x variable. We then ranked this collection of Rank_Data objects, creating a second rank value and assigning the result to the rank_xy variable. The resulting sequence can be used for a slightly modified rank_corr() function to compute the rank correlations of any of the available values in the rank_seq attribute of the Rank_Data objects. We'll leave this modification as an exercise for you. If you found this tutorial useful and would like to learn more such techniques, head over to get Steven Lott's bestseller, Functional Python Programming. Why functional programming in Python matters: Interview with best selling author, Steven Lott Top 7 Python programming books you need to read Members Inheritance and Polymorphism
Read more
  • 0
  • 0
  • 15376
Modal Close icon
Modal Close icon