How-To Tutorials

article-image-shared-pointers-in-rust-challenges-solutions

22 Aug 2018

8 min read

Working with Shared pointers in Rust: Challenges and Solutions [Tutorial]

22 Aug 2018

One of Rust's most criticized problem is that it's difficult to develop an application with shared pointers. It's true that due to Rust's memory safety guarantees, it might be difficult to develop those kind of algorithms, but as we will see now, the standard library gives us types we can use to safely allow that behavior. In this article, we'll understand how to overcome the issue of shared pointers in Rust to increase efficiency. This article is an extract from Rust High Performance, authored by Iban Eguia Moraza. Overcoming issue with cell module The standard Rust library has one interesting module, the std::cell module, that allows us to use objects with interior mutability. This means that we can have an immutable object and still mutate it by getting a mutable borrow to the underlying data. This, of course, would not comply with the mutability rules we saw before, but the cells make sure this works by checking the borrows at runtime or by doing copies of the underlying data. Cells Let's start with the basic Cell structure. A Cell will contain a mutable value, but it can be mutated without having a mutable Cell. It has mainly three interesting methods: set(), swap(), and replace(). The first allows us to set the contained value, replacing it with a new value. The previous structure will be dropped (the destructor will run). That last bit is the only difference with the replace() method. In the replace() method, instead of dropping the previous value, it will be returned. The swap() method, on the other hand, will take another Cell and swap the values between the two. All this without the Cell needing to be mutable. Let's see it with an example: use std::cell::Cell; #[derive(Copy, Clone)] struct House { bedrooms: u8, } impl Default for House { fn default() -> Self { House { bedrooms: 1 } } } fn main() { let my_house = House { bedrooms: 2 }; let my_dream_house = House { bedrooms: 5 }; let my_cell = Cell::new(my_house); println!("My house has {} bedrooms.", my_cell.get().bedrooms); my_cell.set(my_dream_house); println!("My new house has {} bedrooms.", my_cell.get().bedrooms); let my_new_old_house = my_cell.replace(my_house); println!( "My house has {} bedrooms, it was better with {}", my_cell.get().bedrooms, my_new_old_house.bedrooms ); let my_new_cell = Cell::new(my_dream_house); my_cell.swap(&my_new_cell); println!( "Yay! my current house has {} bedrooms! (my new house {})", my_cell.get().bedrooms, my_new_cell.get().bedrooms ); let my_final_house = my_cell.take(); println!( "My final house has {} bedrooms, the shared one {}", my_final_house.bedrooms, my_cell.get().bedrooms ); } As you can see in the example, to use a Cell, the contained type must be Copy. If the contained type is not Copy, you will need to use a RefCell, which we will see next. Continuing with this Cell example, as you can see through the code, the output will be the following: So we first create two houses, we select one of them as the current one, and we keep mutating the current and the new ones. As you might have seen, I also used the take() method, only available for types implementing the Default trait. This method will return the current value, replacing it with the default value. As you can see, you don't really mutate the value inside, but you replace it with another value. You can either retrieve the old value or lose it. Also, when using the get() method, you get a copy of the current value, and not a reference to it. That's why you can only use elements implementing Copy with a Cell. This also means that a Cell does not need to dynamically check borrows at runtime. RefCell RefCell is similar to Cell, except that it accepts non-Copy data. This also means that when modifying the underlying object, it cannot simply copy it when returning it, it will need to return references. The same way, when you want to mutate the object inside, it will return a mutable reference. This only works because it will dynamically check at runtime whether a borrow exists before returning a mutable borrow, or the other way around, and if it does, the thread will panic. Instead of using the get() method as in Cell, RefCell has two methods to get the underlying data: borrow() and borrow_mut(). The first will get a read-only borrow, and you can have as many immutable borrows in a scope. The second one will return a read-write borrow, and you will only be able to have one in scope to follow the mutability rules. If you try to do a borrow_mut() after a borrow() in the same scope, or a borrow() after a borrow_mut(), the thread will panic. There are two non-panicking alternatives to these borrows: try_borrow() and try_borrow_mut(). These two will try to borrow the data (the first read-only and the second read/write), and if there are incompatible borrows present, they will return a Result::Err, so that you can handle the error without panicking. Both Cell and RefCell have a get_mut() method, that will get a mutable reference to the element inside, but it requires the Cell / RefCell to be mutable, so it doesn't make much sense if you need the Cell / RefCell to be immutable. Nevertheless, if in a part of the code you can actually have a mutable Cell / RefCell, you should use this method to change the contents, since it will check all rules statically at compile time, without runtime overhead. Interestingly enough, RefCell does not return a plain reference to the underlying data when we call borrow() or borrow_mut(). You would expect them to return &T and &mut T (where T is the wrapped element). Instead, they will return a Ref and a RefMut, respectively. This is to safely wrap the reference inside, so that the lifetimes get correctly calculated by the compiler without requiring references to live for the whole lifetime of the RefCell. They implement Deref into references, though, so thanks to Rust's Deref coercion, you can use them as references. Overcoming issue with rc module The std::rc module contains reference-counted pointers that can be used in single-threaded applications. They have very little overhead, thanks to counters not being atomic counters, but this means that using them in multithreaded applications could cause data races. Thus, Rust will stop you from sending them between threads at compile time. There are two structures in this module: Rc and Weak. An Rc is an owning pointer to the heap. This means that it's the same as a Box, except that it allows for reference-counted pointers. When the Rc goes out of scope, it will decrease by 1 the number of references, and if that count is 0, it will drop the contained object. Since an Rc is a shared reference, it cannot be mutated, but a common pattern is to use a Cell or a RefCell inside the Rc to allow for interior mutability. Rc can be downgraded to a Weak pointer, that will have a borrowed reference to the heap. When an Rc drops the value inside, it will not check whether there are Weak pointers to it. This means that a Weak pointer will not always have a valid reference, and therefore, for safety reasons, the only way to check the value of the Weak pointer is to upgrade it to an Rc, which could fail. The upgrade() method will return None if the reference has been dropped. Let's check all this by creating an example binary tree structure: use std::cell::RefCell; use std::rc::{Rc, Weak}; struct Tree<T> { root: Node<T>, } struct Node<T> { parent: Option<Weak<Node<T>>>, left: Option<Rc<RefCell<Node<T>>>>, right: Option<Rc<RefCell<Node<T>>>>, value: T, } In this case, the tree will have a root node, and each of the nodes can have up to two children. We call them left and right, because they are usually represented as trees with one child on each side. Each node has a pointer to one of the children, and it owns the children nodes. This means that when a node loses all references, it will be dropped, and with it, its children. Each child has a pointer to its parent. The main issue with this is that, if the child has an Rc pointer to its parent, it will never drop. This is a circular dependency, and to avoid it, the pointer to the parent will be a Weak pointer. So, you've finally understood how Rust manages shared pointers for complex structures, where the Rust borrow checker can make your coding experience much more difficult. If you found this article useful and would like to learn more such tips, head over to pick up the book, Rust High Performance, authored by Iban Eguia Moraza. Perform Advanced Programming with Rust Rust 1.28 is here with global allocators, nonZero types and more Say hello to Sequoia: a new Rust based OpenPGP library to secure your apps

0
0
18050

How-To Tutorials

article-image-generative-adversarial-networks-using-keras

Amey Varangaonkar

21 Aug 2018

12 min read

Generative Adversarial Networks: Generate images using Keras GAN [Tutorial]

Amey Varangaonkar

21 Aug 2018

12 min read

You might have worked with the popular MNIST dataset before - but in this article, we will be generating new MNIST-like images with a Keras GAN. It can take a very long time to train a GAN; however, this problem is small enough to run on most laptops in a few hours, which makes it a great example. The following excerpt is taken from the book Deep Learning Quick Reference, authored by Mike Bernico. The network architecture that we will be using here has been found by, and optimized by, many folks, including the authors of the DCGAN paper and people like Erik Linder-Norén, who's excellent collection of GAN implementations called Keras GAN served as the basis of the code we used here. Loading the MNIST dataset The MNIST dataset consists of 60,000 hand-drawn numbers, 0 to 9. Keras provides us with a built-in loader that splits it into 50,000 training images and 10,000 test images. We will use the following code to load the dataset: from keras.datasets import mnist def load_data(): (X_train, _), (_, _) = mnist.load_data() X_train = (X_train.astype(np.float32) - 127.5) / 127.5 X_train = np.expand_dims(X_train, axis=3) return X_train As you probably noticed, We're not returning any of the labels or the testing dataset. We're only going to use the training dataset. The labels aren't needed because the only labels we will be using are 0 for fake and 1 for real. These are real images, so they will all be assigned a label of 1 at the discriminator. Building the generator The generator uses a few new layers that we will talk about in this section. First, take a chance to skim through the following code: def build_generator(noise_shape=(100,)): input = Input(noise_shape) x = Dense(128 * 7 * 7, activation="relu")(input) x = Reshape((7, 7, 128))(x) x = BatchNormalization(momentum=0.8)(x) x = UpSampling2D()(x) x = Conv2D(128, kernel_size=3, padding="same")(x) x = Activation("relu")(x) x = BatchNormalization(momentum=0.8)(x) x = UpSampling2D()(x) x = Conv2D(64, kernel_size=3, padding="same")(x) x = Activation("relu")(x) x = BatchNormalization(momentum=0.8)(x) x = Conv2D(1, kernel_size=3, padding="same")(x) out = Activation("tanh")(x) model = Model(input, out) print("-- Generator -- ") model.summary() return model We have not previously used the UpSampling2D layer. This layer will take increases in the rows and columns of the input tensor, leaving the channels unchanged. It does this by repeating the values in the input tensor. By default, it will double the input. If we give an UpSampling2D layer a 7 x 7 x 128 input, it will give us a 14 x 14 x 128 output. Typically when we build a CNN, we start with an image that is very tall and wide and uses convolutional layers to get a tensor that's very deep but less tall and wide. Here we will do the opposite. We'll use a dense layer and a reshape to start with a 7 x 7 x 128 tensor and then, after doubling it twice, we'll be left with a 28 x 28 tensor. Since we need a grayscale image, we can use a convolutional layer with a single unit to get a 28 x 28 x 1 output. This sort of generator arithmetic is a little off-putting and can seem awkward at first but after a few painful hours, you will get the hang of it! Building the discriminator The discriminator is really, for the most part, the same as any other CNN. Of course, there are a few new things that we should talk about. We will use the following code to build the discriminator: def build_discriminator(img_shape): input = Input(img_shape) x =Conv2D(32, kernel_size=3, strides=2, padding="same")(input) x = LeakyReLU(alpha=0.2)(x) x = Dropout(0.25)(x) x = Conv2D(64, kernel_size=3, strides=2, padding="same")(x) x = ZeroPadding2D(padding=((0, 1), (0, 1)))(x) x = (LeakyReLU(alpha=0.2))(x) x = Dropout(0.25)(x) x = BatchNormalization(momentum=0.8)(x) x = Conv2D(128, kernel_size=3, strides=2, padding="same")(x) x = LeakyReLU(alpha=0.2)(x) x = Dropout(0.25)(x) x = BatchNormalization(momentum=0.8)(x) x = Conv2D(256, kernel_size=3, strides=1, padding="same")(x) x = LeakyReLU(alpha=0.2)(x) x = Dropout(0.25)(x) x = Flatten()(x) out = Dense(1, activation='sigmoid')(x) model = Model(input, out) print("-- Discriminator -- ") model.summary() return model First, you might notice the oddly shaped zeroPadding2D() layer. After the second convolution, our tensor has gone from 28 x 28 x 3 to 7 x 7 x 64. This layer just gets us back into an even number, adding zeros on one side of both the rows and columns so that our tensor is now 8 x 8 x 64. More unusual is the use of both batch normalization and dropout. Typically, these two layers are not used together; however, in the case of GANs, they do seem to benefit the network. Building the stacked model Now that we've assembled both the generator and the discriminator, we need to assemble a third model that is the stack of both models together that we can use for training the generator given the discriminator loss. To do that we can just create a new model, this time using the previous models as layers in the new model, as shown in the following code: discriminator = build_discriminator(img_shape=(28, 28, 1)) generator = build_generator() z = Input(shape=(100,)) img = generator(z) discriminator.trainable = False real = discriminator(img) combined = Model(z, real) Notice that we're setting the discriminator's training attribute to False before building the model. This means that for this model we will not be updating the weights of the discriminator during backpropagation. We will freeze these weights and only move the generator weights with the stack. The discriminator will be trained separately. Now that all the models are built, they need to be compiled, as shown in the following code: gen_optimizer = Adam(lr=0.0002, beta_1=0.5) disc_optimizer = Adam(lr=0.0002, beta_1=0.5) discriminator.compile(loss='binary_crossentropy', optimizer=disc_optimizer, metrics=['accuracy']) generator.compile(loss='binary_crossentropy', optimizer=gen_optimizer) combined.compile(loss='binary_crossentropy', optimizer=gen_optimizer) If you'll notice, we're creating two custom Adam optimizers. This is because many times we will want to change the learning rate for only the discriminator or generator, slowing one or the other down so that we end up with a stable GAN where neither is overpowering the other. You'll also notice that we're using beta_1 = 0.5. This is a recommendation from the original DCGAN paper that we've carried forward and also had success with. A learning rate of 0.0002 is a good place to start as well, and was found in the original DCGAN paper. The training loop We have previously had the luxury of calling .fit() on our model and letting Keras handle the painful process of breaking the data apart into mini batches and training for us. Unfortunately, because we need to perform the separate updates for the discriminator and the stacked model together for a single batch we're going to have to do things the old-fashioned way, with a few loops. This is how things used to be done all the time, so while it's perhaps a little more work, it does admittedly leave me feeling nostalgic. The following code illustrates the training technique: num_examples = X_train.shape[0] num_batches = int(num_examples / float(batch_size)) half_batch = int(batch_size / 2) for epoch in range(epochs + 1): for batch in range(num_batches): # noise images for the batch noise = np.random.normal(0, 1, (half_batch, 100)) fake_images = generator.predict(noise) fake_labels = np.zeros((half_batch, 1)) # real images for batch idx = np.random.randint(0, X_train.shape[0], half_batch) real_images = X_train[idx] real_labels = np.ones((half_batch, 1)) # Train the discriminator (real classified as ones and generated as zeros) d_loss_real = discriminator.train_on_batch(real_images, real_labels) d_loss_fake = discriminator.train_on_batch(fake_images, fake_labels) d_loss = 0.5 * np.add(d_loss_real, d_loss_fake) noise = np.random.normal(0, 1, (batch_size, 100)) # Train the generator g_loss = combined.train_on_batch(noise, np.ones((batch_size, 1))) # Plot the progress print("Epoch %d Batch %d/%d [D loss: %f, acc.: %.2f%%] [G loss: %f]" % (epoch,batch, num_batches, d_loss[0], 100 * d_loss[1], g_loss)) if batch % 50 == 0: save_imgs(generator, epoch, batch) There is a lot going on here, to be sure. As before, let's break it down block by block. First, let's see the code to generate noise vectors: noise = np.random.normal(0, 1, (half_batch, 100)) fake_images = generator.predict(noise) fake_labels = np.zeros((half_batch, 1)) This code is generating a matrix of noise vectors called z) and sending it to the generator. It's getting a set of generated images back, which we're calling fake images. We will use these to train the discriminator, so the labels we want to use are 0s, indicating that these are in fact generated images. Note that the shape here is half_batch x 28 x 28 x 1. The half_batch is exactly what you think it is. We're creating half a batch of generated images because the other half of the batch will be real data, which we will assemble next. To get our real images, we will generate a random set of indices across X_train and use that slice of X_train as our real images, as shown in the following code: idx = np.random.randint(0, X_train.shape[0], half_batch) real_images = X_train[idx] real_labels = np.ones((half_batch, 1)) Yes, we are sampling with replacement in this case. It does work out but it's probably not the best way to implement minibatch training. It is, however, probably the easiest and most common. Since we are using these images to train the discriminator, and because they are real images, we will assign them 1s as labels, rather than 0s. Now that we have our discriminator training set assembled, we will update the discriminator. Also, note that we aren't using the soft labels. That's because we want to keep things as easy as they can be to understand. Luckily the network doesn't require them in this case. We will use the following code to train the discriminator: # Train the discriminator (real classified as ones and generated as zeros) d_loss_real = discriminator.train_on_batch(real_images, real_labels) d_loss_fake = discriminator.train_on_batch(fake_images, fake_labels) d_loss = 0.5 * np.add(d_loss_real, d_loss_fake) Notice that here we're using the discriminator's train_on_batch() method. The train_on_batch() method does exactly one round of forward and backward propagation. Every time we call it, it updates the model once from the model's previous state. Also, notice that we're making the update for the real images and fake images separately. This is advice that is given on the GAN hack Git we had previously referenced in the Generator architecture section. Especially in the early stages of training, when real images and fake images are from radically different distributions, batch normalization will cause problems with training if we were to put both sets of data in the same update. Now that the discriminator has been updated, it's time to update the generator. This is done indirectly by updating the combined stack, as shown in the following code: noise = np.random.normal(0, 1, (batch_size, 100)) g_loss = combined.train_on_batch(noise, np.ones((batch_size, 1))) To update the combined model, we create a new noise matrix, and this time it will be as large as the entire batch. We will use that as an input to the stack, which will cause the generator to generate an image and the discriminator to evaluate that image. Finally, we will use the label of 1 because we want to backpropagate the error between a real image and the generated image. Lastly, the training loop reports the discriminator and generator loss at the epoch/batch and then, every 50 batches, of every epoch we will use save_imgs to generate example images and save them to disk, as shown in the following code: print("Epoch %d Batch %d/%d [D loss: %f, acc.: %.2f%%] [G loss: %f]" % (epoch,batch, num_batches, d_loss[0], 100 * d_loss[1], g_loss)) if batch % 50 == 0: save_imgs(generator, epoch, batch) The save_imgs function uses the generator to create images as we go, so we can see the fruits of our labor. We will use the following code to define save_imgs: def save_imgs(generator, epoch, batch): r, c = 5, 5 noise = np.random.normal(0, 1, (r * c, 100)) gen_imgs = generator.predict(noise) gen_imgs = 0.5 * gen_imgs + 0.5 fig, axs = plt.subplots(r, c) cnt = 0 for i in range(r): for j in range(c): axs[i, j].imshow(gen_imgs[cnt, :, :, 0], cmap='gray') axs[i, j].axis('off') cnt += 1 fig.savefig("images/mnist_%d_%d.png" % (epoch, batch)) plt.close() It uses only the generator by creating a noise matrix and retrieving an image matrix in return. Then, using matplotlib.pyplot, it saves those images to disk in a 5 x 5 grid. Performing model evaluation Good is somewhat subjective when you're building a deep neural network to create images. Let's take a look at a few examples of the training process, so you can see for yourself how the GAN begins to learn to generate MNIST. Here's the network at the very first batch of the very first epoch. Clearly, the generator doesn't really know anything about generating MNIST at this point; it's just noise, as shown in the following image: But just 50 batches in, something is happening, as you can see from the following image: And after 200 batches of epoch 0 we can almost see numbers, as you can see from the following image: And here's our generator after one full epoch. These generated numbers look pretty good, and we can see how the discriminator might be fooled by them. At this point, we could probably continue to improve a little bit, but it looks like our GAN has worked as the computer is generating some pretty convincing MNIST digits, as shown in the following image: Thus, we see the power of GANs in action when it comes to image generation using the Keras library. If you found the above article to be useful, make sure you check out our book Deep Learning Quick Reference, for more such interesting coverage of popular deep learning concepts and their practical implementation. Keras 2.2.0 releases! 2 ways to customize your deep learning models with Keras How to build Deep convolutional GAN using TensorFlow and Keras

0
3
58913

How-To Tutorials

article-image-implementing-dependency-injection-in-spring-tutorial

Natasha Mathur

21 Aug 2018

9 min read

Implementing Dependency Injection in Spring [Tutorial]

Natasha Mathur

21 Aug 2018

9 min read

0
0
23913

How-To Tutorials

article-image-build-reinforcement-learning-agent-in-keras-tutorial

Amey Varangaonkar

20 Aug 2018

6 min read

Build your first Reinforcement learning agent in Keras [Tutorial]

Amey Varangaonkar

20 Aug 2018

6 min read

Today there are a variety of tools available at your disposal to develop and train your own Reinforcement learning agent. In this tutorial, we are going to learn about a Keras-RL agent called CartPole. We will go through this example because it won't consume your GPU, and your cloud budget to run. Also, this logic can be easily extended to other Atari problems. This article is an excerpt taken from the book Deep Learning Quick Reference, written by Mike Bernico. Let's talk quickly about the CartPole environment first: CartPole: The CartPole environment consists of a pole, balanced on a cart. The agent has to learn how to balance the pole vertically, while the cart underneath it moves. The agent is given the position of the cart, the velocity of the cart, the angle of the pole, and the rotational rate of the pole as inputs. The agent can apply a force on either side of the cart. If the pole falls more than 15 degrees from vertical, it's game over for our agent. The CartPole agent will use a fairly modest neural network that you should be able to train fairly quickly even without a GPU. We will start by looking at the model architecture. Then we will define the network's memory, exploration policy, and finally, train the agent. CartPole neural network architecture Three hidden layers with 16 neurons each are more than enough to solve this simple problem. We will use the following code to define the model: def build_model(state_size, num_actions): input = Input(shape=(1,state_size)) x = Flatten()(input) x = Dense(16, activation='relu')(x) x = Dense(16, activation='relu')(x) x = Dense(16, activation='relu')(x) output = Dense(num_actions, activation='linear')(x) model = Model(inputs=input, outputs=output) print(model.summary()) return model The input will be a 1 x state space vector and there will be an output neuron for each possible action that will predict the Q value of that action for each step. By taking the argmax of the outputs, we can choose the action with the highest Q value, but we don't have to do that ourselves as Keras-RL will do it for us. Keras-RL Memory Keras-RL provides us with a class called rl.memory.SequentialMemory that provides a fast and efficient data structure that we can store the agent's experiences in: memory = SequentialMemory(limit=50000, window_length=1) We need to specify a maximum size for this memory object, which is a hyperparameter. As new experiences are added to this memory and it becomes full, old experiences are forgotten. Keras-RL Policy Keras-RL provides an -greedy Q Policy called rl.policy.EpsGreedyQPolicy that we can use to balance exploration and exploitation. We can use rl.policy.LinearAnnealedPolicy to decay our as the agent steps forward in the world, as shown in the following code: policy = LinearAnnealedPolicy(EpsGreedyQPolicy(), attr='eps', value_max=1., value_min=.1, value_test=.05, nb_steps=10000) Here we're saying that we want to start with a value of 1 for and go no smaller than 0.1, while testing if our random number is less than 0.05. We set the number of steps between 1 and .1 to 10,000 and Keras-RL handles the decay math for us. Agent With a model, memory, and policy defined, we're now ready to create a deep Q network Agent and send that agent those objects. Keras-RL provides an agent class called rl.agents.dqn.DQNAgent that we can use for this, as shown in the following code: dqn = DQNAgent(model=model, nb_actions=num_actions, memory=memory, nb_steps_warmup=10, target_model_update=1e-2, policy=policy) dqn.compile(Adam(lr=1e-3), metrics=['mae']) Two of these parameters are probably unfamiliar at this point, target_model_update and nb_steps_warmup: nb_steps_warmup: Determines how long we wait before we start doing experience replay, which if you recall, is when we actually start training the network. This lets us build up enough experience to build a proper minibatch. If you choose a value for this parameter that's smaller than your batch size, Keras RL will sample with a replacement. target_model_update: The Q function is recursive and when the agent updates it's network for Q(s,a) that update also impacts the prediction it will make for Q(s', a). This can make for a very unstable network. The way most deep Q network implementations address this limitation is by using a target network, which is a copy of the deep Q network that isn't trained, but rather replaced with a fresh copy every so often. The target_model_update parameter controls how often this happens. Keras-RL Training Keras-RL provides several Keras-like callbacks that allow for convenient model checkpointing and logging. We will use both of those callbacks below. If you would like to see more of the callbacks Keras-RL provides, they can be found here: https://github.com/matthiasplappert/keras-rl/blob/master/rl/callbacks.py. You can also find a Callback class that you can use to create your own Keras-RL callbacks. We will use the following code to train our model: def build_callbacks(env_name): checkpoint_weights_filename = 'dqn_' + env_name + '_weights_{step}.h5f' log_filename = 'dqn_{}_log.json'.format(env_name) callbacks = [ModelIntervalCheckpoint(checkpoint_weights_filename, interval=5000)] callbacks += [FileLogger(log_filename, interval=100)] return callbacks callbacks = build_callbacks(ENV_NAME) dqn.fit(env, nb_steps=50000, visualize=False, verbose=2, callbacks=callbacks) Once the agent's callbacks are built, we can fit the DQNAgent by using a .fit() method. Take note of the visualize parameter in this example. If visualize were set to True, we would be able to watch the agent interact with the environment as we went. However, this significantly slows down the training. Results After the first 250 episodes, we will see that the total rewards for the episode approach 200 and the episode steps also approach 200. This means that the agent has learned to balance the pole on the cart until the environment ends at a maximum of 200 steps. It's of course fun to watch our success, so we can use the DQNAgent .test() method to evaluate for some number of episodes. The following code is used to define this method: dqn.test(env, nb_episodes=5, visualize=True) Here we've set visualize=True so we can watch our agent balance the pole, as shown in the following image: There we go, that's one balanced pole! Alright, I know, I'll admit that balancing a pole on a cart isn't all that cool, but it's a good enough demonstration of the process! Hopefully, you have now understood the dynamics behind the process, and as we discussed earlier, the solution to this problem can be applied to other similar game-based problems. If you found this article to be useful, make sure you check out the book Deep Learning Quick Reference to understand the other different types of reinforcement models you can build using Keras. Top 5 tools for reinforcement learning DeepCube: A new deep reinforcement learning approach solves the Rubik’s cube with no human help OpenAI builds reinforcement learning based system giving robots human like dexterity

0
0
55776

How-To Tutorials

article-image-app-metrics-analyze-http-traffic-errors-network-performance-net-core-app

Aaron Lazar

20 Aug 2018

12 min read

Use App Metrics to analyze HTTP traffic, errors & network performance of a .NET Core app [Tutorial]

Aaron Lazar

20 Aug 2018

12 min read

0
0
38171

How-To Tutorials

article-image-best-practices-for-c-code-optimization-tutorial

Aaron Lazar

17 Aug 2018

9 min read

Best practices for C# code optimization [Tutorial]

Aaron Lazar

17 Aug 2018

9 min read

There are many factors that negatively impact the performance of a .NET Core application. Sometimes these are minor things that were not considered earlier at the time of writing the code, and are not addressed by the accepted best practices. As a result, to solve these problems, programmers often resort to ad hoc solutions. However, when bad practices are combined together, they produce performance issues. It is always better to know the best practices that help developers write cleaner code and make the application performant. In this article, we will learn the following topics: Boxing and unboxing overhead String concatenation Exceptions handling for versus foreach Delegates This tutorial is an extract from the book, C# 7 and .NET Core 2.0 High Performance, authored by Ovais Mehboob Ahmed Khan. Boxing and unboxing overhead The boxing and unboxing methods are not always good to use and they negatively impact the performance of mission-critical applications. Boxing is a method of converting a value type to an object type, and is done implicitly, whereas unboxing is a method of converting an object type back to a value type and requires explicit casting. Let's go through an example where we have two methods executing a loop of 10 million records, and in each iteration, they are incrementing the counter by 1. The AvoidBoxingUnboxing method is using a primitive integer to initialize and increment it on each iteration, whereas the BoxingUnboxing method is boxing by assigning the numeric value to the object type first and then unboxing it on each iteration to convert it back to the integer type, as shown in the following code: private static void AvoidBoxingUnboxing() { Stopwatch watch = new Stopwatch(); watch.Start(); //Boxing int counter = 0; for (int i = 0; i < 1000000; i++) { //Unboxing counter = i + 1; } watch.Stop(); Console.WriteLine($"Time taken {watch.ElapsedMilliseconds}"); } private static void BoxingUnboxing() { Stopwatch watch = new Stopwatch(); watch.Start(); //Boxing object counter = 0; for (int i = 0; i < 1000000; i++) { //Unboxing counter = (int)i + 1; } watch.Stop(); Console.WriteLine($"Time taken {watch.ElapsedMilliseconds}"); } When we run both methods, we will clearly see the differences in performance. The BoxingUnboxing is executed seven times slower than the AvoidBoxingUnboxing method, as shown in the following screenshot: For mission-critical applications, it's always better to avoid boxing and unboxing. However, in .NET Core, we have many other types that internally use objects and perform boxing and unboxing. Most of the types under System.Collections and System.Collections.Specialized use objects and object arrays for internal storage, and when we store primitive types in these collections, they perform boxing and convert each primitive value to an object type, adding extra overhead and negatively impacting the performance of the application. Other types of System.Data, namely DateSet, DataTable, and DataRow, also use object arrays under the hood. Types under the System.Collections.Generic namespace or typed arrays are the best approaches to use when performance is the primary concern. For example, HashSet<T>, LinkedList<T>, and List<T> are all types of generic collections. For example, here is a program that stores the integer value in ArrayList: private static void AddValuesInArrayList() { Stopwatch watch = new Stopwatch(); watch.Start(); ArrayList arr = new ArrayList(); for (int i = 0; i < 1000000; i++) { arr.Add(i); } watch.Stop(); Console.WriteLine($"Total time taken is {watch.ElapsedMilliseconds}"); } Let's write another program that uses a generic list of the integer type: private static void AddValuesInGenericList() { Stopwatch watch = new Stopwatch(); watch.Start(); List<int> lst = new List<int>(); for (int i = 0; i < 1000000; i++) { lst.Add(i); } watch.Stop(); Console.WriteLine($"Total time taken is {watch.ElapsedMilliseconds}"); } When running both programs, the differences are pretty noticeable. The code with the generic list List<int> is over 10 times faster than the code with ArrayList. The result is as follows: String concatenation In .NET, strings are immutable objects. Two strings refer to the same memory on the heap until the string value is changed. If any of the string is changed, a new string is created on the heap and is allocated a new memory space. Immutable objects are generally thread safe and eliminate the race conditions between multiple threads. Any change in the string value creates and allocates a new object in memory and avoids producing conflicting scenarios with multiple threads. For example, let's initialize the string and assign the Hello World value to the a string variable: String a = "Hello World"; Now, let's assign the a string variable to another variable, b: String b = a; Both a and b point to the same value on the heap, as shown in the following diagram: Now, suppose we change the value of b to Hope this helps: b= "Hope this helps"; This will create another object on the heap, where a points to the same and b refers to the new memory space that contains the new text: With each change in the string, the object allocates a new memory space. In some cases, it may be an overkill scenario, where the frequency of string modification is higher and each modification is allocated a separate memory space, creates work for the garbage collector in collecting the unused objects and freeing up space. In such a scenario, it is highly recommended that you use the StringBuilder class. Exception handling Improper handling of exceptions also decreases the performance of an application. The following list contains some of the best practices in dealing with exceptions in .NET Core: Always use a specific exception type or a type that can catch the exception for the code you have written in the method. Using the Exception type for all cases is not a good practice. It is always a good practice to use try, catch, and finally block where the code can throw exceptions. The final block is usually used to clean up the resources, and returns a proper response that the calling code is expecting. In deeply nested code, don't use try catch block and handle it to the calling method or main method. Catching exceptions on multiple stacks slows down performance and is not recommended. Always use exceptions for fatal conditions that terminate the program. Using exceptions for noncritical conditions, such as converting the value to an integer or reading the value from an empty array, is not recommended and should be handled through custom logic. For example, converting a string value to the integer type can be done by using the Int32.Parse method rather than by using the Convert.ToInt32 method and then failing at a point when the string is not represented as a digit. While throwing an exception, add a meaningful message so that the user knows where that exception has actually occurred rather than going through the stack trace. For example, the following code shows a way of throwing an exception and adding a custom message based on the method and class being called: static string GetCountryDetails(Dictionary<string, string> countryDictionary, string key) { try { return countryDictionary[key]; } catch (KeyNotFoundException ex) { KeyNotFoundException argEx = new KeyNotFoundException(" Error occured while executing GetCountryDetails method. Cause: Key not found", ex); throw argEx; } } Throw exceptions rather than returning the custom messages or error codes and handle it in the main calling method. When logging exceptions, always check the inner exception and read the exception message or stack trace. It is helpful, and gives the actual point in the code where the error is thrown. For vs foreach For and foreach are two of the alternative ways of iterating over a list of items. Each of them operates in a different way. The for loop actually loads all the items of the list in memory first and then uses an indexer to iterate over each element, whereas foreach uses an enumerator and iterates until it reaches the end of the list. The following table shows the types of collections that are good to use for for and foreach: Type For/Foreach Typed array Good for both Array list Better with for Generic collections Better with for Delegates Delegates are a type in .NET which hold the reference to the method. The type is equivalent to the function pointer in C or C++. When defining a delegate, we can specify both the parameters that the method can take and its return type. This way, the reference methods will have the same signature. Here is a simple delegate that takes a string and returns an integer: delegate int Log(string n); Now, suppose we have a LogToConsole method that has the same signature as the one shown in the following code. This method takes the string and writes it to the console window: static int LogToConsole(string a) { Console.WriteLine(a); return 1; } We can initialize and use this delegate like this: Log logDelegate = LogToConsole; logDelegate ("This is a simple delegate call"); Suppose we have another method called LogToDatabase that writes the information in the database: static int LogToDatabase(string a) { Console.WriteLine(a); //Log to database return 1; } Here is the initialization of the new logDelegate instance that references the LogToDatabase method: Log logDelegateDatabase = LogToDatabase; logDelegateDatabase ("This is a simple delegate call"); The preceding delegate is the representation of unicast delegates, as each instance refers to a single method. On the other hand, we can also create multicast delegates by assigning LogToDatabase to the same LogDelegate instance, as follows: Log logDelegate = LogToConsole; logDelegate += LogToDatabase; logDelegate("This is a simple delegate call"); The preceding code seems pretty straightforward and optimized, but under the hood, it has a huge performance overhead. In .NET, delegates are implemented by a MutlicastDelegate class that is optimized to run unicast delegates. It stores the reference of the method to the target property and calls the method directly. For multicast delegates, it uses the invocation list, which is a generic list, and holds the references to each method that is added. With multicast delegates, each target property holds the reference to the generic list that contains the method and executes in sequence. However, this adds an overhead for multicast delegates and takes more time to execute. If you liked this article and would like to learn more such techniques, grab this book, C# 7 and .NET Core 2.0 High Performance, authored by Ovais Mehboob Ahmed Khan. Behavior Scripting in C# and Javascript for game developers Exciting New Features in C# 8.0 Exploring Language Improvements in C# 7.2 and 7.3

0
1
26997

How-To Tutorials

article-image-rust-for-web-development-tutorial

Aaron Lazar

17 Aug 2018

13 min read

Use Rust for web development [Tutorial]

Aaron Lazar

17 Aug 2018

13 min read

You might think that Rust is only meant to be used for complex system development, or that it should be used where security is the number one concern. Thinking of using it for web development might sound to you like huge overkill. We already have proven web-oriented languages that have worked until now, such as PHP or JavaScript, right? This is far from true. Many projects use the web as their platform and for them, it's sometimes more important to be able to receive a lot of traffic without investing in expensive servers rather than using legacy technologies, especially in new products. This is where Rust comes in handy. Thanks to its speed and some really well thought out web-oriented frameworks, Rust performs even better than the legacy web programming languages. In this tutorial, we'll see how Rust can be used for Web Development. This article is an extract from Rust High Performance, authored by Iban Eguia Moraza. Rust is even trying to replace some of the JavaScript on the client side of applications, since Rust can compile to WebAssembly, making it extremely powerful for heavy client-side web workloads. Creating extremely efficient web templates We have seen that Rust is a really efficient language and metaprogramming allows for the creation of even more efficient code. Rust has great templating language support, such as Handlebars and Tera. Rust's Handlebars implementation is much faster than the JavaScript implementation, while Tera is a template engine created for Rust based on Jinja2. In both cases, you define a template file and then you use Rust to parse it. Even though this will be reasonable for most web development, in some cases, it might be slower than pure Rust alternatives. This is where the Maud crate comes in. We will see how it works and how it achieves orders of magnitude faster performance than its counterparts. To use Maud, you will need nightly Rust, since it uses procedural macros. As we saw in previous chapters, if you are using rustup you can simply run rustup override set nightly. Then, you will need to add Maud to your Cargo.toml file in the [dependencies] section: [dependencies] maud = "0.17.2 Maud brings an html!{} procedural macro that enables you to write HTML in Rust. You will, therefore, need to import the necessary crate and macro in your main.rs or lib.rs file, as you will see in the following code. Remember to also add the procedural macro feature at the beginning of the crate: #![feature(proc_macro)] extern crate maud; use maud::html; You will now be able to use the html!{} macro in your main() function. This macro will return a Markup object, which you can then convert to a String or return to Rocket or Iron for your website implementation (you will need to use the relevant Maud features in that case). Let's see what a short template implementation looks like: fn main() { use maud::PreEscaped; let user_name = "FooBar"; let markup = html! { (PreEscaped("<!DOCTYPE html>")) html { head { title { "Test website" } meta charset="UTF-8"; } body { header { nav { ul { li { "Home" } li { "Contact Us" } } } } main { h1 { "Welcome to our test template!" } p { "Hello, " (user_name) "!" } } footer { p { "Copyright © 2017 - someone" } } } } }; println!("{}", markup.into_string()); } It seems like a complex template, but it contains just the basic information a new website should have. We first add a doctype, making sure it will not escape the content (that is what the PreEscaped is for) and then we start the HTML document with two parts: the head and the body. In the head, we add the required title and the charset meta element to tell the browser that we will be using UTF-8. Then, the body contains the three usual sections, even though this can, of course, be modified. One header, one main section, and one footer. I added some example information in each of the sections and showed you how to add a dynamic variable in the main section inside a paragraph. The interesting syntax here is that you can create elements with attributes, such as the meta element, even without content, by finishing it early with a semicolon. You can use any HTML tag and add variables. The generated code will be escaped, except if you ask for non-escaped data, and it will be minified so that it occupies the least space when being transmitted. Inside the parentheses, you can call any function or variable that returns a type that implements the Display trait and you can even add any Rust code if you add braces around it, with the last statement returning a Display element. This works on attributes too. This gets processed at compile time, so that at runtime it will only need to perform the minimum possible amount of work, making it extremely efficient. And not only that; the template will be typesafe thanks to Rust's compile-time guarantees, so you won't forget to close a tag or an attribute. There is a complete guide to the templating engine that can be found at https://maud.lambda.xyz/. Connecting with a database If we want to use SQL/relational databases in Rust, there is no other crate to think about than Diesel. If you need access to NoSQL databases such as Redis or MongoDB, you will also find proper crates, but since the most used databases are relational databases, we will check Diesel here. Diesel makes working with MySQL/MariaDB, PostgreSQL, and SQLite very easy by providing a great ORM and typesafe query builder. It prevents all potential SQL injections at compile time, but is still extremely fast. In fact, it's usually faster than using prepared statements, due to the way it manages connections to databases. Without entering into technical details, we will check how this stable framework works. The development of Diesel has been impressive and it's already working in stable Rust. It even has a stable 1.x version, so let's check how we can map a simple table. Diesel comes with a command-line interface program, which makes it much easier to use. To install it, run cargo install diesel_cli. Note that, by default, this will try to install it for PostgreSQL, MariaDB/MySQL, and SQLite. For this short tutorial, you need to have SQLite 3 development files installed, but if you want to avoid installing all MariaDB/MySQL or PostgreSQL files, you should run the following command: cargo install --no-default-features --features sqlite diesel_cli Then, since we will be using SQLite for our short test, add a file named .env to the current directory, with the following content: DATABASE_URL=test.sqlite We can now run diesel setup and diesel migration generate initial_schema. This will create the test.sqlite SQLite database and a migrations folder, with the first empty initial schema migration. Let's add this to the initial schema up.sql file: CREATE TABLE 'users' ( 'username' TEXT NOT NULL PRIMARY KEY, 'password' TEXT NOT NULL, 'email' TEXT UNIQUE ); In its counterpart down.sql file, we will need to drop the created table: DROP TABLE `users`; Then, we can execute diesel migration run and check that everything went smoothly. We can execute diesel migration redo to check that the rollback and recreation worked properly. We can now start using the ORM. We will need to add diesel, diesel_infer_schema, and dotenv to our Cargo.toml. The dotenv crate will read the .env file to generate the environment variables. If you want to avoid using all the MariaDB/MySQL or PostgreSQL features, you will need to configure diesel for it: [dependencies] dotenv = "0.10.1" [dependencies.diesel] version = "1.1.1" default-features = false features = ["sqlite"] [dependencies.diesel_infer_schema] version = "1.1.0" default-features = false features = ["sqlite"] Let's now create a structure that we will be able to use to retrieve data from the database. We will also need some boilerplate code to make everything work: #[macro_use] extern crate diesel; #[macro_use] extern crate diesel_infer_schema; extern crate dotenv; use diesel::prelude::*; use diesel::sqlite::SqliteConnection; use dotenv::dotenv; use std::env; #[derive(Debug, Queryable)] struct User { username: String, password: String, email: Option<String>, } fn establish_connection() -> SqliteConnection { dotenv().ok(); let database_url = env::var("DATABASE_URL") .expect("DATABASE_URL must be set"); SqliteConnection::establish(&database_url) .expect(&format!("error connecting to {}", database_url)) } mod schema { infer_schema!("dotenv:DATABASE_URL"); } Here, the establish_connection() function will call dotenv() so that the variables in the .env file get to the environment, and then it uses that DATABASE_URL variable to establish the connection with the SQLite database and returns the handle. The schema module will contain the schema of the database. The infer_schema!() macro will get the DATABASE_URL variable and connect to the database at compile time to generate the schema. Make sure you run all the migrations before compiling. We can now develop a small main() function with the basics to list all of the users from the database: fn main() { use schema::users::dsl::*; let connection = establish_connection(); let all_users = users .load::<User>(&connection) .expect("error loading users"); println!("{:?}", all_users); } This will just load all of the users from the database into a list. Notice the use statement at the beginning of the function. This retrieves the required information from the schema for the users table so that we can then call users.load(). As you can see in the guides at diesel.rs, you can also generate Insertable objects, which might not have some of the fields with default values, and you can perform complex queries by filtering the results in the same way you would write a SELECT statement. Creating a complete web server There are multiple web frameworks for Rust. Some of them work in stable Rust, such as Iron and Nickel Frameworks, and some don't, such as Rocket. We will talk about the latter since, even if it forces you to use the latest nightly branch, it's so much more powerful than the rest that it really makes no sense to use any of the others if you have the option to use Rust nightly. Using Diesel with Rocket, apart from the funny wordplay joke, works seamlessly. You will probably be using the two of them together, but in this section, we will learn how to create a small Rocket server without any further complexity. There are some boilerplate code implementations that add a database, cache, OAuth, templating, response compression, JavaScript minification, and SASS minification to the website, such as my Rust web template in GitHub if you need to start developing a real-life Rust web application. Rocket trades that nightly instability, which will break your code more often than not, for simplicity and performance. Developing a Rocket application is really easy and the performance of the results is astonishing. It's even faster than using some other, seemingly simpler frameworks, and of course, it's much faster than most of the frameworks in other languages. So, how does it feel to develop a Rocket application? We start by adding the latest rocket and rocket_codegen crates to our Cargo.toml file and adding a nightly override to our current directory by running rustup override set nightly. The rocket crate contains all the code to run the server, while the rocket_codegen crate is actually a compiler plugin that modifies the language to adapt it for web development. We can now write the default Hello, world! Rocket example: #![feature(plugin)] #![plugin(rocket_codegen)] extern crate rocket; #[get("/")] fn index() -> &'static str { "Hello, world!" } fn main() { rocket::ignite().mount("/", routes![index]).launch(); } In this example, we can see how we ask Rust to let us use plugins to then import the rocket_codegen plugin. This will enable us to use attributes such as #[get] or #[post] with request information that will generate boilerplate code when compiled, leaving our code fairly simple for our development. Also, note that this code has been checked with Rocket 0.3 and it might fail in a future version, since the library is not stable yet. In this case, you can see that the index() function will respond to any GET request with a base URL. This can be modified to accept only certain URLs or to get the path of something from the URL. You can also have overlapping routes with different priorities so that if one is not taken for a request guard, the next will be tried. And, talking about request guards, you can create objects that can be generated when processing a request that will only let the request process a given function if they are properly built. This means that you can, for example, create a User object that will get generated by checking the cookies in the request and comparing them in a Redis database, only allowing the execution of the function for logged-in users. This easily prevents many logic flaws. The main() function ignites the Rocket and mounts the index route at /. This means that you can have multiple routes with the same path mounted at different route paths and they do not need to know about the whole path in the URL. In the end, it will launch the Rocket server and if you run it with cargo run, it will show the following: If you go to the URL, you will see the Hello, World! message. Rocket is highly configurable. It has a rocket_contrib crate which offers templates and further features, and you can create responders to add GZip compression to responses. You can also create your own error responders when an error occurs. You can also configure the behavior of Rocket by using the Rocket.toml file and environment variables. As you can see in this last output, it is running in development mode, which adds some debugging information. You can configure different behaviors for staging and production modes and make them perform faster. Also, make sure that you compile the code in --release mode in production. If you want to develop a web application in Rocket, make sure you check https://rocket.rs/ for further information. Future releases also look promising. Rocket will implement native CSRF and XSS prevention, which, in theory, should prevent all XSS and CSRF attacks at compile time. It will also make further customizations to the engine possible. If you found this article useful and would like to learn more such tips, head over to pick up the book, Rust High Performance, authored by Iban Eguia Moraza. Mozilla is building a bridge between Rust and JavaScript Perform Advanced Programming with Rust Say hello to Sequoia: a new Rust based OpenPGP library to secure your apps

0
0
20788

How-To Tutorials

article-image-task-parallel-library-multi-threading-net-core

Aaron Lazar

16 Aug 2018

11 min read

Task parallel library for easy multi-threading in .NET Core [Tutorial]

Aaron Lazar

16 Aug 2018

11 min read

Compared to the classic threading model in .NET, Task Parallel Library minimizes the complexity of using threads and provides an abstraction through a set of APIs that help developers focus more on the application program instead of focusing on how the threads will be provisioned. In this article, we'll learn how TPL benefits of using traditional threading techniques for concurrency and high performance. There are several benefits of using TPL over threads: It autoscales the concurrency to a multicore level It autoscales LINQ queries to a multicore level It handles the partitioning of the work and uses ThreadPool where required It is easy to use and reduces the complexity of working with threads directly This tutorial is an extract from the book, C# 7 and .NET Core 2.0 High Performance, authored by Ovais Mehboob Ahmed Khan. Creating a task using TPL TPL APIs are available in the System.Threading and System.Threading.Tasks namespaces. They work around the task, which is a program or a block of code that runs asynchronously. An asynchronous task can be run by calling either the Task.Run or TaskFactory.StartNew methods. When we create a task, we provide a named delegate, anonymous method, or a lambda expression that the task executes. Here is a code snippet that uses a lambda expression to execute the ExecuteLongRunningTasksmethod using Task.Run: class Program { static void Main(string[] args) { Task t = Task.Run(()=>ExecuteLongRunningTask(5000)); t.Wait(); } public static void ExecuteLongRunningTask(int millis) { Thread.Sleep(millis); Console.WriteLine("Hello World"); } } In the preceding code snippet, we have executed the ExecuteLongRunningTask method asynchronously using the Task.Run method. The Task.Run method returns the Task object that can be used to further wait for the asynchronous piece of code to be executed completely before the program ends. To wait for the task, we have used the Wait method. Alternatively, we can also use the Task.Factory.StartNew method, which is more advanced and provides more options. While calling the Task.Factory.StartNew method, we can specify CancellationToken, TaskCreationOptions, and TaskScheduler to set the state, specify other options, and schedule tasks. TPL uses multiple cores of the CPU out of the box. When the task is executed using the TPL API, it automatically splits the task into one or more threads and utilizes multiple processors, if they are available. The decision as to how many threads will be created is calculated at runtime by CLR. Whereas a thread only has an affinity to a single processor, running any task on multiple processors needs a proper manual implementation. Task-based asynchronous pattern (TAP) When developing any software, it is always good to implement the best practices while designing its architecture. The task-based asynchronous pattern is one of the recommended patterns that can be used when working with TPL. There are, however, a few things to bear in mind while implementing TAP. Naming convention The method executing asynchronously should have the naming suffix Async. For example, if the method name starts with ExecuteLongRunningOperation, it should have the suffix Async, with the resulting name of ExecuteLongRunningOperationAsync. Return type The method signature should return either a System.Threading.Tasks.Task or System.Threading.Tasks.Task<TResult>. The task's return type is equivalent to the method that returns void, whereas TResult is the data type. Parameters The out and ref parameters are not allowed as parameters in the method signature. If multiple values need to be returned, tuples or a custom data structure can be used. The method should always return Task or Task<TResult>, as discussed previously. Here are a few signatures for both synchronous and asynchronous methods: Synchronous methodAsynchronous methodVoid Execute();Task ExecuteAsync();List<string> GetCountries();Task<List<string>> GetCountriesAsync();Tuple<int, string> GetState(int stateID);Task<Tuple<int, string>> GetStateAsync(int stateID);Person GetPerson(int personID);Task<Person> GetPersonAsync(int personID); Exceptions The asynchronous method should always throw exceptions that are assigned to the returning task. However, the usage errors, such as passing null parameters to the asynchronous method, should be properly handled. Let's suppose we want to generate several documents dynamically based on a predefined templates list, where each template populates the placeholders with dynamic values and writes it on the filesystem. We assume that this operation will take a sufficient amount of time to generate a document for each template. Here is a code snippet showing how the exceptions can be handled: static void Main(string[] args) { List<Template> templates = GetTemplates(); IEnumerable<Task> asyncDocs = from template in templates select GenerateDocumentAsync(template); try { Task.WaitAll(asyncDocs.ToArray()); }catch(Exception ex) { Console.WriteLine(ex); } Console.Read(); } private static async Task<int> GenerateDocumentAsync(Template template) { //To automate long running operation Thread.Sleep(3000); //Throwing exception intentionally throw new Exception(); } In the preceding code, we have a GenerateDocumentAsync method that performs a long running operation, such as reading the template from the database, populating placeholders, and writing a document to the filesystem. To automate this process, we used Thread.Sleep to sleep the thread for three seconds and then throw an exception that will be propagated to the calling method. The Main method loops the templates list and calls the GenerateDocumentAsync method for each template. Each GenerateDocumentAsync method returns a task. When calling an asynchronous method, the exception is actually hidden until the Wait, WaitAll, WhenAll, and other methods are called. In the preceding example, the exception will be thrown once the Task.WaitAll method is called, and will log the exception on the console. Task status The task object provides a TaskStatus that is used to know whether the task is executing the method running, has completed the method, has encountered a fault, or whether some other occurrence has taken place. The task initialized using Task.Run initially has the status of Created, but when the Start method is called, its status is changed to Running. When applying the TAP pattern, all the methods return the Task object, and whether they are using the Task.Run inside, the method body should be activated. That means that the status should be anything other than Created. The TAP pattern ensures the consumer that the task is activated and the starting task is not required. Task cancellation Cancellation is an optional thing for TAP-based asynchronous methods. If the method accepts the CancellationToken as the parameter, it can be used by the caller party to cancel a task. However, for a TAP, the cancellation should be properly handled. Here is a basic example showing how cancellation can be implemented: static void Main(string[] args) { CancellationTokenSource tokenSource = new CancellationTokenSource(); CancellationToken token = tokenSource.Token; Task.Factory.StartNew(() => SaveFileAsync(path, bytes, token)); } static Task<int> SaveFileAsync(string path, byte[] fileBytes, CancellationToken cancellationToken) { if (cancellationToken.IsCancellationRequested) { Console.WriteLine("Cancellation is requested..."); cancellationToken.ThrowIfCancellationRequested } //Do some file save operation File.WriteAllBytes(path, fileBytes); return Task.FromResult<int>(0); } In the preceding code, we have a SaveFileAsync method that takes the byte array and the CancellationToken as parameters. In the Main method, we initialize the CancellationTokenSource that can be used to cancel the asynchronous operation later in the program. To test the cancellation scenario, we will just call the Cancel method of the tokenSource after the Task.Factory.StartNew method and the operation will be canceled. Moreover, when the task is canceled, its status is set to Cancelled and the IsCompleted property is set to true. Task progress reporting With TPL, we can use the IProgress<T> interface to get real-time progress notifications from the asynchronous operations. This can be used in scenarios where we need to update the user interface or the console app of asynchronous operations. When defining the TAP-based asynchronous methods, defining IProgress<T> in a parameter is optional. We can have overloaded methods that can help consumers to use in the case of specific needs. However, they should only be used if the asynchronous method supports them. Here is the modified version of SaveFileAsync that updates the user about the real progress: static void Main(string[] args) { var progressHandler = new Progress<string>(value => { Console.WriteLine(value); }); var progress = progressHandler as IProgress<string>; CancellationTokenSource tokenSource = new CancellationTokenSource(); CancellationToken token = tokenSource.Token; Task.Factory.StartNew(() => SaveFileAsync(path, bytes, token, progress)); Console.Read(); } static Task<int> SaveFileAsync(string path, byte[] fileBytes, CancellationToken cancellationToken, IProgress<string> progress) { if (cancellationToken.IsCancellationRequested) { progress.Report("Cancellation is called"); Console.WriteLine("Cancellation is requested..."); } progress.Report("Saving File"); File.WriteAllBytes(path, fileBytes); progress.Report("File Saved"); return Task.FromResult<int>(0); } Implementing TAP using compilers Any method that is attributed with the async keyword (for C#) or Async for (Visual Basic) is called an asynchronous method. The async keyword can be applied to a method, anonymous method, or a Lambda expression, and the language compiler can execute that task asynchronously. Here is a simple implementation of the TAP method using the compiler approach: static void Main(string[] args) { var t = ExecuteLongRunningOperationAsync(100000); Console.WriteLine("Called ExecuteLongRunningOperationAsync method, now waiting for it to complete"); t.Wait(); Console.Read(); } public static async Task<int> ExecuteLongRunningOperationAsync(int millis) { Task t = Task.Factory.StartNew(() => RunLoopAsync(millis)); await t; Console.WriteLine("Executed RunLoopAsync method"); return 0; } public static void RunLoopAsync(int millis) { Console.WriteLine("Inside RunLoopAsync method"); for(int i=0;i< millis; i++) { Debug.WriteLine($"Counter = {i}"); } Console.WriteLine("Exiting RunLoopAsync method"); } In the preceding code, we have the ExecuteLongRunningOperationAsync method, which is implemented as per the compiler approach. It calls the RunLoopAsync that executes a loop for a certain number of milliseconds that is passed in the parameter. The async keyword on the ExecuteLongRunningOperationAsync method actually tells the compiler that this method has to be executed asynchronously, and, once the await statement is reached, the method returns to the Main method that writes the line on a console and waits for the task to be completed. Once the RunLoopAsync is executed, the control comes back to await and starts executing the next statements in the ExecuteLongRunningOperationAsync method. Implementing TAP with greater control over Task As we know, that the TPL is centered on the Task and Task<TResult> objects. We can execute an asynchronous task by calling the Task.Run method and execute a delegate method or a block of code asynchronously and use Wait or other methods on that task. However, this approach is not always adequate, and there are scenarios where we may have different approaches to executing asynchronous operations, and we may use an Event-based Asynchronous Pattern (EAP) or an Asynchronous Programming Model (APM). To implement TAP principles here, and to get the same control over asynchronous operations executing with different models, we can use the TaskCompletionSource<TResult> object. The TaskCompletionSource<TResult> object is used to create a task that executes an asynchronous operation. When the asynchronous operation completes, we can use the TaskCompletionSource<TResult> object to set the result, exception, or state of the task. Here is a basic example that executes the ExecuteTask method that returns Task, where the ExecuteTask method uses the TaskCompletionSource<TResult> object to wrap the response as a Task and executes the ExecuteLongRunningTask through the Task.StartNew method: static void Main(string[] args) { var t = ExecuteTask(); t.Wait(); Console.Read(); } public static Task<int> ExecuteTask() { var tcs = new TaskCompletionSource<int>(); Task<int> t1 = tcs.Task; Task.Factory.StartNew(() => { try { ExecuteLongRunningTask(10000); tcs.SetResult(1); }catch(Exception ex) { tcs.SetException(ex); } }); return tcs.Task; } public static void ExecuteLongRunningTask(int millis) { Thread.Sleep(millis); Console.WriteLine("Executed"); } So now, we've been able to use TPL and TAP over traditional threads, thus improving performance. If you liked this article and would like to learn more such techniques, pick up this book, C# 7 and .NET Core 2.0 High Performance, authored by Ovais Mehboob Ahmed Khan. Get to know ASP.NET Core Web API [Tutorial] .NET Core completes move to the new compiler – RyuJIT Applying Single Responsibility principle from SOLID in .NET Core

0
0
33211

How-To Tutorials

article-image-f-for-net-core-application-development-tutorial

Aaron Lazar

16 Aug 2018

17 min read

Getting started with F# for .Net Core application development [Tutorial]

Aaron Lazar

16 Aug 2018

17 min read

F# is Microsoft's purely functional programming language, that can be used along with the .NET Core framework. In this article, we will get introduced to F# to leverage .NET Core for our application development. This article is extracted from the book, .NET Core 2.0 By Example, written by Rishabh Verma and Neha Shrivastava. Basics of classes Classes are types of object which can contain functions, properties, and events. An F# class must have a parameter and a function attached like a member. Both properties and functions can use the member keyword. The following is the class definition syntax: type [access-modifier] type-name [type-params] [access-modifier] (parameter-list) [ as identifier ] = [ class ] [ inherit base-type-name(base-constructor-args) ] [ let-bindings ] [ do-bindings ] member-list [ end ] // Mutually recursive class definitions: type [access-modifier] type-name1 ... and [access-modifier] type-name2 ... Let’s discuss the preceding syntax for class declaration: type: In the F# language, class definition starts with a type keyword. access-modifier: The F# language supports three access modifiers—public, private, and internal. By default, it considers the public modifier if no other access modifier is provided. The Protected keyword is not used in the F# language, and the reason is that it will become object oriented rather than functional programming. For example, F# usually calls a member using a lambda expression and if we make a member type protected and call an object of a different instance, it will not work. type-name: It is any of the previously mentioned valid identifiers; the default access modifier is public. type-params: It defines optional generic type parameters. parameter-list: It defines constructor parameters; the default access modifier for the primary constructor is public. identifier: It is used with the optional as keyword, the as keyword gives a name to an instance variable which can be used in the type definition to refer to the instance of the type. Inherit: This keyword allows us to specify the base class for a class. let-bindings: This is used to declare fields or function values in the context of a class. do-bindings: This is useful for the execution of code to create an object member-list: The member-list comprises extra constructors, instance and static method declarations, abstract bindings, interface declarations, and event and property declarations. Here is an example of a class: type StudentName(firstName,lastName) = member this.FirstName = firstName member this.LastName = lastName In the previous example, we have not defined the parameter type. By default, the program considers it as a string value but we can explicitly define a data type, as follows: type StudentName(firstName:string,lastName:string) = member this.FirstName = firstName member this.LastName = lastName Constructor of a class In F#, the constructor works in a different way to any other .NET language. The constructor creates an instance of a class. A parameter list defines the arguments of the primary constructor and class. The constructor contains let and do bindings, which we will discuss next. We can add multiple constructors, apart from the primary constructor, using the new keyword and it must invoke the primary constructor, which is defined with the class declaration. The syntax of defining a new constructor is as shown: new (argument-list) = constructor-body Here is an example to explain the concept. In the following code, the StudentDetail class has two constructors: a primary constructor that takes two arguments and another constructor that takes no arguments: type StudentDetail(x: int, y: int) = do printfn "%d %d" x y new() = StudentDetail(0, 0) A let and do binding A let and do binding creates the primary constructor of a class and runs when an instance of a class is created. A function is compiled into a member if it has a let binding. If the let binding is a value which is not used in any function or member, then it is compiled into a local variable of a constructor; otherwise, it is compiled into a field of the class. The do expression executes the initialized code. As any extra constructors always call the primary constructor, let and do bindings always execute, irrespective of which constructor is called. Fields that are created by let bindings can be accessed through the methods and properties of the class, though they cannot be accessed from static methods, even if the static methods take an instance variable as a parameter: type Student(name) as self = let data = name do self.PrintMessage() member this.PrintMessage() = printf " Student name is %s" data Generic type parameters F# also supports a generic parameter type. We can specify multiple generic type parameters separated by a comma. The syntax of a generic parameter declaration is as follows: type MyGenericClassExample<'a> (x: 'a) = do printfn "%A" x The type of the parameter infers where it is used. In the following code, we call the MyGenericClassExample method and pass a sequence of tuples, so here the parameter type became a sequence of tuples: let g1 = MyGenericClassExample( seq { for i in 1 .. 10 -> (i, i*i) } ) Properties Values related to an object are represented by properties. In object-oriented programming, properties represent data associated with an instance of an object. The following snippet shows two types of property syntax: // Property that has both get and set defined. [ attributes ] [ static ] member [accessibility-modifier] [self- identifier.]PropertyName with [accessibility-modifier] get() = get-function-body and [accessibility-modifier] set parameter = set-function-body // Alternative syntax for a property that has get and set. [ attributes-for-get ] [ static ] member [accessibility-modifier-for-get] [self-identifier.]PropertyName = get-function-body [ attributes-for-set ] [ static ] member [accessibility-modifier-for-set] [self- identifier.]PropertyName with set parameter = set-function-body There are two kinds of property declaration: Explicitly specify the value: We should use the explicit way to implement the property if it has non-trivial implementation. We should use a member keyword for the explicit property declaration. Automatically generate the value: We should use this when the property is just a simple wrapper for a value. There are many ways of implementing an explicit property syntax based on need: Read-only: Only the get() method Write-only: Only the set() method Read/write: Both get() and set() methods An example is shown as follows: // A read-only property. member this.MyReadOnlyProperty = myInternalValue // A write-only property. member this.MyWriteOnlyProperty with set (value) = myInternalValue <- value // A read-write property. member this.MyReadWriteProperty with get () = myInternalValue and set (value) = myInternalValue <- value Backing stores are private values that contain data for properties. The keyword, member val instructs the compiler to create backing stores automatically and then gives an expression to initialize the property. The F# language supports immutable types, but if we want to make a property mutable, we should use get and set. As shown in the following example, the MyClassExample class has two properties: propExample1 is read-only and is initialized to the argument provided to the primary constructor, and propExample2 is a settable property initialized with a string value ".Net Core 2.0": type MyClassExample(propExample1 : int) = member val propExample1 = property1 member val propExample2 = ".Net Core 2.0" with get, set Automatically implemented properties don't work efficiently with some libraries, for example, Entity Framework. In these cases, we should use explicit properties. Static and instance properties There can be further categorization of properties as static or instance properties. Static, as the name suggests, can be invoked without any instance. The self-identifier is neglected by the static property while it is necessary for the instance property. The following is an example of the static property: static member MyStaticProperty with get() = myStaticValue and set(value) = myStaticValue <- value Abstract properties Abstract properties have no implementation and are fully abstract. They can be virtual. It should not be private and if one accessor is abstract all others must be abstract. The following is an example of the abstract property and how to use it: // Abstract property in abstract class. // The property is an int type that has a get and // set method [<AbstractClass>] type AbstractBase() = abstract Property1 : int with get, set // Implementation of the abstract property type Derived1() = inherit AbstractBase() let mutable value = 10 override this.Property1 with get() = value and set(v : int) = value <- v // A type with a "virtual" property. type Base1() = let mutable value = 10 abstract Property1 : int with get, set default this.Property1 with get() = value and set(v : int) = value <- v // A derived type that overrides the virtual property type Derived2() = inherit Base1() let mutable value2 = 11 override this.Property1 with get() = value2 and set(v) = value2 <- v Inheritance and casts In F#, the inherit keyword is used while declaring a class. The following is the syntax: type MyDerived(...) = inherit MyBase(...) In a derived class, we can access all methods and members of the base class, but it should not be a private member. To refer to base class instances in the F# language, the base keyword is used. Virtual methods and overrides In F#, the abstract keyword is used to declare a virtual member. So, here we can write a complete definition of the member as we use abstract for virtual. F# is not similar to other .NET languages. Let's have a look at the following example: type MyClassExampleBase() = let mutable x = 0 abstract member virtualMethodExample : int -> int default u. virtualMethodExample (a : int) = x <- x + a; x type MyClassExampleDerived() = inherit MyClassExampleBase () override u. virtualMethodExample (a: int) = a + 1 In the previous example, we declared a virtual method, virtualMethodExample, in a base class, MyClassExampleBase, and overrode it in a derived class, MyClassExampleDerived. Constructors and inheritance An inherited class constructor must be called in a derived class. If a base class constructor contains some arguments, then it takes parameters of the derived class as input. In the following example, we will see how derived class arguments are passed in the base class constructor with inheritance: type MyClassBase2(x: int) = let mutable z = x * x do for i in 1..z do printf "%d " i type MyClassDerived2(y: int) = inherit MyClassBase2(y * 2) do for i in 1..y do printf "%d " i If a class has multiple constructors, such as new(str) or new(), and this class is inherited in a derived class, we can use a base class constructor to assign values. For example, DerivedClass, which inherits BaseClass, has new(str1,str2), and in place of the first string, we pass inherit BaseClass(str1). Similarly for blank, we wrote inherit BaseClass(). Let's explore the following example for more detail: type BaseClass = val string1 : string new (str) = { string1 = str } new () = { string1 = "" } type DerivedClass = inherit BaseClass val string2 : string new (str1, str2) = { inherit BaseClass(str1); string2 = str2 } new (str2) = { inherit BaseClass(); string2 = str2 } let obj1 = DerivedClass("A", "B") let obj2 = DerivedClass("A") Functions and lambda expressions A lambda expression is one kind of anonymous function, which means it doesn't have a name attached to it. But if we want to create a function which can be called, we can use the fun keyword with a lambda expression. We can pass the input parameter in the lambda function, which is created using the fun keyword. This function is quite similar to a normal F# function. Let's see a normal F# function and a lambda function: // Normal F# function let addNumbers a b = a+b // Evaluating values let sumResult = addNumbers 5 6 // Lambda function and evaluating values let sumResult = (fun (a:int) (b:int) -> a+b) 5 6 // Both the function will return value sumResult = 11 Handling data – tuples, lists, record types, and data manipulation F# supports many data types, for example: Primitive types: bool, int, float, string values. Aggregate type: class, struct, union, record, and enum Array: int[], int[ , ], and float[ , , ] Tuple: type1 * type2 * like (a,1,2,true) type is—char * int * int * bool Generic: list<’x>, dictionary < ’key, ’value> In an F# function, we can pass one tuple instead of multiple parameters of different types. Declaration of a tuple is very simple and we can assign values of a tuple to different variables, for example: let tuple1 = 1,2,3 // assigning values to variables , v1=1, v2= 2, v3=3 let v1,v2,v3 = tuple1 // if we want to assign only two values out of three, use “_” to skip the value. Assigned values: v1=1, //v3=3 let v1,_,v3 = tuple In the preceding examples, we saw that tuple supports pattern matching. These are option types and an option type in F# supports the idea that the value may or not be present at runtime. List List is a generic type implementation. An F# list is similar to a linked list implementation in any other functional language. It has a special opening and closing bracket construct, a short form of the standard empty list ([ ]) syntax: let empty = [] // This is an empty list of untyped type or we can say //generic type. Here type is: 'a list let intList = [10;20;30;40] // this is an integer type list The cons operator is used to prepend an item to a list using a double colon cons(prepend,::). To append another list to one list, we use the append operator—@: // prepend item x into a list let addItem xs x = x :: xs let newIntList = addItem intList 50 // add item 50 in above list //“intlist”, final result would be- [50;10;20;30;40] // using @ to append two list printfn "%A" (["hi"; "team"] @ ["how";"are";"you"]) // result – ["hi"; "team"; "how";"are";"you"] Lists are decomposable using pattern matching into a head and a tail part, where the head is the first item in the list and the tail part is the remaining list, for example: printfn "%A" newIntList.Head printfn "%A" newIntList.Tail printfn "%A" newIntList.Tail.Tail.Head let rec listLength (l: 'a list) = if l.IsEmpty then 0 else 1 + (listLength l.Tail) printfn "%d" (listLength newIntList) Record type The class, struct, union, record, and enum types come under aggregate types. The record type is one of them, it can have n number of members of any individual type. Record type members are by default immutable but we can make them mutable. In general, a record type uses the members as an immutable data type. There is no way to execute logic during instantiation as a record type don't have constructors. A record type also supports match expression, depending on the values inside those records, and they can also again decompose those values for individual handling, for example: type Box = {width: float ; height:int } let giftbox = {width = 6.2 ; height = 3 } In the previous example, we declared a Box with float a value width and an integer height. When we declare giftbox, the compiler automatically detects its type as Box by matching the value types. We can also specify type like this: let giftbox = {Box.width = 6.2 ; Box.height = 3 } or let giftbox : Box = {width = 6.2 ; height = 3 } This kind of type declaration is used when we have the same type of fields or field type declared in more than one type. This declaration is called a record expression. Object-oriented programming in F# F# also supports implementation inheritance, the creation of object, and interface instances. In F#, constructed types are fully compatible .NET classes which support one or more constructors. We can implement a do block with code logic, which can run at the time of class instance creation. The constructed type supports inheritance for class hierarchy creation. We use the inherit keyword to inherit a class. If the member doesn't have implementation, we can use the abstract keyword for declaration. We need to use the abstractClass attribute on the class to inform the compiler that it is abstract. If the abstractClass attribute is not used and type has all abstract members, the F# compiler automatically creates an interface type. Interface is automatically inferred by the compiler as shown in the following screenshot: The override keyword is used to override the base class implementation; to use the base class implementation of the same member, we use the base keyword. In F#, interfaces can be inherited from another interface. In a class, if we use the construct interface, we have to implement all the members in the interface in that class, as well. In general, it is not possible to use interface members from outside the class instance, unless we upcast the instance type to the required interface type. To create an instance of a class or interface, the object expression syntax is used. We need to override virtual members if we are creating a class instance and need member implementation for interface instantiation: type IExampleInterface = abstract member IntValue: int with get abstract member HelloString: unit -> string type PrintValues() = interface IExampleInterface with member x.IntValue = 15 member x.HelloString() = sprintf "Hello friends %d" (x :> IExampleInterface).IntValue let example = let varValue = PrintValues() :> IExampleInterface { new IExampleInterface with member x.IntValue = varValue.IntValue member x.HelloString() = sprintf "<b>%s</b>" (varValue.HelloString()) } printfn "%A" (example.HelloString()) Exception handling The exception keyword is used to create a custom exception in F#; these exceptions adhere to Microsoft best practices, such as constructors supplied, serialization support, and so on. The keyword raise is used to throw an exception. Apart from this, F# has some helper functions, such as failwith, which throws a failure exception at F# runtime, and invalidop, invalidarg, which throw the .NET Framework standard type invalid operation and invalid argument exception, respectively. try/with is used to catch an exception; if an exception occurred on an expression or while evaluating a value, then the try/with expression could be used on the right side of the value evaluation and to assign the value back to some other value. try/with also supports pattern matching to check an individual exception type and extract an item from it. try/finally expression handling depends on the actual code block. Let's take an example of declaring and using a custom exception: exception MyCustomExceptionExample of int * string raise (MyCustomExceptionExample(10, "Error!")) In the previous example, we created a custom exception called MyCustomExceptionExample, using the exception keyword, passing value fields which we want to pass. Then we used the raise keyword to raise exception passing values, which we want to display while running the application or throwing the exception. However, as shown here, while running this code, we don't get our custom message in the error value and the standard exception message is displayed: We can see in the previous screenshot that the exception message doesn't contain the message that we passed. In order to display our custom error message, we need to override the standard message property on the exception type. We will use pattern matching assignment to get two values and up-cast the actual type, due to the internal representation of the exception object. If we run this program again, we will get the custom message in the exception: exception MyCustomExceptionExample of int * string with override x.Message = let (MyCustomExceptionExample(i, s)) = upcast x sprintf "Int: %d Str: %s" i s raise (MyCustomExceptionExample(20, "MyCustomErrorMessage!")) Now, we will get the following error message: In the previous screenshot, we can see our custom message with integer and string values included in the output. We can also use the helper function, failwith, to raise a failure exception, as it includes our message as an error message, as follows: failwith "An error has occurred" The preceding error message can be seen in the following screenshot: Here is a detailed exception screenshot: An example of the invalidarg helper function follows. In this factorial function, we are checking that the value of x is greater than zero. For cases where x is less than 0, we call invalidarg, pass x as the parameter name that is invalid, and then some error message saying the value should be greater than 0. The invalidarg helper function throws an invalid argument exception from the standard system namespace in .NET: let rec factorial x = if x < 0 then invalidArg "x" "Value should be greater than zero" match x with | 0 -> 1 | _ -> x * (factorial (x - 1)) By now, you should be pretty familiar with the F# programming language, to use in your application development, alongside C#. If you found this tutorial helpful and you're interested in learning more, head over to this book .NET Core 2.0 By Example, by Rishabh Verma and Neha Shrivastava. .NET Core completes move to the new compiler – RyuJIT Applying Single Responsibility principle from SOLID in .NET Core Unit Testing in .NET Core with Visual Studio 2017 for better code quality

0
0
17764

How-To Tutorials

article-image-multithreading-in-rust-using-crates-tutorial

Aaron Lazar

15 Aug 2018

17 min read

Multithreading in Rust using Crates [Tutorial]

Aaron Lazar

15 Aug 2018

17 min read

The crates.io ecosystem in Rust can make use of approaches to improve our development speed as well as the performance of our code. In this tutorial, we'll learn how to use the crates ecosystem to manipulate threads in Rust. This article is an extract from Rust High Performance, authored by Iban Eguia Moraza. Using non-blocking data structures One of the issues we saw earlier was that if we wanted to share something more complex than an integer or a Boolean between threads and if we wanted to mutate it, we needed to use a Mutex. This is not entirely true, since one crate, Crossbeam, allows us to use great data structures that do not require locking a Mutex. They are therefore much faster and more efficient. Often, when we want to share information between threads, it's usually a list of tasks that we want to work on cooperatively. Other times, we want to create information in multiple threads and add it to a list of information. It's therefore not so usual for multiple threads to be working with exactly the same variables since as we have seen, that requires synchronization and it will be slow. This is where Crossbeam shows all its potential. Crossbeam gives us some multithreaded queues and stacks, where we can insert data and consume data from different threads. We can, in fact, have some threads doing an initial processing of the data and others performing a second phase of the processing. Let's see how we can use these features. First, add crossbeam to the dependencies of the crate in the Cargo.toml file. Then, we start with a simple example: extern crate crossbeam; use std::thread; use std::sync::Arc; use crossbeam::sync::MsQueue; fn main() { let queue = Arc::new(MsQueue::new()); let handles: Vec<_> = (1..6) .map(|_| { let t_queue = queue.clone(); thread::spawn(move || { for _ in 0..1_000_000 { t_queue.push(10); } }) }) .collect(); for handle in handles { handle.join().unwrap(); } let final_queue = Arc::try_unwrap(queue).unwrap(); let mut sum = 0; while let Some(i) = final_queue.try_pop() { sum += i; } println!("Final sum: {}", sum); } Let's first understand what this example does. It will iterate 1,000,000 times in 5 different threads, and each time it will push a 10 to a queue. Queues are FIFO lists, first input, first output. This means that the first number entered will be the first one to pop() and the last one will be the last to do so. In this case, all of them are a 10, so it doesn't matter. Once the threads finish populating the queue, we iterate over it and we add all the numbers. A simple computation should make you able to guess that if everything goes perfectly, the final number should be 50,000,000. If you run it, that will be the result, and that's not all. If you run it by executing cargo run --release, it will run blazingly fast. On my computer, it took about one second to complete. If you want, try to implement this code with the standard library Mutex and vector, and you will see that the performance difference is amazing. As you can see, we still needed to use an Arc to control the multiple references to the queue. This is needed because the queue itself cannot be duplicated and shared, it has no reference count. Crossbeam not only gives us FIFO queues. We also have LIFO stacks. LIFO comes from last input, first output, and it means that the last element you inserted in the stack will be the first one to pop(). Let's see the difference with a couple of threads: extern crate crossbeam; use std::thread; use std::sync::Arc; use std::time::Duration; use crossbeam::sync::{MsQueue, TreiberStack}; fn main() { let queue = Arc::new(MsQueue::new()); let stack = Arc::new(TreiberStack::new()); let in_queue = queue.clone(); let in_stack = stack.clone(); let in_handle = thread::spawn(move || { for i in 0..5 { in_queue.push(i); in_stack.push(i); println!("Pushed :D"); thread::sleep(Duration::from_millis(50)); } }); let mut final_queue = Vec::new(); let mut final_stack = Vec::new(); let mut last_q_failed = 0; let mut last_s_failed = 0; loop { // Get the queue match queue.try_pop() { Some(i) => { final_queue.push(i); last_q_failed = 0; println!("Something in the queue! :)"); } None => { println!("Nothing in the queue :("); last_q_failed += 1; } } // Get the stack match stack.try_pop() { Some(i) => { final_stack.push(i); last_s_failed = 0; println!("Something in the stack! :)"); } None => { println!("Nothing in the stack :("); last_s_failed += 1; } } // Check if we finished if last_q_failed > 1 && last_s_failed > 1 { break; } else if last_q_failed > 0 || last_s_failed > 0 { thread::sleep(Duration::from_millis(100)); } } in_handle.join().unwrap(); println!("Queue: {:?}", final_queue); println!("Stack: {:?}", final_stack); } As you can see in the code, we have two shared variables: a queue and a stack. The secondary thread will push new values to each of them, in the same order, from 0 to 4. Then, the main thread will try to get them back. It will loop indefinitely and use the try_pop() method. The pop() method can be used, but it will block the thread if the queue or the stack is empty. This will happen in any case once all values get popped since no new values are being added, so the try_pop() method will help not to block the main thread and end gracefully. The way it checks whether all the values were popped is by counting how many times it failed to pop a new value. Every time it fails, it will wait for 100 milliseconds, while the push thread only waits for 50 milliseconds between pushes. This means that if it tries to pop new values two times and there are no new values, the pusher thread has already finished. It will add values as they are popped to two vectors and then print the result. In the meantime, it will print messages about pushing and popping new values. You will understand this better by seeing the output: Note that the output can be different in your case, since threads don't need to be executed in any particular order. In this example output, as you can see, it first tries to get something from the queue and the stack but there is nothing there, so it sleeps. The second thread then starts pushing things, two numbers actually. After this, the queue and the stack will be [0, 1]. Then, it pops the first item from each of them. From the queue, it will pop the 0 and from the stack it will pop the 1 (the last one), leaving the queue as [1] and the stack as [0]. It will go back to sleep and the secondary thread will insert a 2 in each variable, leaving the queue as [1, 2] and the stack as [0, 2]. Then, the main thread will pop two elements from each of them. From the queue, it will pop the 1 and the 2, while from the stack it will pop the 2 and then the 0, leaving both empty. The main thread then goes to sleep, and for the next two tries, the secondary thread will push one element and the main thread will pop it, twice. It might seem a little bit complex, but the idea is that these queues and stacks can be used efficiently between threads without requiring a Mutex, and they accept any Send type. This means that they are great for complex computations, and even for multi-staged complex computations. The Crossbeam crate also has some helpers to deal with epochs and even some variants of the mentioned types. For multithreading, Crossbeam also adds a great utility: scoped threads. Scoped threads In all our examples, we have used standard library threads. As we have discussed, these threads have their own stack, so if we want to use variables that we created in the main thread we will need to send them to the thread. This means that we will need to use things such as Arc to share non-mutable data. Not only that, having their own stack means that they will also consume more memory and eventually make the system slower if they use too much. Crossbeam gives us some special threads that allow sharing stacks between them. They are called scoped threads. Using them is pretty simple and the crate documentation explains them perfectly; you will just need to create a Scope by calling crossbeam::scope(). You will need to pass a closure that receives the Scope. You can then call spawn() in that scope the same way you would do it in std::thread, but with one difference, you can share immutable variables among threads if they were created inside the scope or moved to it. This means that for the queues or stacks we just talked about, or for atomic data, you can simply call their methods without requiring an Arc! This will improve the performance even further. Let's see how it works with a simple example: extern crate crossbeam; fn main() { let all_nums: Vec<_> = (0..1_000_u64).into_iter().collect(); let mut results = Vec::new(); crossbeam::scope(|scope| { for num in &all_nums { results.push(scope.spawn(move || num * num + num * 5 + 250)); } }); let final_result: u64 = results.into_iter().map(|res| res.join()).sum(); println!("Final result: {}", final_result); } Let's see what this code does. It will first just create a vector with all the numbers from 0 to 1000. Then, for each of them, in a crossbeam scope, it will run one scoped thread per number and perform a supposedly complex computation. This is just an example, since it will just return a result of a simple second-order function. Interestingly enough, though, the scope.spawn() method allows returning a result of any type, which is great in our case. The code will add each result to a vector. This won't directly add the resulting number, since it will be executed in parallel. It will add a result guard, which we will be able to check outside the scope. Then, after all the threads run and return the results, the scope will end. We can now check all the results, which are guaranteed to be ready for us. For each of them, we just need to call join() and we will get the result. Then, we sum it up to check that they are actual results from the computation. This join() method can also be called inside the scope and get the results, but it will mean that if you do it inside the for loop, for example, you will block the loop until the result is generated, which is not efficient. The best thing is to at least run all the computations first and then start checking the results. If you want to perform more computations after them, you might find it useful to run the new computation in another loop or iterator inside the crossbeam scope. But, how does crossbeam allow you to use the variables outside the scope freely? Won't there be data races? Here is where the magic happens. The scope will join all the inner threads before exiting, which means that no further code will be executed in the main thread until all the scoped threads finish. This means that we can use the variables of the main thread, also called parent stack, due to the main thread being the parent of the scope in this case without any issue. We can actually check what is happening by using the println!() macro. If we remember from previous examples, printing to the console after spawning some threads would usually run even before the spawned threads, due to the time it takes to set them up. In this case, since we have crossbeam preventing it, we won't see it. Let's check the example: extern crate crossbeam; fn main() { let all_nums: Vec<_> = (0..10).into_iter().collect(); crossbeam::scope(|scope| { for num in all_nums { scope.spawn(move || { println!("Next number is {}", num); }); } }); println!("Main thread continues :)"); } If you run this code, you will see something similar to the following output: As you can see, scoped threads will run without any particular order. In this case, it will first run the 1, then the 0, then the 2, and so on. Your output will probably be different. The interesting thing, though, is that the main thread won't continue executing until all the threads have finished. Therefore, reading and modifying variables in the main thread is perfectly safe. There are two main performance advantages with this approach; Arc will require a call to malloc() to allocate memory in the heap, which will take time if it's a big structure and the memory is a bit full. Interestingly enough, that data is already in our stack, so if possible, we should try to avoid duplicating it in the heap. Moreover, the Arc will have a reference counter, as we saw. And it will even be an atomic reference counter, which means that every time we clone the reference, we will need to atomically increment the count. This takes time, even more than incrementing simple integers. Most of the time, we might be waiting for some expensive computations to run, and it would be great if they just gave all the results when finished. We can still add some more chained computations, using scoped threads, that will only be executed after the first ones finish, so we should use scoped threads more often than normal threads, if possible. Using thread pool So far, we have seen multiple ways of creating new threads and sharing information between them. Nevertheless, the ideal number of threads we should spawn to do all the work should be around the number of virtual processors in the system. This means we should not spawn one thread for each chunk of work. Nevertheless, controlling what work each thread does can be complex, since you have to make sure that all threads have work to do at any given point in time. Here is where thread pooling comes in handy. The Threadpool crate will enable you to iterate over all your work and for each of your small chunks, you can call something similar to a thread::spawn(). The interesting thing is that each task will be assigned to an idle thread, and no new thread will be created for each task. The number of threads is configurable and you can get the number of CPUs with other crates. Not only that, if one of the threads panics, it will automatically add a new one to the pool. To see an example, first, let's add threadpool and num_cpus as dependencies in our Cargo.toml file. Then, let's see an example code: extern crate num_cpus; extern crate threadpool; use std::sync::atomic::{AtomicUsize, Ordering}; use std::sync::Arc; use threadpool::ThreadPool; fn main() { let pool = ThreadPool::with_name("my worker".to_owned(), num_cpus::get()); println!("Pool threads: {}", pool.max_count()); let result = Arc::new(AtomicUsize::new(0)); for i in 0..1_0000_000 { let t_result = result.clone(); pool.execute(move || { t_result.fetch_add(i, Ordering::Relaxed); }); } pool.join(); let final_res = Arc::try_unwrap(result).unwrap().into_inner(); println!("Final result: {}", final_res); } This code will create a thread pool of threads with the number of logical CPUs in your computer. Then, it will add a number from 0 to 1,000,000 to an atomic usize, just to test parallel processing. Each addition will be performed by one thread. Doing this with one thread per operation (1,000,000 threads) would be really inefficient. In this case, though, it will use the appropriate number of threads, and the execution will be really fast. There is another crate that gives thread pools an even more interesting parallel processing feature: Rayon. Using parallel iterators If you can see the big picture in these code examples, you'll have realized that most of the parallel work has a long loop, giving work to different threads. It happened with simple threads and it happens even more with scoped threads and thread pools. It's usually the case in real life, too. You might have a bunch of data to process, and you can probably separate that processing into chunks, iterate over them, and hand them over to various threads to do the work for you. The main issue with that approach is that if you need to use multiple stages to process a given piece of data, you might end up with lots of boilerplate code that can make it difficult to maintain. Not only that, you might find yourself not using parallel processing sometimes due to the hassle of having to write all that code. Luckily, Rayon has multiple data parallelism primitives around iterators that you can use to parallelize any iterative computation. You can almost forget about the Iterator trait and use Rayon's ParallelIterator alternative, which is as easy to use as the standard library trait! Rayon uses a parallel iteration technique called work stealing. For each iteration of the parallel iterator, the new value or values get added to a queue of pending work. Then, when a thread finishes its work, it checks whether there is any pending work to do and if there is, it starts processing it. This, in most languages, is a clear source of data races, but thanks to Rust, this is no longer an issue, and your algorithms can run extremely fast and in parallel. Let's look at how to use it for an example similar to those we have seen in this chapter. First, add rayon to your Cargo.toml file and then let's start with the code: extern crate rayon; use rayon::prelude::*; fn main() { let result = (0..1_000_000_u64) .into_par_iter() .map(|e| e * 2) .sum::<u64>(); println!("Result: {}", result); } As you can see, this works just as you would write it in a sequential iterator, yet, it's running in parallel. Of course, running this example sequentially will be faster than running it in parallel thanks to compiler optimizations, but when you need to process data from files, for example, or perform very complex mathematical computations, parallelizing the input can give great performance gains. Rayon implements these parallel iteration traits to all standard library iterators and ranges. Not only that, it can also work with standard library collections, such as HashMap and Vec. In most cases, if you are using the iter() or into_iter() methods from the standard library in your code, you can simply use par_iter() or into_par_iter() in those calls and your code should now be parallel and work perfectly. But, beware, sometimes parallelizing something doesn't automatically improve its performance. Take into account that if you need to update some shared information between the threads, they will need to synchronize somehow, and you will lose performance. Therefore, multithreading is only great if workloads are completely independent and you can execute one without any dependency on the rest. If you found this article useful and would like to learn more such tips, head over to pick up this book, Rust High Performance, authored by Iban Eguia Moraza. Rust 1.28 is here with global allocators, nonZero types and more Java Multithreading: How to synchronize threads to implement critical sections and avoid race conditions Multithreading with Qt

0
0
33775

How-To Tutorials

article-image-understanding-functional-reactive-programming-in-scala

Fatema Patrawala

15 Aug 2018

6 min read

Understanding functional reactive programming in Scala [Tutorial]

Fatema Patrawala

15 Aug 2018

6 min read

Like OOP (Object-Oriented Programming), Functional Programming is a kind of programming paradigm. It is a programming style in which we write programs in terms of pure functions and immutable data. It treats its programs as function evaluation. As we use pure functions and immutable data to write our applications, we will get lots of benefits for free. For instance, with immutable data, we do not need to worry about shared-mutable states, side effects, and thread-safety. It follows a Declarative programming style, which means programming is done in terms of expressions, not statements. For instance, in OOP or imperative programming paradigms, we use statements to write programs where FP uses everything as expressions. In this scala functional programming tutorial we will understand the principles and benefits of FP and why Functional reactive programming is a best fit for Reactive programming in Scala. This Scala tutorial is an extract taken from the book Scala Reactive Programming written by Rambabu Posa. Principles of functional programming FP has the following principles: Pure functions Immutable data No side effects Referential transparency (RT) Functions are first-class citizens Functions that include anonymous functions, higher order functions, combinators, partial functions, partially-applied functions, function currying, closures Tail recursion Functions composability A pure function is a function that always returns the same results for the same inputs irrespective of how many times and where you run this function. We will get lots of benefits with immutable data. For instance, no shared data, no side effects, thread safety for free, and so on. Like an object is a first-class citizen in OOP, in FP, a function is a first-class citizen. This means that we can use a function as any of these: An object A value A data A data type An operation In simple words, in FP, we treat both functions and data as the same. We can compose functions that are in sequential order so that we can solve even complex problems easily. Higher-Order Functions (HOF) are functions that take one or more functions as their parameters or return a function as their result or do both. For instance, map(), flatMap(), filter(), and so on are some of the important and frequently used higher-order functions. Consider the following example: map(x => x*x) Here, the map() function is an example of Higher-Order Function because it takes an anonymous function as its parameter. This anonymous function x => x *x is of type Int => Int, which takes an Int as input and returns Int as its result. An anonymous function is a function without any name. Benefits of functional programming FP provides us with many benefits: Thread-safe code Easy-to-write concurrency and parallel code We can write simple, readable, and elegant code Type safety Composability Supports Declarative programming As we use pure functions and immutability in FP, we will get thread-safety for free. One of the greatest benefits of FP is function composability. We can compose multiple functions one by one and execute them either sequentially or parentally. It gives us a great approach to solve complex problems easily. Functional Reactive programming The combination of FP and RP is known as function Reactive programming or, for short, FRP. It is a multiparadigm and combines the benefits and best features of two of the most popular programming paradigms, which are, FP and RP. FRP is a new programming paradigm or a new style of programming that uses the RP paradigm to support asynchronous non-blocking data streaming with backpressure and also uses the FP paradigm to utilize its features (such as pure functions, immutability, no side effects, RT, and more) and its HOF or combinators (such as map, flatMap, filter, reduce, fold, and zip). In simple words, FRP is a new programming paradigm to support RP using FP features and its building blocks. FRP = FP + RP, as shown here: Today, we have many FRP solutions, frameworks, tools, or technologies. Here's a list of a few FRP technologies: Scala, Play Framework, and Akka Toolkit RxJS Reactive-banana Reactive Sodium Haskell This book is dedicated toward discussing Lightbend's FRP technology stack—Lagom Framework, Scala, Play Framework, and Akka Toolkit (Akka Streams). FRP technologies are mainly useful in developing interactive programs, such as rich GUI (graphical user interfaces), animations, multiplayer games, computer music, or robot controllers. Types of Reactive Programming Even though most of the projects or companies use FP Paradigm to develop their Reactive systems or solutions, there are a couple of ways to use RP. They are known as types of RP: FRP (Functional Reactive Programming) OORP (Object-Oriented Reactive Programming) However, FP is the best programming paradigm to conflate with RP. We will get all the benefits of FP for free. Why FP is the best fit for RP When we conflate RP with FP, we will get the following benefits: Composability—we can compose multiple data streams using functional operations so that we can solve even complex problems easily Thread safety Readability Simple, concise, clear, and easy-to-understand code Easy-to-write asynchronous, concurrent, and parallel code Supports very flexible and easy-to-use operations Supports Declarative programming Easy to write, more Scalable, highly available, and robust code In FP, we concentrate on what to do to fulfill a job, whereas in other programming paradigms, such as OOP or imperative programming (IP), we concentrate on how to do. Declarative programming gives us the following benefits: No side effects Enforces to use immutability Easy to write concise and understandable code The main property of RP is real-time data streaming, and the main property of FP is composability. If we combine these two paradigms, we will get more benefits and can develop better solutions easily. In RP, everything is a stream, while everything is a function in FP. We can use these functions to perform operations on data streams. We learnt the principles and benefits of Scala functional programming. To build fault-tolerant, robust, and distributed applications in Scala, grab the book Scala Reactive Programming today. Introduction to the Functional Programming Manipulating functions in functional programming Why functional programming in Python matters: Interview with best selling author, Steven Lott

0
0
19708

How-To Tutorials

article-image-mongodb-sharding-clusters-choosing-right-shard-key

Fatema Patrawala

14 Aug 2018

9 min read

MongoDB Sharding: Sharding clusters and choosing the right shard key [Tutorial]

Fatema Patrawala

14 Aug 2018

9 min read

Sharding was one of the features that MongoDB offered from an early stage, since version 1.6 was released in August 2010. Sharding is the ability to horizontally scale out our database by partitioning our datasets across different servers—the shards. Foursquare and Bitly are two of the most famous early customers for MongoDB that were also using sharding from its inception all the way to the general availability release. In this article we will learn how to design a sharding cluster and how to make the single most important decision around it of choosing the unique shard key. This article is a MongoDB shard tutorial taken from the book Mastering MongoDB 3.x by Alex Giamas. Sharding setup in MongoDB Sharding is performed at the collection level. We can have collections that we don't want or need to shard for several reasons. We can leave these collections unsharded. These collections will be stored in the primary shard. The primary shard is different for each database in MongoDB. The primary shard is automatically selected by MongoDB when we create a new database in a sharded environment. MongoDB will pick the shard that has the least data stored at the moment of creation. If we want to change the primary shard at any other point, we can issue the following command: > db.runCommand( { movePrimary : "mongo_books", to : "UK_based" } ) We thus move the database named mongo_books to the shard named UK_based. Choosing the shard key Choosing our shard key is the most important decision we need to make. The reason is that once we shard our data and deploy our cluster, it becomes very difficult to change the shard key. First, we will go through the process of changing the shard key. Changing the shard key There is no command or simple procedure to change the shard key in MongoDB. The only way to change the shard key involves backing up and restoring all of our data, something that may range from being extremely difficult to impossible in high-load production environments. The steps if we want to change our shard key are as follows: Export all data from MongoDB. Drop the original sharded collection. Configure sharding with the new key. Presplit the new shard key range. Restore our data back into MongoDB. From these steps, step 4 is the one that needs some more explanation. MongoDB uses chunks to split data in a sharded collection. If we bootstrap a MongoDB sharded cluster from scratch, chunks will be calculated automatically by MongoDB. MongoDB will then distribute the chunks across different shards to ensure that there are an equal number of chunks in each shard. The only case in which we cannot really do this is when we want to load data into a newly sharded collection. The reasons are threefold: MongoDB creates splits only after an insert operation. Chunk migration will copy all of the data in that chunk from one shard to another. The floor(n/2) chunk migrations can happen at any given time, where n is the number of shards we have. Even with three shards, this is only a floor(1.5)=1 chunk migration at a time. These three limitations combined mean that letting MongoDB to figure it out on its own will definitely take much longer and may result in an eventual failure. This is why we want to presplit our data and give MongoDB some guidance on where our chunks should go. Considering our example of the mongo_books database and the books collection, this would be: > db.runCommand( { split : "mongo_books.books", middle : { id : 50 } } ) The middle command parameter will split our key space in documents that have id<=50 and documents that have id>50. There is no need for a document to exist in our collection with id=50 as this will only serve as the guidance value for our partitions. In this example, we chose 50 assuming that our keys follow a uniform distribution (that is, the same count of keys for each value) in the range of values from 0 to 100. We should aim to create at least 20-30 chunks to grant MongoDB flexibility in potential migrations. We can also use bounds and find instead of middle if we want to manually define the partition key, but both parameters need data to exist in our collection before applying them. Choosing the correct shard key After the previous section, it's now self-evident that we need to take into great consideration the choice of our shard key as it is something that we have to stick with. A great shard key has three characteristics: High cardinality Low frequency Non-monotonically changing in value We will go over the definitions of these three properties first to understand what they mean. High cardinality means that the shard key must have as many distinct values as possible. A Boolean can take only values of true/false, and so it is a bad shard key choice. A 64-bit long value field that can take any value from −(2^63) to 2^63 − 1 and is a good example in terms of cardinality. Low frequency directly relates to the argument about high cardinality. A low-frequency shard key will have a distribution of values as close to a perfectly random / uniform distribution. Using the example of our 64-bit long value, it is of little use to us if we have a field that can take values ranging from −(2^63) to 2^63 − 1 only to end up observing the values of 0 and 1 all the time. In fact, it is as bad as using a Boolean field, which can also take only two values after all. If we have a shard key with high frequency values, we will end up with chunks that are indivisible. These chunks cannot be further divided and will grow in size, negatively affecting the performance of the shard that contains them. Non-monotonically changing values mean that our shard key should not be, for example, an integer that always increases with every new insert. If we choose a monotonically increasing value as our shard key, this will result in all writes ending up in the last of all of our shards, limiting our write performance. If we want to use a monotonically changing value as the shard key, we should consider using hash-based sharding. In the next section, we will describe different sharding strategies and their advantages and disadvantages. Range-based sharding The default and the most widely used sharding strategy is range-based sharding. This strategy will split our collection's data into chunks, grouping documents with nearby values in the same shard. For our example database and collection, mongo_books and books respectively, we have: > sh.shardCollection("mongo_books.books", { id: 1 } ) This creates a range-based shard key on id with ascending direction. The direction of our shard key will determine which documents will end up in the first shard and which ones in the subsequent ones. This is a good strategy if we plan to have range-based queries as these will be directed to the shard that holds the result set instead of having to query all shards. Hash-based sharding If we don't have a shard key (or can't create one) that achieves the three goals mentioned previously, we can use the alternative strategy of using hash-based sharding. In this case, we are trading data distribution with query isolation. Hash-based sharding will take the values of our shard key and hash them in a way that guarantees close to uniform distribution. This way we can be sure that our data will evenly distribute across shards. The downside is that only exact match queries will get routed to the exact shard that holds the value. Any range query will have to go out and fetch data from all shards. For our example database and collection (mongo_books and books respectively), we have: > sh.shardCollection("mongo_books.books", { id: "hashed" } ) Similar to the preceding example, we are now using the id field as our hashed shard key. Suppose we use fields with float values for hash-based sharding. Then we will end up with collisions if the precision of our floats is more that 2^53. These fields should be avoided where possible. Coming up with our own key Range-based sharding does not need to be confined to a single key. In fact, in most cases, we would like to combine multiple keys to achieve high cardinality and low frequency. A common pattern is to combine a low-cardinality first part (but still having as distinct values more than two times the number of shards that we have) with a high-cardinality key as its second field. This achieves both read and write distribution from the first part of the sharding key and then cardinality and read locality from the second part. On the other hand, if we don't have range queries, we can get away by using hash-based sharding on a primary key as this will exactly target the shard and document that we are going after. To make things more complicated, these considerations may change depending on our workload. A workload that consists almost exclusively (say 99.5%) of reads won't care about write distribution. We can use the built-in _id field as our shard key and this will only add 0.5% load in the last shard. Our reads will still be distributed across shards. Unfortunately, in most cases, this is not simple. Location-based data Due to government regulations and the desire to have our data as close to our users as possible, there is often a constraint and need to limit data in a specific data center. By placing different shards at different data centers, we can satisfy this requirement. To summarize we learned about MongoDB sharding and got to know techniques to choose the correct shard key. Get the expert guide Mastering MongoDB 3.x today to build fault-tolerant MongoDB application. MongoDB 4.0 now generally available with support for multi-platform, mobile, ACID transactions and more MongoDB going relational with 4.0 release Indexing, Replicating, and Sharding in MongoDB [Tutorial]

0
0
21654

How-To Tutorials

article-image-cloud-native-architectures-microservices-containers-serverless-part-2

Guest Contributor

14 Aug 2018

8 min read

Modern Cloud Native architectures: Microservices, Containers, and Serverless - Part 2

Guest Contributor

14 Aug 2018

8 min read

0
0
15950

article-image-application-data-entity-framework-net-core

Aaron Lazar

14 Aug 2018

14 min read

Access application data with Entity Framework in .NET Core [Tutorial]

Aaron Lazar

14 Aug 2018

14 min read

In this tutorial, we will get started with using the Entity Framework and create a simple console application to perform CRUD operations. The intent is to get started with EF Core and understand how to use it. Before we dive into coding, let us see the two development approaches that EF Core supports: Code-first Database-first These two paradigms have been supported for a very long time and therefore we will just look at them at a very high level. EF Core mainly targets the code-first approach and has limited support for the database-first approach, as there is no support for the visual designer or wizard for the database model out of the box. However, there are third-party tools and extensions that support this. The list of third-party tools and extensions can be seen at https://docs.microsoft.com/en-us/ef/core/extensions/. This tutorial has been extracted from the book .NET Core 2.0 By Example, by Rishabh Verma and Neha Shrivastava. In the code-first approach, we first write the code; that is, we first create the domain model classes and then, using these classes, EF Core APIs create the database and tables, using migration based on the convention and configuration provided. We will look at conventions and configurations a little later in this section. The following diagram illustrates the code-first approach: In the database-first approach, as the name suggests, we have an existing database or we create a database first and then use EF Core APIs to create the domain and context classes. As mentioned, currently EF Core has limited support for it due to a lack of tooling. So, our preference will be for the code-first approach throughout our examples. The reader can discover the third-party tools mentioned previously to learn more about the EF Core database-first approach as well. The following image illustrates the database-first approach: Building Entity Framework Core Console App Now that we understand the approaches and know that we will be using the code-first approach, let's dive into coding our getting started with EF Core console app. Before we do so, we need to have SQL Express installed in our development machine. If SQL Express is not installed, download the SQL Express 2017 edition from https://www.microsoft.com/en-IN/sql-server/sql-server-downloads and run the setup wizard. We will do the Basic installation of SQL Express 2017 for our learning purposes, as shown in the following screenshot: Our objective is to learn how to use EF Core and so we will not do anything fancy in our console app. We will just do simple Create Read Update Delete (CRUD) operations of a simple class called Person, as defined here: public class Person { public int Id { get; set; } public string Name { get; set; } public bool Gender { get; set; } public DateTime DateOfBirth { get; set; } public int Age { get { var age = DateTime.Now.Year - this.DateOfBirth.Year; if (DateTime.Now.DayOfYear < this.DateOfBirth.DayOfYear) { age = age - 1; } return age; } } } As we can see in the preceding code, the class has simple properties. To perform the CRUD operations on this class, let's create a console app by performing the following steps: Create a new .NET Core console project named GettingStartedWithEFCore, as shown in the following screenshot: Create a new folder named Models in the project node and add the Person class to this newly created folder. This will be our model entity class, which we will use for CRUD operations. Next, we need to install the EF Core package. Before we do that, it's important to know that EF Core provides support for a variety of databases. A few of the important ones are: SQL Server SQLite InMemory (for testing) The complete and comprehensive list can be seen at https://docs.microsoft.com/en-us/ef/core/providers/. We will be working with SQL Server on Windows for our learning purposes, so let's install the SQL Server package for Entity Framework Core. To do so, let's install the Microsoft.EntityFrameworkCore.SqlServer package from the NuGet Package Manager in Visual Studio 2017. Right-click on the project. Select Manage Nuget Packages and then search for Microsoft.EntityFrameworkCore.SqlServer. Select the matching result and click Install: Next, we will create a class called Context, as shown here: public class Context : DbContext { public DbSet<Person> Persons { get; set; } protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder) { //// Get the connection string from configuration optionsBuilder.UseSqlServer(@"Server=.\SQLEXPRESS ;Database=PersonDatabase;Trusted_Connection=True;"); } protected override void OnModelCreating(ModelBuilder modelBuilder) { modelBuilder.Entity<Person> ().Property(nameof(Person.Name)).IsRequired(); } } The class looks quite simple, but it has the following subtle and important things to make note of: The Context class derives from DbContext, which resides in the Microsoft.EntityFrameworkCore namespace. DbContext is an integral part of EF Core and if you have worked with EF, you will already be aware of it. An instance of DbContext represents a session with the database and can be used to query and save instances of your entities. DbContext is a combination of the Unit Of Work and Repository Patterns. Typically, you create a class that derives from DbContext and contains Microsoft.EntityFrameworkCore.DbSet properties for each entity in the model. If properties have a public setter, they are automatically initialized when the instance of the derived context is created. It contains a property named Persons (plural of the model class Person) of type DbSet<Person>. This will map to the Persons table in the underlying database. The class overrides the OnConfiguring method of DbContext and specifies the connection string to be used with the SQL Server database. The connection string should be read from the configuration file, appSettings.json, but for the sake of brevity and simplicity, it's hardcoded in the preceding code. The OnConfiguring method allows us to select and configure the data source to be used with a context using DbContextOptionsBuilder. Let's look at the connection string. Server= specifies the server. It can be .\SQLEXPRESS, .\SQLSERVER, .\LOCALDB, or any other instance name based on the installation you have done. Database= specifies the database name that will be created. Trusted_Connection=True specifies that we are using integrated security or Windows authentication. An enthusiastic reader should read the official Microsoft Entity framework documentation on configuring the context at https://docs.microsoft.com/en-us/ef/core/miscellaneous/configuring-dbcontext. The OnModelCreating method allows us to configure the model using the ModelBuilder Fluent API. This is the most powerful method of configuration and allows configuration to be specified without modifying the entity classes. The Fluent API configuration has the highest precedence and will override conventions and data annotations. The preceding code has same effect as the following data annotation has on the Name property in the Person class: [Required] public string Name { get; set; } The preceding point highlights the flexibility and configuration that EF Core brings to the table. EF Core uses a combination of conventions, attributes, and Fluent API statements to build a database model at runtime. All we have to do is to perform actions on the model classes using a combination of these and they will automatically be translated to appropriate changes in the database. Before we conclude this point, let's have a quick look at each of the different ways to configure a database model: EF Core conventions: The conventions in EF Core are comprehensive. They are the default rules by which EF Core builds a database model based on classes. A few of the simpler yet important default conventions are listed here: EF Core creates database tables for all DbSet<TEntity> properties in a Context class with the same name as that of the property. In the preceding example, the table name would be Persons based on this convention. EF Core creates tables for entities that are not included as DbSet properties but are reachable through reference properties in the other DbSet entities. If the Person class had a complex/navigation property, EF Core would have created a table for it as well. EF Core creates columns for all the scalar read-write properties of a class with the same name as the property by default. It uses the reference and collection properties for building relationships among corresponding tables in the database. In the preceding example, the scalar properties of Person correspond to a column in the Persons table. EF Core assumes a property named ID or one that is suffixed with ID as a primary key. If the property is an integer type or Guid type, then EF Core also assumes it to be IDENTITY and automatically assigns a value when inserting the data. This is precisely what we will make use of in our example while inserting or creating a new Person. EF Core maps the data type of a database column based on the data type of the property defined in the C# class. A few of the mappings between the C# data type to the SQL Server column data type are listed in the following table: C# data type SQL server data type int int string nvarchar(Max) decimal decimal(18,2) float real byte[] varbinary(Max) datetime datetime bool bit byte tinyint short smallint long bigint double float There are many other conventions, and we can define custom conventions as well. For more details, please read the official Microsoft documentation at https://docs.microsoft.com/en-us/ef/core/modeling/. Attributes: Conventions are often not enough to map the class to database objects. In such scenarios, we can use attributes called data annotation attributes to get the desired results. The [Required] attribute that we have just seen is an example of a data annotation attribute. Fluent API: This is the most powerful way of configuring the model and can be used in addition to or in place of attributes. The code written in the OnModelConfiguring method is an example of a Fluent API statement. If we check now, there is no PersonDatabase database. So, we need to create the database from the model by adding a migration. EF Core includes different migration commands to create or update the database based on the model. To do so in Visual Studio 2017, go to Tools | Nuget Package Manager | Package Manager Console, as shown in the following screenshot: This will open the Package Manager Console window. Select the Default Project as GettingStartedWithEFCore and type the following command: add-migration CreatePersonDatabase If you are not using Visual Studio 2017 and you are dependent on .NET Core CLI tooling, you can use the following command: dotnet ef migrations add CreatePersonDatabase We have not installed the Microsoft.EntityFrameworkCore.Design package, so it will give an error: Your startup project 'GettingStartedWithEFCore' doesn't reference Microsoft.EntityFrameworkCore.Design. This package is required for the Entity Framework Core Tools to work. Ensure your startup project is correct, install the package, and try again. So let's first go to the NuGet Package Manager and install this package. After successful installation of this package, if we run the preceding command again, we should be able to run the migrations successfully. It will also tell us the command to undo the migration by displaying the message To undo this action, use Remove-Migration. We should see the new files added in the Solution Explorer in the Migrations folder, as shown in the following screenshot: 8. Although we have migrations applied, we have still not created a database. To create the database, we need to run the following commands. In Visual Studio 2017: update-database –verbose In .NET Core CLI: dotnet ef database update If all goes well, we should have the database created with the Persons table (property of type DbSet<Person>) in the database. Let's validate the table and database by using SQL Server Management Studio (SSMS). If SSMS is not installed in your machine, you can also use Visual Studio 2017 to view the database and table. Let's check the created database. In Visual Studio 2017, click on the View menu and select Server Explorer, as shown in the following screenshot: In Server Explorer, right-click on Data Connections and then select Add Connection. The Add Connection dialog will show up. Enter .\SQLEXPRESS in the Server name (since we installed SQL EXPRESS 2017) and select PersonDatabase as the database, as shown in the following screenshot: On clicking OK, we will see the database named PersonDatabase and if we expand the tables, we can see the Persons table as well as the _EFMigrationsHistory table. Notice that the properties in the Person class that had setters are the only properties that get transformed into table columns in the Persons table. Notice that the Age property is read-only in the class we created and therefore we do not see an age column in the database table, as shown in the following screenshot: This is the first migration to create a database. Whenever we add or update the model classes or configurations, we need to sync the database with the model using the add-migration and update-database commands. With this, we have our model class ready and the corresponding database created. The following image summarizes how the properties have been mapped from the C# class to the database table columns: Now, we will use the Context class to perform CRUD operations. Let's go back to our Main.cs and write the following code. The code is well commented, so please go through the comments to understand the flow: class Program { static void Main(string[] args) { Console.WriteLine("Getting started with EF Core"); Console.WriteLine("We will do CRUD operations on Person class."); //// Lets create an instance of Person class. Person person = new Person() { Name = "Rishabh Verma", Gender = true, //// For demo true= Male, false = Female. Prefer enum in real cases. DateOfBirth = new DateTime(2000, 10, 23) }; using (var context = new Context()) { //// Context has strongly typed property named Persons which referes to Persons table. //// It has methods Add, Find, Update, Remove to perform CRUD among many others. //// Use AddRange to add multiple persons in once. //// Complete set of APIs can be seen by using F12 on the Persons property below in Visual Studio IDE. var personData = context.Persons.Add(person); //// Though we have done Add, nothing has actually happened in database. All changes are in context only. //// We need to call save changes, to persist these changes in the database. context.SaveChanges(); //// Notice above that Id is Primary Key (PK) and hence has not been specified in the person object passed to context. //// So, to know the created Id, we can use the below Id int createdId = personData.Entity.Id; //// If all goes well, person data should be persisted in the database. //// Use proper exception handling to discover unhandled exception if any. Not showing here for simplicity and brevity. createdId variable would now hold the id of created person. //// READ BEGINS Person readData = context.Persons.Where(j => j.Id == createdId).FirstOrDefault(); //// We have the data of person where Id == createdId, i.e. details of Rishabh Verma. //// Lets update the person data all together just for demonstarting update functionality. //// UPDATE BEGINS person.Name = "Neha Shrivastava"; person.Gender = false; person.DateOfBirth = new DateTime(2000, 6, 15); person.Id = createdId; //// For update cases, we need this to be specified. //// Update the person in context. context.Persons.Update(person); //// Save the updates. context.SaveChanges(); //// DELETE the person object. context.Remove(readData); context.SaveChanges(); } Console.WriteLine("All done. Please press Enter key to exit..."); Console.ReadLine(); } } With this, we have completed our sample app to get started with EF Core. I hope this simple example will set you up to start using EF Core with confidence and encourage you to start exploring it further. The detailed features of EF Core can be learned from the official Microsoft documentation available at https://docs.microsoft.com/en-us/ef/core/. If you're interested in learning more, head over to this book, .NET Core 2.0 By Example, by Rishabh Verma and Neha Shrivastava. How to build a chatbot with Microsoft Bot framework Working with Entity Client and Entity SQL Get to know ASP.NET Core Web API [Tutorial]

0
0
24168

How-To Tutorials

article-image-polymorphism-type-pattern-matching-python

Aaron Lazar

13 Aug 2018

11 min read

Polymorphism and type-pattern matching in Python [Tutorial]

Aaron Lazar

13 Aug 2018

11 min read

0
0
15376

How-To Tutorials

Working with Shared pointers in Rust: Challenges and Solutions [Tutorial]

Generative Adversarial Networks: Generate images using Keras GAN [Tutorial]

Implementing Dependency Injection in Spring [Tutorial]

Build your first Reinforcement learning agent in Keras [Tutorial]

Use App Metrics to analyze HTTP traffic, errors & network performance of a .NET Core app [Tutorial]

Best practices for C# code optimization [Tutorial]

Use Rust for web development [Tutorial]

Task parallel library for easy multi-threading in .NET Core [Tutorial]

Getting started with F# for .Net Core application development [Tutorial]

Multithreading in Rust using Crates [Tutorial]

Trending Topics

Understanding functional reactive programming in Scala [Tutorial]

MongoDB Sharding: Sharding clusters and choosing the right shard key [Tutorial]

Modern Cloud Native architectures: Microservices, Containers, and Serverless - Part 2

Access application data with Entity Framework in .NET Core [Tutorial]

Polymorphism and type-pattern matching in Python [Tutorial]

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access