Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials

7013 Articles
article-image-deep-learning-methodologies-extend-image-based-application-videos
Sunith Shetty
05 Apr 2018
4 min read
Save for later

Datasets and deep learning methodologies to extend image-based applications to videos

Sunith Shetty
05 Apr 2018
4 min read
In today’s tutorial, we will extend image based application to videos, which will include pose estimation, captioning, and generating videos. Extending image-based application to videos Images can be used for pose estimation, style transfer, image generation, segmentation, captioning, and so on. Similarly, these applications find a place in videos too. Using the temporal information may improve the predictions from images and vice versa. In this section, we will see how to extend these applications to videos. Regressing the human pose Human pose estimation is an important application of video data and can improve other tasks such as action recognition. First, let's see a description of the datasets available for pose estimation: Poses in the wild dataset: Contains 30 videos annotated with the human pose. The dataset is annotated with human upper body joints. Frames Labeled In Cinema (FLIC): A human pose dataset obtained from 30 Movies. Pfister et al. proposed a method to predict the human pose in videos. The following is the pipeline for regressing the human pose: The frames from the video are taken and passed through a convolutional network. The layers are fused, and the pose heatmaps are obtained. The pose heatmaps are combined with optical flow to get the warped heatmaps. The warped heatmaps across a timeframe are pooled to produce the pooled heatmap, getting the final pose. Tracking facial landmarks Face analysis in videos requires face detection, landmark detection, pose estimation, verification, and so on. Computing landmarks are especially crucial for capturing facial animation, human-computer interaction, and human activity recognition. Instead of computing over frames, it can be computed over video. Gu et al. proposed a method to use a joint estimation of detection and tracking of facial landmarks in videos using RNN. The results outperform frame wise predictions and other previous models. The landmarks are computed by CNN, and the temporal aspect is encoded in an RNN. Synthetic data was used for training. Segmenting videos Videos can be segmented in a better way when temporal information is used. Gadde et al. proposed a method to combine temporal information by warping. The following image demonstrates the solution, which segments two frames and combines the warping: The warping net is shown in the following image: Reproduced from Gadde et al The optical flow is computed between two frames, which are combined with warping. The warping module takes the optical flow, transforms it, and combines it with the warped representations. Captioning videos Captions can be generated for videos, describing the context. Let's see a list of the datasets available for captioning videos: Microsoft Research - Video To Text (MSR-VTT) has 200,000 video clip and sentence pairs. MPII Movie Description Corpus (MPII-MD) has 68,000 sentences with 94 movies. Montreal Video Annotation Dataset (M-VAD) has 49,000 clips. YouTube2Text has 1,970 videos with 80,000 descriptions. Yao et al. proposed a method for captioning videos. A 3D convolutional network trained for action recognition is used to extract the local temporal features. An attention mechanism is then used on the features to generate text using an RNN. The process is shown here: Reproduced from Yao et al Donahue et al. proposed another method for video captioning or description, which uses LSTM with convolution features. This is similar to the preceding approach, except that we use 2D convolution features over here, as shown in the following image: Reproduced from Donahue et al We have several ways to combine text with images, such as activity recognition, image description, and video description techniques. The following image illustrates these techniques: Reproduced from Donahue et al Venugopalan et al. proposed a method for video captioning using an encoder-decoder approach. The following is a visualization of the technique proposed by him: Reproduced from Venugopalan et al The CNN can be computed on the frames or the optical flow of the images for this method.   Generating videos Videos can be generated using generative models, in an unsupervised manner. The future frames can be predicted using the current frame. Ranzato et al. proposed a method for generating videos, inspired by language models. An RNN model is utilized to take a patch of the image and predict the next patch. To summarize, we learned about video-based solutions in various scenarios such as action recognition, gesture recognition, security applications, and intrusion detection. You read an excerpt from a book written by Rajalingappaa Shanmugamani titled, Deep Learning for Computer Vision. This book will help you learn to model and train advanced neural networks for implementation of Computer Vision tasks.    
Read more
  • 0
  • 0
  • 16730

article-image-10-reasons-data-scientists-love-jupyter-notebooks
Aarthi Kumaraswamy
04 Apr 2018
5 min read
Save for later

10 reasons why data scientists love Jupyter notebooks

Aarthi Kumaraswamy
04 Apr 2018
5 min read
In the last twenty years, Python has been increasingly used for scientific computing and data analysis as well. Today, the main advantage of Python and one of the main reasons why it is so popular is that it brings scientific computing features to a general-purpose language that is used in many research areas and industries. This makes the transition from research to production much easier. IPython is a Python library that was originally meant to improve the default interactive console provided by Python and to make it scientist-friendly. In 2011, ten years after the first release of IPython, the IPython Notebook was introduced. This web-based interface to IPython combines code, text, mathematical expressions, inline plots, interactive figures, widgets, graphical interfaces, and other rich media within a standalone sharable web document. This platform provides an ideal gateway to interactive scientific computing and data analysis. IPython has become essential to researchers, engineers, data scientists, teachers and their students. Within a few years, IPython gained an incredible popularity among the scientific and engineering communities. The Notebook started to support more and more programming languages beyond Python. In 2014, the IPython developers announced the Jupyter project, an initiative created to improve the implementation of the Notebook and make it language-agnostic by design. The name of the project reflects the importance of three of the main scientific computing languages supported by the Notebook: Julia, Python, and R. Today, Jupyter is an ecosystem by itself that comprehends several alternative Notebook interfaces (JupyterLab, nteract, Hydrogen, and others), interactive visualization libraries, authoring tools compatible with notebooks. Jupyter has its own conference named JupyterCon. The project received funding from several companies as well as the Alfred P. Sloan Foundation and the Gordon and Betty Moore Foundation. Apart from the rich legacy that Jupyter notebooks come from and the richer ecosystem that it provides developers, here are ten more reasons for you to start using it for your next data science project if aren’t already using it now. All in one place: The Jupyter Notebook is a web-based interactive environment that combines code, rich text, images, videos, animations, mathematical equations, plots, maps, interactive figures and widgets, and graphical user interfaces, into a single document. Easy to share: Notebooks are saved as structured text files (JSON format), which makes them easily shareable. Easy to convert: Jupyter comes with a special tool, nbconvert, which converts notebooks to other formats such as HTML and PDF. Another online tool, nbviewer, allows us to render a publicly-available notebook directly in the browser. Language independent: The architecture of Jupyter is language independent. The decoupling between the client and kernel makes it possible to write kernels in any language. Easy to create kernel wrappers: Jupyter brings a lightweight interface for kernel languages that can be wrapped in Python. Wrapper kernels can implement optional methods, notably for code completion and code inspection. Easy to customize: Jupyter interface can be used to create an entirely customized experience in the Jupyter Notebook (or another client application such as the console). Extensions with custom magic commands: Create IPython extensions with custom magic commands to make interactive computing even easier. Many third-party extensions and magic commands exist, for example, the %%cython magic that allows one to write Cython code directly in a notebook. Stress-free Reproducible experiments: Jupyter notebooks can help you conduct efficient and reproducible interactive computing experiments with ease. It lets you keep a detailed record of your work. Also, the ease of use of the Jupyter Notebook means that you don't have to worry about reproducibility; just do all of your interactive work in notebooks, put them under version control, and commit regularly. Don't forget to refactor your code into independent reusable components. Effective teaching-cum-learning tool: The Jupyter Notebook is not only a tool for scientific research and data analysis but also a great tool for teaching. An example is IPython Blocks - a library that allows you or your students to create grids of colorful blocks. Interactive code and data exploration: The ipywidgets package provides many common user interface controls for exploring code and data interactively. You enjoyed excerpts from Cyrille Rossant’s latest book, IPython Cookbook, Second Edition. This book contains 100+ recipes for high-performance scientific computing and data analysis, from the latest IPython/Jupyter features to the most advanced tricks, to help you write better and faster code. For free recipes from the book, head over to the Ipython Cookbook Github page. If you loved what you saw, support Cyrille’s work by buying a copy of the book today! Related Jupyter articles: Latest Jupyter news updates: Is JupyterLab all set to phase out Jupyter Notebooks? What’s new in Jupyter Notebook 5.3.0 3 ways JupyterLab will revolutionize Interactive Computing Jupyter notebooks tutorials: Getting started with the Jupyter notebook (part 1) Jupyter and Python Scripting Jupyter as a Data Laboratory: Part 1    
Read more
  • 0
  • 0
  • 52676

article-image-generative-models-action-create-van-gogh-neural-artistic-style-transfer
Sunith Shetty
03 Apr 2018
14 min read
Save for later

Generative Models in action: How to create a Van Gogh with Neural Artistic Style Transfer

Sunith Shetty
03 Apr 2018
14 min read
In today’s tutorial, we will learn the principles behind neural artistic style transfer and show a working example to transfer the style of Van Gogh art onto an image. Neural artistic style transfer An image can be considered as a combination of style and content. The artistic style transfer technique transforms an image to look like a painting with a specific painting style. We will see how to code this idea up. The loss function will compare the generated image with the content of the photo and style of the painting. Hence, the optimization is carried out for the image pixel, rather than for the weights of the network. Two values are calculated by comparing the content of the photo with the generated image followed by the style of the painting and the generated image. Content loss Since pixels are not a good choice, we will use the CNN features of various layers, as they are a better representation of the content. The initial layers have high-frequency such as edges, corners, and textures but the later layers represent objects, and hence are better for content. The latter layer can compare the object to object better than the pixel. But for this, we need to first import the required libraries, using the following code: import  numpy as  np from PIL  import  Image from  scipy.optimize  import fmin_l_bfgs_b from  scipy.misc  import imsave from  vgg16_avg  import VGG16_Avg from  keras import  metrics from  keras.models  import Model from  keras import  backend as K  Now, let's load the required image, using the following command: content_image = Image.open(work_dir + 'bird_orig.png') We will use the following image for this instance: As we are using the VGG architecture for extracting the features, the mean of all the ImageNet images has to be subtracted from all the images, as shown in the following code: imagenet_mean = np.array([123.68, 116.779, 103.939], dtype=np.float32) def subtract_imagenet_mean(image):  return (image - imagenet_mean)[:, :, :, ::-1] Note that the channels are different. The preprocess function takes the generated image and subtracts the mean and then reverses the channel. The deprocess function reverses that effect because of the preprocessing step, as shown in the following code: def add_imagenet_mean(image, s):  return np.clip(image.reshape(s)[:, :, :, ::-1] + imagenet_mean, 0,    255) First, we will see how to create an image with the content from another image. This is a process of creating an image from random noise. The content used here is the sum of the activation in some layer. We will minimize the loss of the content between the random noise and image, which is termed as the content loss. This loss is similar to pixel-wise loss but applied on layer activations, hence will capture the content leaving out the noise. Any CNN architecture can be used to do forward inference of content image and random noise. The activations are taken and the mean squared error is calculated, comparing the activations of these two outputs. The pixel of the random image is updated while the CNN weights are frozen. We will freeze the VGG network for this case. Now, the VGG model can be loaded. Generative images are very sensitive to subsampling techniques such as max pooling. Getting back the pixel values from max pooling is not possible. Hence, average pooling is a smoother method than max pooling. The function to convert VGG model with average pooling is used for loading the model, as shown here: vgg_model = VGG16_Avg(include_top=False) Note that the weights are the same for this model as the original, even though the pooling type has been changed. The ResNet and Inception models are not suited for this because of their inability to provide various abstractions. We will take the activations from the last convolutional layer of the VGG model namely block_conv1, while the model was frozen. This is the third last layer from the VGG, with a wide receptive field. The code for the same is given here for your reference: content_layer = vgg_model.get_layer('block5_conv1').output Now, a new model is created with a truncated VGG, till the layer that was giving good features. Hence, the image can be loaded now and can be used to carry out the forward inference, to get the actually activated layers. A TensorFlow variable is created to capture the activation, using the following code: content_model = Model(vgg_model.input, content_layer) content_image_array = subtract_imagenet_mean(np.expand_dims(np.array(content_image), 0)) content_image_shape = content_image_array.shape target = K.variable(content_model.predict(content_image_array)) Let's define an evaluator class to compute the loss and gradients of the image. The following class returns the loss and gradient values at any point of the iteration: class ConvexOptimiser(object): def __init__(self, cost_function, tensor_shape): self.cost_function = cost_function self.tensor_shape = tensor_shape self.gradient_values = None def loss(self, point): loss_value, self.gradient_values = self.cost_function([point.reshape(self.tensor_shape)]) return loss_value.astype(np.float64) def gradients(self, point): return self.gradient_values.flatten().astype(np.float64) Loss function can be defined as the mean squared error between the values of activations at specific convolutional layers. The loss will be computed between the layers of generated image and the original content photo, as shown here: mse_loss = metrics.mean_squared_error(content_layer, target) The gradients of the loss can be computed by considering the input of the model, as shown: grads = K.gradients(mse_loss, vgg_model.input) The input to the function is the input of the model and the output will be the array of loss and gradient values as shown: cost_function = K.function([vgg_model.input], [mse_loss]+grads) This function is deterministic to optimize, and hence SGD is not required: optimiser = ConvexOptimiser(cost_function, content_image_shape) This function can be optimized using a simple optimizer, as it is convex and hence is deterministic. We can also save the image at every step of the iteration. We will define it in such a way that the gradients are accessible, as we are using the scikit-learn's optimizer, for the final optimization. Note that this loss function is convex and so, a simple optimizer is good enough for the computation. The optimizer can be defined using the following code: def optimise(optimiser, iterations, point, tensor_shape, file_name): for i in range(iterations): point, min_val, info = fmin_l_bfgs_b(optimiser.loss, point.flatten(), fprime=optimiser.gradients, maxfun=20) point = np.clip(point, -127, 127) print('Loss:', min_val) imsave(work_dir + 'gen_'+file_name+'_{i}.png', add_imagenet_mean(point.copy(), tensor_shape)[0]) return point The optimizer takes loss function, point, and gradients, and returns the updates. A random image needs to be generated so that the content loss will be minimized, using the following code: def generate_rand_img(shape):  return np.random.uniform(-2.5, 2.5, shape)/1 generated_image = generate_rand_img(content_image_shape) Here is the random image that is created: The optimization can be run for 10 iterations to see the results, as shown: iterations = 10 generated_image = optimise(optimiser, iterations, generated_image, content_image_shape, 'content') If everything goes well, the loss should print as shown here, over the iterations: Current loss value: 73.2010421753 Current loss value: 22.7840042114 Current loss value: 12.6585302353 Current loss value: 8.53817081451 Current loss value: 6.64649534225 Current loss value: 5.56395864487 Current loss value: 4.83072710037 Current loss value: 4.32800722122 Current loss value: 3.94804215431 Current loss value: 3.66387653351 Here is the image that is generated and now, it almost looks like a bird. The optimization can be run for further iterations to have this done: An optimizer took the image and updated the pixels so that the content is the same. Though the results are worse, it can reproduce the image to a certain extent with the content. All the images through iterations give a good intuition on how the image is generated. There is no batching involved in this process. In the next section, we will see how to create an image in the style of a painting. Style loss using the Gram matrix After creating an image that has the content of the original image, we will see how to create an image with just the style. Style can be thought of as a mix of colour and texture of an image. For that purpose, we will define style loss. First, we will load the image and convert it to an array, as shown in the following code: style_image = Image.open(work_dir + 'starry_night.png') style_image = style_image.resize(np.divide(style_image.size, 3.5).astype('int32')) Here is the style image we have loaded: Now, we will preprocess this image by changing the channels, using the following code: style_image_array = subtract_imagenet_mean(np.expand_dims(style_image, 0)[:, :, :, :3]) style_image_shape = style_image_array.shape For this purpose, we will consider several layers, like we have done in the following code: model = VGG16_Avg(include_top=False, input_shape=shp[1:]) outputs = {l.name: l.output for l in model.layers} Now, we will take multiple layers as an array output of the first four blocks, using the following code: layers = [outputs['block{}_conv1'.format(o)] for o in range(1,3)] A new model is now created, that can output all those layers and assign the target variables, using the following code: layers_model = Model(model.input, layers) targs = [K.variable(o) for o in layers_model.predict(style_arr)] Style loss is calculated using the Gram matrix. The Gram matrix is the product of a matrix and its transpose. The activation values are simply transposed and multiplied. This matrix is then used for computing the error between the style and random images. The Gram matrix loses the location information but will preserve the texture information. We will define the Gram matrix using the following code: def grammian_matrix(matrix):  flattened_matrix = K.batch_flatten(K.permute_dimensions(matrix, (2, 0, 1)))  matrix_transpose_dot = K.dot(flattened_matrix, K.transpose(flattened_matrix))  element_count = matrix.get_shape().num_elements()  return matrix_transpose_dot / element_count As you might be aware now, it is a measure of the correlation between the pair of columns. The height and width dimensions are flattened out. This doesn't include any local pieces of information, as the coordinate information is disregarded. Style loss computes the mean squared error between the Gram matrix of the input image and the target, as shown in the following code def style_mse_loss(x, y):  return metrics.mse(grammian_matrix(x), grammian_matrix(y)) Now, let's compute the loss by summing up all the activations from the various layers, using the following code: style_loss = sum(style_mse_loss(l1[0], l2[0]) for l1, l2 in zip(style_features, style_targets)) grads = K.gradients(style_loss, vgg_model.input) style_fn = K.function([vgg_model.input], [style_loss]+grads) optimiser = ConvexOptimiser(style_fn, style_image_shape) We then solve it as the same way we did before, by creating a random image. But this time, we will also apply a Gaussian filter, as shown in the following code: generated_image = generate_rand_img(style_image_shape) The random image generated will look like this: The optimization can be run for 10 iterations to see the results, as shown below: generated_image = optimise(optimiser, iterations, generated_image, style_image_shape) If everything goes well, the solver should print the loss values similar to the following: Current loss value: 5462.45556641 Current loss value: 189.738555908 Current loss value: 82.4192581177 Current loss value: 55.6530838013 Current loss value: 37.215713501 Current loss value: 24.4533748627 Current loss value: 15.5914745331 Current loss value: 10.9425945282 Current loss value: 7.66888141632 Current loss value: 5.84042310715 Here is the image that is generated: Here, from a random noise, we have created an image with a particular painting style without any location information. In the next section, we will see how to combine both—the content and style loss. Style transfer Now we know how to reconstruct an image, as well as how to construct an image that captures the style of an original image. The obvious idea may be to just combine these two approaches by weighting and adding the two loss functions, as shown in the following code: w,h = style.size src = img_arr[:,:h,:w] Like before, we're going to grab a sequence of layer outputs to compute the style loss. However, we still only need one layer output to compute the content loss. How do we know which layer to grab? As we discussed earlier, the lower the layer, the more exact the content reconstruction will be. In merging content reconstruction with style, we might expect that a looser reconstruction of the content will allow more room for the style to affect (re: inspiration). Furthermore, a later layer ensures that the image looks like the same subject, even if it doesn't have the same details. The following code is used for this process: style_layers = [outputs['block{}_conv2'.format(o)] for o in range(1,6)] content_name = 'block4_conv2' content_layer = outputs[content_name] Now, a separate model for style is created with required output layers, using the following code: style_model = Model(model.input, style_layers) style_targs = [K.variable(o) for o in style_model.predict(style_arr)] We will also create another model for the content with the content layer, using the following code: content_model = Model(model.input, content_layer) content_targ = K.variable(content_model.predict(src)) Now, the merging of the two approaches is as simple as merging their respective loss functions. Note that as opposed to our previous functions, this function is producing three separate types of outputs: One for the original image One for the image whose style we're emulating One for the random image whose pixels we are training One way for us to tune how the reconstructions mix is by changing the factor on the content loss, which we have here as 1/10. If we increase that denominator, the style will have a larger effect on the image, and if it's too large, the original content of the image will be obscured by an unstructured style. Likewise, if it is too small then the image will not have enough style. We will use the following code for this process: style_wgts = [0.05,0.2,0.2,0.25,0.3] The loss function takes both style and content layers, as shown here: loss = sum(style_loss(l1[0], l2[0])*w    for l1,l2,w in zip(style_layers, style_targs, style_wgts)) loss += metrics.mse(content_layer, content_targ)/10 grads = K.gradients(loss, model.input) transfer_fn = K.function([model.input], [loss]+grads) evaluator = Evaluator(transfer_fn, shp) We will run the solver for 10 iterations as before, using the following code: iterations=10 x = rand_img(shp) x = solve_image(evaluator, iterations, x) The loss values should be printed as shown here: Current loss value: 2557.953125 Current loss value: 732.533630371 Current loss value: 488.321166992 Current loss value: 385.827178955 Current loss value: 330.915924072 Current loss value: 293.238189697 Current loss value: 262.066864014 Current loss value: 239.34185791 Current loss value: 218.086700439 Current loss value: 203.045211792 These results are remarkable. Each one of them does a fantastic job of recreating the original image in the style of the artist. The generated image will look like the following: We will now conclude the style transfer section. This operation is really slow but can work with any images. In the next section, we will see how to use a similar idea to create a superresolution network. There are several ways to make this better, such as: Adding a Gaussian filter to a random image Adding different weights to the layers Different layers and weights can be used to content Initialization of image rather than random image Color can be preserved Masks can be used for specifying what is required Any sketch can be converted to painting Drawing a sketch and creating the image Any image can be converted to artistic style by training a CNN to output such an image. To summarize, we learned to implement to transfer style from one image to another while preserving the content as is. You read an excerpt from a book written by Rajalingappaa Shanmugamani titled Deep Learning for Computer Vision. In this book, you will learn how to model and train advanced neural networks to implement a variety of Computer Vision tasks.
Read more
  • 0
  • 0
  • 21816

article-image-how-to-work-with-the-intellij-idea-selenium-plugin
Amey Varangaonkar
03 Apr 2018
3 min read
Save for later

How to work with the Selenium IntelliJ IDEA plugin

Amey Varangaonkar
03 Apr 2018
3 min read
Most of the framework components you design and build will be customized to your application under test. However, there are many third-party tools and plugins available, which you can use to provide better results processing, reporting, performance, and services to engineers using the framework. In this article, we cover one of the most popular plugins used with Selenium - the Selenium IntelliJ IDEA plugin. IntelliJ IDEA Selenium plugin When we covered building page object classes earlier, we discussed how to define the locators on a page for each WebElement or MobileElement using the @findBy annotations. That required the user to use one of the Inspectors or plugins to view the DOM structure and hand-code a robust locator that is cross-platform safe. Now, when using CSS and XPath locators, the hierarchy of the element can get complex, and there is a greater chance of building invalid locators. So, Perfect Test has come up with a Selenium plugin for the IntelliJ IDEA that will find and create locators on the fly. Before discussing some of the features of the plugin, let's review where this is located. Sample project files There are instructions on the www.perfect-test.com site for installing the plugin and once that is done, users can create a new project using a sample template, which will auto- generate a series of template files. These files are generic "getting started" files, but you should still follow the structure and design of the framework as outlined in this book. Here is a quick screenshot of the autogenerated file structure of the sample project: Once the plugin is enabled by simply clicking on the Selenium icon in the toolbar, users can use the Code Generate menu features to create code samples, Java methods, getter/setter methods, WebElements, copyrights for files, locators, and so on. Generating element locators The plugin has a nice feature for creating WebElement definitions, adding locators of choice, and validating them in the class. It provides a set of tooltips to tell the user what is incorrect in the syntax of the locator, which is helpful when creating CSS and XPath strings. Here is a screenshot of the locator strategy feature: Once the WebElement structure is built into the page object class, you can capture and verify the locator, and it will indicate an error with a red underline. When moving over the invalid syntax, it provides a tooltip and a lightbulb icon to the left of it, where users can use features for Check Element Existence on page and Fix Locator Popup. These are very useful for quickly finding syntax errors and defining locators. Here is a screenshot of the Check Element Existence on page feature: Here is a screenshot of the Fix Locator Popup feature: The Selenium IntelliJ plugin deals mostly with creating locators and the differences between CSS and XPath syntax. The tool also provides drop-down lists of examples where users can pick and choose how to build the queries. It's a great way to get started using Selenium to build real page object classes, and it provides a tool to validate complex CSS and XPath structures in locators! Apart from the Selenium IntelliJ plugin, there are other third-party APIs such as HTML Publisher Plugin, BrowserMob Proxy Plugin, ExtentReports Reporter API and also Sauce Labs Test Cloud services.  This article is an excerpt taken from the book Selenium Framework Design in Data-Driven Testing by Carl Cocchiaro. It presents a step-by-step approach to design and build a data-driven test framework using Selenium WebDriver, Java, and TestNG.  
Read more
  • 0
  • 0
  • 18659

article-image-what-domain-driven-design
Packt Editorial Staff
03 Apr 2018
18 min read
Save for later

What is domain driven design?

Packt Editorial Staff
03 Apr 2018
18 min read
Domain driven design exists because all software exists for a purpose. It does something. For example, you can't provide a software solution for a financial system such as online stock trading if you don't understand the stock exchanges and their functioning. Having domain knowledge is essential to solving problems with software. Domain driven design is simply designing software with the specific domain - whether that's finance, medicine, law, eCommerce - in mind. This has been taken from Mastering Microservices with Java 9 - Second Edition. Central to Domain Driven Design is the concept of a model. A model is an abstraction, or a blueprint, of the domain. Domain driven design is a collaborative activity Designing this model is not rocket science, but it does take a lot of effort, refining, and input from domain experts. It is the collective job of software designers, domain experts, and developers. They organize information, divide it into smaller parts, group them logically, and create modules. Each module can be taken up individually, and can be divided using a similar approach. This process can be followed until we reach the unit level, or when we cannot divide it any further. A complex project may have more of such iterations; similarly, a simple project could have just a single iteration of it. Once a model is defined and well documented, it can move onto the next stage - code design. So, here we have a software design—a domain model and code design, and code implementation of the domain model. The domain model provides a high level of the architecture of a solution (software/application), and the code implementation gives the domain model a life, as a working model. Domain Driven Design makes design and development work together. It provides the ability to develop software continuously, while keeping the design up to date based on feedback received from the development. It solves one of the limitations offered by Agile and Waterfall methodologies, making software maintainable, including design and code, as well as keeping application minimum viable. It gives developers the right platform to understand the domain, and provides the opportunity to share early feedback of the domain model implementation. It removes the bottleneck that appears in later stages when stockholders wait for deliverables. The fundamental components of Domain Driven Design To understand domain driven design, you can break it down into 3 fundamental concepts: Ubiquitous language and unified model language (UML) Multilayer architecture Artifacts (components) Ubiquitous language Ubiquitous language is a common language to communicate within a project. It's because designing a model is a collaborative effort of software designers, domain experts, and developers that it requires a common language to communicate with. It removes misunderstandings, misinterpretations. Communication gaps so often lead to bad software - ubiquitous language minimizes these gaps. It does, however, need to be used everywhere on a project. Unified Modeling Language (UML) is widely used and very popular when creating models. It also has a few limitations; for example, when you have thousands of classes drawn from a paper, it's difficult to represent class relationships and simultaneously understand their abstraction while taking a meaning from it. Also, UML diagrams do not represent the concepts of a model and what objects are supposed to do. Therefore, UML should always be used with other documents, code, or any other reference for effective communication. Multilayered architecture Multilayered architecture is a common solution for Domain Driven Design. It contains four layers: Presentation layer or (UI) Application layer - responsible for application logic. It maintains and coordinates the overall flow of the product/service. It does not contain business logic or UI. It may hold the state of application objects, like tasks in progress. Domain layer - contains the domain information and business logic. It holds the state of the business object. Infrastructure layer -  provides support to all the other layers and is responsible for communication between them. To understand the interaction of the different layers, take the example of table booking at a restaurant. The end user places a request for a table booking using UI. The UI passes the request to the application layer. The application layer fetches the domain objects, such as the restaurant, the table, a date, and so on, from the domain layer. The domain layer fetches these existing persisted objects from the infrastructure, and invokes relevant methods to make the booking and persist them back to the infrastructure layer. Once domain objects are persisted, the application layer shows the booking confirmation to the end user. Artifacts used in Domain Driven Design There are seven different artifacts used in Domain Driven Design to express, create, and retrieve domain models: Entities Value objects Services Aggregates Repository Factory Module Entities are certain types of objects that are identifiable and remain the same throughout the states of the products/services. These objects are not identified by their attributes, but by their identity and thread of continuity. These type of objects are known as entities. It sounds pretty simple, but it carries complexity. You need to understand how we can define the entities. Let's take an example of a table booking system, where we have a restaurant class with attributes such as restaurant name, address, phone number, establishment data, and so on. We can take two instances of the restaurant class that are not identifiable using the restaurant name, as there could be other restaurants with the same name. Similarly, if we go by any other single attribute, we will not find any attributes that can singularly identify a unique restaurant. If two restaurants have all the same attribute values, they are therefore the same and are interchangeable with each other. Still, they are not the same entities, as both have different references (memory addresses). Conversely, let's take a class of U.S. citizens. Every U.S. citizen has his or her own social security number. This number is not only unique, but remains unchanged throughout the life of the citizen and assures continuity. This citizen object would exist in the memory, would be serialized, and would be removed from the memory and stored in the database. It even exists after the person is deceased. It will be kept in the system for as long as the system exists. A citizen's social security number remains the same irrespective of its representation. Therefore, creating entities in a product means creating an identity. So, now give an identity to any restaurant in the previous example, then either use a combination of attributes such as restaurant name, establishment date, and street, or add an identifier such as restaurant_id to identify it. The basic rule is that two identifiers cannot be the same. Therefore, when we introduce an identifier for an entity, we need to be sure of it. There are different ways to create a unique identity for objects, described as follows: Using the primary key in a table. Using an automated generated ID by a domain module. A domain program generates the identifier and assigns it to objects that are being persisted among different layers. A few real-life objects carry user-defined identifiers themselves. For example, each country has its own country codes for dialing ISD calls. Composite key. This is a combination of attributes that can also be used for creating an identifier, as explained for the preceding restaurant object. Value objects Value objects (VOs) simplify the design. In contrast to entities, value objects have only attributes and no conceptual identity. A best practice is to keep value objects as immutable objects. If possible, you should even keep entity objects immutable too. You might want to keep all objects as entities, but you're likely to run into problems if you do this; there has to be one instance for each object. Let's say you are creating customers as entity objects. Each customer object would represent the restaurant guest; this cannot be used for booking orders for other guests. This may create millions of customer entity objects in the memory if millions of customers are using the system. Not only are there millions of uniquely identifiable objects that exist in the system, but each object is being tracked. Tracking as well as creating an identity is complex. A highly credible system is required to create and track these objects, which is not only very complex, but also resource heavy. It may result in system performance degradation. Therefore, it is important to use value objects instead of using entities. The reasons are explained in the next few paragraphs. Applications don't always need to have to be trackable and have an identifiable customer object. There are cases when you just need to have some or all attributes of the domain element. These are the cases when value objects can be used by the application. It makes things simple and improves the performance. Value objects can easily be created and destroyed, owing to the absence of identity. This simplifies the design—it makes value objects available for garbage collection if no other object has referenced them. Value objects should be designed and coded as immutable. Once they are created, they should never be modified during their life-cycle. If you need a different value of the VO, or any of its objects, then simply create a new value object, but don't modify the original value object. Here, immutability carries all the significance from object-oriented programming (OOP). A value object can be shared and used without impacting on its integrity if, and only if, it is immutable. Services While creating the domain model, you may come across situations where behavior may not be related to any object. These behaviors can be accommodated in service objects. Service objects are part of the domain layer and do not have any internal state. The sole purpose of service objects is to provide behavior to the domain that does not belong to a single entity or value object. Ubiquitous language helps you to identify different objects, identities, or value objects with different attributes and behaviors during the process of domain driven design and domain modelling. During the course of creating the domain model, you may find different behaviors or methods that do not belong to any specific object. Such behaviors are important, and so cannot be neglected. Neither can you add them to entities or value objects. It would spoil the object to add behavior that does not belong to it. Keep in mind, that behavior may impact on various objects. The use of object-oriented programming makes it possible to attach to some objects; this is known as a service. Services are common in technical frameworks. These are also used in domain layers in domain driven design. A service object does not have any internal state; its only purpose is to provide a behavior to the domain. Service objects provide behaviors that cannot be related to specific entities or value objects. Service objects may provide one or more related behaviors to one or more entities or value objects. It is a practice to define the services explicitly in the domain model. While creating the services, you need to tick all of the following points: Service objects' behavior performs on entities and value objects, but it does not belong to entities or value objects Service objects' behavior state is not maintained, and hence, they are stateless Services are part of the domain model Services may also exist in other layers. It is very important to keep domain-layer services isolated. It removes the complexities and keeps the design decoupled. Let's take an example where a restaurant owner wants to see the report of his monthly table bookings. In this case, he will log in as an admin and click the Display Report button after providing the required input fields, such as duration. Application layers pass the request to the domain layer that owns the report and templates objects, with some parameters such as report ID, and so on. Reports get created using the template, and data is fetched from either the database or other sources. Then the application layer passes through all the parameters, including the report ID to the business layer. Here, a template needs to be fetched from the database or another source to generate the report based on the ID. This operation does not belong to either the report object or the template object. Therefore, a service object is used that performs this operation to retrieve the required template from the database. Aggregates Aggregate domain pattern is related to the object's life cycle. It defines ownership and boundaries which is crucial in Domain Driven Design When you reserve a table at your favorite restaurant online using an application, you don't need to worry about the internal system and process that takes place to book your reservation, including searching for available restaurants, then for available tables on the given date, time, and so on and so forth. Therefore, you can say that a reservation application is an aggregate of several other objects, and works as a root for all the other objects for a table reservation system. This root should be an entity that binds collections of objects together. It is also called the aggregate root. This root object does not pass any reference of inside objects to external worlds, and protects the changes performed within internal objects. We need to understand why aggregators are required. A domain model can contain large numbers of domain objects. The bigger the application functionalities and size and the more complex its design, the greater number of objects present. A relationship exists between these objects. Some may have a many-to-many relationship, a few may have a one-to-many relationship, and others may have a one-to-one relationship. These relationships are enforced by the model implementation in the code, or in the database that ensures that these relationships among the objects are kept intact. Relationships are not just unidirectional; they can also be bidirectional. They can also increase in complexity. The designer's job is to simplify these relationships in the model. Some relationships may exist in a real domain, but may not be required in the domain model. Designers need to ensure that such relationships do not exist in the domain model. Similarly, multiplicity can be reduced by these constraints. One constraint may do the job where many objects satisfy the relationship. It is also possible that a bidirectional relationship could be converted into a unidirectional relationship. No matter how much simplification you input, you may still end up with relationships in the model. These relationships need to be maintained in the code. When one object is removed, the code should remove all the references to this object from other places. For example, a record removal from one table needs to be addressed wherever it has references in the form of foreign keys and such, to keep the data consistent and maintain its integrity. Also, invariants (rules) need to be forced and maintained whenever data changes. Relationships, constraints, and invariants bring a complexity that requires an efficient handling in code. We find the solution by using the aggregate represented by the single entity known as the root, which is associated with the group of objects that maintains consistency with regards to data changes. This root is the only object that is accessible from outside, so this root element works as a boundary gate that separates the internal objects from the external world. Roots can refer to one or more inside objects, and these inside objects can have references to other inside objects that may or may not have relationships with the root. However, outside objects can also refer to the root, and not to any inside objects. An aggregate ensures data integrity and enforces the invariant. Outside objects cannot make any change to inside objects; they can only change the root. However, they can use the root to make a change inside the object by calling exposed operations. The root should pass the value of inside objects to outside objects if required. If an aggregate object is stored in the database, then the query should only return the aggregate object. Traversal associations should be used to return the object when it is internally linked to the aggregate root. These internal objects may also have references to other aggregates. An aggregate root entity holds its global identity, and holds local identities inside their entities. A simple example of an aggregate in the table booking system is the customer. Customers can be exposed to external objects, and their root object contains their internal object address and contact information. When requested, the value object of internal objects, such as address, can be passed to external objects: Repository In a domain model, at a given point in time, many domain objects may exist. Each object may have its own life-cycle, from the creation of objects to their removal or persistence. Whenever any domain operation needs a domain object, it should retrieve the reference of the requested object efficiently. It would be very difficult if you didn't maintain all of the available domain objects in a central object. A central object carries the references of all the objects, and is responsible for returning the requested object reference. This central object is known as the repository. The repository is a point that interacts with infrastructures such as the database or file system. A repository object is the part of the domain model that interacts with storage such as the database, external sources, and so on, to retrieve the persisted objects. When a request is received by the repository for an object's reference, it returns the existing object's reference. If the requested object does not exist in the repository, then it retrieves the object from storage. For example, if you need a customer, you would query the repository object to provide the customer with ID 31. The repository would provide the requested customer object if it is already available in the repository, and if not, it would query the persisted stores such as the database, fetch it, and provide its reference. The main advantage of using the repository is having a consistent way to retrieve objects where the requestor does not need to interact directly with the storage such as the database. A repository may query objects from various storage types, such as one or more databases, filesystems, or factory repositories, and so on. In such cases, a repository may have strategies that also point to different sources for different object types As you can see in the repository object flow diagram on the right, the repository interacts with the infrastructure layer, and this interface is part of the domain layer. The requestor may belong to a domain layer, or an application layer. The repository helps the system to manage the life cycle of domain objects. Factory A factory is required when a simple constructor is not enough to create the object. It helps to create complex objects, or an aggregate that involves the creation of other related objects. A factory is also a part of the life cycle of domain objects, as it is responsible for creating them. Factories and repositories are in some way related to each other, as both refer to domain objects. The factory refers to newly created objects, whereas the repository returns the already existing objects either from the memory, or from external storage. Let's see how control flows, by using a user creation process application. Let's say that a user signs up with a username user1. This user creation first interacts with the factory, which creates the name user1 and then caches it in the domain using the repository, which also stores it in the storage for persistence. When the same user logs in again, the call moves to the repository for a reference. This uses the storage to load the reference and pass it to the requestor. The requestor may then use this user1 object to book the table in a specified restaurant, and at a specified time. These values are passed as parameters, and a table booking record is created in storage using the repository:       The factory may use one of the object-oriented programming patterns, such as the factory or abstract factory pattern, for object creation. Modules Modules are the best way to separate related business objects. These are best suited to large projects where the size of domain objects is bigger. For the end user, it makes sense to divide the domain model into modules and set the relationship between these modules. Once you understand the modules and their relationship, you start to see the bigger picture of the domain model, thus it's easier to drill down further and understand the model. Modules also help you to write code that is highly cohesive, or maintains low coupling. Ubiquitous language can be used to name these modules. For the table booking system, we could have different modules, such as user-management, restaurants and tables, analytics and reports, and reviews, and so on. This introduction to domain driven design should give you a strong foundation for using it when you build software. It's principles are useful - in particular, making sure you collaborate and use the same language as different stakeholders is one of domain driven design's most valuable contributions to the way we approach software development.
Read more
  • 0
  • 0
  • 25354

article-image-top-5-free-business-intelligence-tools
Amey Varangaonkar
02 Apr 2018
7 min read
Save for later

Top 5 free Business Intelligence tools

Amey Varangaonkar
02 Apr 2018
7 min read
There is no shortage of business intelligence tools available to modern businesses today. But they're not always easy on the pocket. Great functionality, stylish UI and ease of use always comes with a price tag. If you can afford it, great - if not, it's time to start thinking about open source and free business intelligence tools.  Free business intelligence tools can power your business Take a look at 5 of the best free or open source business intelligence tools. They're all as effective and powerful as anything you'd pay a premium for. You simply need to know what you're doing with them. BIRT BIRT (Business Intelligence and Reporting Tools) is an open-source project that offers industry-standard reporting and BI capabilities. It's available as both a desktop and web application. As a top-level project within the umbrella of the Eclipse Foundation, it's got a good pedigree that means you can be confident in its potency. BIRT is especially useful for businesses which have a working environment built around Java and Java EE, as its reporting and charting engines can integrate seamlessly with Java. From creating a range of reports to different types of charts and graphs, BIRT can also be used for advanced analytical tasks. You can learn about the impressive reporting capabilities that BIRT offers on its official features page. Pros: The BIRT platform is one of the most popularly used open source business intelligence tools across the world, with more than 12 million downloads and 2.5 million users across more than 150 countries. With a large community of users, getting started with this tool, or getting solutions to problems that you might come across should be easy. Cons: Some programming experience, preferably in Java, is required to make the best use of this tool. The complex functions and features may not be easy to grasp for absolute beginners. Jaspersoft Community Jaspersoft, formerly known as Panscopic, is one of the leading open source suites of tools for a variety of reporting and business intelligence tasks. It was acquired by TIBCO in 2014 in a deal worth approximately $185 million, and has grown in popularity ever since. Jaspersoft began with the promise of “saving the world from the oppression of complex, heavyweight business intelligence”, and the Community edition offers the following set of tools for easier reporting and analytics: JasperReports Server: This tool is used for designing standalone or embeddable reports which can be used across third party applications JasperReports Library: You can design pixel-perfect reports from different kinds of datasets Jaspersoft ETL: This is a popular warehousing tool powered by Talend for extracting useful insights from a variety of data sources Jaspersoft Studio: Eclipse-based report designer for JasperReports and JasperReports Server Visualize.js: A JavaScript-based framework to embed Jaspersoft applications Pros: Jaspersoft, like BIRT, has a large community of developers looking to actively solve any problem they might come across. More often than not, your queries are bound to be answered satisfactorily. Cons: Absolute beginners might struggle with the variety of offerings and their applications. The suite of Jaspersoft tools is more suited for someone with an intermediate programming experience. KNIME KNIME is a free, open-source data analytics and business intelligence company that offers a robust platform for reporting and data integration. Used commonly by data scientists and analysts, KNIME offers features for data mining, machine learning and data visualization in order to build effective end-to-end data pipelines. There are 2 major product offerings from KNIME: KNIME Analytics Platform KNIME Cloud Analytics Platform Considered to be one of the most established players in the Analytics and business intelligence market, KNIME has customers in over 60 countries worldwide. You can often find KNIME featured as a ‘Leader’ in the Gartner Magic Quadrant. It finds applications in a variety of enterprise use-cases, including pharma, CRM, finance, and more. Pros: If you want to leverage the power of predictive analytics and machine learning, KNIME offers you just the perfect environment to build industry-standard, accurate models. You can create a wide variety of visualizations including complex plots and charts, and perform complex ETL tasks with relative ease. Cons: KNIME is not suited for beginners. It's built instead for established professionals such as data scientists and analysts who want to conduct analyses quickly and efficiently. Tableau Public Tableau Public’s promise is simple - “Visualize and share your data in minutes - for free”. Tableau is one of the most popular business intelligence tools out there, rivalling the likes of Qlik, Spotfire, Power BI among others. Along with its enterprise edition which offers premium analytics, reporting and dashboarding features, Tableau also offers a freely available Public version for effective visual analytics. Last year, Tableau released an announcement that the interactive stories and reports published on the Tableau Public platform had received more than 1 billion views worldwide. Leading news organizations around the world, including BBC and CNBC, use Tableau Public for data visualization. Pros: Tableau Public is a very popular tool with a very large community of users. If you find yourself struggling to understand or execute any feature on this platform, there are ample number of solutions available on the community forums and also on forums such as Stack Overflow. The quality of visualizations is industry-standard, and you can publish them anywhere on the web without any hassle. Cons:It’s quite difficult to think of any drawback of using Tableau Public, to be honest. Having limited features as compared to the enterprise edition of Tableau is obviously a shortcoming, though. [box type="info" align="" class="" width=""]Editor’s tip: If you want to get started with Tableau Public and create interesting data stories using it, Creating Data Stories with Tableau Public is one book you do not want to miss out on![/box] Microsoft Power BI Microsoft Power BI is a paid, enterprise-ready offering by Microsoft to empower businesses to find intuitive data insights across a variety of data formats. Microsoft also offers a stripped-down version of Power BI with limited Business Intelligence capabilities called as Power BI Desktop. In this free version, users are offered up to 1 GB of data to work on, and the ability to create different kinds of visualizations on CSV data as well as Excel spreadsheets. The reports and visualizations built using Power BI Desktop can be viewed on mobile devices as well as on browsers, and can be updated on the go. Pros: Free, very easy to use. Power BI Desktop allows you to create intuitive visualizations and reports. For beginners looking to learn the basics of Business Intelligence and data visualization, this is a great tool to use. You can also work with any kind of data and connect it to the Power BI Desktop effortlessly. Cons: You don’t get the full suite of features on Power BI Desktop which make Power BI such an elegant and wonderful Business Intelligence tool. Also, new reports and dashboards cannot be created via the mobile platform. [box type="info" align="" class="" width=""]Editor’s Tip: If you want to get started with Microsoft Power BI, or want handy tips on using Power BI effectively, our Microsoft Power BI Cookbook will prove to be of great use! [/box] There are a few other free and open source tools which are quite effective and find a honorary mention in this article. We were absolutely spoilt for choices, and choosing the top 5 tools list among all these options was a lot of hard work! Some other tools which deserve a honorary mention are - Dataiku Free Edition, Pentaho Community Edition, QlikView Personal Edition, Rapidminer, among others. You may want to check them out as well. What do you think about this list? Are there any other free/open source business intelligence tools which should’ve made it into list?
Read more
  • 0
  • 0
  • 44955
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime
article-image-how-to-handle-exceptions-and-synchronization-methods-with-selenium-webdriver-api
Amey Varangaonkar
02 Apr 2018
11 min read
Save for later

How to handle exceptions and synchronization methods with Selenium WebDriver API

Amey Varangaonkar
02 Apr 2018
11 min read
One of the areas often misunderstood, but is important in framework design is exception handling. Users must program into their tests and methods on how to handle exceptions that might occur in tests, including those that are thrown by applications themselves, and those that occur using the Selenium WebDriver API. In this article, we will see how to do that effectively. Let us look at different kinds of exceptions that users must account for: Implicit exceptions: Implicit exceptions are internal exceptions raised by the API method when a certain condition is not met, such as an illegal index of an array, null pointer, file not found, or something unexpected occurring at runtime. Explicit exceptions: Explicit exceptions are thrown by the user to transfer control out of the current method, and to another event handler when certain conditions are not met, such as an object is not found on the page, a test verification fails, or something expected as a known state is not met. In other words, the user is predicting that something will occur, and explicitly throws an exception if it does not. WebDriver exceptions: The Selenium WebDriver API has its own set of exceptions that can implicitly occur when elements are not found, elements are not visible, elements are not enabled or clickable, and so on. They are thrown by the WebDriver API method, but users can catch those exceptions and explicitly handle them in a predictable way. Try...catch blocks: In Java, exception handling can be completely controlled using a try...catch block of statements to transfer control to another method, so that the exit out of the current routine doesn't transfer control to the call handler up the chain, but rather, is handled in a predictable way before the exception is thrown. Let us examine the different ways of handling exceptions during automated testing. Implicit exception handling A simple example of Selenium WebDriver implicit exception handling can be described as follows: Define an element on a page Create a method to retrieve the text from the element on the page In the signature of the method, add throws Exception Do not handle a specific exception like ElementNotFoundException: // create a method to retrieve the text from an element on a page @FindBy(id="submit") protected M submit; public String getText(WebElement element) throws Exception { return element.getText(); } // use the method LoginPO.getText(submit); Now, when using an assertion method, TestNG will implicitly throw an exception if the condition is not met: Define an element on a page Create a method to verify the text of the element on a page Cast the expected and actual text to the TestNG's assertEquals method TestNG will throw an AssertionError TestNG engages the difference viewer to compare the result if it fails: // create a method to verify the text from an element on a page @FindBy(id="submit") protected M submit; public void verifyText(WebElement element, String expText) throws AssertionError { assertEquals(element.getText(), expText, "Verify Submit Button Text"); } // use the method LoginPO.verifyText(submit, "Sign Inx"); // throws AssertionError java.lang.AssertionError: Verify Text Label expected [ Sign Inx] but found [ Sign In] Expected : Sign Inx Actual : Sign In <Click to see difference> TestNG difference viewer When using the TestNG's assertEquals methods, a difference viewer will be engaged if the comparison fails. There will be a link in the stacktrace in the console to open it. Since it is an overloaded method, it can take a number of data types, such as String, Integer, Boolean, Arrays, Objects, and so on. The following screenshot displays the TestNG difference viewer: Explicit exception handling In cases where the user can predict when an error might occur in the application, they can check for that error and explicitly raise an exception if it is found. Take the login function of a browser or mobile application as an example. If the user credentials are incorrect, the app will throw an exception saying something like "username invalid, try again" or "password incorrect, please re-enter". The exception can be explicitly handled in a way that the actual error message can be thrown in the exception. Here is an example of the login method we wrote earlier with exception handling added to it: @FindBy(id="myApp_exception") protected M error; /** * login - method to login to app with error handling * * @param username * @param password * @throws Exception */ public void login(String username, String password) throws Exception { if ( !this.username.getAttribute("value").equals("") ) { this.username.clear(); } this.username.sendKeys(username); if ( !this.password.getAttribute( "value" ).equals( "" ) ) { this.password.clear(); } this.password.sendKeys(password); submit.click(); // exception handling if ( BrowserUtils.elementExists(error, Global_VARS.TIMEOUT_SECOND) ) { String getError = error.getText(); throw new Exception("Login Failed with error = " + getError); } } Try...catch exception handling Now, sometimes the user will want to trap an exception instead of throwing it, and perform some other action such as retry, reload page, cleanup dialogs, and so on. In cases like that, the user can use try...catch in Java to trap the exception. The action would be included in the try clause, and the user can decide what to do in the catch condition. Here is a simple example that uses the ExpectedConditions method to look for an element on a page, and only return true or false if it is found. No exception will be raised:  /** * elementExists - wrapper around the WebDriverWait method to * return true or false * * @param element * @param timer * @throws Exception */ public static boolean elementExists(WebElement element, int timer) { try { WebDriver driver = CreateDriver.getInstance().getCurrentDriver(); WebDriverWait exists = new WebDriverWait(driver, timer); exists.until(ExpectedConditions.refreshed( ExpectedConditions.visibilityOf(element))); return true; } catch (StaleElementReferenceException | TimeoutException | NoSuchElementException e) { return false; } } In cases where the element is not found on the page, the Selenium WebDriver will return a specific exception such as ElementNotFoundException. If the element is not visible on the page, it will return ElementNotVisibleException, and so on. Users can catch those specific exceptions in a try...catch...finally block, and do something specific for each type (reload page, re-cache element, and so on): try { .... } catch(ElementNotFoundException e) { // do something } catch(ElementNotVisibleException f) { // do something else } finally { // cleanup } Synchronizing methods Earlier, the login method was introduced, and in that method, we will now call one of the synchronization methods waitFor(title, timer) that we created in the utility classes. This method will wait for the login page to appear with the title element as defined. So, in essence, after the URL is loaded, the login method is called, and it synchronizes against a predefined page title. If the waitFor method doesn't find it, it will throw an exception, and the login will not be attempted. It's important to predict and synchronize the page object methods so that they do not get out of "sync" with the application and continue executing when a state has not been reached during the test. This becomes a tedious process during the development of the page object methods, but pays big dividends in the long run when making those methods "robust". Also, users do not have to synchronize before accessing each element. Usually, you would synchronize against the last control rendered on a page when navigating between them. In the same login method, it's not enough to just check and wait for the login page title to appear before logging in; users must also wait for the next page to render, that being the home page of the application. So, finally, in the login method we just built, another waitFor will be added: public void login(String username, String password) throws Exception { BrowserUtils.waitFor(getPageTitle(), getElementWait()); if ( !this.username.getAttribute("value").equals("") ) { this.username.clear(); } this.username.sendKeys(username); if ( !this.password.getAttribute( "value" ).equals( "" ) ) { this.password.clear(); } this.password.sendKeys(password); submit.click(); // exception handling if ( BrowserUtils.elementExists(error, Global_VARS.TIMEOUT_SECOND) ) { String getError = error.getText(); throw new Exception("Login Failed with error = " + getError); } // wait for the home page to appear BrowserUtils.waitFor(new MyAppHomePO<WebElement>().getPageTitle(), getElementWait()); } Table classes When building the page object classes, there will frequently be components on a page that are common to multiple pages, but not all pages, and rather than including the similar locators and methods in each class, users can build a common class for just that portion of the page. HTML tables are a typical example of a common component that can be classed. So, what users can do is create a generic class for the common table rows and columns, extend the subclasses that have a table with this new class, and pass in the dynamic ID or locator to the constructor when extending the subclass with that table class. Let's take a look at how this is done: Create a new page object class for the table component in the application, but do not derive it from the base class in the framework In the constructor of the new class, add a parameter of the type WebElement, requiring users to pass in the static element defined in each subclass for that specific table Create generic methods to get the row count, column count, row data, and cell data for the table In each subclass that inherits these methods, implement them for each page, varying the starting row number and/or column header rows if <th> is used rather than <tr> When the methods are called on each table, it will identify them using the WebElement passed into the constructor: /** * WebTable Page Object Class * * @author Name */ public class WebTablePO { private WebElement table; /** constructor * * @param table * @throws Exception */ public WebTablePO(WebElement table) throws Exception { setTable(table); } /** * setTable - method to set the table on the page * * @param table * @throws Exception */ public void setTable(WebElement table) throws Exception { this.table = table; } /** * getTable - method to get the table on the page * * @return WebElement * @throws Exception */ public WebElement getTable() throws Exception { return this.table; } .... Now, the structure of the class is simple so far, so let's add in some common "generic" methods that can be inherited and extended by each subclass that extends the class: // Note: JavaDoc will be eliminated in these examples for simplicity sake public int getRowCount() { List<WebElement> tableRows = table.findElements(By.tagName("tr")); return tableRows.size(); } public int getColumnCount() { List<WebElement> tableRows = table.findElements(By.tagName("tr")); WebElement headerRow = tableRows.get(1); List<WebElement> tableCols = headerRow.findElements(By.tagName("td")); return tableCols.size(); } public int getColumnCount(int index) { List<WebElement> tableRows = table.findElements(By.tagName("tr")); WebElement headerRow = tableRows.get(index); List<WebElement> tableCols = headerRow.findElements(By.tagName("td")); return tableCols.size(); } public String getRowData(int rowIndex) { List<WebElement> tableRows = table.findElements(By.tagName("tr")); WebElement currentRow = tableRows.get(rowIndex); return currentRow.getText(); } public String getCellData(int rowIndex, int colIndex) { List<WebElement> tableRows = table.findElements(By.tagName("tr")); WebElement currentRow = tableRows.get(rowIndex); List<WebElement> tableCols = currentRow.findElements(By.tagName("td")); WebElement cell = tableCols.get(colIndex - 1); return cell.getText(); } Finally, let's extend a subclass with the new WebTablePO class, and implement some of the methods: /** * Homepage Page Object Class * * @author Name */ public class MyHomepagePO<M extends WebElement> extends WebTablePO<M> { public MyHomepagePO(M table) throws Exception { super(table); } @FindBy(id = "my_table") protected M myTable; // table methods public int getTableRowCount() throws Exception { WebTablePO table = new WebTablePO(getTable()); return table.getRowCount(); } public int getTableColumnCount() throws Exception { WebTablePO table = new WebTablePO(getTable()); return table.getColumnCount(); } public int getTableColumnCount(int index) throws Exception { WebTablePO table = new WebTablePO(getTable()); return table.getColumnCount(index); } public String getTableCellData(int row, int column) throws Exception { WebTablePO table = new WebTablePO(getTable()); return table.getCellData(row, column); } public String getTableRowData(int row) throws Exception { WebTablePO table = new WebTablePO(getTable()); return table.getRowData(row).replace("\n", " "); } public void verifyTableRowData(String expRowText) { String actRowText = ""; int totalNumRows = getTableRowCount(); // parse each row until row data found for ( int i = 0; i < totalNumRows; i++ ) { if ( this.getTableRowData(i).contains(expRowText) ) { actRowText = this.getTableRowData(i); break; } } // verify the row data try { assertEquals(actRowText, expRowText, "Verify Row Data"); } catch (AssertionError e) { String error = "Row data '" + expRowText + "' Not found!"; throw new Exception(error); } } } We saw, how fairly effective it is to handle object class methods, especially when it comes to handling synchronization and exceptions. You read an excerpt from the book Selenium Framework Design in Data-Driven Testing by Carl Cocchiaro. The book will show you how to design your own automation testing framework without any hassle.
Read more
  • 0
  • 0
  • 22693

article-image-3-ways-to-deploy-a-qt-and-opencv-application
Gebin George
02 Apr 2018
16 min read
Save for later

3 ways to deploy a QT and OpenCV application

Gebin George
02 Apr 2018
16 min read
[box type="note" align="" class="" width=""]This article is an excerpt from the book, Computer Vision with OpenCV 3 and Qt5 written by Amin Ahmadi Tazehkandi.  This book covers how to build, test, and deploy Qt and OpenCV apps, either dynamically or statically.[/box] Today, we will learn three different methods to deploy a QT + OpenCV application. It is extremely important to provide the end users with an application package that contains everything it needs to be able to run on the target platform. And demand very little or no effort at all from the users in terms of taking care of the required dependencies. Achieving this kind of works-out-of-the-box condition for an application relies mostly on the type of the linking (dynamic or static) that is used to create an application, and also the specifications of the target operating system. Deploying using static linking Deploying an application statically means that your application will run on its own and it eliminates having to take care of almost all of the needed dependencies, since they are already inside the executable itself. It is enough to simply make sure you select the Release mode while building your application, as seen in the following screenshot: When your application is built in the Release mode, you can simply pick up the produced executable file and ship it to your users. If you try to deploy your application to Windows users, you might face an error similar to the following when your application is executed: The reason for this error is that on Windows, even when building your Qt application statically, you still need to make sure that Visual C++ Redistributables exist on the target system. This is required for C++ applications that are built by using Microsoft Visual C++, and the version of the required redistributables correspond to the Microsoft Visual Studio installed on your computer. In our case, the official title of the installer for these libraries is Visual C++ Redistributables for Visual Studio 2015, and it can be downloaded from the following link: https:/ / www. microsoft. com/en- us/ download/ details. aspx? id= 48145. It is a common practice to include the redistributables installer inside the installer for our application and perform a silent installation of them if they are not already installed. This process happens with most of the applications you use on your Windows PCs, most of the time, without you even noticing it. We already quite briefly talked about the advantages (fewer files to deploy) and disadvantages (bigger executable size) of static linking. But when it is meant in the context of deployment, there are some more complexities that need to be considered. So, here is another (more complete) list of disadvantages, when using static linking to deploy your applications: The building takes more time and the executable size gets bigger and bigger. You can't mix static and shared (dynamic) Qt libraries, which means you can't use the power of plugins and extending your application without building everything from scratch. Static linking, in a sense, means hiding the libraries used to build an application. Unfortunately, this option is not offered with all libraries, and failing to comply with it can lead to licensing issues with your application. This complexity arises partly because of the fact that Qt Framework uses some third-party libraries that do not offer the same set of licensing options as Qt itself. Talking about licensing issues is not a discussion suitable for this book, so we'll suffice with mentioning that you must be careful when you plan to create commercial applications using static linking of Qt libraries. For a detailed list of licenses used by third-party libraries within Qt, you can always refer to the Licenses Used in Qt web page from the following link: http://doc.qt.io/qt-5/ licenses-used-in-qt.html Static linking, even with all of its disadvantages that we just mentioned, is still an option, and a good one in some cases, provided that you can comply with the licensing options of the Qt Framework. For instance, in Linux operating systems where creating an installer for our application requires some extra work and care, static linking can help extremely reduce the effort needed to deploy applications (merely a copy and paste). So, the final decision of whether to use static linking or not is mostly on you and how you plan to deploy your application. Making this important decision will be much easier by the end of this chapter, when you have an overview of the possible linking and deployment methods. Deploying using dynamic linking When you deploy an application built with Qt and OpenCV using shared libraries (or dynamic linking), you need to make sure that the executable of your application is able to reach the runtime libraries of Qt and OpenCV, in order to load and use them. This reachability or visibility of runtime libraries can have different meanings depending on the operating system. For instance, on Windows, you need to copy the runtime libraries to the same folder where your application executable resides, or put them in a folder that is appended to the PATH environment value. Qt Framework offers command-line tools to simplify the deployment of Qt applications on Windows and macOS. As mentioned before, the first thing you need to do is to make sure your application is built in the Release mode, and not Debug mode. Then, if you are on Windows, first copy the executable (let us assume it is called app.exe) from the build folder into a separate folder (which we will refer to as deploy_path) and execute the following commands using a command-line instance: cd deploy_path QT_PATHbinwindeployqt app.exe The windeployqt tool is a deployment helper tool that simplifies the process of copying the required Qt runtime libraries into the same folder as the application executable. Itsimply takes an executable as a parameter and after determining the modules used to create it, copies all required runtime libraries and any additional required dependencies, such as Qt plugins, translations, and so on. This takes care of all the required Qt runtime libraries, but we still need to take care of OpenCV runtime libraries. If you followed all of the steps in Chapter 1, Introduction to OpenCV and Qt, for building OpenCV libraries dynamically, then you only need to manually copy the opencv_world330.dll and opencv_ffmpeg330.dll files from OpenCV installation folder (inside the x86vc14bin folder) into the same folder where your application executable resides. We didn't really go into the benefits of turning on the BUILD_opencv_world option when we built OpenCV in the early chapters of the book; however, it should be clear now that this simplifies the deployment and usage of the OpenCV libraries, by requiring only a single entry for LIBS in the *.pro file and manually copying only a single file (not counting the ffmpeg library) when deploying OpenCV applications. It should be also noted that this method has the disadvantage of copying all OpenCV codes (in a single library) along your application even when you do not need or use all of its modules in a project. Also note that on Windows, as mentioned in the Deploying using static linking section, you still need to similarly provide the end users of your application with Microsoft Visual C++ Redistributables. On a macOS operating system, it is also possible to easily deploy applications written using Qt Framework. For this reason, you can use the macdeployqt command-line tool provided by Qt. Similar to windeployqt, which accepts a Windows executable and fills the same folder with the required libraries, macdeployqt accepts a macOS application bundle and makes it deployable by copying all of the required Qt runtimes as private frameworks inside the bundle itself. Here is an example: cd deploy_path QT_PATH/bin/macdeployqt my_app_bundle Optionally, you can also provide an additional -dmg parameter, which leads to the creation of a macOS *.dmg (disk image) file. As for the deployment of OpenCV libraries when dynamic linking is used, you can create an installer using Qt Installer Framework (which we will learn about in the next section), a third-party provider, or a script that makes sure the required runtime libraries are copied to their required folders. This is because of the fact that simply copying your runtime libraries (whether it is OpenCV or anything else) to the same folder as the application executable does not help with making them visible to an application on macOS. The same also applies to the Linux operating system, where unfortunately even a tool for deploying Qt runtime libraries does not exist (at least for the moment), so we also need to take care of Qt libraries in addition to OpenCV libraries, either by using a trusted third-party provider (which you can search for online) or by using the cross-platform installer provided by Qt itself, combined with some scripting to make sure everything is in place when our application is executed. Deploy using Qt Installer Framework Qt Installer Framework allows you to create cross-platform installers of your Qt applications for Windows, macOS, and Linux operating systems. It allows for creating standard installer wizards where the user is taken through consecutive dialogs that provide all the necessary information, and finally display the progress for when the application is being installed and so on, similar to most of installations you have probably faced, and especially the installation of Qt Framework itself. Qt Installer Framework is based on Qt Framework itself but is provided as a different package and does not require Qt SDK (Qt Framework, Qt Creator, and so on) to be present on a computer. It is also possible to use Qt Installer Framework in order to create installer packages for any application, not just Qt applications. In this section, we are going to learn how to create a basic installer using Qt Installer Framework, which takes care of installing your application on a target computer and copying all the necessary dependencies. The result will be a single executable installer file that you can put on a web server to be downloaded or provide it in a USB stick or CD, or any other media type. This example project will help you get started with working your way around the many great capabilities of Qt Installer Framework by yourself. You can use the following link to download and install the Qt Installer Framework. Make sure to simply download the latest version when you use this link, or any other source for downloading it. At the moment, the latest version is 3.0.2: https://download.qt.io/official_releases/qt-installer-framework After you have downloaded and installed Qt Installer Framework, you can start creating the required files that Qt Installer Framework needs in order to create an installer. You can do this by simply browsing to the Qt Installer Framework, and from the examples folder copying the tutorial folder, which is also a template in case you want to quickly rename and re-edit all of the files and create your installer quickly. We will go the other way and create them manually; first because we want to understand the structure of the required files and folders for the Qt Installer Framework, and second, because it is still quite easy and simple. Here are the required steps for creating an installer: Assuming that you have already finished developing your Qt and OpenCV application, you can start by creating a new folder that will contain the installer files. Let's assume this folder is called deploy. Create an XML file inside the deploy folder and name it config.xml. This XML file must contain the following: <?xml version="1.0" encoding="UTF-8"?> <Installer> <Name>Your application</Name> <Version>1.0.0</Version> <Title>Your application Installer</Title> <Publisher>Your vendor</Publisher> <StartMenuDir>Super App</StartMenuDir> <TargetDir>@HomeDir@/InstallationDirectory</TargetDir> </Installer> Make sure to replace the required XML fields in the preceding code with information relevant to your application and then save and close this file: Now, create a folder named packages inside the deploy folder. This folder will contain the individual packages that you want the user to be able to install, or make them mandatory or optional so that the user can review and decide what will be installed. In the case of simpler Windows applications that are written using Qt and OpenCV, usually it is enough to have just a single package that includes the required files to run your application, and even do silent installation of Microsoft Visual C++ Redistributables. But for more complex cases, and especially when you want to have more control over individual installable elements of your application, you can also go for two or more packages, or even sub-packages. This is done by using domain-like folder names for each package. Each package folder can have a name like com.vendor.product, where vendor and product are replaced by the developer name or company and the application. A subpackage (or sub-component) of a package can be identified by adding. subproduct to the name of the parent package. For instance, you can have the following folders inside the packages folder: com.vendor.product com.vendor.product.subproduct1 com.vendor.product.subproduct2 com.vendor.product.subproduct1.subsubproduct1 … This can go on for as many products (packages) and sub-products (sub-packages) as we like. For our example case, let's create a single folder that contains our executable, since it describes it all and you can create additional packages by simply adding them to the packages folder. Let's name it something like com.amin.qtcvapp. Now, follow these required steps: Now, create two folders inside the new package folder that we created, the com.amin.qtcvapp folder. Rename them to data and meta. These two folders must exist inside all packages. Copy your application files inside the data folder. This folder will be extracted into the target folder exactly as it is (we will talk about setting the target folder of a package in the later steps). In case you are planning to create more than one package, then make sure to separate their data correctly and in a way that it makes sense. Of course, you won't be faced with any errors if you fail to do so, but the users of your application will probably be confused, for instance by skipping a package that should be installed at all times and ending up with an installed application that does not work. Now, switch to the meta folder and create the following two files inside that folder, and fill them with the codes provided for each one of them. The package.xml file should contain the following. There's no need to mention that you must fill the fields inside the XML with values relevant to your package: <?xml version="1.0" encoding="UTF-8"?> <Package> <DisplayName>The component</DisplayName> <Description>Install this component.</Description> <Version>1.0.0</Version> <ReleaseDate>1984-09-16</ReleaseDate> <Default>script</Default> <Script>installscript.qs</Script> </Package> The script in the previous XML file, which is probably the most important part of the creation of an installer, refers to a Qt Installer Script (*.qs file), which is named installerscript.qs and can be used to further customize the package, its target folder, and so on. So, let us create a file with the same name (installscript.qs) inside the meta folder, and use the following code inside it: function Component() { // initializations go here } Component.prototype.isDefault = function() { // select (true) or unselect (false) the component by default return true; } Component.prototype.createOperations = function() { try { // call the base create operations function component.createOperations(); } catch (e) { console.log(e); } } This is the most basic component script, which customizes our package (well, it only performs the default actions) and it can optionally be extended to change the target folder, create shortcuts in the Start menu or desktop (on Windows), and so on. It is a good idea to keep an eye on the Qt Installer Framework documentation and learn about its scripting to be able to create more powerful installers that can put all of the required dependencies of your app in place, and automatically. You can also browse through all of the examples inside the examples folder of the Qt Installer Framework and learn how to deal with different deployment cases. For instance, you can try to create individual packages for Qt and OpenCV dependencies and allow the users to deselect them, in case they already have the Qt runtime libraries on their computer. The last step is to use the binarycreator tool to create our single and standalone installer. Simply run the following command by using a Command Prompt (or Terminal) instance: binarycreator -p packages -c config.xml myinstaller The binarycreator is located inside the Qt Installer Framework bin folder. It requires two parameters that we have already prepared. -p must be followed by our packages folder and -c must be followed by the configuration file (or config.xml) file. After executing this command, you will get myinstaller (on Windows, you can append *.exe to it), which you can execute to install your application. This single file should contain all of the required files needed to run your application, and the rest is taken care of. You only need to provide a download link to this file, or provide it on a CD to your users. The following are the dialogs you will face in this default and most basic installer, which contains most of the usual dialogs you would expect when installing an application: If you go to the installation folder, you will notice that it contains a few more files than you put inside the data folder of your package. Those files are required by the installer to handle modifications and uninstall your application. For instance, the users of your application can easily uninstall your application by executing the maintenance tool executable, which would produce another simple and user-friendly dialog to handle the uninstall process: We saw how to deploy a QT + OpenCV applications using static linking, dynamic linking, and QT installer. If you found our post useful, do check out this book Computer Vision with OpenCV 3 and Qt5  to accentuate your OpenCV applications by developing them with Qt.  
Read more
  • 0
  • 0
  • 37237

article-image-3-best-practices-to-develop-effective-test-automation-with-selenium
Amey Varangaonkar
30 Mar 2018
5 min read
Save for later

3 best practices to develop effective test automation with Selenium

Amey Varangaonkar
30 Mar 2018
5 min read
In this article, we will look at some of the industry best practices and standards to use in order to develop and maintain effective test automation strategies with Selenium. 1. Naming Convention When developing the framework, it is important to establish some naming convention standards for each type of file created. In general, this is completely subjective. But it is important to establish them upfront so users can use the same file naming conventions for the same file types to avoid confusion later on, when there are many users building them. Here are a few suggestions: Utility classes: Utility classes don't use any prefix or suffix in their names, but do follow Java standards such as having the first letter of each word capitalized, and ending with .java extensions. (Acronyms used can be all caps). Examples include CreateDriver.java, Global_VARS.java, BrowserUtils.java, DataProvider_JSON.java, and so on. Page object classes: It is useful to be able to differentiate the page object classes from the utility classes. A good way to name them is FeaturePO.java, where PO stands for page object and is capitalized, along with the first letter of each word. End the name with a .java extension. Test classes: It is useful to be able to differentiate the test classes from the PO and utility classes. A good way to name them is FeatureTest.java, where Test stands for test class, and the first letter of each word is capitalized. End the name with a .java extension. Data files: Data files are obviously named with an extension for the type of file, such as .json, .csv, .xls, and so on. But, in the case of this framework, the files can be named the same as the corresponding test class, but without the word Test. For example, LoginCredsTest.java would have the data file LoginCreds.json. Setup classes: Usually, there is a common setup class for setup and teardown for all test classes, that can be named AUTSetup.java. So, as an example, GmailSetup.java would be the setup class for all test classes derived from it, and contains only TestNG annotated methods. Test methods: Most test methods in each test class are named using sequential numbering, followed by a feature and action. For example: tc001_gmailLoginCreds, tc002_gmailLoginPassword, and so on. Setup/teardown methods: The setup and teardown methods can be named according to the setup or teardown action they perform. The following naming conventions can be used in conjunction with the TestNG annotations: @BeforeSuite: The suiteSetup method @AfterSuite: The suiteTeardown method @BeforeClass: The classSetup method @AfterClass: The classTeardown method @BeforeMethod: The methodSetup method @AfterMethod: The methodTeardown method 2. Comments Although obvious and somewhat subjective, it is good practice to comment on code when it is not obvious why something is done, there is a complex routine, or there is a "kluge" added to work around a problem. In Java, there are two types of comments used, as well as a set of standards for JavaDoc. We will look at a couple of examples here: [box type="info" align="" class="" width=""]There is an Oracle article on using comments in Java located at http://www. oracle.com/ technetwork/java/codeconventions-141999.html#385[/box] Block comment: /* single line block comment */ code goes here… /* * multi-line block * comment */ code goes here... End-of-line comment: code goes here // end of line comment JavaDoc comments: /** * Description of the method * * @param arg1 to the method * @param arg2 to the method * return value returned from the method */ [box type="info" align="" class="" width=""]The Oracle documentation on using the JavaDoc tool is located at http://www.oracle.com/technetwork/java/javase/documentation/index-137868.html. [/box] 3. Folder names and structures As the framework starts to evolve, there needs to be some organization around the folder structure in the IDE, along with a naming convention. The IntelliJ IDE uses modules to organize the repo, and under those modules, users can create the folder structures. It is common to also separate the page object and utility classes from the test classes. So, as an example, under the top-level folder src, create main/java/com/yourCo/page objects and test/java/com/yourCo/tests folders. From there, under each structure, users can create feature-based folders. Also, to retain a completely independent set of libraries for the Selenium driver and utility classes, create a separate module called something like Selenium3 with the same folder structures. This will allow users to use the same driver class and utilities for any additional modules that are added to the repo/framework. It is common to automate testing for more than one application, and this will allow the inclusion of the module in those additional modules. Here are a few suggestions regarding folder naming conventions: Name all the folders using lowercase names, so there won't be a mix-and-match of different standards. Name the page object class folders after the features they pertain to; for instance, login for the LoginPO.java, email for the GmailPO.java, and so on. Name the test class folders after the same features as the PO classes, but under the test folder. Then there can be a one-to-one correlation between the PO and test class folders. Store the common base classes under a common folder under main. Store the common setup classes under a common folder under test. Store all the utility classes for the AUT under a utils folder under main. Store all the suite files for the tests under a suites folder under test. Here is an example of a folder structure for the Selenium3 module. Of course, there are no test folders under this one: Here is an example of a folder structure for an AUT module showing the PO and test class Folders: You read an excerpt from the book Selenium Framework Design in Data-Driven Testing  written by Carl Cocchiaro. This book presents effective techniques for building data-driven test frameworks using Selenium WebDriver.
Read more
  • 0
  • 0
  • 41407

article-image-analyzing-moby-dick-frequency-distribution-nltk
Richard Gall
30 Mar 2018
2 min read
Save for later

Analyzing Moby Dick through frequency distribution with NLTK

Richard Gall
30 Mar 2018
2 min read
What is frequency distribution and why does it matter? In the context of natural language processing, frequency distribution is simply a tally of the number of times each unique word is used in a text. Recording the individual word counts of a text can better help us understand not only what topics are being discussed and what information is important but also how that information is being discussed as well. It's a useful method for better understanding language and different types of texts. This video tutorial has been taken from from Natural Language Processing with Python. Word frequency distribution is central to performing content analysis with NLP. Its applications are wide ranging. From understanding and characterizing an author’s writing style to analyzing the vocabulary of rappers, the technique is playing a large part in wider cultural conversations. It’s also used in psychological research in a number of ways to analyze how patients use language to form frameworks for thinking about themselves and the world. Trivial or serious, word frequency distribution is becoming more and more important in the world of research. Of course, manually creating such a word frequency distribution models would be time consuming and inconvenient for data scientists. Fortunately for us, NLTK, Python’s toolkit for natural language processing, makes life much easier. How to use NLTK for frequency distribution Take a look at how to use NLTK to create a frequency distribution for Herman Melville’s Moby Dick in the video tutorial above. In it, you'll find a step by step guide to performing an important data analysis task. Once you've done that, you can try it for yourself, or have a go at performing a similar analysis on another data set. Read Analyzing Textual information using the NLTK library. Learn more about natural language processing - read How to create a conversational assistant or chatbot using Python.
Read more
  • 0
  • 0
  • 9766
article-image-django-and-django-rest-frameworks-build-restful-app
Sugandha Lahoti
29 Mar 2018
12 min read
Save for later

Getting started with Django and Django REST frameworks to build a RESTful app

Sugandha Lahoti
29 Mar 2018
12 min read
In this article, we will learn how to install Django and Django REST framework in an isolated environment. We will also look at the Django folders, files, and configurations, and how to create an app with Django. We will also introduce various command-line and GUI tools that are use to interact with the RESTful Web Services. Installing Django and Django REST frameworks in an isolated environment First, run the following command to install the Django web framework: pip install django==1.11.5 The last lines of the output will indicate that the django package has been successfully installed. The process will also install the pytz package that provides world time zone definitions. Take into account that you may also see a notice to upgrade pip. The next lines show a sample of the four last lines of the output generated by a successful pip installation: Collecting django Collecting pytz (from django) Installing collected packages: pytz, django Successfully installed django-1.11.5 pytz-2017.2 Now that we have installed the Django web framework, we can install Django REST framework. Django REST framework works on top of Django and provides us with a powerful and flexible toolkit to build RESTful Web Services. We just need to run the following command to install this package: pip install djangorestframework==3.6.4 The last lines for the output will indicate that the djangorestframework package has been successfully installed, as shown here: Collecting djangorestframework Installing collected packages: djangorestframework Successfully installed djangorestframework-3.6.4 After following the previous steps, we will have Django REST framework 3.6.4 and Django 1.11.5 installed in our virtual environment. Creating an app with Django Now, we will create our first app with Django and we will analyze the directory structure that Django creates. First, go to the root folder for the virtual environment: 01. In Linux or macOS, enter the following command: cd ~/HillarDjangoREST/01 If you prefer Command Prompt, run the following command in the Windows command line: cd /d %USERPROFILE%\HillarDjangoREST\01 If you prefer Windows PowerShell, run the following command in Windows PowerShell: cd /d $env:USERPROFILE\HillarDjangoREST\01 In Linux or macOS, run the following command to create a new Django project named restful01. The command won't produce any output: python bin/django-admin.py startproject restful01 In Windows, in either Command Prompt or PowerShell, run the following command to create a new Django project named restful01. The command won't produce any output: python Scripts\django-admin.py startproject restful01 The previous command creates a restful01 folder with other subfolders and Python files. Now, go to the recently created restful01 folder. Just execute the following command on any platform: cd restful01 Then, run the following command to create a new Django app named toys within the restful01 Django project. The command won't produce any output: python manage.py startapp toys The previous command creates a new restful01/toys subfolder, with the following files: views.py tests.py models.py apps.py admin.py   init  .py In addition, the restful01/toys folder will have a migrations subfolder with an init  .py Python script. The following diagram shows the folders and files in the directory tree, starting at the restful01 folder with two subfolders - toys and restful01: Understanding Django folders, files, and configurations After we create our first Django project and then a Django app, there are many new folders and files. First, use your favorite editor or IDE to check the Python code in the apps.py file within the restful01/toys folder (restful01\toys in Windows). The following lines show the code for this file: from django.apps import AppConfig class ToysConfig(AppConfig): name = 'toys' The code declares the ToysConfig class as a subclass of the django.apps.AppConfig class that represents a Django application and its configuration. The ToysConfig class just defines the name class attribute and sets its value to 'toys'. Now, we have to add toys.apps.ToysConfig as one of the installed apps in the restful01/settings.py file that configures settings for the restful01 Django project. I built the previous string by concatenating many values as follows: app name + .apps. + class name, which is, toys + .apps. + ToysConfig. In addition, we have to add the rest_framework app to make it possible for us to use Django REST framework. The restful01/settings.py file is a Python module with module-level variables that define the configuration of Django for the restful01 project. We will make some changes to this Django settings file. Open the restful01/settings.py file and locate the highlighted lines that specify the strings list that declares the installed apps. The following code shows the first lines for the settings.py file. Note that the file has more code: """ Django settings for restful01 project. Generated by 'django-admin startproject' using Django 1.11.5. For more information on this file, see https://docs.djangoproject.com/en/1.11/topics/settings/ For the full list of settings and their values, see https://docs.djangoproject.com/en/1.11/ref/settings/ """ import os # Build paths inside the project like this: os.path.join(BASE_DIR, ...) BASE_DIR = os.path.dirname(os.path.dirname(os.path.abspath( file ))) # Quick-start development settings - unsuitable for production # See https://docs.djangoproject.com/en/1.11/howto/deployment/checklist/ # SECURITY WARNING: keep the secret key used in production secret! SECRET_KEY = '+uyg(tmn%eo+fpg+fcwmm&x(2x0gml8)=cs@$nijab%)y$a*xe' # SECURITY WARNING: don't run with debug turned on in production! DEBUG = True ALLOWED_HOSTS = [] # Application definition INSTALLED_APPS = [ 'django.contrib.admin', 'django.contrib.auth', 'django.contrib.contenttypes', 'django.contrib.sessions', 'django.contrib.messages', 'django.contrib.staticfiles', ]         Add the following two strings to the INSTALLED_APPS strings list and save the changes to the restful01/settings.py file: 'rest_framework' 'toys.apps.ToysConfig' The following lines show the new code that declares the INSTALLED_APPS string list with the added lines highlighted and with comments to understand what each added string means. The code file for the sample is included in the hillar_django_restful_01 folder: INSTALLED_APPS = [ 'django.contrib.admin', 'django.contrib.auth', 'django.contrib.contenttypes', 'django.contrib.sessions', 'django.contrib.messages', 'django.contrib.staticfiles', # Django REST framework 'rest_framework', # Toys application 'toys.apps.ToysConfig', ] This way, we have added Django REST framework and the toys application to our initial Django project named restful01. Installing tools Now, we will leave Django for a while and we will install many tools that we will use to interact with the RESTful Web Services that we will develop throughout this book. We will use the following different kinds of tools to compose and send HTTP requests and visualize the responses throughout our book: Command-line tools GUI tools Python code Web browser JavaScript code You can use any other application that allows you to compose and send HTTP requests. There are many apps that run on tablets and smartphones that allow you to accomplish this task. However, we will focus our attention on the most useful tools when building RESTful Web Services with Django. Installing Curl We will start installing command-line tools. One of the key advantages of command-line tools is that you can easily run again the HTTP requests again after we have built them for the first time, and we don't need to use the mouse or tap the screen to run requests. We can also easily build a script with batch requests and run them. As happens with any command-line tool, it can take more time to perform the first requests compared with GUI tools, but it becomes easier once we have performed many requests and we can easily reuse the commands we have written in the past to compose new requests. Curl, also known as cURL, is a very popular open source command-line tool and library that allows us to easily transfer data. We can use the curl command-line tool to easily compose and send HTTP requests and check their responses. In Linux or macOS, you can open a Terminal and start using curl from the command line. In Windows, you have two options. You can work with curl in Command Prompt or you can decide to install curl as part of the Cygwin package installation option and execute it from the Cygwin terminal. You can read more about the Cygwin terminal and its installation procedure at: http://cygwin.com/install.html. Windows Powershell includes a curl alias that calls the Invoke-WebRequest command, and therefore, if you want to work with Windows Powershell with curl, it is necessary to remove the curl alias. If you want to use the curl command within Command Prompt, you just need to download and unzip the latest version of the curl download page: https://curl.haxx.se/download.html. Make sure you download the version that includes SSL and SSH. The following screenshot shows the available downloads for Windows. The Win64 - Generic section includes the versions that we can run in Command Prompt or Windows Powershell. After you unzip the .7zip or .zip file you have downloaded, you can include the folder in which curl.exe is included in your path. For example, if you unzip the Win64 x86_64.7zip file, you will find curl.exe in the bin folder. The following screenshot shows the results of executing curl --version on Command Prompt in Windows 10. The --version option makes curl display its version and all the libraries, protocols, and features it supports: Installing HTTPie Now, we will install HTTPie, a command-line HTTP client written in Python that makes it easy to send HTTP requests and uses a syntax that is easier than curl. By default, HTTPie displays colorized output and uses multiple lines to display the response details. In some cases, HTTPie makes it easier to understand the responses than the curl utility. However, one of the great disadvantages of HTTPie as a command-line utility is that it takes more time to load than curl, and therefore, if you want to code scripts with too many commands, you have to evaluate whether it makes sense to use HTTPie. We just need to make sure we run the following command in the virtual environment we have just created and activated. This way, we will install HTTPie only for our virtual environment. Run the following command in the terminal, Command Prompt, or Windows PowerShell to install the httpie package: pip install --upgrade httpie The last lines of the output will indicate that the httpie package has been successfully installed: Collecting httpie Collecting colorama>=0.2.4 (from httpie) Collecting requests>=2.11.0 (from httpie) Collecting Pygments>=2.1.3 (from httpie) Collecting idna<2.7,>=2.5 (from requests>=2.11.0->httpie) Collecting urllib3<1.23,>=1.21.1 (from requests>=2.11.0->httpie) Collecting chardet<3.1.0,>=3.0.2 (from requests>=2.11.0->httpie) Collecting certifi>=2017.4.17 (from requests>=2.11.0->httpie) Installing collected packages: colorama, idna, urllib3, chardet, certifi, requests, Pygments, httpie Successfully installed Pygments-2.2.0 certifi-2017.7.27.1 chardet-3.0.4 colorama-0.3.9 httpie-0.9.9 idna-2.6 requests-2.18.4 urllib3-1.22 Now, we will be able to use the http command to easily compose and send HTTP requests to our future RESTful Web Services build with Django. The following screenshot shows the results of executing http on Command Prompt in Windows 10. HTTPie displays the valid options and indicates that a URL is required: Installing the Postman REST client So far, we have installed two terminal-based or command-line tools to compose and send HTTP requests to our Django development server: cURL and HTTPie. Now, we will start installing Graphical User Interface (GUI) tools. Postman is a very popular API testing suite GUI tool that allows us to easily compose and send HTTP requests, among other features. Postman is available as a standalone app in Linux, macOS, and Windows. You can download the versions of the Postman app from the following URL: https://www.getpostman.com. The following screenshot shows the HTTP GET request builder in Postman: Installing Stoplight Stoplight is a very useful GUI tool that focuses on helping architects and developers to model complex APIs. If we need to consume our RESTful Web Service in many different programming languages, we will find Stoplight extremely helpful. Stoplight provides an HTTP request maker that allows us to compose and send requests and generate the necessary code to make them in different programming languages, such as JavaScript, Swift, C#, PHP, Node, and Go, among others. Stoplight provides a web version and is also available as a standalone app in Linux, macOS, and Windows. You can download the versions of Stoplight from the following URL: http://stoplight.io/. The following screenshot shows the HTTP GET request builder in Stoplight with the code generation at the bottom: Installing iCurlHTTP We can also use apps that can compose and send HTTP requests from mobile devices to work with our RESTful Web Services. For example, we can work with the iCurlHTTP app on iOS devices such as iPad and iPhone: https://itunes.apple.com/us/app/icurlhttp/id611943891. On Android devices, we can work with the HTTP Request app: https://play.google.com/store/apps/details?id=air.http.request&hl=en. The following screenshot shows the UI for the iCurlHTTP app running on an iPad Pro: At the time of writing, the mobile apps that allow you to compose and send HTTP requests do not provide all the features you can find in Postman or command-line utilities. We learnt to set up a virtual environment with Django and Django REST framework and created an app with Django. We looked at Django folders, files, and configurations and installed command-line and GUI tools to interact with the RESTful Web Services. This article is an excerpt from the book, Django RESTful Web Services, written by Gaston C. Hillar. This book serves as an easy guide to build Python RESTful APIs and web services with Django. The code bundle for the article is hosted on GitHub.
Read more
  • 0
  • 0
  • 17835

article-image-how-to-build-and-deploy-microservices-using-payara-micro
Gebin George
28 Mar 2018
9 min read
Save for later

How to build and deploy Microservices using Payara Micro

Gebin George
28 Mar 2018
9 min read
Payara Micro offers a new way to run Java EE or microservice applications. It is based on the Web profile of Glassfish and bundles few additional APIs. The distribution is designed keeping modern containerized environment in mind. Payara Micro is available to download as a standalone executable JAR, as well as a Docker image. It's an open source MicroProfile compatible runtime. Today, we will learn to use payara micro to build and deploy microservices. Here’s a list of APIs that are supported in Payara Micro: Servlets, JSTL, EL, and JSPs WebSockets JSF JAX-RS Chapter 4 [ 91 ] EJB lite JTA JPA Bean Validation CDI Interceptors JBatch Concurrency JCache We will be exploring how to build our services using Payara Micro in the next section. Building services with Payara Micro Let's start building parts of our Issue Management System (IMS), which is going to be a one-stop-destination for collaboration among teams. As the name implies, this system will be used for managing issues that are raised as tickets and get assigned to users for resolution. To begin the project, we will identify our microservice candidates based on the business model of IMS. Here, let's define three functional services, which will be hosted in their own independent Git repositories: ims-micro-users ims-micro-tasks ims-micro-notify You might wonder, why these three and why separate repositories? We could create much more fine-grained services and perhaps it wouldn't be wrong to do so. The answer lies in understanding the following points: Isolating what varies: We need to be able to independently develop and deploy each unit. Changes to one business capability or domain shouldn't require changes in other services more often than desired. Organisation or Team structure: If you define teams by business capability, then they can work independent of others and release features with greater agility. The tasks team should be able to evolve independent of the teams that are handling users or notifications. The functional boundaries should allow independent version and release cycle management. Transactional boundaries for consistency: Distributed transactions are not easy, thus creating services for related features that are too fine grained, and lead to more complexity than desired. You would need to become familiar with concepts like eventual consistency, but these are not easy to achieve in practice. Source repository per service: Setting up a single repository that hosts all the services is ideal when it's the same team that works on these services and the project is relatively small. But we are building our fictional IMS, which is a large complex system with many moving parts. Separate teams would get tightly coupled by sharing a repository. Moreover, versioning and tagging of releases will be yet another problem to solve. The projects are created as standard Java EE projects, which are Skinny WARs, that will be deployed using the Payara Micro server. Payara Micro allows us to delay the decision of using a Fat JAR or Skinny WAR. This gives us flexibility in picking the deployment choice at a later stage. As Maven is a widely adopted build tool among developers, we will use the same to create our example projects, using the following steps: mvn archetype:generate -DgroupId=org.jee8ng -DartifactId=ims-micro-users - DarchetypeArtifactId=maven-archetype-webapp -DinteractiveMode=false mvn archetype:generate -DgroupId=org.jee8ng -DartifactId=ims-micro-tasks - DarchetypeArtifactId=maven-archetype-webapp -DinteractiveMode=false mvn archetype:generate -DgroupId=org.jee8ng -DartifactId=ims-micro-notify - DarchetypeArtifactId=maven-archetype-webapp -DinteractiveMode=false Once the structure is generated, update the properties and dependencies section of pom.xml with the following contents, for all three projects: <properties> <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> <maven.compiler.source>1.8</maven.compiler.source> <maven.compiler.target>1.8</maven.compiler.target> <failOnMissingWebXml>false</failOnMissingWebXml> </properties> <dependencies> <dependency> <groupId>javax</groupId> <artifactId>javaee-api</artifactId> <version>8.0</version> <scope>provided</scope> </dependency> Chapter 4 [ 93 ] <dependency> <groupId>junit</groupId> <artifactId>junit</artifactId> <version>4.12</version> <scope>test</scope> </dependency> </dependencies> Next, create a beans.xml file under WEB-INF folder for all three projects: <?xml version="1.0" encoding="UTF-8"?> <beans xmlns="http://xmlns.jcp.org/xml/ns/javaee" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://xmlns.jcp.org/xml/ns/javaee http://xmlns.jcp.org/xml/ns/javaee/beans_2_0.xsd" bean-discovery-mode="all"> </beans> You can delete the index.jsp and web.xml files, as we won't be needing them. The following is the project structure of ims-micro-users. The same structure will be used for ims-micro-tasks and ims-micro-notify: The package name for users, tasks, and notify service will be as shown as the following: org.jee8ng.ims.users (inside ims-micro-users) org.jee8ng.ims.tasks (inside ims-micro-tasks) org.jee8ng.ims.notify (inside ims-micro-notify) Each of the above will in turn have sub-packages called boundary, control, and entity. The structure follows the Boundary-Control-Entity (BCE)/Entity-Control-Boundary (ECB) pattern. The JaxrsActivator shown as follows is required to enable the JAX-RS API and thus needs to be placed in each of the projects: import javax.ws.rs.ApplicationPath; import javax.ws.rs.core.Application; @ApplicationPath("resources") public class JaxrsActivator extends Application {} All three projects will have REST endpoints that we can invoke over HTTP. When doing RESTful API design, a popular convention is to use plural names for resources, especially if the resource could represent a collection. For example: /users /tasks The resource class names in the projects use the plural form, as it's consistent with the resource URL naming used. This avoids confusions such as a resource URL being called a users resource, while the class is named UserResource. Given that this is an opinionated approach, feel free to use singular class names if desired. Here's the relevant code for ims-micro-users, ims-micro-tasks, and ims-micronotify projects respectively. Under ims-micro-users, define the UsersResource endpoint: package org.jee8ng.ims.users.boundary; import javax.ws.rs.*; import javax.ws.rs.core.*; @Path("users") public class UsersResource { @GET Chapter 4 [ 95 ] @Produces(MediaType.APPLICATION_JSON) public Response get() { return Response.ok("user works").build(); } } Under ims-micro-tasks, define the TasksResource endpoint: package org.jee8ng.ims.tasks.boundary; import javax.ws.rs.*; import javax.ws.rs.core.*; @Path("tasks") public class TasksResource { @GET @Produces(MediaType.APPLICATION_JSON) public Response get() { return Response.ok("task works").build(); } } Under ims-micro-notify, define the NotificationsResource endpoint: package org.jee8ng.ims.notify.boundary; import javax.ws.rs.*; import javax.ws.rs.core.*; @Path("notifications") public class NotificationsResource { @GET @Produces(MediaType.APPLICATION_JSON) public Response get() { return Response.ok("notification works").build(); } } Once you build all three projects using mvn clean install, you will get your Skinny WAR files generated in the target directory, which can be deployed on the Payara Micro server. Running services with Payara Micro Download the Payara Micro server if you haven't already, from this link: https://www.payara.fish/downloads. The micro server will have the name payara-micro-xxx.jar, where xxx will be the version number, which might be different when you download the file. Here's how you can start Payara Micro with our services deployed locally. When doing so, we need to ensure that the instances start on different ports, to avoid any port conflicts: >java -jar payara-micro-xxx.jar --deploy ims-micro-users/target/ims-microusers. war --port 8081 >java -jar payara-micro-xxx.jar --deploy ims-micro-tasks/target/ims-microtasks. war --port 8082 >java -jar payara-micro-xxx.jar --deploy ims-micro-notify/target/ims-micronotify. war --port 8083 This will start three instances of Payara Micro running on the specified ports. This makes our applications available under these URLs: http://localhost:8081/ims-micro-users/resources/users/ http://localhost:8082/ims-micro-tasks/resources/tasks/ http://localhost:8083/ims-micro-notify/resources/notifications/ Payar Micro can be started on a non-default port by using the --port parameter, as we did earlier. This is useful when running multiple instances on the same machine. Another option is to use the --autoBindHttp parameter, which will attempt to connect on 8080 as the default port, and if that port is unavailable, it will try to bind on the next port up, repeating until it finds an available port. Examples of starting Payara Micro: Uber JAR option: Now, there's one more feature that Payara Micro provides. We can generate an Uber JAR as well, which would be the Fat JAR approach that we learnt in the Fat JAR section. To package our ims-micro-users project as an Uber JAR, we can run the following command: java -jar payara-micro-xxx.jar --deploy ims-micro-users/target/ims-microusers. war --outputUberJar users.jar This will generate the users.jar file in the directory where you run this command. The size of this JAR will naturally be larger than our WAR file, since it will also bundle the Payara Micro runtime in it. Here's how you can start the application using the generated JAR: java -jar users.jar The server parameters that we used earlier can be passed to this runnable JAR file too. Apart from the two choices we saw for running our microservice projects, there's a third option as well. Payara Micro provides an API based approach, which can be used to programmatically start the embedded server. We will expand upon these three services as we progress further into the realm of cloud based Java EE. We saw how to leverage the power of Payara Micro to run Java EE or microservice applications. You read an excerpt from the book, Java EE 8 and Angular written by Prashant Padmanabhan. This book helps you build high-performing enterprise applications using Java EE powered by Angular at the frontend.  
Read more
  • 0
  • 0
  • 33742

article-image-how-to-build-microservices-using-rest-framework
Gebin George
28 Mar 2018
7 min read
Save for later

How to build Microservices using REST framework

Gebin George
28 Mar 2018
7 min read
Today, we will learn to build microservices using REST framework. Our microservices are Java EE 8 web projects, built using maven and published as separate Payara Micro instances, running within docker containers. The separation allows them to scale individually, as well as have independent operational activities. Given the BCE pattern used, we have the business component split into boundary, control, and entity, where the boundary comprises of the web resource (REST endpoint) and business service (EJB). The web resource will publish the CRUD operations and the EJB will in turn provide the transactional support for each of it along with making external calls to other resources. Here's a logical view for the boundary consisting of the web resource and business service: The microservices will have the following REST endpoints published for the projects shown, along with the boundary classes XXXResource and XXXService: Power Your APIs with JAXRS and CDI, for Server-Sent Events. In IMS, we publish task/issue updates to the browser using an SSE endpoint. The code observes for the events using the CDI event notification model and triggers the broadcast. The ims-users and ims-issues endpoints are similar in API format and behavior. While one deals with creating, reading, updating, and deleting a User, the other does the same for an Issue. Let's look at this in action. After you have the containers running, we can start firing requests to the /users web resource. The following curl command maps the URI /users to the @GET resource method named getAll() and returns a collection (JSON array) of users. The Java code will simply return a Set<User>, which gets converted to JsonArray due to the JSON binding support of JSON-B. The method invoked is as follows: @GET public Response getAll() {... } curl -v -H 'Accept: application/json' http://localhost:8081/ims-users/resources/users ... HTTP/1.1 200 OK ... [{ "id":1,"name":"Marcus","email":"marcus_jee8@testem.com" "credential":{"password":"1234","username":"marcus"} }, { "id":2,"name":"Bob","email":"bob@testem.com" "credential":{"password":"1234","username":"bob"} }] Next, for selecting one of the users, such as Marcus, we will issue the following curl command, which uses the /users/xxx path. This will map the URI to the @GET method which has the additional @Path("{id}") annotation as well. The value of the id is captured using the @PathParam("id") annotation placed before the field. The response is a User entity wrapped in the Response object returned. The method invoked is as follows: @GET @Path("{id}") public Response get(@PathParam("id") Long id) { ... } curl -v -H 'Accept: application/json' http://localhost:8081/ims-users/resources/users/1 ... HTTP/1.1 200 OK ... { "id":1,"name":"Marcus","email":"marcus_jee8@testem.com" "credential":{"password":"1234","username":"marcus"} } In both the preceding methods, we saw the response returned as 200 OK. This is achieved by using a Response builder. Here's the snippet for the method: return Response.ok( ..entity here..).build(); Next, for submitting data to the resource method, we use the @POST annotation. You might have noticed earlier that the signature of the method also made use of a UriInfo object. This is injected at runtime for us via the @Context annotation. A curl command can be used to submit the JSON data of a user entity. The method invoked is as follows: @POST public Response add(User newUser, @Context UriInfo uriInfo) We make use of the -d flag to send the JSON body in the request. The POST request is implied: curl -v -H 'Content-Type: application/json' http://localhost:8081/ims-users/resources/users -d '{"name": "james", "email":"james@testem.io", "credential": {"username":"james","password":"test123"}}' ... HTTP/1.1 201 Created ... Location: http://localhost:8081/ims-users/resources/users/3 The 201 status code is sent by the API to signal that an entity has been created, and it also returns the location for the newly created entity. Here's the relevant snippet to do this: //uriInfo is injected via @Context parameter to this method URI location = uriInfo.getAbsolutePathBuilder() .path(newUserId) // This is the new entity ID .build(); // To send 201 status with new Location return Response.created(location).build(); Similarly, we can also send an update request using the PUT method. The method invoked is as follows: @PUT @Path("{id}") public Response update(@PathParam("id") Long id, User existingUser) curl -v -X PUT -H 'Content-Type: application/json' http://localhost:8081/ims-users/resources/users/3 -d '{"name": "jameson", "email":"james@testem.io"}' ... HTTP/1.1 200 Ok The last method we need to map is the DELETE method, which is similar to the GET operation, with the only difference being the HTTP method used. The method invoked is as follows: @DELETE @Path("{id}") public Response delete(@PathParam("id") Long id) curl -v -X DELETE http://localhost:8081/ims-users/resources/users/3 ... HTTP/1.1 200 Ok You can try out the Issues endpoint in a similar manner. For the GET requests of /users or /issues, the code simply fetches and returns a set of entity objects. But when requesting an item within this collection, the resource method has to look up the entity by the passed in id value, captured by @PathParam("id"), and if found, return the entity, or else a 404 Not Found is returned. Here's a snippet showing just that: final Optional<Issue> issueFound = service.get(id); //id obtained if (issueFound.isPresent()) { return Response.ok(issueFound.get()).build(); } return Response.status(Response.Status.NOT_FOUND).build(); The issue instance can be fetched from a database of issues, which the service object interacts with. The persistence layer can return a JPA entity object which gets converted to JSON for the calling code. We will look at persistence using JPA in a later section. For the update request which is sent as an HTTP PUT, the code captures the identifier ID using @PathParam("id"), similar to the previous GET operation, and then uses that to update the entity. The entity itself is submitted as a JSON input and gets converted to the entity instance along with the passed in message body of the payload. Here's the code snippet for that: @PUT @Path("{id}") public Response update(@PathParam("id") Long id, Issue updated) { updated.setId(id); boolean done = service.update(updated); return done ? Response.ok(updated).build() : Response.status(Response.Status.NOT_FOUND).build(); } The code is simple to read and does one thing—it updates the identified entity and returns the response containing the updated entity or a 404 for a non-existing entity. The service references that we have looked at so far are @Stateless beans which are injected into the resource class as fields: // Project: ims-comments @Stateless public class CommentsService {... } // Project: ims-issues @Stateless public class IssuesService {... } // Project: ims-users @Stateless public class UsersService {... } These will in turn have the EntityManager injected via @PersistenceContext. Combined with the resource and service, our components have made the boundary ready for clients to use. Similar to the WebSockets section in Chapter 6, Power Your APIs with JAXRS and CDI, in IMS, we use a @ServerEndpoint which maintains the list of active sessions and then uses that to broadcast a message to all users who are connected. A ChatThread keeps track of the messages being exchanged through the @ServerEndpoint class. For the message to besent, we use the stream of sessions and filter it by open sessions, then send the message for each of the sessions: chatSessions.getSessions().stream().filter(Session::isOpen) .forEach(s -> { try { s.getBasicRemote().sendObject(chatMessage); }catch(Exception e) {...} }); To summarize, we practically saw how to leverage REST framework to build microservices. This article is an excerpt from the book, Java EE 8 and Angular written by Prashant Padmanabhan. The book covers building modern user friendly web apps with Java EE  
Read more
  • 0
  • 0
  • 51959
article-image-getting-started-with-django-restful-web-services
Sugandha Lahoti
27 Mar 2018
19 min read
Save for later

Getting started with Django RESTful Web Services

Sugandha Lahoti
27 Mar 2018
19 min read
In this article, we will kick start with learning RESTful Web Service and subsequently learn how to create models and perform migration, serialization and deserialization in Django. Here’s a quick glance of what to expect in this article: Defining the requirements for RESTful Web Service Analyzing and understanding Django tables and the database Controlling, serialization and deserialization in Django Defining the requirements for RESTful Web Service Imagine a team of developers working on a mobile app for iOS and Android and requires a RESTful Web Service to perform CRUD operations with toys. We definitely don't want to use a mock web service and we don't want to spend time choosing and configuring an ORM (short for Object-Relational Mapping). We want to quickly build a RESTful Web Service and have it ready as soon as possible to start interacting with it in the mobile app. We really want the toys to persist in a database but we don't need it to be production-ready. Therefore, we can use the simplest possible relational database, as long as we don't have to spend time performing complex installations or configurations. Django REST framework, also known as DRF, will allow us to easily accomplish this task and start making HTTP requests to the first version of our RESTful Web Service. In this case, we will work with a very simple SQLite database, the default database for a new Django REST framework project. First, we must specify the requirements for our main resource: a toy. We need the following attributes or fields for a toy entity: An integer identifier A name An optional description A toy category description, such as action figures, dolls, or playsets A release date A bool value indicating whether the toy has been on the online store's homepage at least once In addition, we want to have a timestamp with the date and time of the toy's addition to the database table, which will be generated to persist toys. In a RESTful Web Service, each resource has its own unique URL. In our web service, each toy will have its own unique URL. The following table shows the HTTP verbs, the scope, and the semantics of the methods that our first version of the web service must support. Each method is composed of an HTTP verb and a scope. All the methods have a well-defined meaning for toys and collections: HTTP verb Scope Semantics GET Toy Retrieve a single toy GET Collection of toys Retrieve all the stored toys in the collection, sorted by their name in ascending order POST Collection of toys Create a new toy in the collection PUT Toy Update an existing toy DELETE Toy Delete an existing toy In the previous table, the GET HTTP verb appears twice but with two different scopes: toys and collection of toys. The first row shows a GET HTTP verb applied to a toy, that is, to a single resource. The second row shows a GET HTTP verb applied to a collection of toys, that is, to a collection of resources. We want our web service to be able to differentiate collections from a single resource of the collection in the URLs. When we refer to a collection, we will use a slash (/) as the last character for the URL, as in http://localhost:8000/toys. When we refer to a single resource of the collection we won't use a slash (/) as the last character for the URL, as in http://localhost:8000/toys/5. Let's consider that http://localhost:8000/toys/ is the URL for the collection of toys. If we add a number to the previous URL, we identify a specific toy with an ID or primary key equal to the specified numeric value. For example, http://localhost:8000/toys/42 identifies the toy with an ID equal to 42. We have to compose and send an HTTP request with the POST HTTP verb and http://localhost:8000/toys/ request URL to create a new toy and add it to the toys collection. In this example, our RESTful Web Service will work with JSON (short for JavaScript Object Notation), and therefore we have to provide the JSON key-value pairs with the field names and the values to create the new toy. As a result of the request, the server will validate the provided values for the fields, make sure that it is a valid toy, and persist it in the database. The server will insert a new row with the new toy in the appropriate table and it will return a 201 Created status code and a JSON body with the recently added toy serialized to JSON, including the assigned ID that was automatically generated by the database and assigned to the toy object: POST http://localhost:8000/toys/ We have to compose and send an HTTP request with the GET HTTP verb and http://localhost:8000/toys/{id} request URL to retrieve the toy whose ID matches the specified numeric value in {id}. For example, if we use the request URL http://localhost:8000/toys/25, the server will retrieve the toy whose ID matches 25. As a result of the request, the server will retrieve a toy with the specified ID from the database and create the appropriate toy object in Python. If a toy is found, the server will serialize the toy object into JSON, return a 200 OK status code, and return a JSON body with the serialized toy object. If no toy matches the specified ID, the server will return only a 404 Not Found status: GET http://localhost:8000/toys/{id} We have to compose and send an HTTP request with the PUT HTTP verb and request URL http://localhost:8000/toys/{id} to retrieve the toy whose ID matches the value in {id} and replace it with a toy created with the provided data. In addition, we have to provide the JSON key-value pairs with the field names and the values to create the new toy that will replace the existing one. As a result of the request, the server will validate the provided values for the fields, make sure that it is a valid toy, and replace the one that matches the specified ID with the new one in the database. The ID for the toy will be the same after the update operation. The server will update the existing row in the appropriate table and it will return a 200 OK status code and a JSON body with the recently updated toy serialized to JSON. If we don't provide all the necessary data for the new toy, the server will return a 400 Bad Request status code. If the server doesn't find a toy with the specified ID, the server will only return a 404 Not Found status: PUT http://localhost:8000/toys/{id} We have to compose and send an HTTP request with the DELETE HTTP verb and request URL http://localhost:8000/toys/{id} to remove the toy whose ID matches the specified numeric value in {id}. For example, if we use the request URL http://localhost:8000/toys/34, the server will delete the toy whose ID matches 34. As a result of the request, the server will retrieve a toy with the specified ID from the database and create the appropriate toy object in Python. If a toy is found, the server will request the ORM delete the toy row associated with this toy object and the server will return a 204 No Content status code. If no toy matches the specified ID, the server will return only a 404 Not Found status: DELETE http://localhost:8000/toys/{id} Creating first Django model Now, we will create a simple Toy model in Django, which we will use to represent and persist toys. Open the toys/models.py file. The following lines show the initial code for this file with just one import statement and a comment that indicates we should create the models: from django.db import models #Create your models here. The following lines show the new code that creates a Toy class, specifically, a Toy model in the toys/models.py file. The code file for the sample is included in the hillar_django_restful_02_01 folder in the restful01/toys/models.py file: from django.db import models class Toy(models.Model): created = models.DateTimeField(auto_now_add=True) name = models.CharField(max_length=150, blank=False, default='') description = models.CharField(max_length=250, blank=True, default='') toy_category = models.CharField(max_length=200, blank=False, default='') release_date = models.DateTimeField() was_included_in_home = models.BooleanField(default=False) class Meta: ordering = ('name',) The Toy class is a subclass of the django.db.models.Model class and defines the following attributes: created, name, description, toy_category, release_date, and was_included_in_home. Each of these attributes represents a database column or field. We specified the field types, maximum lengths, and defaults for many attributes. The class declares a Meta inner class that declares an ordering attribute and sets its value to a tuple of string whose first value is the 'name' string. This way, the inner class indicates to Django that, by default, we want the results ordered by the name attribute in ascending order. Running initial migration Now, it is necessary to create the initial migration for the new Toy model we recently coded. We will also synchronize the SQLite database for the first time. By default, Django uses the popular self-contained and embedded SQLite database, and therefore we don't need to make changes in the initial ORM configuration. In this example, we will be working with this default configuration. Of course, we will upgrade to another database after we have a sample web service built with Django. We will only use SQLite for this example. We just need to run the following Python script in the virtual environment that we activated in the previous chapter. Make sure you are in the restful01 folder within the main folder for the virtual environment when you run the following command: python manage.py makemigrations toys The following lines show the output generated after running the previous command: Migrations for 'toys': toys/migrations/0001_initial.py: - Create model Toy The output indicates that the restful01/toys/migrations/0001_initial.py file includes the code to create the Toy model. The following lines show the code for this file that was automatically generated by Django. The code file for the sample is included in the hillar_django_restful_02_01 folder in the restful01/toys/migrations/0001_initial.py file: # -*- coding: utf-8 -*- # Generated by Django 1.11.5 on 2017-10-08 05:19 from   future    import unicode_literals from django.db import migrations, models class Migration(migrations.Migration): initial = True dependencies = [ ] operations = [ migrations.CreateModel( name='Toy', fields=[ ('id', models.AutoField(auto_created=True, primary_key=True, serialize=False, verbose_name='ID')), ('created', models.DateTimeField(auto_now_add=True)), ('name', models.CharField(default='', max_length=150)), ('description', models.CharField(blank=True, default='', max_length=250)), ('toy_category', models.CharField(default='', max_length=200)), ('release_date', models.DateTimeField()), ('was_included_in_home', models.BooleanField(default=False)), ], options={ 'ordering': ('name',), }, ), ] Understanding migrations The automatically generated code defines a subclass of the django.db.migrations.Migration class named Migration, which defines an operation that creates the Toy model's table and includes it in the operations attribute. The call to the migrations.CreateModel method specifies the model's name, the fields, and the options to instruct the ORM to create a table that will allow the underlying database to persist the model. The fields argument is a list of tuples that includes information about the field name, the field type, and additional attributes based on the data we provided in our model, that is, in the Toy class. Now, run the following Python script to apply all the generated migrations. Make sure you are in the restful01 folder within the main folder for the virtual environment when you run the following command: python manage.py migrate The following lines show the output generated after running the previous command: Operations to perform: Apply all migrations: admin, auth, contenttypes, sessions, toys Running migrations: Applying contenttypes.0001_initial... OK Applying auth.0001_initial... OK Applying admin.0001_initial... OK Applying admin.0002_logentry_remove_auto_add... OK Applying contenttypes.0002_remove_content_type_name... OK Applying auth.0002_alter_permission_name_max_length... OK Applying auth.0003_alter_user_email_max_length... OK Applying auth.0004_alter_user_username_opts... OK Applying auth.0005_alter_user_last_login_null... OK Applying auth.0006_require_contenttypes_0002... OK Applying auth.0007_alter_validators_add_error_messages... OK Applying auth.0008_alter_user_username_max_length... OK Applying sessions.0001_initial... OK Applying toys.0001_initial... OK After we run the previous command, we will notice that the root folder for our restful01 project now has a db.sqlite3 file that contains the SQLite database. We can use the SQLite command line or any other application that allows us to easily check the contents of the SQLite database to check the tables that Django generated. The first migration will generate many tables required by Django and its installed apps before running the code that creates the table for the Toys model. These tables provide support for user authentication, permissions, groups, logs, and migration management. We will work with the models related to these tables after we add more features and security to our web services. After the migration process creates all these Django tables in the underlying database, the first migration runs the Python code that creates the table required to persist our model. Thus, the last line of the running migrations section displays Applying toys.0001_initial. Analyzing the database In most modern Linux distributions and macOS, SQLite is already installed, and therefore you can run the sqlite3 command-line utility. In Windows, if you want to work with the sqlite3.exe command-line utility, you have to download the bundle of command-line tools for managing SQLite database files from the downloads section of the SQLite webpage at http://www.sqlite.org/download.html. For example, the ZIP file that includes the command-line tools for version 3.20.1 is sqlite- tools-win32-x8 6-3200100.zip. The name for the file changes with the SQLite version. You just need to make sure that you download the bundle of command-line tools and not the ZIP file that provides the SQLite DLLs. After you unzip the file, you can include the folder that includes the command-line tools in the PATH environment variable, or you can access the sqlite3.exe command-line utility by specifying the full path to it. Run the following command to list the generated tables. The first argument, db.sqlite3, specifies the file that contains that SQLite database and the second argument indicates the command that we want the sqlite3 command-line utility to run against the specified database: sqlite3 db.sqlite3 ".tables" The following lines show the output for the previous command with the list of tables that Django generated in the SQLite database: auth_group                django_admin_log auth_group_permissions                          django_content_type auth_permission                         django_migrations auth_user                 django_session auth_user_groups                      toys_toy auth_user_user_permissions The following command will allow you to check the contents of the toys_toy table after we compose and send HTTP requests to the RESTful Web Service and the web service makes CRUD operations to the toys_toy table: sqlite3 db.sqlite3 "SELECT * FROM toys_toy ORDER BY name;" Instead of working with the SQLite command-line utility, you can use a GUI tool to check the contents of the SQLite database. DB Browser for SQLite is a useful, free, multiplatform GUI tool that allows us to easily check the database contents of an SQLite database in Linux, macOS, and Windows. You can read more information about this tool and download its different versions from http://sqlitebrowser.org. Once you have installed the tool, you just need to open the db.sqlite3 file and you can check the database structure and browse the data for the different tables. After we start working with the first version of our web service, you need to check the contents of the toys_toy table with this tool The SQLite database engine and the database file name are specified in the restful01/settings.py Python file. The following lines show the declaration of the DATABASES dictionary, which contains the settings for all the databases that Django uses. The nested dictionary maps the database named default with the django.db.backends.sqlite3 database engine and the db.sqlite3 database file located in the BASE_DIR folder (restful01): DATABASES = { 'default': { 'ENGINE': 'django.db.backends.sqlite3', 'NAME': os.path.join(BASE_DIR, 'db.sqlite3'), } } After we execute the migrations, the SQLite database will have the following tables. Django uses prefixes to identify the modules and applications that each table belongs to. The tables that start with the auth_ prefix belong to the Django authentication module. The table that starts with the toys_ prefix belongs to our toys application. If we add more models to our toys application, Django will create new tables with the toys_ prefix: auth_group: Stores authentication groups auth_group_permissions: Stores permissions for authentication groups auth_permission: Stores permissions for authentication auth_user: Stores authentication users auth_user_groups: Stores authentication user groups auth_user_groups_permissions: Stores permissions for authentication user groups django_admin_log: Stores the Django administrator log django_content_type: Stores Django content types django_migrations: Stores the scripts generated by Django migrations and the date and time at which they were applied django_session: Stores Django sessions toys_toy: Persists the Toys model sqlite_sequence: Stores sequences for SQLite primary keys with autoincrement fields Understanding the table generated by Django The toys_toy table persists in the database the Toy class we recently created, specifically, the Toy model. Django's integrated ORM generated the toys_toy table based on our Toy model. Run the following command to retrieve the SQL used to create the toys_toy table: sqlite3 db.sqlite3 ".schema toys_toy" The following lines show the output for the previous command together with the SQL that the migrations process executed, to create the toys_toy table that persists the Toy model. The next lines are formatted to make it easier to understand the SQL code. Notice that the output from the command is formatted in a different way: CREATE TABLE IF NOT EXISTS "toys_toy" ( "id" integer NOT NULL PRIMARY KEY AUTOINCREMENT, "created" datetime NOT NULL, "name" varchar(150) NOT NULL, "description" varchar(250) NOT NULL, "toy_category" varchar(200) NOT NULL, "release_date" datetime NOT NULL, "was_included_in_home" bool NOT NULL ); The toys_toy table has the following columns (also known as fields) with their SQLite types, all of them not nullable: id: The integer primary key, an autoincrement row created: DateTime name: varchar(150) description: varchar(250) toy_category: varchar(200) release_date: DateTime was_included_in_home: bool Controlling, serialization and deserialization Our RESTful Web Service has to be able to serialize and deserialize the Toy instances into JSON representations. In Django REST framework, we just need to create a serializer class for the Toy instances to manage serialization to JSON and deserialization from JSON. Now, we will dive deep into the serialization and deserialization process in Django REST framework. It is very important to understand how it works because it is one of the most important components for all the RESTful Web Services we will build. Django REST framework uses a two-phase process for serialization. The serializers are mediators between the model instances and Python primitives. Parser and renderers handle as mediators between Python primitives and HTTP requests and responses. We will configure our mediator between the Toy model instances and Python primitives by creating a subclass of the rest_framework.serializers.Serializer class to declare the fields and the necessary methods to manage serialization and deserialization. We will repeat some of the information about the fields that we have included in the Toy model so that we understand all the things that we can configure in a subclass of the Serializer class. However, we will work with shortcuts, which will reduce boilerplate code later in the following examples. We will write less code in the following examples by using the ModelSerializer class. Now, go to the restful01/toys folder and create a new Python code file named serializers.py. The following lines show the code that declares the new ToySerializer class. The code file for the sample is included in the hillar_django_restful_02_01 folder in the restful01/toys/serializers.py file: from rest_framework import serializers from toys.models import Toy class ToySerializer(serializers.Serializer): pk = serializers.IntegerField(read_only=True) name = serializers.CharField(max_length=150) description = serializers.CharField(max_length=250) release_date = serializers.DateTimeField() toy_category = serializers.CharField(max_length=200) was_included_in_home = serializers.BooleanField(required=False) def create(self, validated_data): return Toy.objects.create(**validated_data) def update(self, instance, validated_data): instance.name = validated_data.get('name', instance.name) instance.description = validated_data.get('description', instance.description) instance.release_date = validated_data.get('release_date', instance.release_date) instance.toy_category = validated_data.get('toy_category', instance.toy_category) instance.was_included_in_home = validated_data.get('was_included_in_home', instance.was_included_in_home) instance.save() return instance The ToySerializer class declares the attributes that represent the fields that we want to be serialized. Notice that we have omitted the created attribute that was present in the Toy model. When there is a call to the save method that ToySerializer inherits from the serializers.Serializer superclass, the overridden create and update methods define how to create a new instance or update an existing instance. In fact, these methods must be implemented in our class because they only raise a NotImplementedError exception in their base declaration in the serializers.Serializer superclass. The create method receives the validated data in the validated_data argument. The code creates and returns a new Toy instance based on the received validated data. The update method receives an existing Toy instance that is being updated and the new validated data in the instance and validated_data arguments. The code updates the values for the attributes of the instance with the updated attribute values retrieved from the validated data. Finally, the code calls the save method for the updated Toy instance and returns the updated and saved instance. We designed a RESTful Web Service to interact with a simple SQLite database and perform CRUD operations with toys. We defined the requirements for our web service and understood the tasks performed by each HTTP method and different scope. You read an excerpt from the book, Django RESTful Web Services, written by Gaston C. Hillar. This book helps developers build complex RESTful APIs from scratch with Django and the Django REST Framework. The code bundle for the article is hosted on GitHub.  
Read more
  • 0
  • 0
  • 19255

article-image-how-to-perform-full-text-search-fts-in-postgresql
Sugandha Lahoti
27 Mar 2018
8 min read
Save for later

How to perform full-text search (FTS) in PostgreSQL

Sugandha Lahoti
27 Mar 2018
8 min read
[box type="note" align="" class="" width=""]This article is an excerpt from the book, Mastering  PostgreSQL 10, written by Hans-Jürgen Schönig. This book provides expert techniques on PostgreSQL 10 development and administration.[/box] If you are looking up names or for simple strings, you are usually querying the entire content of a field. In Full-Text-Search (FTS), this is different. The purpose of the full-text search is to look for words or groups of words, which can be found in a text. Therefore, FTS is more of a contains operation as you are basically never looking for an exact string. In this article, we will show how to perform a full-text search operation in PostgreSQL. In PostgreSQL, FTS can be done using GIN indexes. The idea is to dissect a text, extract valuable lexemes (= "preprocessed tokens of words"), and index those elements rather than the underlying text. To make your search even more successful, those words are preprocessed. Here is an example: test=# SELECT to_tsvector('english', 'A car, I want a car. I would not even mind having many cars'); to_tsvector --------------------------------------------------------------- 'car':2,6,14 'even':10 'mani':13 'mind':11 'want':4 'would':8 (1 row) The example shows a simple sentence. The to_tsvector function will take the string, apply English rules, and perform a stemming process. Based on the configuration (english), PostgreSQL will parse the string, throw away stop words, and stem individual words. For example, car and cars will be transformed to the car. Note that this is not about finding the word stem. In the case of many, PostgreSQL will simply transform the string to mani by applying standard rules working nicely with the English language. Note that the output of the to_tsvector function is highly language dependent. If you tell PostgreSQL to treat the string as dutch, the result will be totally different: test=# SELECT to_tsvector('dutch', 'A car, I want a car. I would not even mind having many cars'); to_tsvector ----------------------------------------------------------------- 'a':1,5 'car':2,6,14 'even':10 'having':12 'i':3,7 'many':13 'mind':11 'not':9 'would':8 (1 row) To figure out which configurations are supported, consider running the following query: SELECT cfgname FROM pg_ts_config; Comparing strings After taking a brief look at the stemming process, it is time to figure out how a stemmed text can be compared to a user query. The following code snippet checks for the word wanted: test=# SELECT to_tsvector('english', 'A car, I want a car. I would not even mind having many cars') @@ to_tsquery('english', 'wanted'); ?column? ---------- t (1 row) Note that wanted does not actually show up in the original text. Still, PostgreSQL will return true. The reason is that want and wanted are both transformed to the same lexeme, so the result is true. Practically, this makes a lot of sense. Imagine you are looking for a car on Google. If you find pages selling cars, this is totally fine. Finding common lexemes is, therefore, an intelligent idea. Sometimes, people are not only looking for a single word, but want to find a set of words. With to_tsquery, this is possible, as shown in the next example: test=# SELECT to_tsvector('english', 'A car, I want a car. I would not even mind having many cars') @@ to_tsquery('english', 'wanted & bmw'); ?column? ---------- f (1 row) In this case, false is returned because bmw cannot be found in our input string. In the to_tsquery function, & means and and | means or. It is therefore easily possible to build complex search strings. Defining GIN indexes If you want to apply text search to a column or a group of columns, there are basically two choices: Create a functional index using GIN Add a column containing ready-to-use tsvectors and a trigger to keep them in sync In this section, both options will be outlined. To show how things work, I have created some sample data: test=# CREATE TABLE t_fts AS SELECT comment FROM pg_available_extensions; SELECT 43 Indexing the column directly with a functional index is definitely a slower but more space efficient way to get things done: test=# CREATE INDEX idx_fts_func ON t_fts USING gin(to_tsvector('english', comment)); CREATE INDEX Deploying an index on the function is easy, but it can lead to some overhead. Adding a materialized column needs more space, but will lead to a better runtime behavior: test=# ALTER TABLE t_fts ADD COLUMN ts tsvector; ALTER TABLE The only trouble is, how do you keep this column in sync? The answer is by using a trigger: test=# CREATE TRIGGER tsvectorupdate BEFORE INSERT OR UPDATE ON t_fts FOR EACH ROW EXECUTE PROCEDURE tsvector_update_trigger(somename, 'pg_catalog.english', 'comment'); Fortunately, PostgreSQL already provides a C function that can be used by a trigger to sync the tsvector column. Just pass a name, the desired language, as well as a couple of columns to the function, and you are already done. The trigger function will take care of all that is needed. Note that a trigger will always operate within the same transaction as the statement making the modification. Therefore, there is no risk of being inconsistent. Debugging your search Sometimes, it is not quite clear why a query matches a given search string. To debug your query, PostgreSQL offers the ts_debug function. From a user's point of view, it can be used just like to_tsvector. It reveals a lot about the inner workings of the FTS infrastructure: test=# x Expanded display is on. test=# SELECT * FROM ts_debug('english', 'go to www.postgresql-support.de'); -[ RECORD 1 ]+---------------------------- alias  | asciiword description | Word, all ASCII token      | go dictionaries | {english_stem} dictionary           | english_stem lexemes    | {go} -[ RECORD 2 ]+---------------------------- alias  | blank Description | Space symbols token   |         dictionaries | {}         dictionary     |         lexemes       | -[ RECORD 3 ]+---------------------------- alias  | asciiword description | Word, all ASCII token      | to dictionaries | {english_stem} dictionary   | english_stem lexemes    | {} -[ RECORD 4 ]+---------------------------- alias  | blank description | Space symbols token | dictionaries | {} dictionary   | lexemes          | -[ RECORD 5 ]+---------------------------- alias  | host description | Host token      | www.postgresql-support.de dictionaries | {simple} dictionary | simple lexemes    | {www.postgresql-support.de} ts_debug will list every token found and display information about the token. You will see which token the parser found, the dictionary used, as well as the type of object. In my example, blanks, words, and hosts have been found. You might also see numbers, email addresses, and a lot more. Depending on the type of string, PostgreSQL will handle things differently. For example, it makes absolutely no sense to stem hostnames and e-mail addresses. Gathering word statistics Full-text search can handle a lot of data. To give end users more insights into their texts, PostgreSQL offers the pg_stat function, which returns a list of words: SELECT * FROM ts_stat('SELECT to_tsvector(''english'', comment) FROM pg_available_extensions') ORDER BY 2 DESC LIMIT 3; word   | ndoc | nentry ----------+------+-------- function | 10 |   10 data      |      10 |  10 type        |   7  |     7 (3 rows) The word column contains the stemmed word, ndoc tells us about the number of documents a certain word occurs.nentry indicates how often a word was found all together. Taking advantage of exclusion operators So far, indexes have been used to speed things up and to ensure uniqueness. However, a couple of years ago, somebody came up with the idea of using indexes for even more. As you have seen in this chapter, GiST supports operations such as intersects, overlaps, contains, and a lot more. So, why not use those operations to manage data integrity? Here is an example: test=# CREATE EXTENSION btree_gist; test=# CREATE TABLE t_reservation ( room int, from_to tsrange, EXCLUDE USING GiST (room with =, from_to with &&) ); CREATE TABLE The EXCLUDE  USING  GiST clause defines additional constraints. If you are selling rooms, you might want to allow different rooms to be booked at the same time. However, you don't want to sell the same room twice during the same period. What the EXCLUDE clause says in my example is this, if a room is booked twice at the same time, an error should pop up (the data in from_to with must not overlap (&&) if it is related to the same room). The following two rows will not violate constraints: test=# INSERT INTO t_reservation VALUES (10, '["2017-01-01", "2017-03-03"]'); INSERT 0 1 test=# INSERT INTO t_reservation VALUES (13, '["2017-01-01", "2017-03-03"]'); INSERT 0 1 However, the next INSERT will cause a violation because the data overlaps: test=# INSERT INTO t_reservation VALUES (13, '["2017-02-02", "2017-08-14"]'); ERROR:  conflicting key value violates exclusion constraint "t_reservation_room_from_to_excl" DETAIL:   Key (room, from_to)=(13, ["2017-02-02 00:00:00","2017-08-14 00:00:00"]) conflicts with existing key (room, from_to)=(13, ["2017-01-01 00:00:00","2017-03-03 00:00:00"]). The use of exclusion operators is very useful and can provide you with highly advanced means to handle integrity. To summarize, we learnt how to perform full-text search operation in PostgreSQL. If you liked our article, check out the book Mastering  PostgreSQL 10 to understand how to perform operations such as indexing, query optimization, concurrent transactions, table partitioning, server tuning, and more.  
Read more
  • 0
  • 0
  • 21944
Modal Close icon
Modal Close icon