Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Events
Videos
Audiobooks
Packt Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials

7018 Articles
article-image-matrix-and-pixel-manipulation-along-handling-files
Packt
11 Aug 2015
14 min read
Save for later

Matrix and Pixel Manipulation along with Handling Files

Packt
11 Aug 2015
14 min read
In this article, by Daniel Lélis Baggio, author of the book OpenCV 3.0 Computer Vision with Java, you will learn to perform basic operations required in computer vision, such as dealing with matrices, pixels, and opening files for prototype applications. In this article, the following topics will be covered: Basic matrix manipulation Pixel manipulation How to load and display images from files (For more resources related to this topic, see here.) Basic matrix manipulation From a computer vision background, we can see an image as a matrix of numerical values, which represents its pixels. For a gray-level image, we usually assign values ranging from 0 (black) to 255 (white) and the numbers in between show a mixture of both. These are generally 8-bit images. So, each element of the matrix refers to each pixel on the gray-level image, the number of columns refers to the image width, as well as the number of rows refers to the image's height. In order to represent a color image, we usually adopt each pixel as a combination of three basic colors: red, green, and blue. So, each pixel in the matrix is represented by a triplet of colors. It is important to observe that with 8 bits, we get 2 to the power of eight (28), which is 256. So, we can represent the range from 0 to 255, which includes, respectively the values used for black and white levels in 8-bit grayscale images. Besides this, we can also represent these levels as floating points and use 0.0 for black and 1.0 for white. OpenCV has a variety of ways to represent images, so you are able to customize the intensity level through the number of bits considering whether one wants signed, unsigned, or floating point data types, as well as the number of channels. OpenCV's convention is seen through the following expression: CV_<bit_depth>{U|S|F}C(<number_of_channels>) Here, U stands for unsigned, S for signed, and F stands for floating point. For instance, if an 8-bit unsigned single-channel image is required, the data type representation would be CV_8UC1, while a colored image represented by 32-bit floating point numbers would have the data type defined as CV_32FC3. If the number of channels is omitted, it evaluates to 1. We can see the ranges according to each bit depth and data type in the following list: CV_8U: These are the 8-bit unsigned integers that range from 0 to 255 CV_8S: These are the 8-bit signed integers that range from -128 to 127 CV_16U: These are the 16-bit unsigned integers that range from 0 to 65,535 CV_16S: These are the 16-bit signed integers that range from -32,768 to 32,767 CV_32S: These are the 32-bit signed integers that range from -2,147,483,648 to 2,147,483,647 CV_32F: These are the 32-bit floating-point numbers that range from -FLT_MAX to FLT_MAX and include INF and NAN values CV_64F: These are the 64-bit floating-point numbers that range from -DBL_MAX to DBL_MAX and include INF and NAN values You will generally start the project from loading an image, but it is important to know how to deal with these values. Make sure you import org.opencv.core.CvType and org.opencv.core.Mat. Several constructors are available for matrices as well, for instance: Mat image2 = new Mat(480,640,CvType.CV_8UC3); Mat image3 = new Mat(new Size(640,480), CvType.CV_8UC3); Both of the preceding constructors will construct a matrix suitable to fit an image with 640 pixels of width and 480 pixels of height. Note that width is to columns as height is to rows. Also pay attention to the constructor with the Size parameter, which expects the width and height order. In case you want to check some of the matrix properties, the methods rows(), cols(), and elemSize() are available: System.out.println(image2 + "rows " + image2.rows() + " cols " + image2.cols() + " elementsize " + image2.elemSize()); The output of the preceding line is: Mat [ 480*640*CV_8UC3, isCont=true, isSubmat=false, nativeObj=0xceeec70, dataAddr=0xeb50090 ]rows 480 cols 640 elementsize 3 The isCont property tells us whether this matrix uses extra padding when representing the image, so that it can be hardware-accelerated in some platforms; however, we won't cover it in detail right now. The isSubmat property refers to fact whether this matrix was created from another matrix and also whether it refers to the data from another matrix. The nativeObj object refers to the native object address, which is a Java Native Interface (JNI) detail, while dataAddr points to an internal data address. The element size is measured in the number of bytes. Another matrix constructor is the one that passes a scalar to be filled as one of its elements. The syntax for this looks like the following: Mat image = new Mat(new Size(3,3), CvType.CV_8UC3, new Scalar(new double[]{128,3,4})); This constructor will initialize each element of the matrix with the triple {128, 3, 4}. A very useful way to print a matrix's contents is using the auxiliary method dump() from Mat. Its output will look similar to the following: [128, 3, 4, 128, 3, 4, 128, 3, 4; 128, 3, 4, 128, 3, 4, 128, 3, 4; 128, 3, 4, 128, 3, 4, 128, 3, 4] It is important to note that while creating the matrix with a specified size and type, it will also immediately allocate memory for its contents. Pixel manipulation Pixel manipulation is often required for one to access pixels in an image. There are several ways to do this and each one has its advantages and disadvantages. A straightforward method to do this is the put(row, col, value) method. For instance, in order to fill our preceding matrix with values {1, 2, 3}, we will use the following code: for(int i=0;i<image.rows();i++){ for(int j=0;j<image.cols();j++){    image.put(i, j, new byte[]{1,2,3}); } } Note that in the array of bytes {1, 2, 3}, for our matrix, 1 stands for the blue channel, 2 for the green, and 3 for the red channel, as OpenCV stores its matrix internally in the BGR (blue, green, and red) format. It is okay to access pixels this way for small matrices. The only problem is the overhead of JNI calls for big images. Remember that even a small 640 x 480 pixel image has 307,200 pixels and if we think about a colored image, it has 921,600 values in a matrix. Imagine that it might take around 50 ms to make an overloaded call for each of the 307,200 pixels. On the other hand, if we manipulate the whole matrix on the Java side and then copy it to the native side in a single call, it will take around 13 ms. If you want to manipulate the pixels on the Java side, perform the following steps: Allocate memory with the same size as the matrix in a byte array. Put the image contents into that array (optional). Manipulate the byte array contents. Make a single put call, copying the whole byte array to the matrix. A simple example that will iterate all image pixels and set the blue channel to zero, which means that we will set to zero every element whose modulo is 3 equals zero, that is {0, 3, 6, 9, …}, as shown in the following piece of code: public void filter(Mat image){ int totalBytes = (int)(image.total() * image.elemSize()); byte buffer[] = new byte[totalBytes]; image.get(0, 0,buffer); for(int i=0;i<totalBytes;i++){    if(i%3==0) buffer[i]=0; } image.put(0, 0, buffer); } First, we find out the number of bytes in the image by multiplying the total number of pixels (image.total) with the element size in bytes (image.elemenSize). Then, we build a byte array with that size. We use the get(row, col, byte[]) method to copy the matrix contents in our recently created byte array. Then, we iterate all bytes and check the condition that refers to the blue channel (i%3==0). Remember that OpenCV stores colors internally as {Blue, Green, Red}. We finally make another JNI call to image.put, which copies the whole byte array to OpenCV's native storage. An example of this filter can be seen in the following screenshot, which was uploaded by Mromanchenko, licensed under CC BY-SA 3.0: Be aware that Java does not have any unsigned byte data type, so be careful when working with it. The safe procedure is to cast it to an integer and use the And operator (&) with 0xff. A simple example of this would be int unsignedValue = myUnsignedByte & 0xff;. Now, unsignedValue can be checked in the range of 0 to 255. Loading and displaying images from files Most computer vision applications need to retrieve images from some where. In case you need to get them from files, OpenCV comes with several image file loaders. Unfortunately, some loaders depend on codecs that sometimes aren't shipped with the operating system, which might cause them not to load. From the documentation, we see that the following files are supported with some caveats: Windows bitmaps: *.bmp, *.dib JPEG files: *.jpeg, *.jpg, *.jpe JPEG 2000 files: *.jp2 Portable Network Graphics: *.png Portable image format: *.pbm, *.pgm, *.ppm Sun rasters: *.sr, *.ras TIFF files: *.tiff, *.tif Note that Windows bitmaps, the portable image format, and sun raster formats are supported by all platforms, but the other formats depend on a few details. In Microsoft Windows and Mac OS X, OpenCV can always read the jpeg, png, and tiff formats. In Linux, OpenCV will look for codecs supplied with the OS, as stated by the documentation, so remember to install the relevant packages (do not forget the development files, for example, "libjpeg-dev" in Debian* and Ubuntu*) to get the codec support or turn on the OPENCV_BUILD_3RDPARTY_LIBS flag in CMake, as pointed out in Imread's official documentation. The imread method is supplied to get access to images through files. Use Imgcodecs.imread (name of the file) and check whether dataAddr() from the read image is different from zero to make sure the image has been loaded correctly, that is, the filename has been typed correctly and its format is supported. A simple method to open a file could look like the one shown in the following code. Make sure you import org.opencv.imgcodecs.Imgcodecs and org.opencv.core.Mat: public Mat openFile(String fileName) throws Exception{ Mat newImage = Imgcodecs.imread(fileName);    if(newImage.dataAddr()==0){      throw new Exception ("Couldn't open file "+fileName);    } return newImage; } Displaying an image with Swing OpenCV developers are used to a simple cross-platform GUI by OpenCV, which was called as HighGUI, and a handy method called imshow. It constructs a window easily and displays an image within it, which is nice to create quick prototypes. As Java comes with a popular GUI API called Swing, we had better use it. Besides, no imshow method was available for Java until its 2.4.7.0 version was released. On the other hand, it is pretty simple to create such functionality. Let's break down the work in to two classes: App and ImageViewer. The App class will be responsible for loading the file, while ImageViewer will display it. The application's work is simple and will only need to use Imgcodecs's imread method, which is shown as follows: package org.javaopencvbook;   import java.io.File; … import org.opencv.imgcodecs.Imgcodecs;   public class App { static{ System.loadLibrary(Core.NATIVE_LIBRARY_NAME); }   public static void main(String[] args) throws Exception { String filePath = "src/main/resources/images/cathedral.jpg"; Mat newImage = Imgcodecs.imread(filePath); if(newImage.dataAddr()==0){    System.out.println("Couldn't open file " + filePath); } else{    ImageViewer imageViewer = new ImageViewer();    imageViewer.show(newImage, "Loaded image"); } } } Note that the App class will only read an example image file in the Mat object and it will call the ImageViewer method to display it. Now, let's see how the ImageViewer class's show method works: package org.javaopencvbook.util;   import java.awt.BorderLayout; import java.awt.Dimension; import java.awt.Image; import java.awt.image.BufferedImage;   import javax.swing.ImageIcon; import javax.swing.JFrame; import javax.swing.JLabel; import javax.swing.JScrollPane; import javax.swing.UIManager; import javax.swing.UnsupportedLookAndFeelException; import javax.swing.WindowConstants;   import org.opencv.core.Mat; import org.opencv.imgproc.Imgproc;   public class ImageViewer { private JLabel imageView; public void show(Mat image){    show(image, ""); }   public void show(Mat image,String windowName){    setSystemLookAndFeel();       JFrame frame = createJFrame(windowName);               Image loadedImage = toBufferedImage(image);        imageView.setIcon(new ImageIcon(loadedImage));               frame.pack();        frame.setLocationRelativeTo(null);        frame.setVisible(true);    }   private JFrame createJFrame(String windowName) {    JFrame frame = new JFrame(windowName);    imageView = new JLabel();    final JScrollPane imageScrollPane = new JScrollPane(imageView);        imageScrollPane.setPreferredSize(new Dimension(640, 480));        frame.add(imageScrollPane, BorderLayout.CENTER);        frame.setDefaultCloseOperation(WindowConstants.EXIT_ON_CLOSE);    return frame; }   private void setSystemLookAndFeel() {    try {      UIManager.setLookAndFeel (UIManager.getSystemLookAndFeelClassName());    } catch (ClassNotFoundException e) {      e.printStackTrace();    } catch (InstantiationException e) {      e.printStackTrace();    } catch (IllegalAccessException e) {      e.printStackTrace();    } catch (UnsupportedLookAndFeelException e) {      e.printStackTrace();    } }   public Image toBufferedImage(Mat matrix){    int type = BufferedImage.TYPE_BYTE_GRAY;    if ( matrix.channels() > 1 ) {      type = BufferedImage.TYPE_3BYTE_BGR;    }    int bufferSize = matrix.channels()*matrix.cols()*matrix.rows();    byte [] buffer = new byte[bufferSize];    matrix.get(0,0,buffer); // get all the pixels    BufferedImage image = new BufferedImage(matrix.cols(),matrix.rows(), type);    final byte[] targetPixels = ((DataBufferByte) image.getRaster().getDataBuffer()).getData();    System.arraycopy(buffer, 0, targetPixels, 0, buffer.length);    return image; }   } Pay attention to the show and toBufferedImage methods. Show will try to set Swing's look and feel to the default native look, which is cosmetic. Then, it will create JFrame with JScrollPane and JLabel inside it. It will then call toBufferedImage, which will convert an OpenCV Mat object to a BufferedImage AWT. This conversion is made through the creation of a byte array that will store matrix contents. The appropriate size is allocated through the multiplication of the number of channels by the number of columns and rows. The matrix.get method puts all the elements into the byte array. Finally, the image's raster data buffer is accessed through the getDataBuffer() and getData() methods. It is then filled with a fast system call to the System.arraycopy method. The resulting image is then assigned to JLabel and then it is easily displayed. Note that this method expects a matrix that is either stored as one channel's unsigned 8-bit or three channel's unsigned 8-bit. In case your image is stored as a floating point, you should convert it using the following code before calling this method, supposing that the image you need to convert is a Mat object called originalImage: Mat byteImage = new Mat(); originalImage.convertTo(byteImage, CvType.CV_8UC3); This way, you can call toBufferedImage from your converted byteImage property. The image viewer can be easily installed in any Java OpenCV project and it will help you to show your images for debugging purposes. The output of this program can be seen in the next screenshot: Summary In this article, we learned dealing with matrices, pixels, and opening files for GUI prototype applications. Resources for Article: Further resources on this subject: Wrapping OpenCV [article] Making subtle color shifts with curves [article] Linking OpenCV to an iOS project [article]
Read more
  • 0
  • 0
  • 16269

article-image-using-form-builder
Packt
11 Aug 2015
18 min read
Save for later

Using the Form Builder

Packt
11 Aug 2015
18 min read
In this article by Christopher John Pecoraro, author of the book, Mastering Laravel, you learn the fundamentals of using the form builder. (For more resources related to this topic, see here.) Building web pages with Laravel Laravel's approach to building web content is flexible. As much or as little of Laravel can be used to create HTML. Laravel uses the filename.blade.php convention to state that the file should be parsed by the blade parser, which actually converts the file into plain PHP. The name blade was inspired by the .NET's razor templating engine, so this may be familiar to someone who has used it. Laravel 5 provides a working demonstration of a form in the /resources/views/ directory. This view is shown when the /home route is requested and the user is not currently logged in. This form is obviously not created using the Laravel form methods. The route is defined in the routes file as follows: Route::get('home', 'HomeController@index'); The master template This is the following app (or master) template: <!DOCTYPE html> <html lang="en"> <head> <meta charset="utf-8"> <meta http-equiv="X-UA-Compatible" content="IE=edge"> <meta name="viewport" content="width=device-width, initial-scale=1"> <title>Laravel</title> <link href="/css/app.css" rel="stylesheet"> <!-- Fonts --> <link href='//fonts.googleapis.com/css?family=Roboto:400,300' rel='stylesheet' type='text/css'> <!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media queries --> <!-- WARNING: Respond.js doesn't work if you view the page via file:// --> <!--[if lt IE 9]> <script src="https://oss.maxcdn.com/html5shiv/3.7.2/ html5shiv.min.js"></script> <script src="https://oss.maxcdn.com/respond/1.4.2/ respond.min.js"></script> <![endif]--> </head> <body> <nav class="navbarnavbar-default"> <div class="container-fluid"> <div class="navbar-header"> <button type="button" class="navbar-toggle collapsed" data-toggle="collapse" datatarget="# bs-example-navbar-collapse-1"> <span class="sr-only">Toggle Navigation</span> <span class="icon-bar"></span> <span class="icon-bar"></span> <span class="icon-bar"></span> </button> <a class="navbar-brand" href="#">Laravel</a> </div> <div class="collapse navbar-collapse" id=" bs-example-navbar-collapse-1"> <ul class="navnavbar-nav"> <li><a href="/">Home</a></li> </ul> <ul class="navnavbar-navnavbar-right"> @if (Auth::guest()) <li><a href="{{ route('auth.login') }}">Login</a></li> <li><a href="/auth/register"> Register</a></li> @else <li class="dropdown"> <a href="#" class="dropdown-toggle" data-toggle="dropdown" role="button" aria-expanded="false">{{ Auth::user()->name }} <span class="caret"></span></a> <ul class="dropdown-menu" role="menu"> <li><a href="/auth/ logout">Logout</a></li> </ul> </li> @endif </ul> </div> </div> </nav> @yield('content') <!-- Scripts --> <script src="//cdnjs.cloudflare.com/ajax/libs/jquery/ 2.1.3/jquery.min.js"></script> <script src="//cdnjs.cloudflare.com/ajax/libs/twitterbootstrap/ 3.3.1/js/bootstrap.min.js"></script> </body> </html> The Laravel 5 master template is a standard HTML5 template with the following features: If the browser is older than Internet Explorer 9: Uses the HTML5 Shim from the CDN Uses the Respond.js JavaScript code from the CDN to retrofit media queries and CSS3 features Using @if (Auth::guest()), if the user is not authenticated, the login form is displayed; otherwise, the logout option is displayed Twitter bootstrap 3.x is included in the CDN The jQuery2.x is included in the CDN Any template that extends this template can override the content section An example page The source code for the login page is as follows: @extends('app') @section('content') <div class="container-fluid"> <div class="row"> <div class="col-md-8 col-md-offset-2"> <div class="panel panel-default"> <div class="panel-heading">Login</div> <div class="panel-body"> @if (count($errors) > 0) <div class="alert alert-danger"> <strong>Whoops!</strong> There were some problems with your input.<br><br> <ul> @foreach ($errors->all() as $error) <li>{{ $error }}</li> @endforeach </ul> </div> @endif <form class="form-horizontal" role="form" method="POST" action="/auth/login"> <input type="hidden" name="_token" value="{{ csrf_token() }}"> <div class="form-group"> <label class="col-md-4 controllabel"> E-Mail Address</label> <div class="col-md-6"> <input type="email" class="formcontrol" name="email" value="{{ old('email') }}"> </div> </div> <div class="form-group"> <label class="col-md-4 controllabel"> Password</label> <div class="col-md-6"> <input type="password" class="form-control" name="password"> </div> </div> <div class="form-group"> <div class="col-md-6 col-md-offset-4"> <div class="checkbox"> <label> <input type="checkbox" name="remember"> Remember Me </label> </div> </div> </div> <div class="form-group"> <div class="col-md-6 col-md-offset-4"> <button type="submit" lass="btn btn-primary" style="margin-right: 15px;"> Login </button> <a href="/password/email">Forgot Your Password?</a> </div> </div> </form> </div> </div> </div> </div> </div> @endsection From static HTML to static methods This login page begins with the following: @extends('app') It obviously uses the object-oriented paradigm to state that the app.blade.php template will be rendered. The following line overrides the content: @section('content') For this exercise, the form builder will be used instead of the static HTML. The form tag We will convert a static form tag to a FormBuilder method. The HTML is as follows: <form class="form-horizontal" role="form" method="POST"   action="/auth/login"> The method facade that we will use is as follows: Form::open(); In the FormBuilder.php class, the $reserved attribute is defined as follows: protected $reserved = ['method', 'url', 'route',   'action', 'files']; The attributes that we need to pass to an array to the open() method are class, role, method, and action. Since method and action are reserved words, it is necessary to pass the parameters in the following manner: Laravel form facade method array element HTML Form tag attribute method method url action role role class class Thus, the method call looks like this: {!! Form::open(['class'=>'form-horizontal', 'role =>'form', 'method'=>'POST', 'url'=>'/auth/login']) !!} The {!! !!} tags are used to start and end parsing of the form builder methods. The form method, POST, is placed first in the list of attributes in the HTML form tag. The action attribute actually needs to be a url. If the action parameter is used, then it refers to the controller action. In this case, the url parameter produces the action attribute of the form tag. Other attributes will be passed to the array and added to the list of attributes. The resultant HTML will be produced as follows: <form method="POST" action="http://laravel.example/auth/login" accept-charset="UTF-8" class="form-horizontal" role="form"> <input name="_token" type="hidden" value="wUY2hFSEWCzKHFfhywHvFbq9TXymUDiRUFreJD4h"> The CRSF token is automatically added, as the form method is POST. The text input field To convert the input fields, a facade is used. The input field's HTML is as follows: <input type="email" class="form-control" name="email" value="{{ old('email') }}"> Converting the preceding input field using a façade looks like this: {!! Form::input('email','email',old('email'), ['class'=>'form-control' ]) !!} Similarly, the text field becomes: {!! Form::input('password','password',null, ['class'=>'form-control']) !!} The input fields have the same signature. Of course, this can be refactored as follows: <?php $inputAttributes = ['class'=>'form-control'] ?> {!! Form::input('email','email',old('email'), $inputAttributes ) !!} ... {!! Form::input('password','password',null,$inputAttributes ) !!} The label tag The label tags are as follows: <label class="col-md-4 control-label">E-Mail Address</label> <label class="col-md-4 control-label">Password</label> To convert the label tags (E-Mail Address and Password), we will first create an array to hold the attributes, and then pass this array to the labels, as follows: $labelAttributes = ['class'=>'col-md-4 control-label']; Here is the form label code: {!! Form::label('email', 'E-Mail Address', $labelAttributes) !!} {!! Form::label('password', 'Password', $labelAttributes) !!} Checkbox To convert the checkbox to a facade, we will convert this: <input type="checkbox" name="remember"> Remember Me The preceding code is converted to the following code: {!! Form::checkbox('remember','') !!} Remember Me Remember that the PHP parameters should be sent in single quotation marks if there are no variables or other special characters, such as line breaks, inside the string to parse, while the HTML produced will have double quotes. The submit button Lastly, the submit button will be converted as follows: <button type="submit" class="btn btn-primary" style="margin-right: 15px;"> Login </button> The preceding code after conversion is as follows:   {!! Form::submit('Login', ['class'=>'btn btn-primary', 'style'=>'margin-right: 15px;']) !!} Note that the array parameter provides an easy way to provide any desired attributes, even those that are not among the list of standard HTML form elements. The anchor tag with links To convert the links, a helper method is used. Consider the following line of code: <a href="/password/email">Forgot Your Password?</a> The preceding line of code after conversion becomes: {!! link_to('/password/email', $title = 'Forgot Your Password?', $attributes = array(), $secure = null) !!} The link_to_route() method may be used to link to a route. For similar helper functions, visit http://laravelcollective.com/docs/5.0/html. Closing the form To end the form, we'll convert the traditional HTML form tag </form> to a Laravel {!! Form::close() !!} form method. The resultant form By putting everything together, the page now looks like this: @extends('app') @section('content') <div class="container-fluid"> <div class="row"> <div class="col-md-8 col-md-offset-2"> <div class="panel panel-default"> <div class="panel-heading">Login</div> <div class="panel-body"> @if (count($errors) > 0) <div class="alert alert-danger"> <strong>Whoops!</strong> There were some problems with your input.<br><br> <ul> @foreach ($errors->all() as $error) <li>{{ $error }}</li> @endforeach </ul> </div> @endif <?php $inputAttributes = ['class'=>'form-control']; $labelAttributes = ['class'=>'col-md-4 control-label']; ?> {!! Form::open(['class'=>'form-horizontal','role'=> 'form','method'=>'POST','url'=>'/auth/login']) !!} <div class="form-group"> {!! Form::label('email', 'E-Mail Address',$labelAttributes) !!} <div class="col-md-6"> {!! Form::input('email','email',old('email'), $inputAttributes) !!} </div> </div> <div class="form-group"> {!! Form::label('password', 'Password',$labelAttributes) !!} <div class="col-md-6"> {!! Form::input('password', 'password',null,$inputAttributes) !!} </div> </div> <div class="form-group"> <div class="col-md-6 col-md-offset-4"> <div class="checkbox"> <label> {!! Form::checkbox('remember','') !!} Remember Me </label> </div> </div> </div> <div class="form-group"> <div class="col-md-6 col-md-offset-4"> {!! Form::submit('Login',['class'=> 'btn btn-primary', 'style'=> 'margin-right: 15px;']) !!} {!! link_to('/password/email', $title = 'Forgot Your Password?', $attributes = array(), $secure = null); !!} </div> </div> {!! Form::close() !!} </div> </div> </div> </div> </div> @endsection Our example If we want to create a form to reserve a room in our accommodation, we can easily call a route from our controller: /** * Show the form for creating a new resource. * * @return Response */ public function create() { return view('auth/reserve'); } Now we need to create a new view that is located at resources/views/auth/reserve.blade.php. In this view, we can create a form to reserve a room in an accommodation where the user can select the start date, which comprises of the start day of the month and year, and the end date, which also comprises of the start day of the month and year. The form would begin as before, with a POST to reserve-room. Then, the form label would be placed next to the select input fields. Finally, the day, the month, and the year select form elements would be created as follows: {!! Form::open(['class'=>'form-horizontal', 'role'=>'form', 'method'=>'POST', 'url'=>'reserve-room']) !!} {!! Form::label(null, 'Start Date',$labelAttributes) !!} {!! Form::selectMonth('month',date('m')) !!} {!! Form::selectRange('date',1,31,date('d')) !!} {!! Form::selectRange('year',date('Y'),date('Y')+3) !!} {!! Form::label(null, 'End Date',$labelAttributes) !!} {!! Form::selectMonth('month',date('m')) !!} {!! Form::selectRange('date',1,31,date('d')) !!} {!! Form::selectRange('year',date('Y'), date('Y')+3,date('Y')) !!} {!! Form::submit('Reserve', ['class'=>'btn btn-primary', 'style'=>'margin-right: 15px;']) !!} {!! Form::close() !!} Month select Firstly, in the selectMonth method, the first parameter is the name of the input attribute, while the second attribute is the default value. Here, the PHP date method is used to extract the numeric portion of the current month—March in this case: {!! Form::selectMonth('month',date('m')) !!} The output, shown here formatted, is as follows: <select name="month"> <option value="1">January</option> <option value="2">February</option> <option value="3" selected="selected">March</option> <option value="4">April</option> <option value="5">May</option> <option value="6">June</option> <option value="7">July</option> <option value="8">August</option> <option value="9">September</option> <option value="10">October</option> <option value="11">November</option> <option value="12">December</option> </select> Date select A similar technique is applied for the selection of the date, but using the selectRange method, the range of the days in the month are passed to the method. Similarly, the PHP date function is used to send the current date to the method as the fourth parameter: {!! Form::selectRange('date',1,31,date('d')) !!} Here is the formatted output: <select name="date"> <option value="1">1</option> <option value="2">2</option> <option value="3">3</option> <option value="4">4</option> ... <option value="28">28</option> <option value="29">29</option> <option value="30" selected="selected">30</option> <option value="31">31</option> </select> The date that should be selected is 30, since today is March 30, 2015. For the months that do not have 31 days, usually a JavaScript method would be used to modify the number of days based on the month and/or the year. Year select The same technique that is used for the date range is applied for the selection of the year; once again, using the selectRange method. The range of the years is passed to the method. The PHP date function is used to send the current year to the method as the fourth parameter: {!! Form::selectRange('year',date('Y'),date('Y')+3,date('Y')) !!} Here is the formatted output: <select name="year"> <option value="2015" selected="selected">2015</option> <option value="2016">2016</option> <option value="2017">2017</option> <option value="2018">2018</option> </select> Here, the current year that is selected is 2015. Form macros We have the same code that generates our month, date, and year selection form block two times: once for the start date and once for the end date. To refactor the code, we can apply the DRY (don't repeat yourself) principle and create a form macro. This will allow us to avoid calling the form element creation method twice, as follows: <?php Form::macro('monthDayYear',function($suffix='') { echo Form::selectMonth(($suffix!=='')?'month- '.$suffix:'month',date('m')); echo Form::selectRange(($suffix!=='')?'date- '.$suffix:'date',1,31,date('d')); echo Form::selectRange(($suffix!=='')?'year- '.$suffix:'year',date('Y'),date('Y')+3,date('Y')); }); ?> Here, the month, date, and year generation code is placed into a macro, which is inside the PHP tags, and it is necessary to add echo to print out the result. The monthDayYear name is given to this macro method. Calling our macro two times: once after each label; each time adding a different suffix via the $suffix variable. Now, our form code looks like this: <?php Form::macro('monthDayYear',function($suffix='') { echo Form::selectMonth(($suffix!=='')?'month- '.$suffix:'month',date('m')); echo Form::selectRange(($suffix!=='')?'date- '.$suffix:'date',1,31,date('d')); echo Form::selectRange(($suffix!=='')?'year- '.$suffix:'year',date('Y'),date('Y')+3,date('Y')); }); ?> {!! Form::open(['class'=>'form-horizontal', 'role'=>'form', 'method'=>'POST', 'url'=>'/reserve-room']) !!} {!! Form::label(null, 'Start Date',$labelAttributes) !!} {!! Form::monthDayYear('-start') !!} {!! Form::label(null, 'End Date',$labelAttributes) !!} {!! Form::monthDayYear('-end') !!} {!! Form::submit('Reserve',['class'=>'btn btn-primary', 'style'=>'margin-right: 15px;']) !!} {!! Form::close() !!} Conclusion The choice to include the HTML form generation package in Laravel 5 can ease the burden of having to create numerous HTML forms. This approach allows developers to use methods, create reusable macros, and use a familiar Laravel approach to build the frontend. Once the basic methods are learned, it is very easy to simply copy and paste the previously created form elements, and then change their element's name and/or the array that is sent to them. Depending on the size of the project, this approach may or may not be the right choice. For a very small application, the difference in the amount of code that needs to be written is not very evident, although, as is the case with the selectMonth and selectRange methods, the amount of code necessary is drastic. This technique, combined with the use of macros, makes it easy to reduce the occurrence of copy duplication. Also, one of the major problems with the frontend design is that the contents of the class of the various elements may need to change throughout the entire application. This would mean performing a large find and replace operation, where changes are required to be made to HTML, such as changing class attributes. By creating an array of attributes, including class, for similar elements, changes made to the entire form can be performed simply by modifying the array that those elements use. In a larger project, however, where parts of forms may be repeated throughout the application, the wise use of macros can easily reduce the amount of code necessary to be written. Not only this, but macros can isolate the code inside from changes that would require more than one block of code to be changed throughout multiple files. In the example, where the month, date, and year is to be selected, it is possible that this could be used up to 20 times in a large application. Any changes made to the desired block of HTML can be simply done to the macro and the result would be reflected in all of the elements that use it. Ultimately, the choice of whether or not to use this package will reside with the developer and the designer. Since a designer who wants to use an alternative frontend design tool may not prefer, nor be competent, to work with the methods in the package, he or she may want to not use it. Summary The construction of the master template was explained and then the form components, such as the various form input types, were shown through examples. Finally, the construction of a form for the room reservation, was explained, as well as a "do not repeat yourself" form macro creation technique. Resources for Article: Further resources on this subject: Eloquent… without Laravel! [Article] Laravel 4 - Creating a Simple CRUD Application in Hours [Article] Exploring and Interacting with Materials using Blueprints [Article]
Read more
  • 0
  • 0
  • 3814

article-image-eav-model
Packt
10 Aug 2015
11 min read
Save for later

EAV model

Packt
10 Aug 2015
11 min read
In this article by Allan MacGregor, author of the book Magento PHP Developer's Guide - Second Edition, we cover details about EAV models, its usefulness in retrieving data, and the advantages it provides to the merchants and developers. EAV stands for entity, attribute, and value and is probably the most difficult concept for new Magento developers to grasp. While the EAV concept is not unique to Magento, it is rarely implemented on modern systems. Additionally, a Magento implementation is not a simple one. (For more resources related to this topic, see here.) What is EAV? In order to understand what EAV is and what its role within Magento is, we need to break down parts of the EAV model: Entity: This represents the data items (objects) inside Magento products, customers, categories, and orders. Each entity is stored in the database with a unique ID. Attribute: These are our object properties. Instead of having one column per attribute on the product table, attributes are stored on separate sets of tables. Value: As the name implies, it is simply the value link to a particular attribute. This data model is the secret behind Magento's flexibility and power, allowing entities to add and remove new properties without having to make any changes to the code, templates, or the database schema. This model can be seen as a vertical way of growing our database (new attributes and more rows), while the traditional model involves a horizontal growth pattern (new attributes and more columns), which would result in a schema redesign every time new attributes are added. The EAV model not only allows for the fast evolution of our database, but is also more effective because it only works with non-empty attributes, avoiding the need to reserve additional space in the database for null values. If you are interested in exploring and learning more about the Magento database structure, I highly recommend visiting www.magereverse.com. Adding a new product attribute is as simple going to the Magento backend and specifying the new attribute type, be it color, size, brand, or anything else. The opposite is true as well and we can get rid of unused attributes on our products or customer models. For more information on managing attributes, visit http://www.magentocommerce.com/knowledge-base/entry/how-do-attributes-work-in-magento. The Magento community edition currently has eight different types of EAV objects: Customer Customer Address Products Product Categories Orders Invoices Credit Memos Shipments The Magento Enterprise Edition has one additional type called RMA item, which is part of the Return Merchandise Authorization (RMA) system. All this flexibility and power is not free; there is a price to pay. Implementing the EAV model results in having our entity data distributed on a large number of tables. For example, just the Product Model is distributed to around 40 different tables. The following diagram only shows a few of the tables involved in saving the information of Magento products: Other major downsides of EAV are the loss of performance while retrieving large collections of EAV objects and an increase in the database query complexity. As the data is more fragmented (stored in more tables), selecting a single record involves several joins. One way Magento works around this downside of EAV is by making use of indexes and flat tables. For example, Magento can save all the product information into the flat_catalog table for easier and faster access. Let's continue using Magento products as our example and manually build the query to retrieve a single product. If you have phpmyadmin or MySQL Workbench installed on your development environment, you can experiment with the following queries. Each can be downloaded on the PHPMyAdmin website at http://www.phpmyadmin.net/ and the MySQL Workbench website at http://www.mysql.com/products/workbench/. The first table that we need to use is the catalog_product_entity table. We canconsider this our main product EAV table since it contains the main entity records for our products: Let's query the table by running the following SQL query: SELECT FROM `catalog_product_entity`; The table contains the following fields: entity_id: This is our product unique identifier that is used internally by Magento. entity_type_id: Magento has several different types of EAV models. Products, customers, and orders are just some of them. Identifying each of these by type allows Magento to retrieve the attributes and values from the appropriate tables. attribute_set_id: Product attributes can be grouped locally into attribute sets. Attribute sets allow even further flexibility on the product structure as products are not forced to use all available attributes. type_id: There are several different types of products in Magento: simple, configurable, bundled, downloadable, and grouped products; each with unique settings and functionality. sku: This stands for Stock Keeping Unit and is a number or code used to identify each unique product or item for sale in a store. This is a user-defined value. has_options: This is used to identify if a product has custom options. required_options: This is used to identify if any of the custom options that are required. created_at: This is the row creation date. updated_at: This is the last time the row was modified. Now we have a basic understanding of the product entity table. Each record represents a single product in our Magento store, but we don't have much information about that product beyond the SKU and the product type. So, where are the attributes stored? And how does Magento know the difference between a product attribute and a customer attribute? For this, we need to take a look into the eav_attribute table by running the following SQL query: SELECT FROM `eav_attribute`; As a result, we will not only see the product attributes, but also the attributes corresponding to the customer model, order model, and so on. Fortunately, we already have a key to filter the attributes from this table. Let's run the following query: SELECT FROM `eav_attribute` WHERE entity_type_id = 4; This query tells the database to only retrieve the attributes where the entity_type_id column is equal to the product entity_type_id(4). Before moving, let's analyze the most important fields inside the eav_attribute table: attribute_id: This is the unique identifier for each attribute and primary key of the table. entity_type_id: This relates each attribute to a specific eav model type. attribute_code: This is the name or key of our attribute and is used to generate the getters and setters for our magic methods. backend_model: These manage loading and storing data into the database. backend_type: This specifies the type of value stored in the backend (database). backend_table: This is used to specify if the attribute should be stored on a special table instead of the default EAV table. frontend_model: These handle the rendering of the attribute element into a web browser. frontend_input: Similar to the frontend model, the frontend input specifies the type of input field the web browser should render. frontend_label: This is the label/name of the attribute as it should be rendered by the browser. source_model: These are used to populate an attribute with possible values. Magento comes with several predefined source models for countries, yes or no values, regions, and so on. Retrieving the data At this point, we have successfully retrieved a product entity and the specific attributes that apply to that entity. Now it's time to start retrieving the actual values. In order to simplify the example (and the query) a little, we will only try to retrieve the name attribute of our products. How do we know which table our attribute values are stored on? Well, thankfully, Magento follows a naming convention to name the tables. If we inspect our database structure, we will notice that there are several tables using the catalog_product_entity prefix: catalog_product_entity catalog_product_entity_datetime catalog_product_entity_decimal catalog_product_entity_int catalog_product_entity_text catalog_product_entity_varchar catalog_product_entity_gallery catalog_product_entity_media_gallery catalog_product_entity_tier_price Wait! How do we know which is the right table to query for our name attribute values? If you were paying attention, I already gave you the answer. Remember that the eav_attribute table had a column called backend_type? Magento EAV stores each attribute on a different table based on the backend type of that attribute. If we want to confirm the backend type of our name attribute, we can do so by running the following code: SELECT FROM `eav_attribute` WHERE `entity_type_id` =4 AND `attribute_code` = 'name'; As a result, we should see that the backend type is varchar and that the values for this attribute are stored in the catalog_product_entity_varchar table. Let's inspect this table: The catalog_product_entity_varchar table is formed by only 6 columns: value_id: This is the attribute value unique identifier and primary key entity_type_id: This is the entity type ID to which this value belongs attribute_id: This is the foreign key that relates the value to our eav_entity table store_id: This is the foreign key matching an attribute value with a storeview entity_id: This is the foreign key relating to the corresponding entity table, in this case, catalog_product_entity value: This is the actual value that we want to retrieve Depending on the attribute configuration, we can have it as a global value, meaning, it applies across all store views or a value per storeview. Now that we finally have all the tables that we need to retrieve the product information, we can build our query: SELECT p.entity_id AS product_id, var.value AS product_name, p.sku AS product_sku FROM catalog_product_entity p, eav_attribute eav, catalog_product_entity_varchar var WHERE p.entity_type_id = eav.entity_type_id AND var.entity_id = p.entity_id    AND eav.attribute_code = 'name'    AND eav.attribute_id = var.attribute_id From our query, we should see a result set with three columns, product_id, product_name, and product_sku. So let's step back for a second in order to get product names with SKUs with raw SQL. We had to write a five-line SQL query, and we only retrieved two values from our products, from one single EAV value table if we want to retrieve a numeric field such as price or a text-value-like product. If we didn't have an ORM in place, maintaining Magento would be almost impossible. Fortunately, we do have an ORM in place, and most likely, you will never need to deal with raw SQL to work with Magento. That said, let's see how we can retrieve the same product information by using the Magento ORM: Our first step is going to be to instantiate a product collection: $collection = Mage::getModel('catalog/product')->getCollection(); Then we will specifically tell Magento to select the name attribute: $collection->addAttributeToSelect('name'); Then, we will ask it to sort the collection by name: $collection->setOrder('name', 'asc'); Finally, we will tell Magento to load the collection: $collection->load(); The end result is a collection of all products in the store sorted by name. We can inspect the actual SQL query by running the following code: echo $collection->getSelect()->__toString(); In just three lines of code, we are telling Magento to grab all the products in the store, to specifically select the name, and finally order the products by name. The last line $collection->getSelect()->__toString(); allows to see the actual query that Magento is executing in our behalf. The actual query being generated by Magento is as follows: SELECT `e`.. IF( at_name.value_id >0, at_name.value, at_name_default.value ) AS `name` FROM `catalog_product_entity` AS `e` LEFT JOIN `catalog_product_entity_varchar` AS `at_name_default` ON (`at_name_default`.`entity_id` = `e`.`entity_id`) AND (`at_name_default`.`attribute_id` = '65') AND `at_name_default`.`store_id` =0 LEFT JOIN `catalog_product_entity_varchar` AS `at_name` ON ( `at_name`.`entity_id` = `e`.`entity_id` ) AND (`at_name`.`attribute_id` = '65') AND (`at_name`.`store_id` =1) ORDER BY `name` ASC As we can see, the ORM and the EAV models are wonderful tools that not only put a lot of power and flexibility in the hands of the developers, but they also do it in a way that is comprehensive and easy to use. Summary In this article, we learned about EAV models and how they are structured to provide Magento with data flexibility and extensibility that both merchants and developers can take advantage of. Resources for Article: Further resources on this subject: Creating a Shipping Module [article] Preparing and Configuring Your Magento Website [article] Optimizing Magento Performance — Using HHVM [article]
Read more
  • 0
  • 0
  • 18756

article-image-setting-synchronous-replication
Packt
10 Aug 2015
17 min read
Save for later

Setting Up Synchronous Replication

Packt
10 Aug 2015
17 min read
In this article by the author, Hans-Jürgen Schönig, of the book, PostgreSQL Replication, Second Edition, we learn how to set up synchronous replication. In asynchronous replication, data is submitted and received by the slave (or slaves) after the transaction has been committed on the master. During the time between the master's commit and the point when the slave actually has fully received the data, it can still be lost. Here, you will learn about the following topics: Making sure that no single transaction can be lost Configuring PostgreSQL for synchronous replication Understanding and using application_name The performance impact of synchronous replication Optimizing replication for speed Synchronous replication can be the cornerstone of your replication setup, providing a system that ensures zero data loss. (For more resources related to this topic, see here.) Synchronous replication setup Synchronous replication has been made to protect your data at all costs. The core idea of synchronous replication is that a transaction must be on at least two servers before the master returns success to the client. Making sure that data is on at least two nodes is a key requirement to ensure no data loss in the event of a crash. Setting up synchronous replication works just like setting up asynchronous replication. Just a handful of parameters discussed here have to be changed to enjoy the blessings of synchronous replication. However, if you are about to create a setup based on synchronous replication, we recommend getting started with an asynchronous setup and gradually extending your configuration and turning it into synchronous replication. This will allow you to debug things more easily and avoid problems down the road. Understanding the downside to synchronous replication The most important thing you have to know about synchronous replication is that it is simply expensive. Synchronous replication and its downsides are two of the core reasons for which we have decided to include all this background information in this book. It is essential to understand the physical limitations of synchronous replication, otherwise you could end up in deep trouble. When setting up synchronous replication, try to keep the following things in mind: Minimize the latency Make sure you have redundant connections Synchronous replication is more expensive than asynchronous replication Always cross-check twice whether there is a real need for synchronous replication In many cases, it is perfectly fine to lose a couple of rows in the event of a crash. Synchronous replication can safely be skipped in this case. However, if there is zero tolerance, synchronous replication is a tool that should be used. Understanding the application_name parameter In order to understand a synchronous setup, a config variable called application_name is essential, and it plays an important role in a synchronous setup. In a typical application, people use the application_name parameter for debugging purposes, as it allows users to assign a name to their database connection. It can help track bugs, identify what an application is doing, and so on: test=# SHOW application_name; application_name ------------------ psql (1 row)   test=# SET application_name TO 'whatever'; SET test=# SHOW application_name; application_name ------------------ whatever (1 row) As you can see, it is possible to set the application_name parameter freely. The setting is valid for the session we are in, and will be gone as soon as we disconnect. The question now is: What does application_name have to do with synchronous replication? Well, the story goes like this: if this application_name value happens to be part of synchronous_standby_names, the slave will be a synchronous one. In addition to that, to be a synchronous standby, it has to be: connected streaming data in real-time (that is, not fetching old WAL records) Once a standby becomes synced, it remains in that position until disconnection. In the case of cascaded replication (which means that a slave is again connected to a slave), the cascaded slave is not treated synchronously anymore. Only the first server is considered to be synchronous. With all of this information in mind, we can move forward and configure our first synchronous replication. Making synchronous replication work To show you how synchronous replication works, this article will include a full, working example outlining all the relevant configuration parameters. A couple of changes have to be made to the master. The following settings will be needed in postgresql.conf on the master: wal_level = hot_standby max_wal_senders = 5   # or any number synchronous_standby_names = 'book_sample' hot_standby = on # on the slave to make it readable Then we have to adapt pg_hba.conf. After that, the server can be restarted and the master is ready for action. We recommend that you set wal_keep_segments as well to keep more transaction logs. We also recommend setting wal_keep_segments to keep more transaction logs on the master database. This makes the entire setup way more robust. It is also possible to utilize replication slots. In the next step, we can perform a base backup just as we have done before. We have to call pg_basebackup on the slave. Ideally, we already include the transaction log when doing the base backup. The --xlog-method=stream parameter allows us to fire things up quickly and without any greater risks. The --xlog-method=stream and wal_keep_segments parameters are a good combo, and in our opinion, should be used in most cases to ensure that a setup works flawlessly and safely. We have already recommended setting hot_standby on the master. The config file will be replicated anyway, so you save yourself one trip to postgresql.conf to change this setting. Of course, this is not fine art but an easy and pragmatic approach. Once the base backup has been performed, we can move ahead and write a simple recovery.conf file suitable for synchronous replication, as follows: iMac:slavehs$ cat recovery.conf primary_conninfo = 'host=localhost                    application_name=book_sample                    port=5432'   standby_mode = on The config file looks just like before. The only difference is that we have added application_name to the scenery. Note that the application_name parameter must be identical to the synchronous_standby_names setting on the master. Once we have finished writing recovery.conf, we can fire up the slave. In our example, the slave is on the same server as the master. In this case, you have to ensure that those two instances will use different TCP ports, otherwise the instance that starts second will not be able to fire up. The port can easily be changed in postgresql.conf. After these steps, the database instance can be started. The slave will check out its connection information and connect to the master. Once it has replayed all the relevant transaction logs, it will be in synchronous state. The master and the slave will hold exactly the same data from then on. Checking the replication Now that we have started the database instance, we can connect to the system and see whether things are working properly. To check for replication, we can connect to the master and take a look at pg_stat_replication. For this check, we can connect to any database inside our (master) instance, as follows: postgres=# x Expanded display is on. postgres=# SELECT * FROM pg_stat_replication; -[ RECORD 1 ]----+------------------------------ pid            | 62871 usesysid         | 10 usename         | hs application_name | book_sample client_addr     | ::1 client_hostname | client_port     | 59235 backend_start   | 2013-03-29 14:53:52.352741+01 state           | streaming sent_location   | 0/30001E8 write_location   | 0/30001E8 flush_location   | 0/30001E8 replay_location | 0/30001E8 sync_priority   | 1 sync_state       | sync This system view will show exactly one line per slave attached to your master system. The x command will make the output more readable for you. If you don't use x to transpose the output, the lines will be so long that it will be pretty hard for you to comprehend the content of this table. In expanded display mode, each column will be in one line instead. You can see that the application_name parameter has been taken from the connect string passed to the master by the slave (which is book_sample in our example). As the application_name parameter matches the master's synchronous_standby_names setting, we have convinced the system to replicate synchronously. No transaction can be lost anymore because every transaction will end up on two servers instantly. The sync_state setting will tell you precisely how data is moving from the master to the slave. You can also use a list of application names, or simply a * sign in synchronous_standby_names to indicate that the first slave has to be synchronous. Understanding performance issues At various points in this book, we have already pointed out that synchronous replication is an expensive thing to do. Remember that we have to wait for a remote server and not just the local system. The network between those two nodes is definitely not something that is going to speed things up. Writing to more than one node is always more expensive than writing to only one node. Therefore, we definitely have to keep an eye on speed, otherwise we might face some pretty nasty surprises. Consider what you have learned about the CAP theory earlier in this book. Synchronous replication is exactly where it should be, with the serious impact that the physical limitations will have on performance. The main question you really have to ask yourself is: do I really want to replicate all transactions synchronously? In many cases, you don't. To prove our point, let's imagine a typical scenario: a bank wants to store accounting-related data as well as some logging data. We definitely don't want to lose a couple of million dollars just because a database node goes down. This kind of data might be worth the effort of replicating synchronously. The logging data is quite different, however. It might be far too expensive to cope with the overhead of synchronous replication. So, we want to replicate this data in an asynchronous way to ensure maximum throughput. How can we configure a system to handle important as well as not-so-important transactions nicely? The answer lies in a variable you have already seen earlier in the book—the synchronous_commit variable. Setting synchronous_commit to on In the default PostgreSQL configuration, synchronous_commit has been set to on. In this case, commits will wait until a reply from the current synchronous standby indicates that it has received the commit record of the transaction and has flushed it to the disk. In other words, both servers must report that the data has been written safely. Unless both servers crash at the same time, your data will survive potential problems (crashing of both servers should be pretty unlikely). Setting synchronous_commit to remote_write Flushing to both disks can be highly expensive. In many cases, it is enough to know that the remote server has accepted the XLOG and passed it on to the operating system without flushing things to the disk on the slave. As we can be pretty certain that we don't lose two servers at the very same time, this is a reasonable compromise between performance and consistency with respect to data protection. Setting synchronous_commit to off The idea is to delay WAL writing to reduce disk flushes. This can be used if performance is more important than durability. In the case of replication, it means that we are not replicating in a fully synchronous way. Keep in mind that this can have a serious impact on your application. Imagine a transaction committing on the master and you wanting to query that data instantly on one of the slaves. There would still be a tiny window during which you can actually get outdated data. Setting synchronous_commit to local The local value will flush locally but not wait for the replica to respond. In other words, it will turn your transaction into an asynchronous one. Setting synchronous_commit to local can also cause a small time delay window, during which the slave can actually return slightly outdated data. This phenomenon has to be kept in mind when you decide to offload reads to the slave. In short, if you want to replicate synchronously, you have to ensure that synchronous_commit is set to either on or remote_write. Changing durability settings on the fly Changing the way data is replicated on the fly is easy and highly important to many applications, as it allows the user to control durability on the fly. Not all data has been created equal, and therefore, more important data should be written in a safer way than data that is not as important (such as log files). We have already set up a full synchronous replication infrastructure by adjusting synchronous_standby_names (master) along with the application_name (slave) parameter. The good thing about PostgreSQL is that you can change your durability requirements on the fly: test=# BEGIN; BEGIN test=# CREATE TABLE t_test (id int4); CREATE TABLE test=# SET synchronous_commit TO local; SET test=# x Expanded display is on. test=# SELECT * FROM pg_stat_replication; -[ RECORD 1 ]----+------------------------------ pid             | 62871 usesysid         | 10 usename         | hs application_name | book_sample client_addr     | ::1 client_hostname | client_port     | 59235 backend_start   | 2013-03-29 14:53:52.352741+01 state           | streaming sent_location   | 0/3026258 write_location   | 0/3026258 flush_location   | 0/3026258 replay_location | 0/3026258 sync_priority   | 1 sync_state       | sync   test=# COMMIT; COMMIT In this example, we changed the durability requirements on the fly. This will make sure that this very specific transaction will not wait for the slave to flush to the disk. Note, as you can see, sync_state has not changed. Don't be fooled by what you see here; you can completely rely on the behavior outlined in this section. PostgreSQL is perfectly able to handle each transaction separately. This is a unique feature of this wonderful open source database; it puts you in control and lets you decide which kind of durability requirements you want. Understanding the practical implications and performance We have already talked about practical implications as well as performance implications. But what good is a theoretical example? Let's do a simple benchmark and see how replication behaves. We are performing this kind of testing to show you that various levels of durability are not just a minor topic; they are the key to performance. Let's assume a simple test: in the following scenario, we have connected two equally powerful machines (3 GHz, 8 GB RAM) over a 1 Gbit network. The two machines are next to each other. To demonstrate the impact of synchronous replication, we have left shared_buffers and all other memory parameters as default, and only changed fsync to off to make sure that the effect of disk wait is reduced to practically zero. The test is simple: we use a one-column table with only one integer field and 10,000 single transactions consisting of just one INSERT statement: INSERT INTO t_test VALUES (1); We can try this with full, synchronous replication (synchronous_commit = on): real 0m6.043s user 0m0.131s sys 0m0.169s As you can see, the test has taken around 6 seconds to complete. This test can be repeated with synchronous_commit = local now (which effectively means asynchronous replication): real 0m0.909s user 0m0.101s sys 0m0.142s In this simple test, you can see that the speed has gone up by us much as six times. Of course, this is a brute-force example, which does not fully reflect reality (this was not the goal anyway). What is important to understand, however, is that synchronous versus asynchronous replication is not a matter of a couple of percentage points or so. This should stress our point even more: replicate synchronously only if it is really needed, and if you really have to use synchronous replication, make sure that you limit the number of synchronous transactions to an absolute minimum. Also, please make sure that your network is up to the job. Replicating data synchronously over network connections with high latency will kill your system performance like nothing else. Keep in mind that throwing expensive hardware at the problem will not solve the problem. Doubling the clock speed of your servers will do practically nothing for you because the real limitation will always come from network latency. The performance penalty with just one connection is definitely a lot larger than that with many connections. Remember that things can be done in parallel, and network latency does not make us more I/O or CPU bound, so we can reduce the impact of slow transactions by firing up more concurrent work. When synchronous replication is used, how can you still make sure that performance does not suffer too much? Basically, there are a couple of important suggestions that have proven to be helpful: Use longer transactions: Remember that the system must ensure on commit that the data is available on two servers. We don't care what happens in the middle of a transaction, because anybody outside our transaction cannot see the data anyway. A longer transaction will dramatically reduce network communication. Run stuff concurrently: If you have more than one transaction going on at the same time, it will be beneficial to performance. The reason for this is that the remote server will return the position inside the XLOG that is considered to be processed safely (flushed or accepted). This method ensures that many transactions can be confirmed at the same time. Redundancy and stopping replication When talking about synchronous replication, there is one phenomenon that must not be left out. Imagine we have a two-node cluster replicating synchronously. What happens if the slave dies? The answer is that the master cannot distinguish between a slow and a dead slave easily, so it will start waiting for the slave to come back. At first glance, this looks like nonsense, but if you think about it more deeply, you will figure out that synchronous replication is actually the only correct thing to do. If somebody decides to go for synchronous replication, the data in the system must be worth something, so it must not be at risk. It is better to refuse data and cry out to the end user than to risk data and silently ignore the requirements of high durability. If you decide to use synchronous replication, you must consider using at least three nodes in your cluster. Otherwise, it will be very risky, and you cannot afford to lose a single node without facing significant downtime or risking data loss. Summary Here, we outlined the basic concept of synchronous replication, and showed how data can be replicated synchronously. We also showed how durability requirements can be changed on the fly by modifying PostgreSQL runtime parameters. PostgreSQL gives users the choice of how a transaction should be replicated, and which level of durability is necessary for a certain transaction. Resources for Article: Further resources on this subject: Introducing PostgreSQL 9 [article] PostgreSQL – New Features [article] Installing PostgreSQL [article]
Read more
  • 0
  • 0
  • 4683

article-image-understanding-hadoop-backup-and-recovery-needs
Packt
10 Aug 2015
25 min read
Save for later

Understanding Hadoop Backup and Recovery Needs

Packt
10 Aug 2015
25 min read
In this article by Gaurav Barot, Chintan Mehta, and Amij Patel, authors of the book Hadoop Backup and Recovery Solutions, we will discuss backup and recovery needs. In the present age of information explosion, data is the backbone of business organizations of all sizes. We need a complete data backup and recovery system and a strategy to ensure that critical data is available and accessible when the organizations need it. Data must be protected against loss, damage, theft, and unauthorized changes. If disaster strikes, data recovery must be swift and smooth so that business does not get impacted. Every organization has its own data backup and recovery needs, and priorities based on the applications and systems they are using. Today's IT organizations face the challenge of implementing reliable backup and recovery solutions in the most efficient, cost-effective manner. To meet this challenge, we need to carefully define our business requirements and recovery objectives before deciding on the right backup and recovery strategies or technologies to deploy. (For more resources related to this topic, see here.) Before jumping onto the implementation approach, we first need to know about the backup and recovery strategies and how to efficiently plan them. Understanding the backup and recovery philosophies Backup and recovery is becoming more challenging and complicated, especially with the explosion of data growth and increasing need for data security today. Imagine big players such as Facebook, Yahoo! (the first to implement Hadoop), eBay, and more; how challenging it will be for them to handle unprecedented volumes and velocities of unstructured data, something which traditional relational databases can't handle and deliver. To emphasize the importance of backup, let's take a look at a study conducted in 2009. This was the time when Hadoop was evolving and a handful of bugs still existed in Hadoop. Yahoo! had about 20,000 nodes running Apache Hadoop in 10 different clusters. HDFS lost only 650 blocks, out of 329 million total blocks. Now hold on a second. These blocks were lost due to the bugs found in the Hadoop package. So, imagine what the scenario would be now. I am sure you will bet on losing hardly a block. Being a backup manager, your utmost target is to think, make, strategize, and execute a foolproof backup strategy capable of retrieving data after any disaster. Solely speaking, the plan of the strategy is to protect the files in HDFS against disastrous situations and revamp the files back to their normal state, just like James Bond resurrects after so many blows and probably death-like situations. Coming back to the backup manager's role, the following are the activities of this role: Testing out various case scenarios to forestall any threats, if any, in the future Building a stable recovery point and setup for backup and recovery situations Preplanning and daily organization of the backup schedule Constantly supervising the backup and recovery process and avoiding threats, if any Repairing and constructing solutions for backup processes The ability to reheal, that is, recover from data threats, if they arise (the resurrection power) Data protection is one of the activities and it includes the tasks of maintaining data replicas for long-term storage Resettling data from one destination to another Basically, backup and recovery strategies should cover all the areas mentioned here. For any system data, application, or configuration, transaction logs are mission critical, though it depends on the datasets, configurations, and applications that are used to design the backup and recovery strategies. Hadoop is all about big data processing. After gathering some exabytes for data processing, the following are the obvious questions that we may come up with: What's the best way to back up data? Do we really need to take a backup of these large chunks of data? Where will we find more storage space if the current storage space runs out? Will we have to maintain distributed systems? What if our backup storage unit gets corrupted? The answer to the preceding questions depends on the situation you may be facing; let's see a few situations. One of the situations is where you may be dealing with a plethora of data. Hadoop is used for fact-finding semantics and data is in abundance. Here, the span of data is short; it is short lived and important sources of the data are already backed up. Such is the scenario wherein the policy of not backing up data at all is feasible, as there are already three copies (replicas) in our data nodes (HDFS). Moreover, since Hadoop is still vulnerable to human error, a backup of configuration files and NameNode metadata (dfs.name.dir) should be created. You may find yourself facing a situation where the data center on which Hadoop runs crashes and the data is not available as of now; this results in a failure to connect with mission-critical data. A possible solution here is to back up Hadoop, like any other cluster (the Hadoop command is Hadoop). Replication of data using DistCp To replicate data, the distcp command writes data to two different clusters. Let's look at the distcp command with a few examples or options. DistCp is a handy tool used for large inter/intra cluster copying. It basically expands a list of files to input in order to map tasks, each of which will copy files that are specified in the source list. Let's understand how to use distcp with some of the basic examples. The most common use case of distcp is intercluster copying. Let's see an example: bash$ hadoop distcp2 hdfs://ka-16:8020/parth/ghiya hdfs://ka-001:8020/knowarth/parth This command will expand the namespace under /parth/ghiya on the ka-16 NameNode into the temporary file, get its content, divide them among a set of map tasks, and start copying the process on each task tracker from ka-16 to ka-001. The command used for copying can be generalized as follows: hadoop distcp2 hftp://namenode-location:50070/basePath hdfs://namenode-location Here, hftp://namenode-location:50070/basePath is the source and hdfs://namenode-location is the destination. In the preceding command, namenode-location refers to the hostname and 50070 is the NameNode's HTTP server post. Updating and overwriting using DistCp The -update option is used when we want to copy files from the source that don't exist on the target or have some different contents, which we do not want to erase. The -overwrite option overwrites the target files even if they exist at the source. The files can be invoked by simply adding -update and -overwrite. In the example, we used distcp2, which is an advanced version of DistCp. The process will go smoothly even if we use the distcp command. Now, let's look at two versions of DistCp, the legacy DistCp or just DistCp and the new DistCp or the DistCp2: During the intercluster copy process, files that were skipped during the copy process have all their file attributes (permissions, owner group information, and so on) unchanged when we copy using legacy DistCp or just DistCp. This, however, is not the case in new DistCp. These values are now updated even if a file is skipped. Empty root directories among the source inputs were not created in the target folder in legacy DistCp, which is not the case anymore in the new DistCp. There is a common misconception that Hadoop protects data loss; therefore, we don't need to back up the data in the Hadoop cluster. Since Hadoop replicates data three times by default, this sounds like a safe statement; however, it is not 100 percent safe. While Hadoop protects from hardware failure on the data nodes—meaning that if one entire node goes down, you will not lose any data—there are other ways in which data loss may occur. Data loss may occur due to various reasons, such as Hadoop being highly susceptible to human errors, corrupted data writes, accidental deletions, rack failures, and many such instances. Any of these reasons are likely to cause data loss. Consider an example where a corrupt application can destroy all data replications. During the process, it will attempt to compute each replication and on not finding a possible match, it will delete the replica. User deletions are another example of how data can be lost, as Hadoop's trash mechanism is not enabled by default. Also, one of the most complicated and expensive-to-implement aspects of protecting data in Hadoop is the disaster recovery plan. There are many different approaches to this, and determining which approach is right requires a balance between cost, complexity, and recovery time. A real-life scenario can be Facebook. The data that Facebook holds increases exponentially from 15 TB to 30 PB, that is, 3,000 times the Library of Congress. With increasing data, the problem faced was physical movement of the machines to the new data center, which required man power. Plus, it also impacted services for a period of time. Data availability in a short period of time is a requirement for any service; that's when Facebook started exploring Hadoop. To conquer the problem while dealing with such large repositories of data is yet another headache. The reason why Hadoop was invented was to keep the data bound to neighborhoods on commodity servers and reasonable local storage, and to provide maximum availability to data within the neighborhood. So, a data plan is incomplete without data backup and recovery planning. A big data execution using Hadoop states a situation wherein the focus on the potential to recover from a crisis is mandatory. The backup philosophy We need to determine whether Hadoop, the processes and applications that run on top of it (Pig, Hive, HDFS, and more), and specifically the data stored in HDFS are mission critical. If the data center where Hadoop is running disappeared, will the business stop? Some of the key points that have to be taken into consideration have been explained in the sections that follow; by combining these points, we will arrive at the core of the backup philosophy. Changes since the last backup Considering the backup philosophy that we need to construct, the first thing we are going to look at are changes. We have a sound application running and then we add some changes. In case our system crashes and we need to go back to our last safe state, our backup strategy should have a clause of the changes that have been made. These changes can be either database changes or configuration changes. Our clause should include the following points in order to construct a sound backup strategy: Changes we made since our last backup The count of files changed Ensure that our changes are tracked The possibility of bugs in user applications since the last change implemented, which may cause hindrance and it may be necessary to go back to the last safe state After applying new changes to the last backup, if the application doesn't work as expected, then high priority should be given to the activity of taking the application back to its last safe state or backup. This ensures that the user is not interrupted while using the application or product. The rate of new data arrival The next thing we are going to look at is how many changes we are dealing with. Is our application being updated so much that we are not able to decide what the last stable version was? Data is produced at a surpassing rate. Consider Facebook, which alone produces 250 TB of data a day. Data production occurs at an exponential rate. Soon, terms such as zettabytes will come upon a common place. Our clause should include the following points in order to construct a sound backup: The rate at which new data is arriving The need for backing up each and every change The time factor involved in backup between two changes Policies to have a reserve backup storage The size of the cluster The size of a cluster is yet another important factor, wherein we will have to select cluster size such that it will allow us to optimize the environment for our purpose with exceptional results. Recalling the Yahoo! example, Yahoo! has 10 clusters all over the world, covering 20,000 nodes. Also, Yahoo! has the maximum number of nodes in its large clusters. Our clause should include the following points in order to construct a sound backup: Selecting the right resource, which will allow us to optimize our environment. The selection of the right resources will vary as per need. Say, for instance, users with I/O-intensive workloads will go for more spindles per core. A Hadoop cluster contains four types of roles, that is, NameNode, JobTracker, TaskTracker, and DataNode. Handling the complexities of optimizing a distributed data center. Priority of the datasets The next thing we are going to look at are the new datasets, which are arriving. With the increase in the rate of new data arrivals, we always face a dilemma of what to backup. Are we tracking all the changes in the backup? Now, if are we backing up all the changes, will our performance be compromised? Our clause should include the following points in order to construct a sound backup: Making the right backup of the dataset Taking backups at a rate that will not compromise performance Selecting the datasets or parts of datasets The next thing we are going to look at is what exactly is backed up. When we deal with large chunks of data, there's always a thought in our mind: Did we miss anything while selecting the datasets or parts of datasets that have not been backed up yet? Our clause should include the following points in order to construct a sound backup: Backup of necessary configuration files Backup of files and application changes The timeliness of data backups With such a huge amount of data collected daily (Facebook), the time interval between backups is yet another important factor. Do we back up our data daily? In two days? In three days? Should we backup small chunks of data daily, or should we back up larger chunks at a later period? Our clause should include the following points in order to construct a sound backup: Dealing with any impacts if the time interval between two backups is large Monitoring a timely backup strategy and going through it The frequency of data backups depends on various aspects. Firstly, it depends on the application and usage. If it is I/O intensive, we may need more backups, as each dataset is not worth losing. If it is not so I/O intensive, we may keep the frequency low. We can determine the timeliness of data backups from the following points: The amount of data that we need to backup The rate at which new updates are coming Determining the window of possible data loss and making it as low as possible Critical datasets that need to be backed up Configuration and permission files that need to be backed up Reducing the window of possible data loss The next thing we are going to look at is how to minimize the window of possible data loss. If our backup frequency is great then what are the chances of data loss? What's our chance of recovering the latest files? Our clause should include the following points in order to construct a sound backup: The potential to recover latest files in the case of a disaster Having a low data-loss probability Backup consistency The next thing we are going to look at is backup consistency. The probability of invalid backups should be less or even better zero. This is because if invalid backups are not tracked, then copies of invalid backups will be made further, which will again disrupt our backup process. Our clause should include the following points in order to construct a sound backup: Avoid copying data when it's being changed Possibly, construct a shell script, which takes timely backups Ensure that the shell script is bug-free Avoiding invalid backups We are going to continue the discussion on invalid backups. As you saw, HDFS makes three copies of our backup for the recovery process. What if the original backup was flawed with errors or bugs? The three copies will be corrupted copies; now, when we recover these flawed copies, the result indeed will be a catastrophe. Our clause should include the following points in order to construct a sound backup: Avoid having a long backup frequency Have the right backup process, and probably having an automated shell script Track unnecessary backups If our backup clause covers all the preceding mentioned points, we surely are on the way to making a good backup strategy. A good backup policy basically covers all these points; so, if a disaster occurs, it always aims to go to the last stable state. That's all about backups. Moving on, let's say a disaster occurs and we need to go to the last stable state. Let's have a look at the recovery philosophy and all the points that make a sound recovery strategy. The recovery philosophy After a deadly storm, we always try to recover from the after-effects of the storm. Similarly, after a disaster, we try to recover from the effects of the disaster. In just one moment, storage capacity which was a boon turns into a curse and just another expensive, useless thing. Starting off with the best question, what will be the best recovery philosophy? Well, it's obvious that the best philosophy will be one wherein we may never have to perform recovery at all. Also, there may be scenarios where we may need to do a manual recovery. Let's look at the possible levels of recovery before moving on to recovery in Hadoop: Recovery to the flawless state Recovery to the last supervised state Recovery to a possible past state Recovery to a sound state Recovery to a stable state So, obviously we want our recovery state to be flawless. But if it's not achieved, we are willing to compromise a little and allow the recovery to go to a possible past state we are aware of. Now, if that's not possible, again we are ready to compromise a little and allow it to go to the last possible sound state. That's how we deal with recovery: first aim for the best, and if not, then compromise a little. Just like the saying goes, "The bigger the storm, more is the work we have to do to recover," here also we can say "The bigger the disaster, more intense is the recovery plan we have to take." So, the recovery philosophy that we construct should cover the following points: An automation system setup that detects a crash and restores the system to the last working state, where the application runs as per expected behavior. The ability to track modified files and copy them. Track the sequences on files, just like an auditor trails his audits. Merge the files that are copied separately. Multiple version copies to maintain a version control. Should be able to treat the updates without impacting the application's security and protection. Delete the original copy only after carefully inspecting the changed copy. Treat new updates but first make sure they are fully functional and will not hinder anything else. If they hinder, then there should be a clause to go to the last safe state. Coming back to recovery in Hadoop, the first question we may think of is what happens when the NameNode goes down? When the NameNode goes down, so does the metadata file (the file that stores data about file owners and file permissions, where the file is stored on data nodes and more), and there will be no one present to route our read/write file request to the data node. Our goal will be to recover the metadata file. HDFS provides an efficient way to handle name node failures. There are basically two places where we can find metadata. First, fsimage and second, the edit logs. Our clause should include the following points: Maintain three copies of the name node. When we try to recover, we get four options, namely, continue, stop, quit, and always. Choose wisely. Give preference to save the safe part of the backups. If there is an ABORT! error, save the safe state. Hadoop provides four recovery modes based on the four options it provides (continue, stop, quit, and always): Continue: This allows you to continue over the bad parts. This option will let you cross over a few stray blocks and continue over to try to produce a full recovery mode. This can be the Prompt when found error mode. Stop: This allows you to stop the recovery process and make an image file of the copy. Now, the part that we stopped won't be recovered, because we are not allowing it to. In this case, we can say that we are having the safe-recovery mode. Quit: This exits the recovery process without making a backup at all. In this, we can say that we are having the no-recovery mode. Always: This is one step further than continue. Always selects continue by default and thus avoids stray blogs found further. This can be the prompt only once mode. We will look at these in further discussions. Now, you may think that the backup and recovery philosophy is cool, but wasn't Hadoop designed to handle these failures? Well, of course, it was invented for this purpose but there's always the possibility of a mashup at some level. Are we overconfident and not ready to take precaution, which can protect us, and are we just entrusting our data blindly with Hadoop? No, certainly we aren't. We are going to take every possible preventive step from our side. In the next topic, we look at the very same topic as to why we need preventive measures to back up Hadoop. Knowing the necessity of backing up Hadoop Change is the fundamental law of nature. There may come a time when Hadoop may be upgraded on the present cluster, as we see many system upgrades everywhere. As no upgrade is bug free, there is a probability that existing applications may not work the way they used to. There may be scenarios where we don't want to lose any data, let alone start HDFS from scratch. This is a scenario where backup is useful, so a user can go back to a point in time. Looking at the HDFS replication process, the NameNode handles the client request to write a file on a DataNode. The DataNode then replicates the block and writes the block to another DataNode. This DataNode repeats the same process. Thus, we have three copies of the same block. Now, how these DataNodes are selected for placing copies of blocks is another issue, which we are going to cover later in Rack awareness. You will see how to place these copies efficiently so as to handle situations such as hardware failure. But the bottom line is when our DataNode is down there's no need to panic; we still have a copy on a different DataNode. Now, this approach gives us various advantages such as: Security: This ensures that blocks are stored on two different DataNodes High write capacity: This writes only on a single DataNode; the replication factor is handled by the DataNode Read options: This denotes better options from where to read; the NameNode maintains records of all the locations of the copies and the distance from the NameNode Block circulation: The client writes only a single block; others are handled through the replication pipeline During the write operation on a DataNode, it receives data from the client as well as passes data to the next DataNode simultaneously; thus, our performance factor is not compromised. Data never passes through the NameNode. The NameNode takes the client's request to write data on a DataNode and processes the request by deciding on the division of files into blocks and the replication factor. The following figure shows the replication pipeline, wherein a block of the file is written and three different copies are made at different DataNode locations: After hearing such a foolproof plan and seeing so many advantages, we again arrive at the same question: is there a need for backup in Hadoop? Of course there is. There often exists a common mistaken belief that Hadoop shelters you against data loss, which gives you the freedom to not take backups in your Hadoop cluster. Hadoop, by convention, has a facility to replicate your data three times by default. Although reassuring, the statement is not safe and does not guarantee foolproof protection against data loss. Hadoop gives you the power to protect your data over hardware failures; the scenario wherein one disk, cluster, node, or region may go down, data will still be preserved for you. However, there are many scenarios where data loss may occur. Consider an example where a classic human-prone error can be the storage locations that the user provides during operations in Hive. If the user provides a location wherein data already exists and they perform a query on the same table, the entire existing data will be deleted, be it of size 1 GB or 1 TB. In the following figure, the client gives a read operation but we have a faulty program. Going through the process, the NameNode is going to see its metadata file for the location of the DataNode containing the block. But when it reads from the DataNode, it's not going to match the requirements, so the NameNode will classify that block as an under replicated block and move on to the next copy of the block. Oops, again we will have the same situation. This way, all the safe copies of the block will be transferred to under replicated blocks, thereby HDFS fails and we need some other backup strategy: When copies do not match the way NameNode explains, it discards the copy and replaces it with a fresh copy that it has. HDFS replicas are not your one-stop solution for protection against data loss. The needs for recovery Now, we need to decide up to what level we want to recover. Like you saw earlier, we have four modes available, which recover either to a safe copy, the last possible state, or no copy at all. Based on your needs decided in the disaster recovery plan we defined earlier, you need to take appropriate steps based on that. We need to look at the following factors: The performance impact (is it compromised?) How large is the data footprint that my recovery method leaves? What is the application downtime? Is there just one backup or are there incremental backups? Is it easy to implement? What is the average recovery time that the method provides? Based on the preceding aspects, we will decide which modes of recovery we need to implement. The following methods are available in Hadoop: Snapshots: Snapshots simply capture a moment in time and allow you to go back to the possible recovery state. Replication: This involves copying data from one cluster and moving it to another cluster, out of the vicinity of the first cluster, so that if one cluster is faulty, it doesn't have an impact on the other. Manual recovery: Probably, the most brutal one is moving data manually from one cluster to another. Clearly, its downsides are large footprints and large application downtime. API: There's always a custom development using the public API available. We will move on to the recovery areas in Hadoop. Understanding recovery areas Recovering data after some sort of disaster needs a well-defined business disaster recovery plan. So, the first step is to decide our business requirements, which will define the need for data availability, precision in data, and requirements for the uptime and downtime of the application. Any disaster recovery policy should basically cover areas as per requirements in the disaster recovery principal. Recovery areas define those portions without which an application won't be able to come back to its normal state. If you are armed and fed with proper information, you will be able to decide the priority of which areas need to be recovered. Recovery areas cover the following core components: Datasets NameNodes Applications Database sets in HBase Let's go back to the Facebook example. Facebook uses a customized version of MySQL for its home page and other interests. But when it comes to Facebook Messenger, Facebook uses the NoSQL database provided by Hadoop. Now, looking from that point of view, Facebook will have both those things in recovery areas and will need different steps to recover each of these areas. Summary In this article, we went through the backup and recovery philosophy and what all points a good backup philosophy should have. We went through what a recovery philosophy constitutes. We saw the modes available for recovery in Hadoop. Then, we looked at why backup is important even though HDFS provides the replication process. Lastly, we looked at the recovery needs and areas. Quite a journey, wasn't it? Well, hold on tight. These are just your first steps into Hadoop User Group (HUG). Resources for Article: Further resources on this subject: Cassandra Architecture [article] Oracle GoldenGate 12c — An Overview [article] Backup and Restore Improvements [article]
Read more
  • 0
  • 0
  • 7059

article-image-integrating-muzzley
Packt
10 Aug 2015
12 min read
Save for later

Integrating with Muzzley

Packt
10 Aug 2015
12 min read
In this article by Miguel de Sousa, author of the book Internet of Things with Intel Galileo, we will cover the following topics: Wiring the circuit The Muzzley IoT ecosystem Creating a Muzzley app Lighting up the entrance door (For more resources related to this topic, see here.) One identified issue regarding IoT is that there will be lots of connected devices and each one speaks its own language, not sharing the same protocols with other devices. This leads to an increasing number of apps to control each of those devices. Every time you purchase connected products, you'll be required to have the exclusive product app, and, in the near future, where it is predicted that more devices will be connected to the Internet than people, this is indeed a problem, which is known as the basket of remotes. Many solutions have been appearing for this problem. Some of them focus on creating common communication standards between the devices or even creating their own protocol such as the Intel Common Connectivity Framework (CCF). A different approach consists in predicting the device's interactions, where collected data is used to predict and trigger actions on the specific devices. An example using this approach is Muzzley. It not only supports a common way to speak with the devices, but also learns from the users' interaction, allowing them to control all their devices from a common app, and on collecting usage data, it can predict users' actions and even make different devices work together. In this article, we will start by understanding what Muzzley is and how we can integrate with it. We will then do some development to allow you to control your own building's entrance door. For this purpose, we will use Galileo as a bridge to communicate with a relay and the Muzzley cloud, allowing you to control the door from a common mobile app and from anywhere as long as there is Internet access. Wiring the circuit In this article, we'll be using a real home AC inter-communicator with a building entrance door unlock button and this will require you to do some homework. This integration will require you to open your inter-communicator and adjust the inner circuit, so be aware that there are always risks of damaging it. If you don't want to use a real inter-communicator, you can replace it by an LED or even by the buzzer module. If you want to use a real device, you can use a DC inter-communicator, but in this guide, we'll only be explaining how to do the wiring using an AC inter-communicator. The first thing you have to do is to take a look at the device manual and check whether it works with AC current, and the voltage it requires. If you can't locate your product manual, search for it online. In this article, we'll be using the solid state relay. This relay accepts a voltage range from 24 V up to 380 V AC, and your inter-communicator should also work in this voltage range. You'll also need some electrical wires and electrical wires junctions: Wire junctions and the solid state relay This equipment will be used to adapt the door unlocking circuit to allow it to be controlled from the Galileo board using a relay. The main idea is to use a relay to close the door opener circuit, resulting in the door being unlocked. This can be accomplished by joining the inter-communicator switch wires with the relay wires. Use some wire and wire junctions to do it, as displayed in the following image: Wiring the circuit The building/house AC circuit is represented in yellow, and S1 and S2 represent the inter-communicator switch (button). On pressing the button, we will also be closing this circuit, and the door will be unlocked. This way, the lock can be controlled both ways, using the original button and the relay. Before starting to wire the circuit, make sure that the inter-communicator circuit is powered off. If you can't switch it off, you can always turn off your house electrical board for a couple of minutes. Make sure that it is powered off by pressing the unlock button and trying to open the door. If you are not sure of what you must do or don't feel comfortable doing it, ask for help from someone more experienced. Open your inter-communicator, locate the switch, and perform the changes displayed in the preceding image (you may have to do some soldering). The Intel Galileo board will then activate the relay using pin 13, where you should wire it to the relay's connector number 3, and the Galileo's ground (GND) should be connected to the relay's connector number 4. Beware that not all the inter-communicator circuits work the same way and although we try to provide a general way to do it, there're always the risk of damaging your device or being electrocuted. Do it at your own risk. Power on your inter-communicator circuit and check whether you can open the door by pressing the unlock door button. If you prefer not using the inter-communicator with the relay, you can always replace it with a buzzer or an LED to simulate the door opening. Also, since the relay is connected to Galileo's pin 13, with the same relay code, you'll have visual feedback from the Galileo's onboard LED. The Muzzley IoT ecosystem Muzzley is an Internet of Things ecosystem that is composed of connected devices, mobile apps, and cloud-based services. Devices can be integrated with Muzzley through the device cloud or the device itself: It offers device control, a rules system, and a machine learning system that predicts and suggests actions, based on the device usage. The mobile app is available for Android, iOS, and Windows phone. It can pack all your Internet-connected devices in to a single common app, allowing them to be controlled together, and to work with other devices that are available in real-world stores or even other homemade connected devices, like the one we will create in this article. Muzzley is known for being one of the first generation platforms with the ability to predict a users' actions by learning from the user's interaction with their own devices. Human behavior is mostly unpredictable, but for convenience, people end up creating routines in their daily lives. The interaction with home devices is an example where human behavior can be observed and learned by an automated system. Muzzley tries to take advantage of these behaviors by identifying the user's recurrent routines and making suggestions that could accelerate and simplify the interaction with the mobile app and devices. Devices that don't know of each others' existence get connected through the user behavior and may create synergies among themselves. When the user starts using the Muzzley app, the interaction is observed by a profiler agent that tries to acquire a behavioral network of the linked cause-effect events. When the frequency of these network associations becomes important enough, the profiler agent emits a suggestion for the user to act upon. For instance, if every time a user arrives home, he switches on the house lights, check the thermostat, and adjust the air conditioner accordingly, the profiler agent will emit a set of suggestions based on this. The cause of the suggestion is identified and shortcuts are offered for the effect-associated action. For instance, the user could receive in the Muzzley app the following suggestions: "You are arriving at a known location. Every time you arrive here, you switch on the «Entrance bulb». Would you like to do it now?"; or "You are arriving at a known location. The thermostat «Living room» says that the temperature is at 15 degrees Celsius. Would you like to set your «Living room» air conditioner to 21 degrees Celsius?" When it comes to security and privacy, Muzzley takes it seriously and all the collected data is used exclusively to analyze behaviors to help make your life easier. This is the system where we will be integrating our door unlocker. Creating a Muzzley app The first step is to own a Muzzley developer account. If you don't have one yet, you can obtain one by visiting https://muzzley.com/developers, clicking on the Sign up button, and submitting the displayed form. To create an app, click on the top menu option Apps and then Create app. Name your App Galileo Lock and if you want to, add a description to your project. As soon as you click on Submit, you'll see two buttons displayed, allowing you to select the integration type: Muzzley allows you to integrate through the product manufacturer cloud or directly with a device. In this example, we will be integrating directly with the device. To do so, click on Device to Cloud integration. Fill in the provider name as you wish and pick two image URLs to be used as the profile (for example, http://hub.packtpub.com/wp-content/uploads/2015/08/Commercial1.jpg) and channel (for example, http://hub.packtpub.com/wp-content/uploads/2015/08/lock.png) images. We can select one of two available ways to add our device: it can be done using UPnP discovery or by inserting a custom serial number. Pick the device discovery option Serial number and ignore the fields Interface UUID and Email Access List; we will come back for them later. Save your changes by pressing the Save changes button. Lighting up the entrance door Now that we can unlock our door from anywhere using the mobile phone with an Internet connection, a nice thing to have is the entrance lights turn on when you open the building door using your Muzzley app. To do this, you can use the Muzzley workers to define rules to perform an action when the door is unlocked using the mobile app. To do this, you'll need to own one of the Muzzley-enabled smart bulbs such as Philips Hue, WeMo LED Lighting, Milight, Easybulb, or LIFX. You can find all the enabled devices in the app profiles selection list: If you don't have those specific lighting devices but have another type of connected device, search the available list to see whether it is supported. If it is, you can use that instead. Add your bulb channel to your account. You should now find it listed in your channels under the category Lighting. If you click on it, you'll be able to control the lights. To activate the trigger option in the lock profile we created previously, go to the Muzzley website and head back to the Profile Spec app, located inside App Details. Expand the property lock status by clicking on the arrow sign in the property #1 - Lock Status section and then expand the controlInterfaces section. Create a new control interface by clicking on the +controlInterface button. In the new controlInterface #1 section, we'll need to define the possible choices of label-values for this property when setting a rule. Feel free to insert an id, and in the control interface option, select the text-picker option. In the config field, we'll need to specify each of the available options, setting the display label and the real value that will be published. Insert the following JSON object: {"options":[{"value":"true","label":"Lock"}, {"value":"false","label":"Unlock"}]}. Now we need to create a trigger. In the profile spec, expand the trigger section. Create a new trigger by clicking on the +trigger button. Inside the newly created section, select the equals condition. Create an input by clicking on +input, insert the ID value, insert the ID of the control interface you have just created in the controlInterfaceId text field. Finally, add the [{"source":"selection.value","target":"data.value"}].path to map the data. Open your mobile app and click on the workers icon. Clicking on Create Worker will display the worker creation menu to you. Here, you'll be able to select a channel component property as a trigger to some other channel component property: Select the lock and select the Lock Status is equal to Unlock trigger. Save it and select the action button. In here, select the smart bulb you own and select the Status On option: After saving this rule, give it a try and use your mobile phone to unlock the door. The smart bulb should then turn on. With this, you can configure many things in your home even before you arrive there. In this specific scenario, we used our door locker as a trigger to accomplish an action on a lightbulb. If you want, you can do the opposite and open the door when a lightbulb lights up a specific color for instance. To do it, similar to how you configured your device trigger, you just have to set up the action options in your device profile page. Summary Everyday objects that surround us are being transformed into information ecosystems and the way we interact with them is slowly changing. Although IoT is growing up fast, it is nowadays in an early stage, and many issues must be solved in order to make it successfully scalable. By 2020, it is estimated that there will be more than 25 billion devices connected to the Internet. This fast growth without security regulations and deep security studies are leading to major concerns regarding the two biggest IoT challenges—security and privacy. Devices in our home that are remotely controllable or even personal data information getting into the wrong hands could be the recipe for a disaster. In this article you have learned the basic steps in wiring the circuit of your Galileo board, creating a Muzzley app, and lighting up the entrance door of your building through your Muzzley app, by using Intel Galileo board as a bridge to communicate with Muzzley cloud. Resources for Article: Further resources on this subject: Getting Started with Intel Galileo [article] Getting the current weather forecast [article] Controlling DC motors using a shield [article]
Read more
  • 0
  • 0
  • 8991
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at €18.99/month. Cancel anytime
article-image-share-and-share-alike
Packt
10 Aug 2015
13 min read
Save for later

Share and Share Alike

Packt
10 Aug 2015
13 min read
In this article by Kevin Harvey, author of the book Test-Driven Development with Django, we'll expose the data in our application via a REST API. As we do, we'll learn: The importance of documentation in the API development process How to write functional tests for API endpoints API patterns and best practices (For more resources related to this topic, see here.) It's an API world, we're just coding in it It's very common nowadays to include a public REST API in your web project. Exposing your services or data to the world is generally done for one of two reasons: You've got interesting data, and other developers might want to integrate that information into a project they're working on You're building a secondary system that you expect your users to interact with, and that system needs to interact with your data (that is, a mobile or desktop app, or an AJAX-driven front end) We've got both reasons in our application. We're housing novel, interesting data in our database that someone might want to access programmatically. Also, it would make sense to build a desktop application that could interact with a user's own digital music collection so they could actually hear the solos we're storing in our system. Deceptive simplicity The good news is that there are some great options for third-party plugins for Django that allow you to build a REST API into an existing application. The bad news is that the simplicity of adding one of these packages can let you go off half-cocked, throwing an API on top of your project without a real plan for it. If you're lucky, you'll just wind up with a bird's nest of an API: inconsistent URLs, wildly varying payloads, and difficult authentication. In the worst-case scenario, your bolt-on API exposes data you didn't intend to make public and wind up with a self-inflicted security issue. Never forget that an API is sort of invisible. Unlike traditional web pages, where bugs are very public and easy to describe, API bugs are only visible to other developers. Take special care to make sure your API behaves exactly as intended by writing thorough documentation and tests to make sure you've implemented it correctly. Writing documentation first "Documentation is king." - Kenneth Reitz If you've spent any time at all working with Python or Django, you know what good documentation looks like. The Django folks in particular seem to understand this well: the key to getting developers to use your code is great documentation. In documenting an API, be explicit. Most of your API methods' docs should take the form of "if you send this, you will get back this", with real-world examples of input and output. A great side effect of prewriting documentation is that it makes the intention of your API crystal clear. You're allowing yourself to conjure up the API from thin air without getting bogged down in any of the details, so you can get a bird's-eye view of what you're trying to accomplish. Your documentation will keep you oriented throughout the development process. Documentation-Driven testing Once you've got your documentation done, testing is simply a matter of writing test cases that match up with what you've promised. The actions of the test methods exercise HTTP methods, and your assertions check the responses. Test-Driven Development really shines when it comes to API development. There are great tools for sending JSON over the wire, but properly formatting JSON can be a pain, and reading it can be worse. Enshrining test JSON in test methods and asserting they match the real responses will save you a ton of headache. More developers, more problems Good documentation and test coverage are exponentially more important when two groups are developing in tandem—one on the client application and one on the API. Changes to an API are hard for teams like this to deal with, and should come with a lot of warning (and apologies). If you have to make a change to an endpoint, it should break a lot of tests, and you should methodically go and fix them all. What's more, no one feels the pain of regression bugs like the developer of an API-consuming client. You really, really, really need to know that all the endpoints you've put out there are still going to work when you add features or refactor. Building an API with Django REST framework Now that you're properly terrified of developing an API, let's get started. What sort of capabilities should we add? Here are a couple possibilities: Exposing the Album, Track, and Solo information we have Creating new Solos or updating existing ones Initial documentation In the Python world it's very common for documentation to live in docstrings, as it keeps the description of how to use an object close to the implementation. We'll eventually do the same with our docs, but it's kind of hard to write a docstring for a method that doesn't exist yet. Let's open up a new Markdown file API.md, right in the root of the project, just to get us started. If you've never used Markdown before, you can read an introduction to GitHub's version of Markdown at https://help.github.com/articles/markdown-basics/. Here's a sample of what should go in API.md. Have a look at https://github.com/kevinharvey/jmad/blob/master/API.md for the full, rendered version. ...# Get a Track with Solos* URL: /api/tracks/<pk>/* HTTP Method: GET## Example Response{"name": "All Blues","slug": "all-blues","album": {"name": "Kind of Blue","url": "http://jmad.us/api/albums/2/"},"solos": [{"artist": "Cannonball Adderley","instrument": "saxophone","start_time": "4:05","end_time": "6:04","slug": "cannonball-adderley","url": "http://jmad.us/api/solos/281/"},...]}# Add a Solo to a Track* URL: /api/solos/* HTTP Method: POST## Example Request{"track": "/api/tracks/83/","artist": "Don Cherry","instrument": "cornet","start_time": "2:13","end_time": "3:54"}## Example Response{"url": "http://jmad.us/api/solos/64/","artist": "Don Cherry","slug": "don-cherry","instrument": "cornet","start_time": "2:13","end_time": "3:54","track": "http://jmad.us/api/tracks/83/"} There's not a lot of prose, and there needn't be. All we're trying to do is layout the ins and outs of our API. It's important at this point to step back and have a look at the endpoints in their totality. Is there enough of a pattern that you can sort of guess what the next one is going to look like? Does it look like a fairly straightforward API to interact with? Does anything about it feel clunky? Would you want to work with this API by yourself? Take time to think through any weirdness now before anything gets out in the wild. $ git commit -am 'Initial API Documentation'$ git tag -a ch7-1-init-api-docs Introducing Django REST framework Now that we've got some idea what we're building, let's actually get it going. We'll be using Django REST Framework (http://www.django-rest-framework.org/). Start by installing it in your environment: $ pip install djangorestframework Add rest_framework to your INSTALLED_APPS in jmad/settings.py: INSTALLED_APPS = (...'rest_framework') Now we're ready to start testing. Writing tests for API endpoints While there's no such thing as browser-based testing for an external API, it is important to write tests that cover its end-to-end processing. We need to be able to send in requests like the ones we've documented and confirm that we receive the responses our documentation promises. Django REST Framework (DRF from here on out) provides tools to help write tests for the application functionality it provides. We'll use rest_framework.tests.APITestCase to write functional tests. Let's kick off with the list of albums. Convert albums/tests.py to a package, and add a test_api.py file. Then add the following: from rest_framework.test import APITestCasefrom albums.models import Albumclass AlbumAPITestCase(APITestCase):def setUp(self):self.kind_of_blue = Album.objects.create(name='Kind of Blue')self.a_love_supreme = Album.objects.create(name='A Love Supreme')def test_list_albums(self):"""Test that we can get a list of albums"""response = self.client.get('/api/albums/')self.assertEqual(response.status_code, 200)self.assertEqual(response.data[0]['name'],'A Love Supreme')self.assertEqual(response.data[1]['url'],'http://testserver/api/albums/1/') Since much of this is very similar to other tests that we've seen before, let's talk about the important differences: We import and subclass APITestCase, which makes self.client an instance of rest_framework.test.APIClient. Both of these subclass their respective django.test counterparts add a few niceties that help in testing APIs (none of which are showcased yet). We test response.data, which we expect to be a list of Albums. response.data will be a Python dict or list that corresponds to the JSON payload of the response. During the course of the test, APIClient (a subclass of Client) will use http://testserver as the protocol and hostname for the server, and our API should return a host-specific URI. Run this test, and we get the following: $ python manage.py test albums.tests.test_apiCreating test database for alias 'default'...F=====================================================================FAIL: test_list_albums (albums.tests.test_api.AlbumAPITestCase)Test that we can get a list of albums---------------------------------------------------------------------Traceback (most recent call last):File "/Users/kevin/dev/jmad-project/jmad/albums/tests/test_api.py",line 17, in test_list_albumsself.assertEqual(response.status_code, 200)AssertionError: 404 != 200---------------------------------------------------------------------Ran 1 test in 0.019sFAILED (failures=1) We're failing because we're getting a 404 Not Found instead of a 200 OK status code. Proper HTTP communication is important in any web application, but it really comes in to play when you're using AJAX. Most frontend libraries will properly classify responses as successful or erroneous based on the status code: making sure the code are on point will save your frontend developers friends a lot of headache. We're getting a 404 because we don't have a URL defined yet. Before we set up the route, let's add a quick unit test for routing. Update the test case with one new import and method: from django.core.urlresolvers import resolve...def test_album_list_route(self):"""Test that we've got routing set up for Albums"""route = resolve('/api/albums/')self.assertEqual(route.func.__name__, 'AlbumViewSet') Here, we're just confirming that the URL routes to the correct view. Run it: $ python manage.py testalbums.tests.test_api.AlbumAPITestCase.test_album_list_route...django.core.urlresolvers.Resolver404: {'path': 'api/albums/','tried': [[<RegexURLResolver <RegexURLPattern list> (admin:admin)^admin/>], [<RegexURLPattern solo_detail_view^recordings/(?P<album>[w-]+)/(?P<track>[w-]+)/(?P<artist>[w-]+)/$>], [<RegexURLPattern None ^$>]]}---------------------------------------------------------------------Ran 1 test in 0.003sFAILED (errors=1) We get a Resolver404 error, which is expected since Django shouldn't return anything at that path. Now we're ready to set up our URLs. API routing with DRF's SimpleRouter Take a look at the documentation for routers at http://www.django-rest-framework.org/api-guide/routers/. They're a very clean way of setting up URLs for DRF-powered views. Update jmad/urls.py like so: ...from rest_framework import routersfrom albums.views import AlbumViewSetrouter = routers.SimpleRouter()router.register(r'albums', AlbumViewSet)urlpatterns = [# Adminurl(r'^admin/', include(admin.site.urls)),# APIurl(r'^api/', include(router.urls)),# Appsurl(r'^recordings/(?P<album>[w-]+)/(?P<track>[w-]+)/(?P<artist>[w-]+)/$','solos.views.solo_detail',name='solo_detail_view'),url(r'^$', 'solos.views.index'),] Here's what we changed: We created an instance of SimpleRouter and used the register method to set up a route. The register method has two required arguments: a prefix to build the route methods from, and something called a viewset. Here we've supplied a non-existent class AlbumViewSet, which we'll come back to later. We've added a few comments to break up our urls.py, which was starting to look a little like a rat's nest. The actual API URLs are registered under the '^api/' path using Django's include function. Run the URL test again, and we'll get ImportError for AlbumViewSet. Let's add a stub to albums/views.py: class AlbumViewSet():pass Run the test now, and we'll start to see some specific DRF error messages to help us build out our view: $ python manage.py testalbums.tests.test_api.AlbumAPITestCase.test_album_list_routeCreating test database for alias 'default'...F...File "/Users/kevin/.virtualenvs/jmad/lib/python3.4/sitepackages/rest_framework/routers.py", line 60, in registerbase_name = self.get_default_base_name(viewset)File "/Users/kevin/.virtualenvs/jmad/lib/python3.4/sitepackages/rest_framework/routers.py", line 135, inget_default_base_nameassert queryset is not None, ''base_name' argument not specified,and could ' AssertionError: 'base_name' argument not specified, and could notautomatically determine the name from the viewset, as it does nothave a '.queryset' attribute. After a fairly lengthy output, the test runner tells us that it was unable to get base_name for the URL, as we did not specify the base_name in the register method, and it couldn't guess the name because the viewset (AlbumViewSet) did not have a queryset attribute. In the router documentation, we came across the optional base_name argument for register (as well as the exact wording of this error). You can use that argument to control the name your URL gets. However, let's keep letting DRF do its default behavior. We haven't read the documentation for viewsets yet, but we know that a regular Django class-based view expects a queryset parameter. Let's stick one on AlbumViewSet and see what happens: from .models import Albumclass AlbumViewSet():queryset = Album.objects.all() Run the test again, and we get: django.core.urlresolvers.Resolver404: {'path': 'api/albums/','tried': [[<RegexURLResolver <RegexURLPattern list> (admin:admin)^admin/>], [<RegexURLPattern solo_detail_view^recordings/(?P<album>[w-]+)/(?P<track>[w-]+)/(?P<artist>[w-]+)/$>], [<RegexURLPattern None ^$>]]}---------------------------------------------------------------------Ran 1 test in 0.011sFAILED (errors=1) Huh? Another 404 is a step backwards. What did we do wrong? Maybe it's time to figure out what a viewset really is. Summary In this article, we covered basic API design and testing patterns, including the importance of documentation when developing an API. In doing so, we took a deep dive into Django REST Framework and the utilities and testing tools available in it. Resources for Article: Further resources on this subject: Test-driven API Development with Django REST Framework [Article] Adding a developer with Django forms [Article] Code Style in Django [Article]
Read more
  • 0
  • 0
  • 2211

article-image-oracle-goldengate-12c-overview
Packt
10 Aug 2015
21 min read
Save for later

Oracle GoldenGate 12c — An Overview

Packt
10 Aug 2015
21 min read
In this article by John P Jeffries, author of the book Oracle GoldenGate 12c Implementer's Guide, he provides an introduction to Oracle GoldenGate by describing the key components, processes, and considerations required to build and implement a GoldenGate solution. John tells you how to address some of the issues that influence the decision-making process when you design a GoldenGate solution. He focuses on the additional configuration options available in Oracle GoldenGate 12c (For more resources related to this topic, see here.) 12c new features Oracle has provided some exciting new features in their 12c version of GoldenGate, some of which we have already touched upon. Following the official desupport of Oracle Streams in Oracle Database 12c, Oracle has essentially migrated some of the key features to its strategic product. You will find that GoldenGate now has a tighter integration with the Oracle database, enabling enhanced functionality. Let's explore some of the new features available in Oracle GoldenGate 12c. Integrated capture Integrated capture has been available since Oracle GoldenGate 11gR2 with Oracle Database 11g (11.2.0.3). Originally decoupled from the database, GoldenGate's new architecture provides the option to integrate its Extract process(es) with the Oracle database. This enables GoldenGate to access the database's data dictionary and undo tablespace, providing replication support for advanced features and data types. Oracle GoldenGate 12c still supports the original Extract configuration, known as Classic Capture. Integrated Replicat Integrated Replicat is a new feature in Oracle GoldenGate 12c for the delivery of data to Oracle Database 11g (11.2.0.4) or 12c. The performance enhancement provides better scalability and load balancing that leverages the database parallel apply servers for automatic, dependency-aware parallel Replicat processes. With Integrated Replicat, there is no need for users to manually split the delivery process into multiple threads and manage multiple parameter files. GoldenGate now uses a lightweight streaming API to prepare, coordinate, and apply the data to the downstream database. Oracle GoldenGate 12c still supports the original Replicat configuration, known as Classic Delivery. Downstream capture Downstream capture was one of my favorite Oracle Stream features. It allows for a combined in-memory capture and apply process that achieves very low latency even in heavy data load situations. Like Streams, GoldenGate builds on this feature by employing a real-time downstream capture process. This method uses Oracle Data Guard's log transportation mechanism, which writes changed data to standby redo logs. It provides a best-of-both-worlds approach, enabling a real-time mine configuration that falls back to archive log mining when the apply process cannot keep up. In addition, the real-time mine process is re-enabled automatically when the data throughput is less. Installation One of the major changes in Oracle GoldenGate 12c is the installation method. Like other Oracle products, Oracle GoldenGate 12c is now installed using the Java-based Oracle Universal Installer (OUI) in either the interactive or silent mode. OUI reads the Oracle Inventory on your system to discover existing installations (Oracle Homes), allowing you to install, deinstall, or clone software products. Upgrading to 12c Whether you wish to upgrade your current GoldenGate installation from Oracle GoldenGate 11g Release 2 or from an earlier version, the steps are the same. Simply stop all the GoldenGate running processes on your database server, backup the GoldenGate home, and then use OUI to perform the fresh installation. It is important to note, however, while restarting replication, ensure the capture process begins from the point at which it was gracefully stopped to guarantee against lost synchronization data. Multitenant database replication As the version suggests, Oracle GoldenGate 12c now supports data replication for Oracle Database 12c. Those familiar with the 12c database features will be aware of the multitenant container database (CDB) that provides database consolidation. Each CDB consists of a root container and one or more pluggable databases (PDB). The PDB can contain multiple schemas and objects, just like a conventional database that GoldenGate replicates data to and from. The GoldenGate Extract process pulls data from multiple PDBs or containers in the source, combining the changed data into a single trail file. Replicat, however, splits the data into multiple process groups in order to apply the changes to a target PDB. Coordinated Delivery The Coordinated Delivery option applies to the GoldenGate Replicat process when configured in the classic mode. It provides a performance gain by automatically splitting the delivered data from a remote trail file into multiple threads that are then applied to the target database in parallel. GoldenGate manages the coordination across selected events that require ordering, including DDL, primary key updates, event marker interface (EMI), and SQLEXEC. Coordinated Delivery can be used with both Oracle (from version 11.2.0.4) and non-Oracle databases. Event-based processing In GoldenGate 12c, event-based processing has been enhanced to allow specific events to be captured and acted upon automatically through an EMI. SQLEXEC provides the API to EMI, enabling programmatic execution of tasks following an event. Now it is possible, for example, to detect the start of a batch job or large transaction, trap the SQL statement(s), and ignore the subsequent multiple change records until the end of the source system transaction. The original DML can then be replayed on the target database as one transaction. This is a major step forward in the performance tuning for data replication. Enhanced security Recent versions of GoldenGate have included security features such as the encryption of passwords and data. Oracle GoldenGate 12c now supports a credential store, better known as an Oracle wallet, that securely stores an alias associated with a username and password. The alias is then referenced in the GoldenGate parameter files rather than the actual username and password. Conflict Detection and Resolution In earlier versions of GoldenGate, Conflict Detection and Resolution (CDR) has been somewhat lightweight and was not readily available out of the box. Although available in Oracle Streams, the GoldenGate administrator would have to programmatically resolve any data conflict in the replication process using GoldenGate built-in tools. In the 12c version, the feature has emerged as an easily configurable option through Extract and Replicat parameters. Dynamic Rollback Selective data back out of applied transactions is now possible using the Dynamic Rollback feature. The feature operates at table and record-level and supports point-in-time recovery. This potentially eliminates the need for a full database restore, following data corruption, erroneous deletions, or perhaps the removal of test data, thus avoiding hours of system downtime. Streams to GoldenGate migration Oracle Streams users can now migrate their data replication solution to Oracle GoldenGate 12c using a purpose-built utility. This is a welcomed feature given that Streams is no longer supported in Oracle Database 12c. The Streams2ogg tool auto generates Oracle GoldenGate configuration files that greatly simplify the effort required in the migration process. Performance In today's demand for real-time access to real-time data, high performance is the key. For example, businesses will no longer wait for information to arrive on their DSS to make decisions and users will expect the latest information to be available in the public cloud. Data has value and must be delivered in real time to meet the demand. So, how long does it take to replicate a transaction from the source database to its target? This is known as end-to-end latency, which typically has a threshold that must not be breeched in order to satisfy a predefined Service Level Agreement (SLA). GoldenGate refers to latency as lag, which can be measured at different intervals in the replication process. They are as follows: Source to Extract: The time taken for a record to be processed by the Extract compared to the commit timestamp on the database Replicat to target: The time taken for the last record to be processed by the Replicat process compared to the record creation time in the trail file A well-designed system may still encounter spikes in the latency, but it should never be continuous or growing. Peaks are typically caused by load on the source database system, where the latency increases with the number of transactions per second. Lag should be measured as an average over a specified period. Trying to tune GoldenGate when the design is poor is a difficult situation to be in. For the system to perform well, you may need to revisit the design. Availability Another important NFR is availability. Normally quoted as a percentage, the system must be available for the specified length of time. For example, NFR of 99.9 percent availability equates to a downtime of 8.76 hours in a year, which sounds quite a lot, especially if it were to occur all at once. Oracle's maximum availability architecture (MAA) offers enhanced availability through products such as Real Application Clusters (RAC) and Active Data Guard (ADG). However, as we previously described, the network plays a major role in data replication. The NFR relates to the whole system, so you need to be sure your design covers redundancy for all components. Event-based processing It is important in any data replication environment to capture and manage events, such as trail records containing specific data or operations or maybe the occurrence of a certain error. These are known as Event Markers. GoldenGate provides a mechanism to perform an action on a given event or condition. These are known as Event Actions and are triggered by Event Records. If you are familiar with Oracle Streams, Event Actions are like rules. The Event Marker System GoldenGate's Event Marker System, also known as event marker interface (EMI), allows custom DML-driven processing on an event. This comprises of an Event Record to trigger a given action. An Event Record can be either a trail record that satisfies a condition evaluated by a WHERE or FILTER clause or a record written to an event table that enables an action to occur. Typical actions are writing status information, reporting errors, ignoring certain records in a trail, invoking a shell script, or performing an administrative task. The following Replicat code describes the process of capturing an event and performing an action by logging DELETE operations made against the CREDITCARD_ACCOUNTS table using the EVENTACTIONS parameter: MAP SRC.CREDITCARD_ACCOUNTS, TARGET TGT.CREDITCARD_ACCOUNTS_DIM;TABLE SRC.CREDITCARD_ACCOUNTS, &FILTER (@GETENV ('GGHEADER', 'OPTYPE') = 'DELETE'), &EVENTACTIONS (LOG INFO); By default, all logged information is written to the process group report file, the GoldenGate error log, and the system messages file. On Linux, this is the /var/log/messages file. Note that the TABLE parameter is also used in the Replicat's parameter file. This is a means of triggering an Event Action to be executed by the Replicat when it encounters an Event Marker. The following code shows the use of the IGNORE option that prevents certain records from being extracted or replicated, which is particularly useful to filter out system type data. When used with the TRANSACTION option, the whole transaction and not just the Event Record is ignored: TABLE SRC.CREDITCARD_ACCOUNTS, &FILTER (@GETENV ('GGHEADER', 'OPTYPE') = 'DELETE'), &EVENTACTIONS (IGNORE TRANSACTION); The preceding code extends the previous code by stopping the Event Record itself from being replicated. Using Event Actions to improve batch performance All replication technologies typically suffer from one flaw that is the way in which the data is replicated. Consider a table that is populated with a million rows as part of a batch process. This may be a bulk insert operation that Oracle completes on the source database as one transaction. However, Oracle will write each change to its redo logs as Logical Change Records (LCRs). GoldenGate will subsequently mine the logs, write the LCRs to a remote trail, convert each one back to DML, and apply them to the target database, one row at a time. The single source transaction becomes one million transactions, which causes a huge performance overhead. To overcome this issue, we can use Event Actions to: Detect the DML statement (INSERT INTO TABLE SELECT ..) Ignore the data resulting from the SELECT part of the statement Replicate just the DML statement as an Event Record Execute just the DML statement on the target database The solution requires a statement table on both source and target databases to trigger the event. Also, both databases must be perfectly synchronized to avoid data integrity issues. User tokens User tokens are GoldenGate environment variables that are captured and stored in the trail record for replication. They can be accessed via the @GETENV function. We can use token data in column maps, stored procedures called by SQLEXEC, and, of course, in macros. Using user tokens to populate a heartbeat table A vast array of user tokens exist in GoldenGate. Let's start by looking at a common method of replicating system information to populate a heartbeat table that can be used to monitor performance. We can use the TOKENS option of the Extract TABLE parameter to define a user token and associate it with the GoldenGate environment data. The following Extract configuration code shows the token declarations for the heartbeat table: TABLE GGADMIN.GG_HB_OUT, &TOKENS (EXTGROUP = @GETENV ("GGENVIRONMENT","GROUPNAME"), &EXTTIME = @DATE ("YYYY-MM-DD HH:MI:SS.FFFFFF","JTS",@GETENV("JULIANTIMESTAMP")), &EXTLAG = @GETENV ("LAG","SEC"), &EXTSTAT_TOTAL = @GETENV ("DELTASTATS","DML"), &), FILTER (@STREQ (EXTGROUP, @GETENV("GGENVIRONMENT","GROUPNAME"))); For the data pump, the example Extract configuration is shown here: TABLE GGADMIN.GG_HB_OUT, &TOKENS (PMPGROUP = @GETENV ("GGENVIRONMENT","GROUPNAME"), &PMPTIME = @DATE ("YYYY-MM-DD HH:MI:SS.FFFFFF","JTS",@GETENV("JULIANTIMESTAMP")), &PMPLAG = @GETENV ("LAG","SEC")); Also, for the Replicat, the following configuration populates the heartbeat table on the target database with the token data derived from Extract, data pump, and Replicat, containing system details and replication lag: MAP GGADMIN.GG_HB_OUT_SRC, TARGET GGADMIN.GG_HB_IN_TGT, &KEYCOLS (DB_NAME, EXTGROUP, PMPGROUP, REPGROUP), &INSERTMISSINGUPDATES, &COLMAP (USEDEFAULTS, &ID = 0, &SOURCE_COMMIT = @GETENV ("GGHEADER", "COMMITTIMESTAMP"), &EXTGROUP = @TOKEN ("EXTGROUP"), &EXTTIME = @TOKEN ("EXTTIME"), &PMPGROUP = @TOKEN ("PMPGROUP"), &PMPTIME = @TOKEN ("PMPTIME"), &REPGROUP = @TOKEN ("REPGROUP"), &REPTIME = @DATE ("YYYY-MM-DD HH:MI:SS.FFFFFF","JTS",@GETENV("JULIANTIMESTAMP")), &EXTLAG = @TOKEN ("EXTLAG"), &PMPLAG = @TOKEN ("PMPLAG"), &REPLAG = @GETENV ("LAG","SEC"), &EXTSTAT_TOTAL = @TOKEN ("EXTSTAT_TOTAL")); As in the heartbeat table example, the defined user tokens can be called in a MAP statement using the @TOKEN function. The SOURCE_COMMIT and LAG metrics are self-explained. However, EXTSTAT_TOTAL, which is derived from DELTASTATS, is particularly useful to measure the load on the source system when you evaluate latency peaks. For applications, user tokens are useful to audit data and trap exceptions within the replicated data stream. Common user tokens are shown in the following code that replicates the token data to five columns of an audit table: MAP SRC.AUDIT_LOG, TARGET TGT.AUDIT_LOG, &COLMAP (USEDEFAULTS, &OSUSER = @TOKEN ("TKN_OSUSER"), &DBNAME = @TOKEN ("TKN_DBNAME"), &HOSTNAME = @TOKEN ("TKN_HOSTNAME"), &TIMESTAMP = @TOKEN ("TKN_COMMITTIME"), &BEFOREAFTERINDICATOR = @TOKEN ("TKN_ BEFOREAFTERINDICATOR"); The BEFOREAFTERINDICATOR environment variable is particularly useful to provide a status flag in order to check whether the data was from a Before or After image of an UPDATE or DELETE operation. By default, GoldenGate provides After images. To enable a Before image extraction, the GETUPDATEBEFORES Extract parameter must be used on the source database. Using logic in the data replication GoldenGate has a number of functions that enable the administrator to program logic in the Extract and Replicat process configuration. These provide generic functions found in the IF and CASE programming languages. In addition, the @COLTEST function enables conditional calculations by testing for one or more column conditions. This is typically used with the @IF function, as shown in the following code: MAP SRC.CREDITCARD_PAYMENTS, TARGET TGT.CREDITCARD_PAYMENTS_FACT,&COLMAP (USEDEFAULTS, &AMOUNT = @IF(@COLTEST(AMOUNT, MISSING, INVALID), 0, AMOUNT)); Here, the @COLTEST function tests the AMOUNT column in the source data to check whether it is MISSING or INVALID. The @IF function returns 0 if @COLTEST returns TRUE and returns the value of AMOUNT if FALSE. The target AMOUNT column is therefore set to 0 when the equivalent source is found to be missing or invalid; otherwise, a direct mapping occurs. The @CASE function tests a list of values for a match and then returns a specified value. If no match is found, @CASE will return a default value. There is no limit to the number of cases to test; however, if the list is very large, a database lookup may be more appropriate. The following code shows the simplicity of the @CASE statement. Here, the country name is returned from the country code: MAP SRC.CREDITCARD_STATEMENT, TARGET TGT.CREDITCARD_STATEMENT_DIM,&COLMAP (USEDEFAULTS, &COUNTRY = @CASE(COUNTRY_CODE, "UK", "United Kingdom", "USA","United States of America")); Other GoldenGate functions: @EVAL and @VALONEOF exist that perform tests. Similar to @CASE, @VALONEOF compares a column or string to a list of values. The difference being it evaluates more than one value against a single column or string. When the following code is used with @IF, it returns "EUROPE" when TRUE and "UNKNOWN" when FALSE: MAP SRC.CREDITCARD_STATEMENT, TARGET TGT.CREDITCARD_STATEMENT_DIM,&COLMAP (USEDEFAULTS, &REGION = @IF(@VALONEOF(COUNTRY_CODE, "UK","E", "D"),"EUROPE","UNKNOWN")); The @EVAL function evaluates a list of conditions and returns a specified value. Optionally, if none are satisfied, it returns a default value. There is no limit to the number of evaluations you can list. However, it is best to list the most common evaluations at the beginning to enhance performance. The following code includes the BEFORE option that compares the before value of the replicated source column to the current value of the target column. Depending on the evaluation, @EVAL will return "PAID MORE", "PAID LESS", or "PAID SAME": MAP SRC.CREDITCARD_ PAYMENTS, TARGET TGT.CREDITCARD_PAYMENTS, &COLMAP (USEDEFAULTS, &STATUS = @EVAL(AMOUNT < BEFORE.AMOUNT, "PAID LESS", AMOUNT > BEFORE.AMOUNT, "PAID MORE", AMOUNT = BEFORE.AMOUNT, "PAID SAME")); The BEFORE option can be used with other GoldenGate functions, including the WHERE and FILTER clauses. However, for the Before image to be written to the trail and to be available, the GETUPDATEBEFORES parameter must be enabled in the source database's Extract parameter file or the target database's Replicat parameter file, but not both. The GETUPDATEBEFORES parameter can be set globally for all tables defined in the Extract or individually per table using GETUPDATEBEFORES and IGNOREUPDATEBEFORES, as seen in the following code: EXTRACT EOLTP01USERIDALIAS srcdb DOMAIN adminSOURCECATALOG PDB1EXTTRAIL ./dirdat/aaGETAPPLOPSIGNOREREPLICATESGETUPDATEBEFORESTABLE SRC.CHECK_PAYMENTS;IGNOREUPDATEBEFORESTABLE SRC.CHECK_PAYMENTS_STATUS;TABLE SRC.CREDITCARD_ACCOUNTS;TABLE SRC.CREDITCARD_PAYMENTS; Tracing processes to find wait events If you have worked with Oracle software, particularly in the performance tuning space, you will be familiar with tracing. Tracing enables additional information to be gathered from a given process or function to diagnose performance problems or even bugs. One example is the SQL trace that can be enabled at a database session or the system level to provide key information, such as; wait events, parse, fetch, and execute times. Oracle GoldenGate 12c offers a similar tracing mechanism through its trace and trace2 options of the SEND GGSCI command. This is like the session-level SQL trace. Also, in a similar fashion to performing a database system trace, tracing can be enabled in the GoldenGate process parameter files that make it permanent until the Extract or Replicat is stopped. trace provides processing information, whereas trace2 identifies the processes with wait events. The following commands show tracing being dynamically enabled for 2 minutes on a running Replicat process: GGSCI (db12server02) 1> send ROLAP01 trace2 ./dirrpt/ROLAP01.trc Wait for 2 minutes, then turn tracing off: GGSCI (db12server02) 2> send ROLAP01 trace2 offGGSCI (db12server02) 3> exit To view the contents of the Replicat trace file, we can execute the following command. In the case of a coordinated Replicat, the trace file will contain information from all of its threads: $ view dirrpt/ROLAP01.trcstatistics between 2015-08-08 Wed HKT 11:55:27 and 2015-08-08 Wed HKT11:57:28RPT_PROD_Ol.LIMIT_TP_RESP : n=2 : op=Insert; total=3; avg=1.5000;max=3msecRPT_PROD_01.SUP_POOL_SMRY_HIST : n=1 : op=Insert; total=2; avg=2.0000;max=2msecRPT_PROD_01.EVENTS : n=1 : op=Insert; total=2; avg=2.0000; max=2msecRPT_PROD_01.DOC_SHIP_DTLS : n=17880 : op=FieldComp; total=22003;avg=1.2306; max=42msecRPT_PROD_01.BUY_POOL_SMRY_HIST : n=1 : op=Insert; total=2; avg=2.0000;max=2msecRPT_PROD_01.LIMIT_TP_LOG : n=2 : op-Insert; total=2; avg=1.0000;max=2msecRPT_PROD_01.POOL_SMRY : n=1 : op=FieldComp; total=2; avg=2.0000;max=2msec..===============================================summary==============Delete : n=2; total=2; avg=1.00;Insert : n=78; total=356; avg=4.56;FieldComp : n=85728; total=123018; avg=1.43;total_op_num=85808 : total_op_time=123376 ms : total_avg_time=1.44ms/optotal commit number=1 The trace file provides the following information: The table name The operation type (FieldComp is for a compressed field) The number of operations The average wait The maximum wait Summary Armed with the preceding information, we can quickly see what operations against which tables are taking the longest time. Exception handling Oracle GoldenGate 12c now supports Conflict Detection and Resolution (CDR). However, out-of-the-box, GoldenGate takes a catch all approach to exception handling. For example, by default, should any operational failure occur, a Replicat process will ABEND and roll back the transaction to the last known checkpoint. This may not be ideal in a production environment. The HANDLECOLLISIONS and NOHANDLECOLLISIONS parameters can be used to control whether or not a Replicat process tries to resolve the duplicate record error and the missing record error. The way to determine what error occurred and on which Replicat is to create an exceptions handler. Exception handling differs from CDR by trapping and reporting Oracle errors suffered by the data replication (DML and DDL). On the other hand, CDR detects and resolves inconsistencies in the replicated data, such as mismatches with before and after images. Exceptions can always be trapped by the Oracle error they produce. GoldenGate provides an exception handler parameter called REPERROR that allows the Replicat to continue processing data after a predefined error. For example, we can include the following configuration in our Replicat parameter file to ignore ORA-00001 "unique constraint (%s.%s) violated": REPERROR (DEFAULT, EXCEPTION)REPERROR (DEFAULT2, ABEND)REPERROR (-1, EXCEPTION) Cloud computing Cloud computing has grown enormously in the recent years. Oracle has named its latest version of products: 12c, the c standing for Cloud of course. The architecture of Oracle 12c Database allows a multitenant container database to support multiple pluggable databases—a key feature of cloud computing—rather than implement the inefficient schema consolidation, typical of the previous Oracle database version architecture, which is known to cause contention on shared resources during high load. The Oracle 12c architecture supports a database consolidation approach through its efficient memory management and dedicated background processes. Online computer companies such as Amazon have leveraged the cloud concept by offering Relational Database Services (RDS), which is becoming very popular for its speed of readiness, support, and low cost. The cloud environments are often huge, containing hundreds of servers, petabytes of storage, terabytes of memory, and countless CPU cores. The cloud has to support multiple applications in a multi-tiered, shared environment, often through virtualization technologies, where storage and CPUs are typically the driving factors for cost-effective options. Customers choose their hardware footprint that best suits their budget and system requirements, commonly known as Platform as a Service (PaaS). Cloud computing is an extension to grid computing that offers both public and private clouds. GoldenGate and Big Data It is increasingly evident that organizations need to quickly access, analyze, and report on their data across their Enterprise in order to be agile in a competitive market. Data is becoming more of an asset to companies; it adds value to a business, but may be stored in any number of current and legacy systems, making it difficult to realize its full potential. Known as big data, it has until recently been nearly impossible to perform real-time business analysis on the combined data from multiple sources. Nowadays, the ability to access all transactional data with low latency is essential. With the introduction of products such as Apache Hadoop, integration of structured data from an RDBMS, including semi-structured and unstructured data, offers a common playing field to support business intelligence. When coupled with ODI, GoldenGate for big data provides real-time delivery to a suite of Apache products, such as Flume, HDFS, Hive, and Hbase, to support big data analytics. Summary In this article, we have learned an introduction to Oracle GoldenGate by describing the key components, processes, and considerations required to build and implement a GoldenGate solution. Resources for Article: Further resources on this subject: What is Oracle Public Cloud? [Article] Oracle GoldenGate- Advanced Administration Tasks - I [Article] Oracle B2B Overview [Article]
Read more
  • 0
  • 0
  • 6636

article-image-introduction-wep
Packt
10 Aug 2015
4 min read
Save for later

An Introduction to WEP

Packt
10 Aug 2015
4 min read
In this article by Marco Alamanni, author of the book, Kali Linux Wireless Penetration Testing Essentials, has explained that the WEP protocol was introduced with the original 802.11 standard as a means to provide authentication and encryption to wireless LAN implementations. It is based on the RC4 (Rivest Cipher 4) stream cypher with a preshared secret key (PSK) of 40 or 104 bits, depending on the implementation. A 24 bit pseudo-random Initialization Vector (IV) is concatenated with the preshared key to generate the per-packet keystream used by RC4 for the actual encryption and decryption processes. Thus, the resulting keystream could be 64 or 128 bits long. (For more resources related to this topic, see here.) In the encryption phase, the keystream is XORed with the plaintext data to obtain the encrypted data, while in the decryption phase the encrypted data is XORed with the keystream to obtain the plaintext data. The encryption process is shown in the following diagram: Attacks against WEP First of all, we must say that WEP is an insecure protocol and has been deprecated by the Wi-Fi Alliance. It suffers from various vulnerabilities related to the generation of the keystreams, to the use of IVs and to the length of the keys. The IV is used to add randomness to the keystream, trying to avoid the reuse of the same keystream to encrypt different packets. This purpose has not been accomplished in the design of WEP, because the IV is only 24 bits long (with 2^24 = 16,777,216 possible values) and it is transmitted in clear-text within each frame. Thus, after a certain period of time (depending on the network traffic) the same IV, and consequently the same keystream, will be reused, allowing the attacker to collect the relative cypher texts and perform statistical attacks to recover the plain texts and the key. The first well-known attack against WEP was the Fluhrer, Mantin and Shamir (FMS) attack, back in 2001. The FMS attack relies on the way WEP generates the keystreams and on the fact that it also uses weak IVs to generate weak keystreams, making possible for an attacker to collect a sufficient number of packets encrypted with these keystreams, analyze them, and recover the key. The number of IVs to be collected to complete the FMS attack is about 250,000 for 40-bit keys and 1,500,000 for 104-bit keys. The FMS attack has been enhanced by Korek, improving its performances. Andreas Klein found more correlations between the RC4 keystream and the key than the ones discovered by Fluhrer, Mantin, and Shamir, that can used to crack the WEP key. In 2007, Pyshkin, Tews, and Weinmann (PTW) extended Andreas Klein's research and improved the FMS attack, significantly reducing the number of IVs needed to successfully recover the WEP key. Indeed, the PTW attack does not rely on weak IVs like the FMS attack does and is very fast and effective. It is able to recover a 104-bit WEP key with a success probability of 50 percent using less than 40,000 frames and with a probability of 95 percent with 85,000 frames. The PTW attack is the default method used by Aircrack-ng to crack WEP keys. Both the FMS and PTW attacks need to collect quite a large number of frames to succeed and can be conducted passively, sniffing the wireless traffic on the same channel of the target AP and capturing frames. The problem is that, in normal conditions, we will have to spend quite a long time to passively collect all the necessary packets for the attacks, especially with the FMS attack. To accelerate the process, the idea is to re-inject frames in the network to generate traffic in response so that we could collect the necessary IVs more quickly. A type of frame that is suitable for this purpose is the ARP request, because the AP broadcasts it and each time with a new IV. As we are not associated with the AP, if we send frames to it directly, they are discarded and a de-authentication frame is sent. Instead, we can capture ARP requests from associated clients and retransmit them to the AP. This technique is called the ARP Request Replay attack and is also adopted by Aircrack-ng for the implementation of the PTW attack. Summary In this article, we covered the WEP protocol, the attacks that have been developed to crack the keys. Resources for Article: Further resources on this subject: Kali Linux – Wireless Attacks [article] What is Kali Linux [article] Penetration Testing [article]
Read more
  • 0
  • 0
  • 9674

article-image-controls-and-widgets
Packt
10 Aug 2015
25 min read
Save for later

Controls and Widgets

Packt
10 Aug 2015
25 min read
In this article by Chip Lambert and Shreerang Patwardhan, author of the book, Mastering jQuery Mobile, we will take our Civic Center application to the next level and in the process of doing so, we will explore different widgets. We will explore the touch events provided by the jQuery Mobile framework further and then take a look at how this framework interacts with third-party plugins. We will be covering the following different widgets and topics in this article: Collapsible widget Listview widget Range slider widget Radio button widget Touch events Third-party plugins HammerJs FastClick Accessibility (For more resources related to this topic, see here.) Widgets We already made use of widgets as part of the Civic Center application. "Which? Where? When did that happen? What did I miss?" Don't panic as you have missed nothing at all. All the components that we use as part of the jQuery Mobile framework are widgets. The page, buttons, and toolbars are all widgets. So what do we understand about widgets from their usage so far? One thing is pretty evident, widgets are feature-rich and they have a lot of things that are customizable and that can be tweaked as per the requirements of the design. These customizable things are pretty much the methods and events that these small plugins offer to the developers. So all in all: Widgets are feature rich, stateful plugins that have a complete lifecycle, along with methods and events. We will now explore a few widgets as discussed before and we will start off with the collapsible widget. A collapsible widget, more popularly known as the accordion control, is used to display and style a cluster of related content together to be easily accessible to the user. Let's see this collapsible widget in action. Pull up the index.html file. We will be adding the collapsible widget to the facilities page. You can jump directly to the content div of the facilities page. We will replace the simple-looking, unordered list and add the collapsible widget in its place. Add the following code in place of the <ul>...<li></li>...</ul> portion: <div data-role="collapsibleset"> <div data-role="collapsible"> <h3>Banquet Halls</h3> <p>List of banquet halls will go here</p> </div> <div data-role="collapsible"> <h3>Sports Arena</h3> <p>List of sports arenas will go here</p> </div> <div data-role="collapsible">    <h3>Conference Rooms</h3> <p>List of conference rooms will come here</p> </div> <div data-role="collapsible"> <h3>Ballrooms</h3> <p>List of ballrooms will come here</p> </div> </div> That was pretty simple. As you must have noticed, we are creating a group of collapsibles defined by div with data-role="collapsibleset". Inside this div, we have multiple div elements each with data-role of "collapsible". These data roles instruct the framework to style div as a collapsible. Let's break individual collapsibles further. Each collapsible div has to have a heading tag (h1-h6), which acts as the title for that collapsible. This heading can be followed by any HTML structure that is required as per your application's design. In our application, we added a paragraph tag with some dummy text for now. We will soon be replacing this text with another widget—listview. Before we proceed to look at how we will be doing this, let's see what the facilities page is looking like right now: Now let's take a look at another widget that we will include in our project—the listview widget. The listview widget is a very important widget from the mobile website stand point. The listview widget is highly customizable and can play an important role in the navigation system of your web application as well. In our application, we will include listview within the collapsible div elements that we have just created. Each collapsible will hold the relevant list items which can be linked to a detailed page for each item. Without further discussion, let's take a look at the following code. We have replaced the contents of the first collapsible list item within the paragraph tag with the code to include the listview widget. We will break up the code and discuss the minute details later: <div data-role="collapsible"> <h3>Banquet Halls</h3> <p> <span>We have 3 huge banquet halls named after 3 most celebrated Chef's from across the world.</span> <ul data-role="listview" data-inset="true"> <li> <a href="#">Gordon Ramsay</a> </li> <li> <a href="#">Anthony Bourdain</a> </li> <li> <a href="#">Sanjeev Kapoor</a> </li> </ul> </p> </div> That was pretty simple, right? We replaced the dummy text from the paragraph tag with a span that has some details concerning what that collapsible list is about, and then we have an unordered list with data-role="listview" and some property called data-inset="true". We have seen several data-roles before, and this one is no different. This data-role attribute informs the framework to style the unordered list, such as a tappable button, while a data-inset property informs the framework to apply the inset appearance to the list items. Without this property, the list items would stretch from edge to edge on the mobile device. Try setting the data-inset property to false or removing the property altogether. You will see the results for yourself. Another thing worth noticing in the preceding code is that we have included an anchor tag within the li tags. This anchor tag informs the framework to add a right arrow icon on the extreme right of that list item. Again, this icon is customizable, along with its position and other styling attributes. Right now, our facilities page should appear as seen in the following image: We will now add similar listview widgets within the remaining three collapsible items. The content for the next collapsible item titled Sports Arena should be as follows. Once added, this collapsible item, when expanded, should look as seen in the screenshot that follows the code: <div data-role="collapsible">    <h3>Sports Arena</h3>    <p>        <span>We have 3 huge sport arenas named after 3 most celebrated sport personalities from across the world.       </span>        <ul data-role="listview" data-inset="true">            <li>                <a href="#">Sachin Tendulkar</a>            </li>            <li>                <a href="#">Roger Federer</a>            </li>            <li>                <a href="#">Usain Bolt</a>            </li>        </ul>    </p> </div> The code for the listview widgets that should be included in the next collapsible item titled Conference Rooms. Once added, this collapsible, item when expanded, should look as seen in the image that follows the code: <div data-role="collapsible">    <h3>Conference Rooms</h3>    <p>        <span>            We have 3 huge conference rooms named after 3 largest technology companies.        </span>        <ul data-role="listview" data-inset="true">            <li>                <a href="#">Google</a>            </li>            <li>                <a href="#">Twitter</a>            </li>            <li>                <a href="#">Facebook</a>            </li>        </ul>    </p> </div> The final collapsible list item – Ballrooms – should hold the following code, to include its share of the listview items: <div data-role="collapsible">    <h3>Ballrooms</h3>    <p>        <span>            We have 3 huge ball rooms named after 3 different dance styles from across the world.        </span>        <ul data-role="listview" data-inset="true">            <li>                <a href="#">Ballet</a>            </li>            <li>                <a href="#">Kathak</a>            </li>            <li>                <a href="#">Paso Doble</a>            </li>        </ul>    </p> </div> After adding these listview items, our facilities page should look as seen in the following image: The facilities page now looks much better than it did earlier, and we now understand a couple more very important widgets available in jQuery Mobile—the collapsible widget and the listview Widget. We will now explore two form widgets – slider widget and the radio buttons widget. For this, we will be enhancing our catering page. Let's build a simple tool that will help the visitors of this site estimate the food expense based on the number of guests and the type of cuisine that they choose. Let's get started then. First, we will add the required HTML, to include the slider widget and the radio buttons widget. Scroll down to the content div of the catering page, where we have the paragraph tag containing some text about the Civic Center's catering services. Add the following code after the paragraph tag: <form>    <label style="font-weight: bold; padding: 15px 0px;" for="slider">Number of guests</label>    <input type="range" name="slider" id="slider" data-highlight="true" min="50" max="1000" value="50">    <fieldset data-role="controlgroup" id="cuisine-choices">        <legend style="font-weight: bold; padding: 15px 0px;">Choose your cuisine</legend>        <input type="radio" name="cuisine-choice" id="cuisine-choice-cont" value="15" checked="checked" />        <label for="cuisine-choice-cont">Continental</label>        <input type="radio" name="cuisine-choice" id="cuisine-choice-mex" value="12" />        <label for="cuisine-choice-mex">Mexican</label>        <input type="radio" name="cuisine-choice" id="cuisine-choice-ind" value="14" />        <label for="cuisine-choice-ind">Indian</label>    </fieldset>    <p>        The approximate cost will be: <span style="font-weight: bold;" id="totalCost"></span>    </p> </form> That is not much code, but we are adding and initializing two new form widgets here. Let's take a look at the code in detail: <label style="font-weight: bold; padding: 15px 0px;" for="slider">Number of guests</label> <input type="range" name="slider" id="slider" data-highlight="true" min="50" max="1000" value="50"> We are initializing our first form widget here—the slider widget. The slider widget is an input element of the type range, which accepts a minimum value and maximum value and a default value. We will be using this slider to accept the number of guests. Since the Civic Center can cater to a maximum of 1,000 people, we will set the maximum limit to 1,000 and we expect that we have at least 50 guests, so we set a minimum value of 50. Since the minimum number of guests that we cater for is 50, we set the input's default value to 50. We also set the data-highlight attribute value to true, which informs the framework that the selected area on the slider should be highlighted. Next comes the group of radio buttons. The most important attribute to be considered here is the data-role="controlgroup" set on the fieldset element. Adding this data-role combines the radio buttons into one single group, which helps inform the user that one of the radio buttons is to be selected. This gives a visual indication to the user that one radio button out of the whole lot needs to be selected. The values assigned to each of the radio inputs here indicate the cost per person for that particular cuisine. This value will help us calculate the final dollar value for the number of selected guests and the type of cuisine. Whenever you are using the form widgets, make sure you have the form elements in the hierarchy as required by the jQuery Mobile framework. When the elements are in the required hierarchy, the framework can apply the required styles. At the end of the previous code snippet, we have a paragraph tag where we will populate the approximate cost of catering for the selected number of guests and the type of cuisine selected. The catering page should now look as seen in the following image. Right now, we only have the HTML widgets in place. When you drag the slider or select different radio buttons, you will only see the UI interactions of these widgets and the UI treatments that the framework applies to these widgets. However, the total cost will not be populated yet. We will need to write some JavaScript logic to determine this value, and we will take a look at this in a minute. Before moving to the JavaScript part, make sure you have all the code that is needed: Now let's take a look at the magic part of the code (read JavaScript) that is going to make our widgets usable for the visitors of this Civic Center web application. Add the following JavaScript code in the script tag at the very end of our index.html file: $(document).on('pagecontainershow', function(){    var guests = 50;    var cost = 35;    var totalCost;    $("#slider").on("slidestop", function(event, ui){        guests = $('#slider').val();        totalCost = costCal();        $("#totalCost").text("$" + totalCost);    });    $("input:radio[name=cuisine-choice]").on("click", function() {        cost = $(this).val();        var totalCost = costCal();        $("#totalCost").text("$" + totalCost);    });    function costCal(){        return guests * cost;    } }); That is a pretty small chunk of code and pretty simple too. We will be looking at a few very important events that are part of the framework and that come in very handy when developing web applications with jQuery Mobile. One of the most important things that you must have already noticed is that we are not making use of the customary $(document).on('ready', function(){ in Jquery, but something that looks as the following code: $(document).on('pagecontainershow', function(){ The million dollar question here is "why doesn't DOM already work in jQuery Mobile?" As part of jQuery, the first thing that we often learn to do is execute our jQuery code as soon as the DOM is ready, and this is identified using the $(document).ready function. In jQuery Mobile, pages are requested and injected into the same DOM as the user navigates from one page to another and so the DOM ready event is as useful as it executes only for the first page. Now we need an event that should execute when every page loads, and $(document).pagecontainershow is the one. The pagecontainershow element is triggered on the toPage after the transition animation has completed. The pagecontainershow element is triggered on the pagecontainer element and not on the actual page. In the function, we initialize the guests and the cost variables to 50 and 35 respectively, as the minimum number of guests we can have is 50 and the "Continental" cuisine is selected by default, which has a value of 35. We will be calculating the estimated cost when the user changes the number of guests or selects a different radio button. This brings us to the next part of our code. We need to get the value of the number of guests as soon as the user stops sliding the slider. jQuery Mobile provides us with the slidestop event for this very purpose. As soon as the user stops sliding, we get the value of the slider and then call the costCal function, which returns a value that is the number of guests multiplied by the cost of the selected cuisine per person. We then display this value in the paragraph at the bottom for the user to get an estimated cost. We will discuss some more about the touch events that are available as part of the jQuery Mobile framework in the next section. When the user selects a different radio button, we retrieve the value of the selected radio button, call the costCal function again, and update the value displayed in the paragraph at the bottom of our page. If you have the code correct and your functions are all working fine, you should see something similar to the following image: Input with touch We will take a look at a couple of touch events, which are tap and taphold. The tap event is triggered after a quick touch; whereas the taphold event is triggered after a sustained, long press touch. The jQuery Mobile tap event is the gesture equivalent of the standard click event that is triggered on the release of the touch gesture. The following snippet of code should help you incorporate the tap event when you need to use it in your application: $(".selector").on("tap", function(){    console.log("tap event is triggered"); }); The jQuery Mobile taphold event triggers after a sustained, complete touch event, which is more commonly known as the long press event. The taphold event fires when the user taps and holds for a minimum of 750 milliseconds. You can also change the default value, but we will come to that in a minute. First, let's see how the taphold event is used: $(".selector").on("taphold", function(){    console.log("taphold event is triggered"); }); Now to change the default value for the long press event, we need to set the value for the following piece of code: $.event.special.tap.tapholdThreshold Working with plugins A number of times, we will come across scenarios where the capabilities of the framework are just not sufficient for all the requirements of your project. In such scenarios, we have to make use of third-party plugins in our project. We will be looking at two very interesting plugins in the course of this article, but before that, you need to understand what jQuery plugins exactly are. A jQuery plugin is simply a new method that has been used to extend jQuery's prototype object. When we include the jQuery plugin as part of our code, this new method becomes available for use within your application. When selecting jQuery plugins for your jQuery Mobile web application, make sure that the plugin is optimized for mobile devices and incorporates touch events as well, based on your requirements. The first plugin that we are going to look at today is called FastClick and is developed by FT Labs. This is an open source plugin and so can be used as part of your application. FastClick is a simple, easy-to-use library designed to eliminate the 300 ms delay between a physical tap and the firing on the click event on mobile browsers. Wait! What are we talking about? What is this 300 ms delay between tap and click? What exactly are we discussing? Sure. We understand the confusion. Let's explain this 300 ms delay issue. The click events have a 300 ms delay on touch devices, which makes web applications feel laggy on a mobile device and doesn't give users a native-like feel. If you go to a site that isn't mobile-optimized, it starts zoomed out. You have to then either pinch and zoom or double tap some content so that it becomes readable. The double-tap is a performance killer, because with every tap we have to wait to see whether it might be a double tap—and this wait is 300 ms. Here is how it plays out: touchstart touchend Wait 300ms in case of another tap click This pause of 300 ms applies to click events in JavaScript, but also other click-based interactions such as links and form controls. Most mobile web browsers out there have this 300 ms delay on the click events, but now a few modern browsers such as Chrome and FireFox for Android and iOS are removing this 300 ms delay. However, if you are supporting the older Android and iOS versions, with older mobile browsers, you might want to consider including the FastClick plugin in your application, which helps resolve this problem. Let's take a look at how we can use this plugin in any web application. First, you need to download the plugin files, or clone their GitHub repository here: https://github.com/ftlabs/fastclick. Once you have done that, include a reference to the plugin's JavaScript file in your application: <script type="application/javascript" src="path/fastclick.js"></script> Make sure that the script is loaded prior to instantiating FastClick on any element of the page. FastClick recommends you to instantiate the plugin on the body element itself. We can do this using the following piece of code: $(function){    FastClick.attach(document.body); } That is it! Your application is now free of the 300 ms click delay issue and will work as smooth as a native application. We have just provided you with an introduction to the FastClick plugin. There are several more features that this plugin provides. Make sure you visit their website—https://github.com/ftlabs/fastclick—for more details on what the plugin has to offer. Another important plugin that we will look at is HammerJs. HammerJs, again is an open source library that helps recognize gestures made by touch, mouse, and pointerEvents. Now, you would say that the jQuery Mobile framework already takes care of this, so why do we need a third-party plugin again? True, jQuery Mobile supports a variety of touch events such as tap, tap and hold, and swipe, as well as the regular mouse events, but what if in our application we want to make use of some touch gestures such as pan, pinch, rotate, and so on, which are not supported by jQuery Mobile by default? This is where HammerJs comes into the picture and plays nicely along with jQuery Mobile. Including HammerJS in your web application code is extremely simple and straightforward, like the FastClick plugin. You need to download the plugin files and then add a reference to the plugin JavaScript file: <script type="application/javascript" src="path/hammer.js"></script> Once you have included the plugin, you need to create a new instance on the Hammer object and then start using the plugin for all the touch gestures you need to support: var hammerPan = new Hammer(element_name, options); hammerPan.on('pan', function(){    console.log("Inside Pan event"); }); By default, Hammer adds a set of events—tap, double tap, swipe, pan, press, pinch, and rotate. The pinch and rotate recognizers are disabled by default, but can be turned on as and when required. HammerJS offers a lot of features that you might want to explore. Make sure you visit their website—http://hammerjs.github.io/ to understand the different features the library has to offer and how you can integrate this plugin within your existing or new jQuery Mobile projects. Accessibility Most of us today cannot imagine our lives without the Internet and our smartphones. Some will even argue that the Internet is the single largest revolutionary invention of all time that has touched numerous lives across the globe. Now, at the click of a mouse or the touch of your fingertip, the world is now at your disposal, provided you can use the mouse, see the screen, and hear the audio—impairments might make it difficult for people to access the Internet. This makes us wonder about how people with disabilities would use the Internet, their frustration in doing so, and the efforts that must be taken to make websites accessible to all. Though estimates vary on this, most studies have revealed that about 15% of the world's population have some kind of disability. Not all of these people would have an issue with accessing the web, but let's assume 5% of these people would face a problem in accessing the web. This 5% is also a considerable amount of users, which cannot be ignored by businesses on the web, and efforts must be taken in the right direction to make the web accessible to these users with disabilities. jQuery Mobile framework comes with built-in support for accessibility. jQuery Mobile is built with accessibility and universal access in mind. Any application that is built using jQuery Mobile is accessible via the screen reader as well. When you make use of the different jQuery Mobile widgets in your application, unknowingly you are also adding support for web accessibility into your application. jQuery Mobile framework adds all the necessary aria attributes to the elements in the DOM. Let's take a look at how the DOM looks for our facilities page: Look at the highlighted Events button in the top right corner and its corresponding HTML (also highlighted) in the developer tools. You will notice that there are a few attributes added to the anchor tag that start with aria-. We did not add any of these aria- attributes when we wrote the code for the Events button. jQuery Mobile library takes care of these things for you. The accessibility implementation is an ongoing process and the awesome developers at jQuery Mobile are working towards improving the support every new release. We spoke about aria- attributes, but what do they really represent? WAI - ARIA stands for Web Accessibility Initiative – Accessible Rich Internet Applications. This was a technical specification published by the World Wide Web Consortium (W3C) and basically specifies how to increase the accessibility of web pages. ARIA specifies the roles, properties, and states of a web page that make it accessible to all users. Accessibility is extremely vast, hence covering every detail of it is not possible. However, there is excellent material available on the Internet on this topic and we encourage you to read and understand this. Try to implement accessibility into your current or next project even if it is not based on jQuery Mobile. Web accessibility is an extremely important thing that should be considered, especially when you are building web applications that will be consumed by a huge consumer base—on e-commerce websites for example. Summary In this article, we made use of some of the available widgets from the jQuery Mobile framework and we built some interactivity into our existing Civic Center application. The widgets that we used included the range slider, the collapsible widget, the listview widget, and the radio button widget. We evaluated and looked at how to use two different third-party plugins—FastClick and HammerJs. We concluded the article by taking a look at the concept of Web Accessibility. Resources for Article: Further resources on this subject: Creating Mobile Dashboards [article] Speeding up Gradle builds for Android [article] Saying Hello to Unity and Android [article]
Read more
  • 0
  • 0
  • 3200
article-image-using-arcpy-dataaccess-module-withfeature-classesand-tables
Packt
10 Aug 2015
28 min read
Save for later

Using the ArcPy DataAccess Module withFeature Classesand Tables

Packt
10 Aug 2015
28 min read
In this article by Eric Pimpler, author of the book Programming ArcGIS with Python Cookbook - Second Edition, we will cover the following topics: Retrieving features from a feature class with SearchCursor Filtering records with a where clause Improving cursor performance with geometry tokens Inserting rows with InsertCursor Updating rows with UpdateCursor (For more resources related to this topic, see here.) We'll start this article with a basic question. What are cursors? Cursors are in-memory objects containing one or more rows of data from a table or feature class. Each row contains attributes from each field in a data source along with the geometry for each feature. Cursors allow you to search, add, insert, update, and delete data from tables and feature classes. The arcpy data access module or arcpy.da was introduced in ArcGIS 10.1 and contains methods that allow you to iterate through each row in a cursor. Various types of cursors can be created depending on the needs of developers. For example, search cursors can be created to read values from rows. Update cursors can be created to update values in rows or delete rows, and insert cursors can be created to insert new rows. There are a number of cursor improvements that have been introduced with the arcpy data access module. Prior to the development of ArcGIS 10.1, cursor performance was notoriously slow. Now, cursors are significantly faster. Esri has estimated that SearchCursors are up to 30 times faster, while InsertCursors are up to 12 times faster. In addition to these general performance improvements, the data access module also provides a number of new options that allow programmers to speed up processing. Rather than returning all the fields in a cursor, you can now specify that a subset of fields be returned. This increases the performance as less data needs to be returned. The same applies to geometry. Traditionally, when accessing the geometry of a feature, the entire geometric definition would be returned. You can now use geometry tokens to return a portion of the geometry rather than the full geometry of the feature. You can also use lists and tuples rather than using rows. There are also other new features, such as edit sessions and the ability to work with versions, domains, and subtypes. There are three cursor functions in arcpy.da. Each returns a cursor object with the same name as the function. SearchCursor() creates a read-only SearchCursor object containing rows from a table or feature class. InsertCursor() creates an InsertCursor object that can be used to insert new records into a table or feature class. UpdateCursor() returns a cursor object that can be used to edit or delete records from a table or feature class. Each of these cursor objects has methods to access rows in the cursor. You can see the relationship between the cursor functions, the objects they create, and how they are used, as follows: Function Object created Usage SearchCursor() SearchCursor This is a read-only view of data from a table or feature class InsertCursor() InsertCursor This adds rows to a table or feature class UpdateCursor() UpdateCursor This edits or deletes rows in a table or feature class The SearchCursor() function is used to return a SearchCursor object. This object can only be used to iterate through a set of rows returned for read-only purposes. No insertions, deletions, or updates can occur through this object. An optional where clause can be set to limit the rows returned. Once you've obtained a cursor instance, it is common to iterate the records, particularly with SearchCursor or UpdateCursor. There are some peculiarities that you need to understand when navigating the records in a cursor. Cursor navigation is forward-moving only. When a cursor is created, the pointer of the cursor sits just above the first row in the cursor. The first call to next() will move the pointer to the first row. Rather than calling the next() method, you can also use a for loop to process each of the records without the need to call the next() method. After performing whatever processing you need to do with this row, a subsequent call to next() will move the pointer to row 2. This process continues as long as you need to access additional rows. However, after a row has been visited, you can't go back a single record at a time. For instance, if the current row is row 3, you can't programmatically back up to row 2. You can only go forward. To revisit rows 1 and 2, you would need to either call the reset() method or recreate the cursor and move back through the object. As I mentioned earlier, cursors are often navigated through the use of for loops as well. In fact, this is a more common way to iterate a cursor and a more efficient way to code your scripts. Cursor navigation is illustrated in the following diagram: The InsertCursor() function is used to create an InsertCursor object that allows you to programmatically add new records to feature classes and tables. To insert rows, call the insertRow() method on this object. You can also retrieve a read-only tuple containing the field names in use by the cursor through the fields property. A lock is placed on the table or feature class being accessed through the cursor. Therefore, it is important to always design your script in a way that releases the cursor when you are done. The UpdateCursor() function can be used to create an UpdateCursor object that can update and delete rows in a table or feature class. As is the case with InsertCursor, this function places a lock on the data while it's being edited or deleted. If the cursor is used inside a Python's with statement, the lock will automatically be freed after the data has been processed. This hasn't always been the case. Prior to ArcGIS 10.1, cursors were required to be manually released using the Python del statement. Once an instance of UpdateCursor has been obtained, you can then call the updateCursor() method to update records in tables or feature classes and the deleteRow() method to delete a row. The subject of data locks requires a little more explanation. The insert and update cursors must obtain a lock on the data source they reference. This means that no other application can concurrently access this data source. Locks are a way of preventing multiple users from changing data at the same time and thus, corrupting the data. When the InsertCursor() and UpdateCursor() methods are called in your code, Python attempts to acquire a lock on the data. This lock must be specifically released after the cursor has finished processing so that the running applications of other users, such as ArcMap or ArcCatalog, can access the data sources. If this isn't done, no other application will be able to access the data. Prior to ArcGIS 10.1 and the with statement, cursors had to be specifically unlocked through Python's del statement. Similarly, ArcMap and ArcCatalog also acquire data locks when updating or deleting data. If a data source has been locked by either of these applications, your Python code will not be able to access the data. Therefore, the best practice is to close ArcMap and ArcCatalog before running any standalone Python scripts that use insert or update cursors. In this article, we're going to cover the use of cursors to access and edit tables and feature classes. However, many of the cursor concepts that existed before ArcGIS 10.1 still apply. Retrieving features from a feature class with SearchCursor There are many occasions when you need to retrieve rows from a table or feature class for read-only purposes. For example, you might want to generate a list of all land parcels in a city with a value greater than $100,000. In this case, you don't have any need to edit the data. Your needs are met simply by generating a list of rows that meet some sort of criteria. A SearchCursor object contains a read-only copy of rows from a table or feature class. These objects can also be filtered through the use of a where clause so that only a subset of the dataset is returned. Getting ready The SearchCursor() function is used to return a SearchCursor object. This object can only be used to iterate a set of rows returned for read-only purposes. No insertions, deletions, or updates can occur through this object. An optional where clause can be set to limit the rows returned. In this article, you will learn how to create a basic SearchCursor object on a feature class through the use of the SearchCursor() function. The SearchCursor object contains a fields property along with the next() and reset() methods. The fields property is a read-only structure in the form of a Python tuple, containing the fields requested from the feature class or table. You are going to hear the term tuple a lot in conjunction with cursors. If you haven't covered this topic before, tuples are a Python structure to store a sequence of data similar to Python lists. However, there are some important differences between Python tuples and lists. Tuples are defined as a sequence of values inside parentheses, while lists are defined as a sequence of values inside brackets. Unlike lists, tuples can't grow and shrink, which can be a very good thing in some cases when you want data values to occupy a specific position each time. This is the case with cursor objects that use tuples to store data from fields in tables and feature classes. How to do it… Follow these steps to learn how to retrieve rows from a table or feature class inside a SearchCursor object: Open IDLE and create a new script window. Save the script as C:ArcpyBookCh8SearchCursor.py. Import the arcpy.da module: import arcpy.da Set the workspace: arcpy.env.workspace = "c:/ArcpyBook/Ch8" Use a Python with statement to create a cursor: with arcpy.da.SearchCursor("Schools.shp",("Facility","Name")) as cursor: Loop through each row in SearchCursor and print the name of the school. Make sure you indent the for loop inside the with block:for row in sorted(cursor): print("School name: " + row[1]) The entire script should appear as follows: import arcpy.da arcpy.env.workspace = "c:/ArcpyBook/Ch8" with arcpy.da.SearchCursor("Schools.shp",("Facility","Name")) as cursor:    for row in sorted(cursor):        print("School name: " + row[1]) Save the script. You can check your work by examining the C:ArcpyBookcodeCh8SearchCursor_Step1.py solution file. Run the script. You should see the following output: School name: ALLAN School name: ALLISON School name: ANDREWS School name: BARANOFF School name: BARRINGTON School name: BARTON CREEK School name: BARTON HILLS School name: BATY School name: BECKER School name: BEE CAVE How it works… The with statement used with the SearchCursor() function will create, open, and close the cursor. So, you no longer have to be concerned with explicitly releasing the lock on the cursor as you did prior to ArcGIS 10.1. The first parameter passed into the SearchCursor() function is a feature class, represented by the Schools.shp file. The second parameter is a Python tuple containing a list of fields that we want returned in the cursor. For performance reasons, it is a best practice to limit the fields returned in the cursor to only those that you need to complete the task. Here, we've specified that only the Facility and Name fields should be returned. The SearchCursor object is stored in a variable called cursor. Inside the with block, we use a Python for loop to loop through each school returned. We also use the Python sorted() function to sort the contents of the cursor. To access the values from a field on the row, simply use the index number of the field you want to return. In this case, we want to return the contents of the Name column, which will be the 1 index number, since it is the second item in the tuple of field names that are returned. Filtering records with a where clause By default, SearchCursor will contain all rows in a table or feature class. However, in many cases, you will want to restrict the number of rows returned by some sort of criteria. Applying a filter through the use of a where clause limits the records returned. Getting ready By default, all rows from a table or feature class will be returned when you create a SearchCursor object. However, in many cases, you will want to restrict the records returned. You can do this by creating a query and passing it as a where clause parameter when calling the SearchCursor() function. In this article, you'll build on the script you created in the previous article by adding a where clause that restricts the records returned. How to do it… Follow these steps to apply a filter to a SearchCursor object that restricts the rows returned from a table or feature class: Open IDLE and load the SearchCursor.py script that you created in the previous recipe. Update the SearchCursor() function by adding a where clause that queries the facility field for records that have the HIGH SCHOOL text: with arcpy.da.SearchCursor("Schools.shp",("Facility","Name"), '"FACILITY" = 'HIGH SCHOOL'') as cursor: You can check your work by examining the C:ArcpyBookcodeCh8SearchCursor_Step2.py solution file. Save and run the script. The output will now be much smaller and restricted to high schools only: High school name: AKINS High school name: ALTERNATIVE LEARNING CENTER High school name: ANDERSON High school name: AUSTIN High school name: BOWIE High school name: CROCKETT High school name: DEL VALLE High school name: ELGIN High school name: GARZA High school name: HENDRICKSON High school name: JOHN B CONNALLY High school name: JOHNSTON High school name: LAGO VISTA How it works… The where clause parameter accepts any valid SQL query, and is used in this case to restrict the number of records that are returned. Improving cursor performance with geometry tokens Geometry tokens were introduced in ArcGIS 10.1 as a performance improvement for cursors. Rather than returning the entire geometry of a feature inside the cursor, only a portion of the geometry is returned. Returning the entire geometry of a feature can result in decreased cursor performance due to the amount of data that has to be returned. It's significantly faster to return only the specific portion of the geometry that is needed. Getting ready A token is provided as one of the fields in the field list passed into the constructor for a cursor and is in the SHAPE@<Part of Feature to be Returned> format. The only exception to this format is the OID@ token, which returns the object ID of the feature. The following code example retrieves only the X and Y coordinates of a feature: with arcpy.da.SearchCursor(fc, ("SHAPE@XY","Facility","Name")) as cursor: The following table lists the available geometry tokens. Not all cursors support the full list of tokens. Check the ArcGIS help files for information about the tokens supported by each cursor type. The SHAPE@ token returns the entire geometry of the feature. Use this carefully though, because it is an expensive operation to return the entire geometry of a feature and can dramatically affect performance. If you don't need the entire geometry, then do not include this token! In this article, you will use a geometry token to increase the performance of a cursor. You'll retrieve the X and Y coordinates of each land parcel from the parcels feature class along with some attribute information about the parcel. How to do it… Follow these steps to add a geometry token to a cursor, which should improve the performance of this object: Open IDLE and create a new script window. Save the script as C:ArcpyBookCh8GeometryToken.py. Import the arcpy.da module and the time module: import arcpy.da import time Set the workspace: arcpy.env.workspace = "c:/ArcpyBook/Ch8" We're going to measure how long it takes to execute the code using a geometry token. Add the start time for the script: start = time.clock() Use a Python with statement to create a cursor that includes the centroid of each feature as well as the ownership information stored in the PY_FULL_OW field: with arcpy.da.SearchCursor("coa_parcels.shp",("PY_FULL_OW","SHAPE@XY")) as cursor: Loop through each row in SearchCursor and print the name of the parcel and location. Make sure you indent the for loop inside the with block: for row in cursor: print("Parcel owner: {0} has a location of: {1}".format(row[0], row[1])) Measure the elapsed time: elapsed = (time.clock() - start) Print the execution time: print("Execution time: " + str(elapsed)) The entire script should appear as follows: import arcpy.da import time arcpy.env.workspace = "c:/ArcpyBook/Ch9" start = time.clock() with arcpy.da.SearchCursor("coa_parcels.shp",("PY_FULL_OW", "SHAPE@XY")) as cursor:    for row in cursor:        print("Parcel owner: {0} has a location of: {1}".format(row[0], row[1])) elapsed = (time.clock() - start) print("Execution time: " + str(elapsed)) You can check your work by examining the C:ArcpyBookcodeCh8GeometryToken.py solution file. Save the script. Run the script. You should see something similar to the following output. Note the execution time; your time will vary: Parcel owner: CITY OF AUSTIN ATTN REAL ESTATE DIVISION has a location of: (3110480.5197341456, 10070911.174956793) Parcel owner: CITY OF AUSTIN ATTN REAL ESTATE DIVISION has a location of: (3110670.413783513, 10070800.960865) Parcel owner: CITY OF AUSTIN has a location of: (3143925.0013213265, 10029388.97419636) Parcel owner: CITY OF AUSTIN % DOROTHY NELL ANDERSON ATTN BARRY LEE ANDERSON has a location of: (3134432.983822767, 10072192.047894118) Execution time: 9.08046185109 Now, we're going to measure the execution time if the entire geometry is returned instead of just the portion of the geometry that we need: Save a new copy of the script as C:ArcpyBookCh8GeometryTokenEntireGeometry.py. Change the SearchCursor() function to return the entire geometry using SHAPE@ instead of SHAPE@XY: with arcpy.da.SearchCursor("coa_parcels.shp",("PY_FULL_OW", "SHAPE@")) as cursor: You can check your work by examining the C:ArcpyBookcodeCh8GeometryTokenEntireGeometry.py solution file. Save and run the script. You should see the following output. Your time will vary from mine, but notice that the execution time is slower. In this case, it's only a little over a second slower, but we're only returning 2600 features. If the feature class were significantly larger, as many are, this would be amplified: Parcel owner: CITY OF AUSTIN ATTN REAL ESTATE DIVISION has a location of: <geoprocessing describe geometry object object at 0x06B9BE00> Parcel owner: CITY OF AUSTIN ATTN REAL ESTATE DIVISION has a location of: <geoprocessing describe geometry object object at 0x2400A700> Parcel owner: CITY OF AUSTIN has a location of: <geoprocessing describe geometry object object at 0x06B9BE00> Parcel owner: CITY OF AUSTIN % DOROTHY NELL ANDERSON ATTN BARRY LEE ANDERSON has a location of: <geoprocessing describe geometry object object at 0x2400A700> Execution time: 10.1211390896 How it works… A geometry token can be supplied as one of the field names supplied in the constructor for the cursor. These tokens are used to increase the performance of a cursor by returning only a portion of the geometry instead of the entire geometry. This can dramatically increase the performance of a cursor, particularly when you are working with large polyline or polygon datasets. If you only need specific properties of the geometry in your cursor, you should use these tokens. Inserting rows with InsertCursor You can insert a row into a table or feature class using an InsertCursor object. If you want to insert attribute values along with the new row, you'll need to supply the values in the order found in the attribute table. Getting ready The InsertCursor() function is used to create an InsertCursor object that allows you to programmatically add new records to feature classes and tables. The insertRow() method on the InsertCursor object adds the row. A row in the form of a list or tuple is passed into the insertRow() method. The values in the list must correspond to the field values defined when the InsertCursor object was created. Similar to instances that include other types of cursors, you can also limit the field names returned using the second parameter of the method. This function supports geometry tokens as well. The following code example illustrates how you can use InsertCursor to insert new rows into a feature class. Here, we insert two new wildfire points into the California feature class. The row values to be inserted are defined in a list variable. Then, an InsertCursor object is created, passing in the feature class and fields. Finally, the new rows are inserted into the feature class by using the insertRow() method: rowValues = [(Bastrop','N',3000,(-105.345,32.234)), ('Ft Davis','N', 456, (-109.456,33.468))] fc = "c:/data/wildfires.gdb/California" fields = ["FIRE_NAME", "FIRE_CONTAINED", "ACRES", "SHAPE@XY"] with arcpy.da.InsertCursor(fc, fields) as cursor: for row in rowValues:    cursor.insertRow(row) In this article, you will use InsertCursor to add wildfires retrieved from a .txt file into a point feature class. When inserting rows into a feature class, you will need to know how to add the geometric representation of a feature into the feature class. This can be accomplished by using InsertCursor along with two miscellaneous objects: Array and Point. In this exercise, we will add point features in the form of wildfire incidents to an empty point feature class. In addition to this, you will use Python file manipulation techniques to read the coordinate data from a text file. How to do it… We will be importing the North American wildland fire incident data from a single day in October, 2007. This data is contained in a comma-delimited text file containing one line for each fire incident on this particular day. Each fire incident has a latitude, longitude coordinate pair separated by commas along with a confidence value. This data was derived by automated methods that use remote sensing data to derive the presence or absence of a wildfire. Confidence values can range from 0 to 100. Higher numbers represent a greater confidence that this is indeed a wildfire: Open the file at C:ArcpyBookCh8Wildfire DataNorthAmericaWildfire_2007275.txt and examine the contents.You will notice that this is a simple comma-delimited text file containing the longitude and latitude values for each fire along with a confidence value. We will use Python to read the contents of this file line by line and insert new point features into the FireIncidents feature class located in the C:ArcpyBookCh8 WildfireDataWildlandFires.mdb personal geodatabase. Close the file. Open ArcCatalog. Navigate to C:ArcpyBookCh8WildfireData.You should see a personal geodatabase called WildlandFires. Open this geodatabase and you will see a point feature class called FireIncidents. Right now, this is an empty feature class. We will add features by reading the text file you examined earlier and inserting points. Right-click on FireIncidents and select Properties. Click on the Fields tab.The latitude/longitude values found in the file we examined earlier will be imported into the SHAPE field and the confidence values will be written to the CONFIDENCEVALUE field. Open IDLE and create a new script. Save the script to C:ArcpyBookCh8InsertWildfires.py. Import the arcpy modules: import arcpy Set the workspace: arcpy.env.workspace = "C:/ArcpyBook/Ch8/WildfireData/WildlandFires.mdb" Open the text file and read all the lines into a list: f = open("C:/ArcpyBook/Ch8/WildfireData/NorthAmericaWildfires_2007275.txt","r") lstFires = f.readlines() Start a try block: try: Create an InsertCursor object using a with block. Make sure you indent inside the try statement. The cursor will be created in the FireIncidents feature class: with arcpy.da.InsertCursor("FireIncidents",("SHAPE@XY","CONFIDENCEVALUE")) as cur: Create a counter variable that will be used to print the progress of the script: cntr = 1 Loop through the text file line by line using a for loop. Since the text file is comma-delimited, we'll use the Python split() function to separate each value into a list variable called vals. We'll then pull out the individual latitude, longitude, and confidence value items and assign them to variables. Finally, we'll place these values into a list variable called rowValue, which is then passed into the insertRow() function for the InsertCursor object, and we then print a message: for fire in lstFires:      if 'Latitude' in fire:        continue      vals = fire.split(",")      latitude = float(vals[0])      longitude = float(vals[1])      confid = int(vals[2])      rowValue = [(longitude,latitude),confid]      cur.insertRow(rowValue)      print("Record number " + str(cntr) + " written to feature class")      #arcpy.AddMessage("Record number" + str(cntr) + " written to feature class")      cntr = cntr + 1 Add the except block to print any errors that may occur: except Exception as e: print(e.message) Add a finally block to close the text file: finally: f.close() The entire script should appear as follows: import arcpy   arcpy.env.workspace = "C:/ArcpyBook/Ch8/WildfireData/WildlandFires.mdb" f = open("C:/ArcpyBook/Ch8/WildfireData/NorthAmericaWildfires_2007275.txt","r") lstFires = f.readlines() try: with arcpy.da.InsertCursor("FireIncidents", ("SHAPE@XY","CONFIDENCEVALUE")) as cur:    cntr = 1    for fire in lstFires:      if 'Latitude' in fire:        continue      vals = fire.split(",")      latitude = float(vals[0])      longitude = float(vals[1])      confid = int(vals[2])      rowValue = [(longitude,latitude),confid]      cur.insertRow(rowValue)      print("Record number " + str(cntr) + " written to feature class")      #arcpy.AddMessage("Record number" + str(cntr) + "       written to feature class")      cntr = cntr + 1 except Exception as e: print(e.message) finally: f.close() You can check your work by examining the C:ArcpyBookcodeCh8InsertWildfires.py solution file. Save and run the script. You should see messages being written to the output window as the script runs: Record number: 406 written to feature class Record number: 407 written to feature class Record number: 408 written to feature class Record number: 409 written to feature class Record number: 410 written to feature class Record number: 411 written to feature class Open ArcMap and add the FireIncidents feature class to the table of contents. The points should be visible, as shown in the following screenshot: You may want to add a basemap to provide some reference for the data. In ArcMap, click on the Add Basemap button and select a basemap from the gallery. How it works… Some additional explanation may be needed here. The lstFires variable contains a list of all the wildfires that were contained in the comma-delimited text file. The for loop will loop through each of these records one by one, inserting each individual record into the fire variable. We also include an if statement that is used to skip the first record in the file, which serves as the header. As I explained earlier, we then pull out the individual latitude, longitude, and confidence value items from the vals variable, which is just a Python list object and assign them to variables called latitude, longitude, and confid. We then place these values into a new list variable called rowValue in the order that we defined when we created InsertCursor. Thus, the latitude and longitude pair should be placed first followed by the confidence value. Finally, we call the insertRow() function on the InsertCursor object assigned to the cur variable, passing in the new rowValue variable. We close by printing a message that indicates the progress of the script and also create the except and finally blocks to handle errors and close the text file. Placing the file.close() method in the finally block ensures that it will execute and close the file even if there is an error in the previous try statement. Updating rows with UpdateCursor If you need to edit or delete rows from a table or feature class, you can use UpdateCursor. As is the case with InsertCursor, the contents of UpdateCursor can be limited through the use of a where clause. Getting ready The UpdateCursor() function can be used to either update or delete rows in a table or feature class. The returned cursor places a lock on the data, which will automatically be released if used inside a Python with statement. An UpdateCursor object is returned from a call to this method. The UpdateCursor object places a lock on the data while it's being edited or deleted. If the cursor is used inside a Python with statement, the lock will automatically be freed after the data has been processed. This hasn't always been the case. Previous versions of cursors were required to be manually released using the Python del statement. Once an instance of UpdateCursor has been obtained, you can then call the updateCursor() method to update records in tables or feature classes and the deleteRow() method can be used to delete a row. In this article, you're going to write a script that updates each feature in the FireIncidents feature class by assigning a value of poor, fair, good, or excellent to a new field that is more descriptive of the confidence values using an UpdateCursor. Prior to updating the records, your script will add the new field to the FireIncidents feature class. How to do it… Follow these steps to create an UpdateCursor object that will be used to edit rows in a feature class: Open IDLE and create a new script. Save the script to C:ArcpyBookCh8UpdateWildfires.py. Import the arcpy module: import arcpy Set the workspace: arcpy.env.workspace = "C:/ArcpyBook/Ch8/WildfireData/WildlandFires.mdb" Start a try block: try: Add a new field called CONFID_RATING to the FireIncidents feature class. Make sure to indent inside the try statement: arcpy.AddField_management("FireIncidents","CONFID_RATING", "TEXT","10") print("CONFID_RATING field added to FireIncidents") Create a new instance of UpdateCursor inside a with block: with arcpy.da.UpdateCursor("FireIncidents", ("CONFIDENCEVALUE","CONFID_RATING")) as cursor: Create a counter variable that will be used to print the progress of the script. Make sure you indent this line of code and all the lines of code that follow inside the with block: cntr = 1 Loop through each of the rows in the FireIncidents fire class. Update the CONFID_RATING field according to the following guidelines:     Confidence value 0 to 40 = POOR     Confidence value 41 to 60 = FAIR     Confidence value 61 to 85 = GOOD     Confidence value 86 to 100 = EXCELLENT This can be translated in the following block of code:    for row in cursor:      # update the confid_rating field      if row[0] <= 40:        row[1] = 'POOR'      elif row[0] > 40 and row[0] <= 60:        row[1] = 'FAIR'      elif row[0] > 60 and row[0] <= 85:        row[1] = 'GOOD'      else:        row[1] = 'EXCELLENT'      cursor.updateRow(row)                       print("Record number " + str(cntr) + " updated")      cntr = cntr + 1 Add the except block to print any errors that may occur: except Exception as e: print(e.message) The entire script should appear as follows: import arcpy   arcpy.env.workspace = "C:/ArcpyBook/Ch8/WildfireData/WildlandFires.mdb" try: #create a new field to hold the values arcpy.AddField_management("FireIncidents", "CONFID_RATING","TEXT","10") print("CONFID_RATING field added to FireIncidents") with arcpy.da.UpdateCursor("FireIncidents",("CONFIDENCEVALUE", "CONFID_RATING")) as cursor:    cntr = 1    for row in cursor:      # update the confid_rating field      if row[0] <= 40:        row[1] = 'POOR'      elif row[0] > 40 and row[0] <= 60:        row[1] = 'FAIR'      elif row[0] > 60 and row[0] <= 85:        row[1] = 'GOOD'      else:        row[1] = 'EXCELLENT'      cursor.updateRow(row)                       print("Record number " + str(cntr) + " updated")      cntr = cntr + 1 except Exception as e: print(e.message) You can check your work by examining the C:ArcpyBookcodeCh8UpdateWildfires.py solution file. Save and run the script. You should see messages being written to the output window as the script runs: Record number 406 updated Record number 407 updated Record number 408 updated Record number 409 updated Record number 410 updated Open ArcMap and add the FireIncidents feature class. Open the attribute table and you should see that a new CONFID_RATING field has been added and populated by UpdateCursor: When you insert, update, or delete data in cursors, the changes are permanent and can't be undone if you're working outside an edit session. However, with the new edit session functionality provided by ArcGIS 10.1, you can now make these changes inside an edit session to avoid these problems. We'll cover edit sessions soon. How it works… In this case, we've used UpdateCursor to update each of the features in a feature class. We first used the Add Field tool to add a new field called CONFID_RATING, which will hold new values that we assign based on values found in another field. The groups are poor, fair, good, and excellent and are based on numeric values found in the CONFIDENCEVALUE field. We then created a new instance of UpdateCursor based on the FireIncidents feature class, and returned the two fields mentioned previously. The script then loops through each of the features and assigns a value of poor, fair, good, or excellent to the CONFID_RATING field (row[1]), based on the numeric value found in CONFIDENCEVALUE. A Python if/elif/else structure is used to control the flow of the script based on the numeric value. The value for CONFID_RATING is then committed to the feature class by passing the row variable into the updateRow() method. Summary In this article we studied, how to retrieve features from a feature class with SerchCursor, filtering records with a where clause, improving cursr performance with geoetry tokens, inserting rows with InsertCursor and updating rows with UpdateCursor. Resources for Article: Further resources on this subject: Adding Graphics to the Map [article] Introduction to Mobile Web ArcGIS Development [article] Python functions – Avoid repeating code [article]
Read more
  • 0
  • 0
  • 7055

article-image-splunk-interface
Packt
10 Aug 2015
17 min read
Save for later

The Splunk Interface

Packt
10 Aug 2015
17 min read
In this article by Vincent Bumgarner & James D. Miller, author of the book, Implementing Splunk - Second Edition, we will walk through the most common elements in the Splunk interface, and will touch upon concepts that will be covered in greater detail. You may want to dive right into the search section, but an overview of the user interface elements might save you some frustration later. We will cover the following topics: Logging in and app selection A detailed explanation of the search interface widgets A quick overview of the admin interface (For more resources related to this topic, see here.) Logging into Splunk The Splunk GUI interface (Splunk is also accessible through its command-line interface [CLI] and REST API) is web-based, which means that no client needs to be installed. Newer browsers with fast JavaScript engines, such as Chrome, Firefox, and Safari, work better with the interface. As of Splunk Version 6.2.0, no browser extensions are required. Splunk Versions 4.2 and earlier require Flash to render graphs. Flash can still be used by older browsers, or for older apps that reference Flash explicitly. The default port for a Splunk installation is 8000. The address will look like: http://mysplunkserver:8000 or http://mysplunkserver.mycompany.com:8000. The Splunk interface If you have installed Splunk on your local machine, the address can be some variant of http://localhost:8000, http://127.0.0.1:8000, http://machinename:8000, or http://machinename.local:8000. Once you determine the address, the first page you will see is the login screen. The default username is admin with the password changeme. The first time you log in, you will be prompted to change the password for the admin user. It is a good idea to change this password to prevent unwanted changes to your deployment. By default, accounts are configured and stored within Splunk. Authentication can be configured to use another system, for instance Lightweight Directory Access Protocol (LDAP). By default, Splunk authenticates locally. If LDAP is set up, the order is as follows: LDAP / Local. The home app After logging in, the default app is the Launcher app (some may refer to this as Home). This app is a launching pad for apps and tutorials. In earlier versions of Splunk, the Welcome tab provided two important shortcuts, Add data and the Launch search app. In version 6.2.0, the Home app is divided into distinct areas, or panes, that provide easy access to Explore Splunk Enterprise (Add Data, Splunk Apps, Splunk Docs, and Splunk Answers) as well as Apps (the App management page) Search & Reporting (the link to the Search app), and an area where you can set your default dashboard (choose a home dashboard).                 The Explore Splunk Enterprise pane shows links to: Add data: This links Add Data to the Splunk page. This interface is a great start for getting local data flowing into Splunk (making it available to Splunk users). The Preview data interface takes an enormous amount of complexity out of configuring dates and line breaking. Splunk Apps: This allows you to find and install more apps from the Splunk Apps Marketplace (http://apps.splunk.com). This marketplace is a useful resource where Splunk users and employees post Splunk apps, mostly free but some premium ones as well. Splunk Answers: This is one of your links to the wide amount of Splunk documentation available, specifically http://answers.splunk.com, where you can engage with the Splunk community on Splunkbase (https://splunkbase.splunk.com/) and learn how to get the most out of your Splunk deployment. The Apps section shows the apps that have GUI elements on your instance of Splunk. App is an overloaded term in Splunk. An app doesn't necessarily have a GUI at all; it is simply a collection of configurations wrapped into a directory structure that means something to Splunk. Search & Reporting is the link to the Splunk Search & Reporting app. Beneath the Search & Reporting link, Splunk provides an outline which, when you hover over it, displays a Find More Apps balloon tip. Clicking on the link opens the same Browse more apps page as the Splunk Apps link mentioned earlier. Choose a home dashboard provides an intuitive way to select an existing (simple XML) dashboard and set it as part of your Splunk Welcome or Home page. This sets you at a familiar starting point each time you enter Splunk. The following image displays the Choose Default Dashboard dialog: Once you select an existing dashboard from the dropdown list, it will be part of your welcome screen every time you log into Splunk – until you change it. There are no dashboards installed by default after installing Splunk, except the Search & Reporting app. Once you have created additional dashboards, they can be selected as the default. The top bar The bar across the top of the window contains information about where you are, as well as quick links to preferences, other apps, and administration. The current app is specified in the upper-left corner. The following image shows the upper-left Splunk bar when using the Search & Reporting app: Clicking on the text takes you to the default page for that app. In most apps, the text next to the logo is simply changed, but the whole block can be customized with logos and alternate text by modifying the app's CSS. The upper-right corner of the window, as seen in the previous image, contains action links that are almost always available: The name of the user who is currently logged in appears first. In this case, the user is Administrator. Clicking on the username allows you to select Edit Account (which will take you to the Your account page) or to Logout (of Splunk). Logout ends the session and forces the user to login again. The following screenshot shows what the Your account page looks like: This form presents the global preferences that a user is allowed to change. Other settings that affect users are configured through permissions on objects and settings on roles. (Note: preferences can also be configured using the CLI or by modifying specific Splunk configuration files). Full name and Email address are stored for the administrator's convenience. Time zone can be changed for the logged-in user. This is a new feature in Splunk 4.3. Setting the time zone only affects the time zone used to display the data. It is very important that the date is parsed properly when events are indexed. Default app controls the starting page after login. Most users will want to change this to search. Restart backgrounded jobs controls whether unfinished queries should run again if Splunk is restarted. Set password allows you to change your password. This is only relevant if Splunk is configured to use internal authentication. For instance, if the system is configured to use Windows Active Directory via LDAP (a very common configuration), users must change their password in Windows. Messages allows you to view any system-level error messages you may have pending. When there is a new message for you to review, a notification displays as a count next to the Messages menu. You can click the X to remove a message. The Settings link presents the user with the configuration pages for all Splunk Knowledge objects, Distributed Environment settings, System and Licensing, Data, and Users and Authentication settings. If you do not see some of these options, you do not have the permissions to view or edit them. The Activity menu lists shortcuts to Splunk Jobs, Triggered Alerts, and System Activity views. You can click Jobs (to open the search jobs manager window, where you can view and manage currently running searches), click Triggered Alerts (to view scheduled alerts that are triggered) or click System Activity (to see dashboards about user activity and the status of the system). Help lists links to video Tutorials, Splunk Answers, the Splunk Contact Support portal, and online Documentation. Find can be used to search for objects within your Splunk Enterprise instance. For example, if you type in error, it returns the saved objects that contain the term error. These saved objects include Reports, Dashboards, Alerts, and so on. You can also search for error in the Search & Reporting app by clicking Open error in search. The search & reporting app The Search & Reporting app (or just the search app) is where most actions in Splunk start. This app is a dashboard where you will begin your searching. The summary view Within the Search & Reporting app, the user is presented with the Summary view, which contains information about the data which that user searches for by default. This is an important distinction—in a mature Splunk installation, not all users will always search all data by default. But at first, if this is your first trip into Search & Reporting, you'll see the following: From the screen depicted in the previous screenshot, you can access the Splunk documentation related to What to Search and How to Search. Once you have at least some data indexed, Splunk will provide some statistics on the available data under What to Search (remember that this reflects only the indexes that this particular user searches by default; there are other events that are indexed by Splunk, including events that Splunk indexes about itself.) This is seen in the following image: In previous versions of Splunk, panels such as the All indexed data panel provided statistics for a user's indexed data. Other panels gave a breakdown of data using three important pieces of metadata—Source, Sourcetype, and Hosts. In the current version—6.2.0—you access this information by clicking on the button labeled Data Summary, which presents the following to the user: This dialog splits the information into three tabs—Hosts, Sources and Sourcetypes. A host is a captured hostname for an event. In the majority of cases, the host field is set to the name of the machine where the data originated. There are cases where this is not known, so the host can also be configured arbitrarily. A source in Splunk is a unique path or name. In a large installation, there may be thousands of machines submitting data, but all data on the same path across these machines counts as one source. When the data source is not a file, the value of the source can be arbitrary, for instance, the name of a script or network port. A source type is an arbitrary categorization of events. There may be many sources across many hosts, in the same source type. For instance, given the sources /var/log/access.2012-03-01.log and /var/log/access.2012-03-02.log on the hosts fred and wilma, you could reference all these logs with source type access or any other name that you like. Let's move on now and discuss each of the Splunk widgets (just below the app name). The first widget is the navigation bar. As a general rule, within Splunk, items with downward triangles are menus. Items without a downward triangle are links. Next we find the Search bar. This is where the magic starts. We'll go into great detail shortly. Search Okay, we've finally made it to search. This is where the real power of Splunk lies. For our first search, we will search for the word (not case specific); error. Click in the search bar, type the word error, and then either press Enter or click on the magnifying glass to the right of the bar. Upon initiating the search, we are taken to the search results page. Note that the search we just executed was across All time (by default); to change the search time, you can utilize the Splunk time picker. Actions Let's inspect the elements on this page. Below the Search bar, we have the event count, action icons, and menus. Starting from the left, we have the following: The number of events matched by the base search. Technically, this may not be the number of results pulled from disk, depending on your search. Also, if your query uses commands, this number may not match what is shown in the event listing. Job: This opens the Search job inspector window, which provides very detailed information about the query that was run. Pause: This causes the current search to stop locating events but keeps the job open. This is useful if you want to inspect the current results to determine whether you want to continue a long running search. Stop: This stops the execution of the current search but keeps the results generated so far. This is useful when you have found enough and want to inspect or share the results found so far. Share: This shares the search job. This option extends the job's lifetime to seven days and sets the read permissions to everyone. Export: This exports the results. Select this option to output to CSV, raw events, XML, or JavaScript Object Notation (JSON) and specify the number of results to export. Print: This formats the page for printing and instructs the browser to print. Smart Mode: This controls the search experience. You can set it to speed up searches by cutting down on the event data it returns and, additionally, by reducing the number of fields that Splunk will extract by default from the data (Fast mode). You can, otherwise, set it to return as much event information as possible (Verbose mode). In Smart mode (the default setting) it toggles search behavior based on the type of search you're running. Timeline Now we'll skip to the timeline below the action icons. Along with providing a quick overview of the event distribution over a period of time, the timeline is also a very useful tool for selecting sections of time. Placing the pointer over the timeline displays a pop-up for the number of events in that slice of time. Clicking on the timeline selects the events for a particular slice of time. Clicking and dragging selects a range of time. Once you have selected a period of time, clicking on Zoom to selection changes the time frame and reruns the search for that specific slice of time. Repeating this process is an effective way to drill down to specific events. Deselect shows all events for the time range selected in the time picker. Zoom out changes the window of time to a larger period around the events in the current time frame The field picker To the left of the search results, we find the field picker. This is a great tool for discovering patterns and filtering search results. Fields The field list contains two lists: Selected Fields, which have their values displayed under the search event in the search results Interesting Fields, which are other fields that Splunk has picked out for you Above the field list are two links: Hide Fields and All Fields. Hide Fields: Hides the field list area from view. All Fields: Takes you to the Selected Fields window. Search results We are almost through with all the widgets on the page. We still have a number of items to cover in the search results section though, just to be thorough. As you can see in the previous screenshot, at the top of this section, we have the number of events displayed. When viewing all results in their raw form, this number will match the number above the timeline. This value can be changed either by making a selection on the timeline or by using other search commands. Next, we have the action icons (described earlier) that affect these particular results. Under the action icons, we have four results tabs: Events list, which will show the raw events. This is the default view when running a simple search, as we have done so far. Patterns streamlines the event pattern detection. It displays a list of the most common patterns among the set of events returned by your search. Each of these patterns represents the number of events that share a similar structure. Statistics populates when you run a search with transforming commands such as stats, top, chart, and so on. The previous keyword search for error does not display any results in this tab because it does not have any transforming commands. Visualization transforms searches and also populates the Visualization tab. The results area of the Visualization tab includes a chart and the statistics table used to generate the chart. Not all searches are eligible for visualization. Under the tabs described just now, is the timeline. Options Beneath the timeline, (starting at the left) is a row of option links that include: Show Fields: shows the Selected Fields screen List: allows you to select an output option (Raw, List, or Table) for displaying the search results Format: provides the ability to set Result display options, such as Show row numbers, Wrap results, the Max lines (to display) and Drilldown as on or off. NN Per Page: is where you can indicate the number of results to show per page (10, 20, or 50). To the right are options that you can use to choose a page of results, and to change the number of events per page. In prior versions of Splunk, these options were available from the Results display options popup dialog. The events viewer Finally, we make it to the actual events. Let's examine a single event. Starting at the left, we have: Event Details: Clicking here (indicated by the right facing arrow) opens the selected event, providing specific information about the event by type, field, and value, and allows you the ability to perform specific actions on a particular event field. In addition, Splunk version 6.2.0 offers a button labeled Event Actions to access workflow actions, a few of which are always available. Build Eventtype: Event types are a way to name events that match a certain query. Extract Fields: This launches an interface for creating custom field extractions. Show Source: This pops up a window with a simulated view of the original source. The event number: Raw search results are always returned in the order most recent first. Next to appear are any workflow actions that have been configured. Workflow actions let you create new searches or links to other sites, using data from an event. Next comes the parsed date from this event, displayed in the time zone selected by the user. This is an important and often confusing distinction. In most installations, everything is in one time zone—the servers, the user, and the events. When one of these three things is not in the same time zone as the others, things can get confusing. Next, we see the raw event itself. This is what Splunk saw as an event. With no help, Splunk can do a good job finding the date and breaking lines appropriately, but as we will see later, with a little help, event parsing can be more reliable and more efficient. Below the event are the fields that were selected in the field picker. Clicking on the value adds the field value to the search. Summary As you have seen, the Splunk GUI provides a rich interface for working with search results. We have really only scratched the surface and will cover more elements. Resources for Article: Further resources on this subject: The Splunk Web Framework [Article] Loading data, creating an app, and adding dashboards and reports in Splunk [Article] Working with Apps in Splunk [Article]
Read more
  • 0
  • 0
  • 4002

article-image-data-types-and-fields
Packt
10 Aug 2015
30 min read
Save for later

Data Types and Fields

Packt
10 Aug 2015
30 min read
In this article by David Studebaker and Christopher Studebaker, authors of the book Programming Microsoft Dynamics™ NAV 2015, explain the design of an application should begin at the simplest level, with the design of the data elements. The type of data our development tool supports has a significant effect on our design. Because NAV is designed for financially-oriented business applications, NAV data types are financially and business oriented. In this article, we will cover many of the data types we use within NAV. For each data type, we will cover some of the more frequently modified field properties and how particular properties, such as Field Class, are used to support application functionality. Field Class is a fundamental property which defines whether the contents of the field are data to be processed or control information to be interpreted. (For more resources related to this topic, see here.) Data types We are going to segregate the data types into several groups. We will first look at Fundamental data types and then at Complex data types. Fundamental data types Fundamental data types are the basic components from which the complex data types are formed. They are grouped into Numeric, String, and Date/Time data types. Numeric data Just like other systems, Microsoft Dynamics NAV 2015 supports several numeric data types. The specifications for each NAV data type are defined for NAV, independent of the supporting SQL Server database rules. However, some data types are stored and handled somewhat differently from a SQL Server point of view than the way they appear to us as NAV developers and users. For more details on the SQL Server-specific representations of various data elements, refer to the Developer and IT Pro Help. Our discussion will focus on NAV representation and handling for each data type. The various numeric data types are as follows: Integer: This is an integer number ranging from -2,147,483,646 to +2,147,483,647 Decimal: This is a decimal number in the range of +/- 999,999,999,999,999.99. Although it is possible to construct larger numbers, errors such as overflow, truncation, or loss of precision might occur. In addition, there is no facility to display or edit larger numbers. Option: This is a special instance of an integer, stored as an integer number ranging from 0 to +2,147,483,647. An option is normally represented in the body of our C/AL code as an option string. We can compare an option to an integer in C/AL rather than using the option string. However, this is not a good practice because it eliminates the self-documenting aspect of an option field. An option string is a set of choices listed in a comma-separated string, one of which is chosen and stored as the current option. Since the maximum length of this string is 250 characters, the practical maximum number of choices for a single option is less than 125. The currently selected choice within the set of options is stored in the option field as the ordinal position of that option within the set. For example, selection of an entry from the option string of red, yellow, and blue would result in the storing of 0 (red), 1 (yellow), and 2 (blue). If red were selected, 0 would be stored in the variable and if blue were selected, 2 would be stored. Quite often, an option string starts with a blank to allow an effective choice of "none chosen". An example of this (blank, Hourly, Daily,…) is as follows: Boolean: A Boolean variable is stored as 1 or 0. In a C/AL code, it is programmatically referred to as True or False, but sometimes, it is referred in properties as Yes or No. Boolean variables may be displayed as Yes or No (language dependent), P or blank, or True or False. BigInteger: 8-byte Integer as opposed to the 4 bytes of Integer. BigIntegers are for very big numbers (from -9,223,372,036,854,775,807 to 9,223,372,036,854,775,807). Char: A numeric code between 0 and 65535 (hexadecimal FFFF) representing a single 16-bit Unicode character. Char variables can operate either as text or numbers. Numeric operations can be done on Char variables. Char variables can also be defined with individual text character values. Char variables cannot be defined as permanent variables in a table; they can only be defined as working storage variables within C/AL objects. Byte: This is a single 8-bit ASCII character with a value ranging from 0 to 255. Byte variables can operate either as text or numbers. Numeric operations can be done on Byte variables. Byte variables can also be defined with individual text character values. Byte variables cannot be defined as permanent variables in a table, but only as working storage variables within C/AL objects. Action: This is a variable returned from a PAGE RUNMODAL function or RUNMODAL (Page) function that specifies what action a user performs on a page. The possible values are OK, Cancel, LookupOK, LookupCancel, Yes, No, RunObject, and RunSystem. ExecutionMode: This specifies the mode in which a session runs. The possible values are Debug or Standard. String data The following are the data types included in String data: Text: This contains any string of alphanumeric characters. In a table, a Text field can be from 1 to 250 characters long. In working storage within an object, a Text variable can be any length if no length is defined. If a maximum length is defined, it must not exceed 1024. NAV 2015 does not require a length to be specified, but if we define a maximum length, it will be enforced. When calculating the length of a record for design purposes (relative to the maximum record length of 8,000 bytes), the full defined field length should be counted. Code: Although the Help says that the length constraints for Code variables are the same as those for text variables, the C/AL Editor enforces length limits of 1 to 250 characters. All of the letters are automatically converted to uppercase when data is entered into a Code variable; any leading or trailing spaces are removed. Date/Time data The following are the data types included in Date/Time data: Date: This contains an integer number, which is interpreted as a date ranging from January 1, 1754 to December 31, 9999. A 0D (numeral zero, letter D) represents an undefined date (stored as a SQL Server DateTime field) that is interpreted as January 1, 1753. According to the Developer and IT Pro Help that, NAV 2015 supports a Date of 1/1/0000 (presumably as a special case for backward compatibility, but it is not supported by SQL Server). A date constant can be written as the letter D preceded by either six digits in the format MMDDYY or eight digits as MMDDYYYY (where M = month, D = day, and Y = year). For example, 011915D or 01192015D both represent January 19, 2015. Later, in DateFormula, we will find D interpreted as day, but here the trailing D is interpreted as the date (data type) constant. When the year is expressed as YY rather than YYYY, the century portion (in this case, 20) is 20 if the two digit year is from 00 to 29, or 19 if the year is from 30 through 99. NAV also defines a special date called the Closing date, which represents the point in time between one day and the next. The purpose of a closing date is to provide a point at the end of a day, after all of the real date- and time-sensitive activity is recorded—the point when accounting closing entries can be recorded. Closing entries are recorded, in effect, at the stroke of midnight between two dates—this is the date of closing accounting books, and it is designed so that one can include or exclude, at the user's option, closing entries in various reports. When sorted by date, the closing date entries will get sorted after all normal entries for a day. For example, the normal date entry for December 31, 2015 would display as 12/31/15 (depending on the date format masking), and the closing date entry would display as C12/31/15. All of the C12/31/15 ledger entries would appear after all normal 12/31/15 ledger entries. The following screenshot shows two 2014 closing date entries mixed with normal entries from December 2015 and January through April 2015. (This data is from Cronus demo. The 2014 Closing entries have an "Opening Entry" description, which shows that these were the first entries for the demo data in the respective accounts. This is not a normal set of production data.) Time: This contains an integer number, which is interpreted on a 24-hour clock, in milliseconds plus 1, from 00:00:00 to 23:59:59:999. A 0T (numeral zero, letter T) represents an undefined time and is stored as 1/1/1753 00:00:00.000. DateTime: This represents a combined Date and Time, stored in Coordinated Universal Time (UTC), and it always displays local time (that is, the local time on our system). DateTime fields do not support NAV "Closing" dates. DateTime is helpful for an application that must support multiple time zones simultaneously. DateTime values can range from January 1, 1754 00:00:00.000 to December 31, 9999 23:59:59.999, but dates earlier than January 1, 1754 cannot be entered (don't test with dates late in 9999 as an intended advance to the year 10000 won't work). Assigning a date of 0DT will yield an undefined or blank DateTime. Duration: This represents the positive or negative difference between two DateTime values, in milliseconds, stored as a BigInteger. Durations are automatically output in the text format as DDD days HH hours MM minutes SS seconds. Complex data types Each complex data type consists of multiple data elements. For ease of reference, we will categorize them into several groups of similar types. Data structure The following data types are in the data structure group: File: This refers to any standard Windows file outside the NAV database. There is a reasonably complete set of functions to allow to create, delete, open, close, read, write, and copy (among other things) data files. For example, we could create our own NAV routines in C/AL to import or export data from or to a file that had been created by some other application. With the three-tier architecture of NAV 2015, business logic runs on the server and not the client. We need to keep this in mind any time we refer to local external files because they will be on the server by default. Use of Universal Naming Convention (UNC) paths can make this easier to manage. Record: This refers to a single data row within a NAV table that consists of individual fields. Quite often, multiple variable instances of a Record (table) are defined in working storage to support a validation process, allowing access to different records within the table at one time in the same function. Objects Page, Report, Codeunit, Query, and XMLPort, each represents an object data type. Object data types are used when there is a need to refer to an object or a function in another object. Examples: Invoking a Report or an XMLPort from a Page or a Report Calling a function for data validation or processing is coded as a function in a Table or a Codeunit Automation The following are Automation data types. (these are not supported by the NAV Web client.) OCX and Automation data types are supported in NAV 2015 for backward compatibility only: OCX: This allows the definition of a variable that represents and allows access to an ActiveX or OCX custom control. Such a control is typically an external application object that we can invoke from our NAV object. Automation: This allows us to define a variable that we can access similar to an OCX. The application must act as an Automation Server and be registered with the NAV client or server that calls it. For example, we can interface from NAV into the various Microsoft Office products (Word, Excel, and so on) by defining them in Automation variables. DotNet: This allows us to define a variable for .NET Framework interface types within an assembly. It supports accessing .NET Framework type members, including methods, properties, and constructors from C/AL. These can be members of the global assembly cache or custom assemblies. Input/Output The following are the Input/Output data types: Dialog: This supports the definition of a simple user interface window without the use of a Page object. Typically, Dialog windows are used to communicate processing progress or allow a brief user response to a go/no-go question, though this latter use could result in bad performance due to locking. There are other user communication tools as well, but they do not use a Dialog type data item. InStream and Outstream: These allow us to read from and write to external files, BLOBS, and objects of the Automation and OCX data types. DateFormula DateFormula provides for the definition and storage of a simple, but clever, set of constructs to support the calculation of runtime-sensitive dates. A DateFormula is stored in a nonlanguage dependent format, thus supporting multilanguage functionality. A DateFormula is a combination of: Numeric multipliers (for example, 1, 2, 3, 4, and so on) Alpha time units (all must be in uppercase) D for a day W for a week WD for day of the week, that is, from day 1 to day 7 (either in the future or in the past but not today). Monday is day 1 and Sunday is day 7. M for calendar month Y for year CM for current month, CY for current year, CW for current week Math symbols interpretation + (plus) as in CM + 10D means the Current Month end plus 10 Days (in other words, the tenth of the next month) – (minus) as in (-WD3) means the date of the previous Wednesday (which is the 3rd day of the past week). Positional notation (D15 means the 15th day of the month and 15D means 15 days) Payment Terms for Invoices support full use of DateFormula. All DateFormula results are expressed as a date based on a reference date. The default reference date is the system date and not the Work Date. Here are some sample DateFormulas and their interpretations (displayed dates are based on the US calendar) with a reference date of July 10, 2015, a Friday: CM is the last day of Current Month, 07/31/15 CM + 10D is the tenth of the next month, 08/10/15 WD6 is the next sixth day of the week, 07/11/15 WD5 is the next fifth day of the week, 07/17/15 CM – M + D is the end of the current month minus one month plus one day, 07/01/15 CM – 5M is the end of the current month minus five months, 02/28/15 Let us take the opportunity to use the DateFormula data type to learn a few NAV development basics. We will do so by experimenting with some hands-on evaluations of several DateFormula values. We will create a table to calculate dates using DateFormula and Reference Dates. To do this, navigate to Tools | Object Designer | Tables. Then, click on the New button and define the fields shown in the following screenshot. Save it as Table 50009, named Date Formula Test. After we are done with this test, we will save this table for some later testing. Now, we will add some simple C/AL code to our table so that when we enter or change either the Reference Date or the DateFormula data, we can calculate a new result date. First, access the new table via the Design button. Then, go to the global variables definition form through the View menu option, the C/AL Globals sub-option, and finally, choose the Functions tab. Type in our new function name as CalculateNewDate on the first blank line, as shown in the following screenshot, and then exit (by means of the Esc key) from this form back to the list of data fields: From the Table Designer form that displays the list of data fields, either press F9 or click on the C/AL Code icon: This will take us to the following screen, where we can see all of the field triggers plus the trigger for the new function that we just defined. The table triggers will not be visible, unless we scroll up to show them. Note that our new function was defined as a LOCAL function. This means that it cannot be accessed from another object unless we change it to a GLOBAL function. Since our goal now is to focus on experimenting with the DateFormula, we will not go into detail and explain the logic of what we are creating. The logic that we're going to code is as follows: When an entry is made (new or changed) in either the "Reference Date" field or in the "Date Formula to Test field", invoke the CalculateNewDate function to calculate a new “Result Date” value based on the entered data. First, you need to create the logic within our new function, CalculateNewDate(), to evaluate and store a Date Result based on the DateFormula and Reference Date that you enter into the table. Just copy the C/AL code exactly as shown in the following screenshot, exit, compile, and save the table: If you get an error message of any type when you close and save the table, you probably have not copied the C/AL code exactly as it is shown in the screenshot. (also shown below for ease of copying.) CalculateNewDate;"Date Result" := CALCDATE("Date Formula to Test","Reference Date for Calculation"); This code will cause the CalculateNewDate()function to be called via the OnValidate trigger when an entry is made in either the Reference Date for Calculation or the Date Formula to Test fields. The function will place the result in the Date Result field. The use of an integer value in the redundantly named Primary Key field allows us to enter any number of records into the table (by manually numbering them 1, 2, 3, and so forth). Let's experiment with several different date and date formula combinations. We will access the table via the Run button. This will cause NAV to generate a default format page and run it in the Role Tailored Client. Enter a Primary Key value of 1 (one). In Reference Date for Calculation, enter either an upper or lower case T for Today and the system date. The same date will appear in the Date Result field because at this point, no Date Formula has been entered. Now, enter 1D (number 1 followed by uppercase or lowercase D (C/SIDE will make it uppercase) in the Date Formula to Test field. We will see that the Date Result field contents are changed to be one day beyond the date in the Reference Date for Calculation field. Now, for another test entry, start with a 2 in the Primary Key field. Again, enter the letter T (for Today) in the Reference Date for Calculation field, and enter the letter W (for Week) in the Date Formula to Test field. We will get an error message telling us that our formulas should include a number. Make the system happy and enter 1W. We will now see a date in the Date Result field that is one week beyond our system date. Set the system's Work Date to a date in the middle of a month. Start another line with the number 3 as the Primary Key, followed by a W (for Work Date) in the Reference Date for Calculation field. Enter cm (or CM or cM or Cm, it doesn't matter) in the Date Formula to Test field. Our result date will be the last day of our Work Date month. Now, enter another line using the Work Date, but enter a formula of –cm (the same as before but with a minus sign). This time, our result date will be the first day of our Work Date month. Note that the DateFormula logic handles month end dates correctly, including a leap year. Try starting with a date in the middle of February 2016 to confirm this. The following screen shows the Date Formula Test window: Now, enter another line with a new Primary Key. Skip over the Reference Date for Calculation field and just enter 1D in the Date Formula to Test field. So, what happens when you do this? We get an error message stating that "You cannot base a date calculation on an undefined date." In other words, NAV cannot make the requested calculation without a Reference Date. Before we put this function into production, we want our code to check for a Reference Date before calculating. We could default an empty date to the System Date or the Work Date and avoid this particular error. The preceding and following screenshots show different sample calculations. Build on these and then experiment. We can create a variety of different algebraic date formulae and get some very interesting results. One NAV user has due dates on Invoices for the tenth of the next month. Invoices are dated at various times during the month than they are actually printed. By using the DateFormula of CM + 10D, the due date is always automatically calculated to be the tenth of the next month. Don't forget to test with WD (weekday), Q (quarter), and Y (year) as well as D (day), W (week), and M (month). For our code to be language independent, we should enter the date formulae with < > delimiters around them (for example, <1D+1W>). NAV will translate the formula into the correct language codes using the installed language layer. Although our focus for the work we just completed was the Date Formula data type, we've accomplished a lot more than simply learning about that one data type: We created a new table just for the purpose of experimenting with a C/AL feature that we might use. This is a technique that comes in handy when we are learning a new feature or trying to decide how it works or how we might use it. We put some critical OnValidate logic in the table. When data is entered in one area, the entry is validated and, if valid, the defined processing is done instantly. We created a common routine as a new LOCAL function. This function is then called from all the places to which it applies. We did our entire test with a table object and a default tabular page that is automatically generated when we Run a table. We didn't have to create a supporting structure to do our testing. Of course, when we design a change to a complicated existing structure, we will have a more complicated testing scenario. One of our goals will always be to simplify our testing scenarios, both to minimize the setup effort and to keep our test narrowly focused on the specific issue. Finally, and most specifically, we saw how NAV tools make a variety of relative date calculations easy. These are very useful in business applications, many aspects of which are date centered. References and other data types The following data types are used for advanced functionality in NAV, sometimes supporting an interface with an external object: RecordID: This contains the object number and primary key of a table. RecordRef: This identifies a row in a table, a record. RecordRef can be used to obtain information about the table, the record, the fields in the record, and the currently active filters on the table. FieldRef: This identifies a field in a table; thus, it allows access to the contents of that field. KeyRef: This identifies a key in a table and the fields in that key. Since the specific record, field, and key references are assigned at runtime, RecordRef, FieldRef, and KeyRef are used to support logic which can run on tables that are not specified at design time. This means that one routine built on these data types can be created to perform a common function for a variety of different tables and table formats. Variant: This defines variables that are typically used to interface with Automation and OCX objects. Variant variables can contain data of various C/AL data types to pass them to an Automation or OCX object as well as external Automation data types that cannot be mapped to C/AL data types. TableFilter: For variables which can only be used for setting security filters from the Permissions table. Transaction Type: This has optional values of UpdateNoLocks, Update, Snapshot, Browse, and Report that define SQL Server behavior for a NAV Report or XMLport transaction from the beginning of the transaction. BLOB: This can contain either specially formatted text, a graphic in the form of a bitmap, or other developer-defined binary data up to 2 GB in size. The term Binary Large Object (BLOB). BLOBs can only be included in tables and not used to define working storage Variables. Refer to Developer and IT Pro Help for additional information. BigText: This can contain large chunks of text up to 2 GB in size. BigText variables can only be defined in the working storage within an object, but they cannot be included in tables. BigText variables cannot be directly displayed or seen in the debugger. There is a group of special functions that can be used to handle BigText data. Refer to Developer and IT Pro Help for additional information. To handle text strings in a single data element that are greater than 250 characters in length, use a combination of BLOB and BigText variables. GUID: This is used to assign a unique identifying number to any database object. Globally Unique Identifier (GUID), a 16-byte binary data type that is used for unique global identification of records, objects, and so on. GUID is generated by an algorithm developed by Microsoft. TestPage: This is used to store a test page, which is a logical representation of a page that does not display a user interface. Test pages are used when you do NAV application testing using the automated testing facility that is part of NAV. Data type usage About forty percent of the data types can be used to define the data either stored in tables or in working storage data definitions (that is, in a Global or Local data definition within an object). Two data types, BLOB and TableFilter, can only be used to define table-stored data, but not working storage data. About sixty percent of the data types can only be used for working storage data definitions. The following list shows which data types can be used for table (persisted) data fields and which ones can be used for working storage (variable) data: FieldClass property options Almost all data fields have a FieldClass property. FieldClass has as much effect on the content and usage of a data field as the data type; in some instances, it has more effect. Now we'll discuss the FieldClass property options now. FieldClass – Normal When the FieldClass is Normal, the field will contain the type of application data that's typically stored in a table—the contents we would expect based on the data type and various properties. FieldClass – FlowField FlowFields must be dynamically calculated. FlowFields are virtual fields stored as metadata; they do not contain data in the conventional sense. A FlowField contains the definition of how to calculate (at runtime) the data that the field represents and a place to store the result of that calculation. Generally, the Editable property for a FlowField is set to No.. Depending on the CalcFormula method, this could be a value, a reference lookup, or a Boolean. When the CalcFormula method is Sum, the FieldClass connects a data field to a previously defined SumIndexField in the table defined in the CalcFormula. The FlowField processing speed will be significantly affected by the key configuration of the table being processed. While we must be careful not to define extra keys, having the right keys defined will have a major effect on system performance and thus, on user satisfaction. A FlowField value is always 0, blank, or false, unless it has been calculated. If a FlowField is displayed directly on a page, it is calculated automatically when the page is rendered. FlowFields are also automatically calculated when they are the subject of predefined filters as part of the properties of a data item in an object. In all other cases, a FlowField must be forced to calculate using the C/AL RecordName.CALCFIELDS(FlowField1, [FlowField2],...) function or by the use of the SETAUTOCALCFIELDS function. This is also true if the underlying data is changed after the initial display of a page (that is, the FlowField must be recalculated to take a data change into account). Because a FlowField does not contain actual data, it cannot be used as a field in a key. In other words, we cannot include a FlowField as part of a key. In addition, we cannot define a FlowField that is based on another FlowField, except in special circumstances. When a field has its FieldClass set to FlowField, another directly associated property becomes available—CalcFormula. (Conversely, the AltSearchField, AutoIncrement, and TestTableRelation properties disappear from view when FieldClass is set to FlowField). The CalcFormula method is the place where we can define the formula for calculating the FlowField. On the CalcFormula property line, there is an ellipsis button. Clicking on that button will bring up the following screen: Click on the drop-down button to show the seven FlowField methods: The seven FlowFields are described in the following table: FlowField Method Field data type   Calculated value as it applies to the specified set of data within a specific column (field) in a table   Sum Decimal The sum total Average Decimal The average value (the sum divided by the row count) Exist Boolean Yes or No / True or False - does an entry exist? Count Integer The number of entries that exist Min Any The smallest value of any entry Max Any The largest value of any entry Lookup Any The value of the specified entry The Reverse Sign control allows us to change the displayed sign of the result for FlowField types Sum and Average only; the underlying data is not changed. If a Reverse Sign is used with the FlowField type Exists, it changes the effective function to does not Exist. Table and Field allow us to define the Table and the Field within that table to which our Calculation Formula will apply. When we make the entries in our Calculation Formula screen, no validation checking is done by the compiler to check whether we have chosen an eligible table and field combination. This checking doesn't occur until runtime. Therefore, when we create a new FlowField, we should test it as soon as we have defined it. The last, but by no means the least significant component of the FlowField calculation formula is the Table Filter. When we click on the ellipsis in the table filter field, the window shown in the following screenshot will appear: When we click on the Field column, we will be invited to select a field from the table that was entered into the Table field earlier. The Type field choice will determine the type of filter. The Value field will have the filter rules defined on this line, which must be consistent with the Type choices described in the following table: Filter type Value Filtering action OnlyMax- Limit Values- Filter Const   A constant which will be defined in the Value field This uses the constant to filter for equally valued entries     Filter   A filter that will be spelled out as a literal in the Value field This applies the filter expression from the Value field     Field   A field from the table within which the FlowField exists This uses the contents of the specified field to filter equally valued entries False False     If the specified field is a FlowFilter and the OnlyMaxLimit parameter is True, then the FlowFilter range will be applied on the basis of only having a MaxLimit, that is, having no bottom limit. This is useful for the date filters for the Balance Sheet data. (Refer to Balance at Date field in the G/L Account table for an example) True False     This causes the contents of the specified field to be interpreted as a filter (See Balance at Date field in the G/L Account table for an example) True or False True FieldClass – FlowFilter FlowFilters control the calculation of FlowFields in the table (when the FlowFilters are included in the CalcFormula). FlowFilters do not contain permanent data, but instead, they contain filters on a per-user basis, with the information stored in that user's instance of the code that is being executed. A FlowFilter field allows a filter to be entered at a parent record level by the user (for example, G/L Account) and applied (through the use of FlowField formulas, for example) to constrain what child data (for example, G/L Entry records) is selected. A FlowFilter allows us to provide flexible data selection functions to the users. The user does not need to have a full understanding of the data structure to apply filtering in intuitive ways to both the primary data table and the subordinate data. Based on our C/AL code design, FlowFilters can be used to apply filtering on multiple tables that are subordinate to a parent table. Of course, it is our responsibility as developers to make good use of this tool. As with many C/AL capabilities, a good way to learn more is by studying standard code designed by the Microsoft developers of NAV and then experimenting. A number of good examples on the use of FlowFilters can be found in the Customer (Table 18) and Item (Table 27) tables. In the Customer table, some of the FlowFields using FlowFilters are Balance, Balance (LCY), Net Change, Net Change (LCY), Sales (LCY), and Profit (LCY) where LCY stands for local currency. The Sales (LCY) FlowField FlowFilter usage is shown in the following screenshot: Similarly constructed FlowFields using FlowFilters in the Item table include Inventory, Net Invoiced Qty., Net Change, Purchases (Qty.) as well as other fields. Throughout the standard code, there are FlowFilters in most of the master table definitions; there are the Date Filters and Global Dimension Filters (global dimensions are user-defined codes that facilitate the segregation of accounting data by groupings such as divisions, departments, projects, customer type, and so on). Other FlowFilters that are widely used in the standard code related to Inventory activity such as Location Filter, Lot No. Filter, Serial No. Filter, and Bin Filter. The following pair of images shows two fields from the Customer table, both with a Data Type of Date. On the left side of the screenshot is the Last Date Modified field (FieldClass of Normal) and on the right side of the screenshot is the Date Filter field (FieldClass of FlowFilter). It's easy to see that the properties of the two fields are very similar, except for the properties that differ because one is a Normal field and the other is a FlowFilter field. Summary In this article, we focused on the basic building blocks of the NAV data structure: fields and their attributes. We reviewed the types of data fields, properties, and trigger elements for each type of field. We walked through a number of examples to illustrate most of these elements though we had postponed the exploration of triggers until later, when we had more knowledge of C/AL. We covered Data Type and FieldClass, properties which determine what kind of data can be stored in a field. Resources for Article: Further resources on this subject: Customization in Microsoft Dynamics CRM [article] What is BI and What are BI Tools for Microsoft Dynamics GP? [article] Learning MS Dynamics AX 2012 Programming [article]
Read more
  • 0
  • 0
  • 8059
article-image-exploring-jenkins
Packt
10 Aug 2015
7 min read
Save for later

Exploring Jenkins

Packt
10 Aug 2015
7 min read
In this article by Mitesh Soni, the author of the book Jenkins Essentials, introduces us to Jenkins. (For more resources related to this topic, see here.) Jenkins is an open source application written in Java. It is one of the most popular continuous integration (CI) tools used to build and test different kinds of projects. In this article, we will have a quick overview of Jenkins, essential features, and its impact on DevOps culture. Before we can start using Jenkins, we need to install it. In this article, we have provided a step-by-step guide to install Jenkins. Installing Jenkins is a very easy task and is different from the OS flavors. This article will also cover the DevOps pipeline. To be precise, we will discuss the following topics in this article: Introduction to Jenkins and its features Installation of Jenkins on Windows and the CentOS operating system How to change configuration settings in Jenkins What is the deployment pipeline On your mark, get set, go! Introduction to Jenkins and its features Let's first understand what continuous integration is. CI is one of the most popular application development practices in recent times. Developers integrate bug fix, new feature development, or innovative functionality in code repository. The CI tool verifies the integration process with an automated build and automated test execution to detect issues with the current source of an application, and provide quick feedback. Jenkins is a simple, extensible, and user-friendly open source tool that provides CI services for application development. Jenkins supports SCM tools such as StarTeam, Subversion, CVS, Git, AccuRev and so on. Jenkins can build Freestyle, Apache Ant, and Apache Maven-based projects. The concept of plugins makes Jenkins more attractive, easy to learn, and easy to use. There are various categories of plugins available such as Source code management, Slave launchers and controllers, Build triggers, Build tools, Build notifies, Build reports, other post-build actions, External site/tool integrations, UI plugins, Authentication and user management, Android development, iOS development, .NET development, Ruby development, Library plugins, and so on. Jenkins defines interfaces or abstract classes that model a facet of a build system. Interfaces or abstract classes define an agreement on what needs to be implemented; Jenkins uses plugins to extend those implementations. To learn more about all plugins, visit https://wiki.jenkins-ci.org/x/GIAL. To learn how to create a new plugin, visit https://wiki.jenkins-ci.org/x/TYAL. To download different versions of plugins, visit https://updates.jenkins-ci.org/download/plugins/. Features Jenkins is one of the most popular CI servers in the market. The reasons for its popularity are as follows: Easy installation on different operating systems. Easy upgrades—Jenkins has very speedy release cycles. Simple and easy-to-use user interface. Easily extensible with the use of third-party plugins—over 400 plugins. Easy to configure the setup environment in the user interface. It is also possible to customize the user interface based on likings. The master slave architecture supports distributed builds to reduce loads on the CI server. Jenkins is available with test harness built around JUnit; test results are available in graphical and tabular forms. Build scheduling based on the cron expression (to know more about cron, visit http://en.wikipedia.org/wiki/Cron). Shell and Windows command execution in prebuild steps. Notification support related to the build status. Installation of Jenkins on Windows and CentOS Go to https://jenkins-ci.org/. Find the Download Jenkins section on the home page of Jenkins's website. Download the war file or native packages based on your operating system. A Java installation is needed to run Jenkins. Install Java based on your operating system and set the JAVA_HOME environment variable accordingly. Installing Jenkins on Windows Select the native package available for Windows. It will download jenkins-1.xxx.zip. In our case, it will download jenkins-1.606.zip. Extract it and you will get setup.exe and jenkins-1.606.msi files. Click on setup.exe and perform the following steps in sequence. On the welcome screen, click Next: Select the destination folder and click on Next. Click on Install to begin installation. Please wait while the Setup Wizard installs Jenkins. Once the Jenkins installation is completed, click on the Finish button. Verify the Jenkins installation on the Windows machine by opening URL http://<ip_address>:8080 on the system where you have installed Jenkins. Installation of Jenkins on CentOS To install Jenkins on CentOS, download the Jenkins repository definition to your local system at /etc/yum.repos.d/ and import the key. Use the wget -O /etc/yum.repos.d/jenkins.repo http://pkg.jenkins-ci.org/redhat/jenkins.repo command to download repo. Now, run yum install Jenkins; it will resolve dependencies and prompt for installation. Reply with y and it will download the required package to install Jenkins on CentOS. Verify the Jenkins status by issuing the service jenkins status command. Initially, it will be stopped. Start Jenkins by executing service jenkins start in the terminal. Verify the Jenkins installation on the CentOS machine by opening the URL http://<ip_address>:8080 on the system where you have installed Jenkins. How to change configuration settings in Jenkins Click on the Manage Jenkins link on the dashboard to configure system, security, to manage plugins, slave nodes, credentials, and so on. Click on the Configure System link to configure Java, Ant, Maven, and other third-party products' related information. Jenkins uses Groovy as its scripting language. To execute the arbitrary script for administration/trouble-shooting/diagnostics on the Jenkins dashboard, go to the Manage Jenkins link on the dashboard, click on Script Console, and run println(Jenkins.instance.pluginManager.plugins). To verify the system log, go to the Manage Jenkins link on the dashboard and click on the System Log link or visit http://localhost:8080/log/all. To get more information on third-party libraries—version and license information in Jenkins, go to the Manage Jenkins link on the dashboard and click on the About Jenkins link. What is the deployment pipeline? The application development life cycle is a traditionally lengthy and a manual process. In addition, it requires effective collaboration between development and operations teams. The deployment pipeline is a demonstration of automation involved in the application development life cycle containing the automated build execution and test execution, notification to the stakeholder, and deployment in different runtime environments. Effectively, the deployment pipeline is a combination of CI and continuous delivery, and hence is a part of DevOps practices. The following diagram depicts the deployment pipeline process: Members of the development team check code into a source code repository. CI products such as Jenkins are configured to poll changes from the code repository. Changes in the repository are downloaded to the local workspace and Jenkins triggers an automated build process, which is assisted by Ant or Maven. Automated test execution or unit testing, static code analysis, reporting, and notification of successful or failed build process are also part of the CI process. Once the build is successful, it can be deployed to different runtime environments such as testing, preproduction, production, and so on. Deploying a war file in terms of the JEE application is normally the final stage in the deployment pipeline. One of the biggest benefits of the deployment pipeline is the faster feedback cycle. Identification of issues in the application at early stages and no dependencies on manual efforts make this entire end-to-end process more effective. To read more, visit http://martinfowler.com/bliki/DeploymentPipeline.html and http://www.informit.com/articles/article.aspx?p=1621865&seqNum=2. Summary Congratulations! We reached the end of this article and hence we have Jenkins installed on our physical or virtual machine. Till now, we covered the basics of CI and the introduction to Jenkins and its features. We completed the installation of Jenkins on Windows and CentOS platforms. In addition to this, we discussed the deployment pipeline and its importance in CI. Resources for Article: Further resources on this subject: Jenkins Continuous Integration [article] Running Cucumber [article] Introduction to TeamCity [article]
Read more
  • 0
  • 0
  • 2321

article-image-hands-prezi-mechanics
Packt
10 Aug 2015
8 min read
Save for later

Hands-on with Prezi Mechanics

Packt
10 Aug 2015
8 min read
In this In this article by J.J. Sylvia IV, author of the book Mastering Prezi for Business Presentations - Second Edition, we will see how to edit a figure and to style symbols. Also we will see the Grouping feature and brief introduction of the Prezi text editor. (For more resources related to this topic, see here.) Editing lines When editing lines or arrows, you can change them from being straight to curved by dragging the center point in any direction: This is extremely useful when creating the line drawings we saw earlier. It's also useful to get arrows pointing at various objects on your canvas: Styled symbols If you're on a tight deadline, or trying to create drawings with shapes simply isn't for you, then the styles available in Prezi may be of more interest to you. These are common symbols that Prezi has created in a few different styles that can be easily inserted into any of your presentations. You can select these from the same Symbols & shapes… option from the Insert menu where we found the symbols. You'll see several different styles to choose from on the right-hand side of your screen. Each of these categories has similar symbols, but styled differently. There is a wide variety of symbols available ranging from people to social media logos. You can pick a style that best matches your theme or the atmosphere you've created for your presentation. Instead of creating your own person from shapes, you can select from a variety of people symbols available: Although these symbols can be very handy, you should be aware that you can't edit them as part of your presentation. If you decide to use one, note that it will work as it is—there are no new hairstyles for these symbols. Highlighter The highlighter tool is extremely useful for pointing out key pieces of information such as an interesting fact. To use it, navigate to the Insert menu and select the Highlighter option. Then, just drag the cursor across the text you'd like to highlight. Once you've done this, the highlighter marks become objects in their own right, so you can click on them to change their size or position just as you would do for a shape. To change the color of your highlighter, you will need to go into the Theme Wizard and edit the RGB values. We'll cover how to do this later when we discuss branding. Grouping Grouping is a great feature that allows you to move or edit several different elements of your presentation at once. This can be especially useful if you're trying to reorganize the layout of your Prezi after it's been created, or to add animations to several elements at once. Let's go back to the drawing we created earlier to see how this might work: The first way to group items is to hold down the Ctrl key (Command on Mac OS) and to left-click on each element you want to group individually. In this case, I need to click on each individual line that makes up the flat top hair in the preceding image. This might be necessary if I only want to group the hair, for example: Another method for grouping is to hold down the Shift key while dragging your mouse to select multiple items at once. In the preceding screenshot, I've selected my entire person at once. Now, I can easily rotate, resize, or move the entire person at once, without having to move each individual line or shape. If you select a group of objects, move them, and then realize that a piece is missing because it didn't get selected, just press the Ctrl+Z (Command+Z on Mac OS) keys on your keyboard to undo the move. Then, broaden your selection and try again. Alternatively, you can hold down the Shift key and simply click on the piece you missed to add it to the group. If we want to keep these elements grouped together instead of having to reselect them each time we decide to make a change, we can click on the Group button that appears with this change. Now these items will stay grouped unless we click on the new Ungroup button, now located in the same place as the Group button previously was: You can also use frames to group material together. If you already created frames as part of your layout, this might make the grouping process even easier. Prezi text editor Over the years, the Prezi text editor has evolved to be quite robust, and it's now possible to easily do all of your text editing directly within Prezi. Spell checker When you spell something incorrectly, Prezi will underline the word it doesn't recognize with a red line. This is just as you would see it in Microsoft Word or any other text editor. To correct the word, simply right-click on it (or Command + Click on Mac OS) and select the word you meant to type from the suggestions, as shown in the following screenshot: The text drag-apart feature So a colleague of yours has just e-mailed you the text that they want to appear in the Prezi you're designing for them? That's great news as it'll help you understand the flow of the presentation. What's frustrating, though, is that you'll have to copy and paste every single line or paragraph across to put it in the right place on your canvas. At least, that used to be the case before Prezi introduced the drag-apart feature in the text editor. This means you can now easily drag a selection of text anywhere on your canvas without having to rely on the copy and paste options. Let's see how we can easily change the text we spellchecked previously, as shown in the following screenshot: In order to drag your text apart, simply highlight the area you require, hold the mouse button down, and then drag the text anywhere on your canvas. Once you have separated your text, you can then edit the separate parts as you would edit any other individual object on your canvas. In this example, we can change the size of the company name and leave the other text as it is, which we couldn't do within a single textbox: Building Prezis for colleagues If you've kindly offered to build a Prezi for one of your colleagues, ask them to supply the text for it in Word format. You'll be able to run a spellcheck on it from there before you copy and paste it into Prezi. Any bad spellings you miss will also get highlighted on your Prezi canvas but it's good to use both options as a safety net. Font colors Other than dragging text apart to make it stand out more on its own, you might want to highlight certain words so that they jump out at your audience even more. The great news is that you can now highlight individual lines of text or single words and change their color. To do so, just highlight a word by clicking and dragging your mouse across it. Then, click on the color picker at the top of the textbox to see the color menu, as shown in the following screenshot: Select any of the colors available in the palette to change the color of that piece of text. Nothing else in the textbox will be affected apart from the text you have selected. This gives you much greater freedom to use colored text in your Prezi design, and doesn't leave you restricted as in older versions of the software. Choose the right color To make good use of this feature, we recommend that you use a color that completely contrasts to the rest of your design. For example, if your design and corporate colors are blue, we suggest you use red or purple to highlight key words. Also, once you pick a color, stick to it throughout the presentation so that your audience knows when they see a key piece of information. Bullet points and indents Bullets and indents make it much easier to put together your business presentations and helps to give the audience some short, simple information as text in the same format they're used to seeing in other presentations. This can be done by simply selecting the main body of text and clicking on the bullet point icon at the top of the textbox. This is a really simple feature, but a useful one nonetheless. We'd obviously like to point out that too much text on any presentation is a bad thing. Keep it short and to the point. Also, remember that too many bullets can kill a presentation. Summary In this article, we discussed the basic mechanics of Prezi. Learning to combine these tools in creative ways will help you move from a Prezi novice to master. Shapes can be used creatively to create content and drawings, and can be grouped together for easy movement and editing. Prezi also features basic text editing which are explained in this article. Resources for Article: Further resources on this subject: Turning your PowerPoint into a Prezi [Article] The Fastest Way to Go from an Idea to a Prezi [Article] Using Prezi - The Online Presentation Software Tool [Article]
Read more
  • 0
  • 0
  • 923
Modal Close icon
Modal Close icon