Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Events
Videos
Audiobooks
Packt Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials

7019 Articles
article-image-working-gradle
Packt
11 Aug 2015
18 min read
Save for later

Working with Gradle

Packt
11 Aug 2015
18 min read
In this article by Mainak Mitra, author of the book Mastering Gradle, we cover some plugins such as War and Scala, which will be helpful in building web applications and Scala applications. Additionally, we will discuss diverse topics such as Property Management, Multi-Project build, and logging aspects. In the Multi-project build section, we will discuss how Gradle supports multi-project build through the root project's build file. It also provides the flexibility of treating each module as a separate project, plus all the modules together like a single project. (For more resources related to this topic, see here.) The War plugin The War plugin is used to build web projects, and like any other plugin, it can be added to the build file by adding the following line: apply plugin: 'war' War plugin extends the Java plugin and helps to create the war archives. The war plugin automatically applies the Java plugin to the build file. During the build process, the plugin creates a war file instead of a jar file. The war plugin disables the jar task of the Java plugin and adds a default war archive task. By default, the content of the war file will be compiled classes from src/main/java; content from src/main/webapp and all the runtime dependencies. The content can be customized using the war closure as well. In our example, we have created a simple servlet file to display the current date and time, a web.xml file and a build.gradle file. The project structure is displayed in the following screenshot: Figure 6.1 The SimpleWebApp/build.gradle file has the following content: apply plugin: 'war'   repositories { mavenCentral() }   dependencies { providedCompile "javax.servlet:servlet-api:2.5" compile("commons-io:commons-io:2.4") compile 'javax.inject:javax.inject:1' } The war plugin adds the providedCompile and providedRuntime dependency configurations on top of the Java plugin. The providedCompile and providedRuntime configurations have the same scope as compile and runtime respectively, but the only difference is that the libraries defined in these configurations will not be a part of the war archive. In our example, we have defined servlet-api as the providedCompile time dependency. So, this library is not included in the WEB-INF/lib/ folder of the war file. This is because this library is provided by the servlet container such as Tomcat. So, when we deploy the application in a container, it is added by the container. You can confirm this by expanding the war file as follows: SimpleWebApp$ jar -tvf build/libs/SimpleWebApp.war    0 Mon Mar 16 17:56:04 IST 2015 META-INF/    25 Mon Mar 16 17:56:04 IST 2015 META-INF/MANIFEST.MF    0 Mon Mar 16 17:56:04 IST 2015 WEB-INF/    0 Mon Mar 16 17:56:04 IST 2015 WEB-INF/classes/    0 Mon Mar 16 17:56:04 IST 2015 WEB-INF/classes/ch6/ 1148 Mon Mar 16 17:56:04 IST 2015 WEB-INF/classes/ch6/DateTimeServlet.class    0 Mon Mar 16 17:56:04 IST 2015 WEB-INF/lib/ 185140 Mon Mar 16 12:32:50 IST 2015 WEB-INF/lib/commons-io-2.4.jar 2497 Mon Mar 16 13:49:32 IST 2015 WEB-INF/lib/javax.inject-1.jar 578 Mon Mar 16 16:45:16 IST 2015 WEB-INF/web.xml Sometimes, we might need to customize the project's structure as well. For example, the webapp folder could be under the root project folder, not in the src folder. The webapp folder can also contain new folders such as conf and resource to store the properties files, Java scripts, images, and other assets. We might want to rename the webapp folder to WebContent. The proposed directory structure might look like this: Figure 6.2 We might also be interested in creating a war file with a custom name and version. Additionally, we might not want to copy any empty folder such as images or js to the war file. To implement these new changes, add the additional properties to the build.gradle file as described here. The webAppDirName property sets the new webapp folder location to the WebContent folder. The war closure defines properties such as version and name, and sets the includeEmptyDirs option as false. By default, includeEmptyDirs is set to true. This means any empty folder in the webapp directory will be copied to the war file. By setting it to false, the empty folders such as images and js will not be copied to the war file. The following would be the contents of CustomWebApp/build.gradle: apply plugin: 'war'   repositories { mavenCentral() } dependencies { providedCompile "javax.servlet:servlet-api:2.5" compile("commons-io:commons-io:2.4") compile 'javax.inject:javax.inject:1' } webAppDirName="WebContent"   war{ baseName = "simpleapp" version = "1.0" extension = "war" includeEmptyDirs = false } After the build is successful, the war file will be created as simpleapp-1.0.war. Execute the jar -tvf build/libs/simpleapp-1.0.war command and verify the content of the war file. You will find the conf folder is added to the war file, whereas images and js folders are not included. You might also find the Jetty plugin interesting for web application deployment, which enables you to deploy the web application in an embedded container. This plugin automatically applies the War plugin to the project. The Jetty plugin defines three tasks; jettyRun, jettyRunWar, and jettyStop. Task jettyRun runs the web application in an embedded Jetty web container, whereas the jettyRunWar task helps to build the war file and then run it in the embedded web container. Task jettyStopstops the container instance. For more information please refer to the Gradle API documentation. Here is the link: https://docs.gradle.org/current/userguide/war_plugin.html. The Scala plugin The Scala plugin helps you to build the Scala application. Like any other plugin, the Scala plugin can be applied to the build file by adding the following line: apply plugin: 'scala' The Scala plugin also extends the Java plugin and adds a few more tasks such as compileScala, compileTestScala, and scaladoc to work with Scala files. The task names are pretty much all named after their Java equivalent, simply replacing the java part with scala. The Scala project's directory structure is also similar to a Java project structure where production code is typically written under src/main/scala directory and test code is kept under the src/test/scala directory. Figure 6.3 shows the directory structure of a Scala project. You can also observe from the directory structure that a Scala project can contain a mix of Java and Scala source files. The HelloScala.scala file has the following content. The output is Hello, Scala... on the console. This is a very basic code and we will not be able to discuss much detail on the Scala programming language. We request readers to refer to the Scala language documentation available at http://www.scala-lang.org/. package ch6   object HelloScala {    def main(args: Array[String]) {      println("Hello, Scala...")    } } To support the compilation of Scala source code, Scala libraries should be added in the dependency configuration: dependencies { compile('org.scala-lang:scala-library:2.11.6') } Figure 6.3 As mentioned, the Scala plugin extends the Java plugin and adds a few new tasks. For example, the compileScala task depends on the compileJava task and the compileTestScala task depends on the compileTestJava task. This can be understood easily, by executing classes and testClasses tasks and looking at the output. $ gradle classes :compileJava :compileScala :processResources UP-TO-DATE :classes   BUILD SUCCESSFUL $ gradle testClasses :compileJava UP-TO-DATE :compileScala UP-TO-DATE :processResources UP-TO-DATE :classes UP-TO-DATE :compileTestJava UP-TO-DATE :compileTestScala UP-TO-DATE :processTestResources UP-TO-DATE :testClasses UP-TO-DATE   BUILD SUCCESSFUL Scala projects are also packaged as jar files. The jar task or assemble task creates a jar file in the build/libs directory. $ jar -tvf build/libs/ScalaApplication-1.0.jar 0 Thu Mar 26 23:49:04 IST 2015 META-INF/ 94 Thu Mar 26 23:49:04 IST 2015 META-INF/MANIFEST.MF 0 Thu Mar 26 23:49:04 IST 2015 ch6/ 1194 Thu Mar 26 23:48:58 IST 2015 ch6/Customer.class 609 Thu Mar 26 23:49:04 IST 2015 ch6/HelloScala$.class 594 Thu Mar 26 23:49:04 IST 2015 ch6/HelloScala.class 1375 Thu Mar 26 23:48:58 IST 2015 ch6/Order.class The Scala plugin does not add any extra convention to the Java plugin. Therefore, the conventions defined in the Java plugin, such as lib directory and report directory can be reused in the Scala plugin. The Scala plugin only adds few sourceSet properties such as allScala, scala.srcDirs, and scala to work with source set. The following task example displays different properties available to the Scala plugin. The following is a code snippet from ScalaApplication/build.gradle: apply plugin: 'java' apply plugin: 'scala' apply plugin: 'eclipse'   version = '1.0'   jar { manifest { attributes 'Implementation-Title': 'ScalaApplication',     'Implementation-Version': version } }   repositories { mavenCentral() }   dependencies { compile('org.scala-lang:scala-library:2.11.6') runtime('org.scala-lang:scala-compiler:2.11.6') compile('org.scala-lang:jline:2.9.0-1') }   task displayScalaPluginConvention << { println "Lib Directory: $libsDir" println "Lib Directory Name: $libsDirName" println "Reports Directory: $reportsDir" println "Test Result Directory: $testResultsDir"   println "Source Code in two sourcesets: $sourceSets" println "Production Code: ${sourceSets.main.java.srcDirs},     ${sourceSets.main.scala.srcDirs}" println "Test Code: ${sourceSets.test.java.srcDirs},     ${sourceSets.test.scala.srcDirs}" println "Production code output:     ${sourceSets.main.output.classesDir} &        ${sourceSets.main.output.resourcesDir}" println "Test code output: ${sourceSets.test.output.classesDir}      & ${sourceSets.test.output.resourcesDir}" } The output of the task displayScalaPluginConvention is shown in the following code: $ gradle displayScalaPluginConvention … :displayScalaPluginConvention Lib Directory: <path>/ build/libs Lib Directory Name: libs Reports Directory: <path>/build/reports Test Result Directory: <path>/build/test-results Source Code in two sourcesets: [source set 'main', source set 'test'] Production Code: [<path>/src/main/java], [<path>/src/main/scala] Test Code: [<path>/src/test/java], [<path>/src/test/scala] Production code output: <path>/build/classes/main & <path>/build/resources/main Test code output: <path>/build/classes/test & <path>/build/resources/test   BUILD SUCCESSFUL Finally, we will conclude this section by discussing how to execute Scala application from Gradle; we can create a simple task in the build file as follows. task runMain(type: JavaExec){ main = 'ch6.HelloScala' classpath = configurations.runtime + sourceSets.main.output +     sourceSets.test.output } The HelloScala source file has a main method which prints Hello, Scala... in the console. The runMain task executes the main method and displays the output in the console: $ gradle runMain .... :runMain Hello, Scala...   BUILD SUCCESSFUL Logging Until now we have used println everywhere in the build script to display the messages to the user. If you are coming from a Java background you know a println statement is not the right way to give information to the user. You need logging. Logging helps the user to classify the categories of messages to show at different levels. These different levels help users to print a correct message based on the situation. For example, when a user wants complete detailed tracking of your software, they can use debug level. Similarly, whenever a user wants very limited useful information while executing a task, they can use quiet or info level. Gradle provides the following different types of logging: Log Level Description ERROR This is used to show error messages QUIET This is used to show limited useful information WARNING This is used to show warning messages LIFECYCLE This is used to show the progress (default level) INFO This is used to show information messages DEBUG This is used to show debug messages (all logs) By default, the Gradle log level is LIFECYCLE. The following is the code snippet from LogExample/build.gradle: task showLogging << { println "This is println example" logger.error "This is error message" logger.quiet "This is quiet message" logger.warn "This is WARNING message" logger.lifecycle "This is LIFECYCLE message" logger.info "This is INFO message" logger.debug "This is DEBUG message" } Now, execute the following command: $ gradle showLogging   :showLogging This is println example This is error message This is quiet message This is WARNING message This is LIFECYCLE message   BUILD SUCCESSFUL Here, Gradle has printed all the logger statements upto the lifecycle level (including lifecycle), which is Gradle's default log level. You can also control the log level from the command line. -q This will show logs up to the quiet level. It will include error and quiet messages -i This will show logs up to the info level. It will include error, quiet, warning, lifecycle and info messages. -s This prints out the stacktrace for all exceptions. -d This prints out all logs and debug information. This is most expressive log level, which will also print all the minor details. Now, execute gradle showLogging -q: This is println example This is error message This is quiet message Apart from the regular lifecycle, Gradle provides an additional option to provide stack trace in case of any exception. Stack trace is different from debug. In case of any failure, it allows tracking of all the nested functions, which are called in sequence up to the point where the stack trace is generated. To verify, add the assert statement in the preceding task and execute the following: task showLogging << { println "This is println example" .. assert 1==2 }   $ gradle showLogging -s …… * Exception is: org.gradle.api.tasks.TaskExecutionException: Execution failed for task ':showLogging'. at org.gradle.api.internal.tasks.execution.ExecuteActionsTaskExecuter. executeActions(ExecuteActionsTaskExecuter.java:69)        at …. org.gradle.api.internal.tasks.execution.SkipOnlyIfTaskExecuter. execute(SkipOnlyIfTaskExecuter.java:53)        at org.gradle.api.internal.tasks.execution.ExecuteAtMostOnceTaskExecuter. execute(ExecuteAtMostOnceTaskExecuter.java:43)        at org.gradle.api.internal.AbstractTask.executeWithoutThrowingTaskFailure (AbstractTask.java:305) ... With stracktrace, Gradle also provides two options: -s or --stracktrace: This will print truncated stracktrace -S or --full-stracktrace: This will print full stracktrace File management One of the key features of any build tool is I/O operations and how easily you can perform the I/O operations such as reading files, writing files, and directory-related operations. Developers with Ant or Maven backgrounds know how painful and complex it was to handle the files and directory operations in old build tools; sometimes you had to write custom tasks and plugins to perform these kinds of operations due to XML limitations in Ant and Maven. Since Gradle uses Groovy, it will make your life much easier while dealing with files and directory-related operations. Reading files Gradle provides simple ways to read the file. You just need to use the File API (application programing interface) and it provides everything to deal with the file. The following is the code snippet from FileExample/build.gradle: task showFile << { File file1 = file("readme.txt") println file1   // will print name of the file file1.eachLine {    println it // will print contents line by line } } To read the file, we have used file(<file Name>). This is the default Gradle way to reference files because Gradle adds some path behavior ($PROJECT_PATH/<filename>) due to absolute and relative referencing of files. Here, the first println statement will print the name of the file which is readme.txt. To read a file, Groovy provides the eachLine method to the File API, which reads all the lines of the file one by one. To access the directory, you can use the following file API: def dir1 = new File("src") println "Checking directory "+dir1.isFile() // will return false   for directory println "Checking directory "+dir1.isDirectory() // will return true for directory Writing files To write to the files, you can use either the append method to add contents to the end of the file or overwrite the file using the setText or write methods: task fileWrite << { File file1 = file ("readme.txt")   // will append data at the end file1.append("nAdding new line. n")   // will overwrite contents file1.setText("Overwriting existing contents")   // will overwrite contents file1.write("Using write method") } Creating files/directories You can create a new file by just writing some text to it: task createFile << { File file1 = new File("newFile.txt") file1.write("Using write method") } By writing some data to the file, Groovy will automatically create the file if it does not exist. To write content to file you can also use the leftshift operator (<<), it will append data at the end of the file: file1 << "New content" If you want to create an empty file, you can create a new file using the createNewFile() method. task createNewFile << { File file1 = new File("createNewFileMethod.txt") file1.createNewFile() } A new directory can be created using the mkdir command. Gradle also allows you to create nested directories in a single command using mkdirs: task createDir << { def dir1 = new File("folder1") dir1.mkdir()   def dir2 = new File("folder2") dir2.createTempDir()   def dir3 = new File("folder3/subfolder31") dir3.mkdirs() // to create sub directories in one command } In the preceding example, we are creating two directories, one using mkdir() and the other using createTempDir(). The difference is when we create a directory using createTempDir(), that directory gets automatically deleted once your build script execution is completed. File operations We will see examples of some of the frequently used methods while dealing with files, which will help you in build automation: task fileOperations << { File file1 = new File("readme.txt") println "File size is "+file1.size() println "Checking existence "+file1.exists() println "Reading contents "+file1.getText() println "Checking directory "+file1.isDirectory() println "File length "+file1.length() println "Hidden file "+file1.isHidden()   // File paths println "File path is "+file1.path println "File absolute path is "+file1.absolutePath println "File canonical path is "+file1.canonicalPath   // Rename file file1.renameTo("writeme.txt")   // File Permissions file1.setReadOnly() println "Checking read permission "+ file1.canRead()+" write permission "+file1.canWrite() file1.setWritable(true) println "Checking read permission "+ file1.canRead()+" write permission "+file1.canWrite()   } Most of the preceding methods are self-explanatory. Try to execute the preceding task and observe the output. If you try to execute the fileOperations task twice, you will get the exception readme.txt (No such file or directory) since you have renamed the file to writeme.txt. Filter files Certain file methods allow users to pass a regular expression as an argument. Regular expressions can be used to filter out only the required data, rather than fetch all the data. The following is an example of the eachFileMatch() method, which will list only the Groovy files in a directory: task filterFiles << { def dir1 = new File("dir1") dir1.eachFileMatch(~/.*.groovy/) {    println it } dir1.eachFileRecurse { dir ->    if(dir.isDirectory()) {      dir.eachFileMatch(~/.*.groovy/) {        println it      }    } } } The output is as follows: $ gradle filterFiles   :filterFiles dir1groovySample.groovy dir1subdir1groovySample1.groovy dir1subdir2groovySample2.groovy dir1subdir2subDir3groovySample3.groovy   BUILD SUCCESSFUL Delete files and directories Gradle provides the delete() and deleteDir() APIs to delete files and directories respectively: task deleteFile << { def dir2 = new File("dir2") def file1 = new File("abc.txt") file1.createNewFile() dir2.mkdir() println "File path is "+file1.absolutePath println "Dir path is "+dir2.absolutePath file1.delete() dir2.deleteDir() println "Checking file(abc.txt) existence: "+file1.exists()+" and Directory(dir2) existence: "+dir2.exists() } The output is as follows: $ gradle deleteFile :deleteFile File path is Chapter6/FileExample/abc.txt Dir path is Chapter6/FileExample/dir2 Checking file(abc.txt) existence: false and Directory(dir2) existence: false   BUILD SUCCESSFUL The preceding task will create a directory dir2 and a file abc.txt. Then it will print the absolute paths and finally delete them. You can verify whether it is deleted properly by calling the exists() function. FileTree Until now, we have dealt with single file operations. Gradle provides plenty of user-friendly APIs to deal with file collections. One such API is FileTree. A FileTree represents a hierarchy of files or directories. It extends the FileCollection interface. Several objects in Gradle such as sourceSets, implement the FileTree interface. You can initialize FileTree with the fileTree() method. The following are the different ways you can initialize the fileTree method: task fileTreeSample << { FileTree fTree = fileTree('dir1') fTree.each {    println it.name } FileTree fTree1 = fileTree('dir1') {    include '**/*.groovy' } println "" fTree1.each {    println it.name } println "" FileTree fTree2 = fileTree(dir:'dir1',excludes:['**/*.groovy']) fTree2.each {    println it.absolutePath } } Execute the gradle fileTreeSample command and observe the output. The first iteration will print all the files in dir1. The second iteration will only include Groovy files (with extension .groovy). The third iteration will exclude Groovy files (with extension .groovy) and print other files with absolute path. You can also use FileTree to read contents from the archive files such as ZIP, JAR, or TAR files: FileTree jarFile = zipTree('SampleProject-1.0.jar') jarFile.each { println it.name } The preceding code snippet will list all the files contained in a jar file. Summary In this article, we have explored different topics of Gradle such as I/O operations, logging, Multi-Project build and testing using Gradle. We also learned how easy it is to generate assets for web applications and Scala projects with Gradle. In the Testing with Gradle section, we learned some basics to execute tests with JUnit and TestNG. In the next article, we will learn the code quality aspects of a Java project. We will analyze a few Gradle plugins such as Checkstyle and Sonar. Apart from learning these plugins, we will discuss another topic called Continuous Integration. These two topics will be combined and presented by exploration of two different continuous integration servers, namely Jenkins and TeamCity. Resources for Article: Further resources on this subject: Speeding up Gradle builds for Android [article] Defining Dependencies [article] Testing with the Android SDK [article]
Read more
  • 0
  • 0
  • 5320

Packt
11 Aug 2015
17 min read
Save for later

Ext JS 5 – an Introduction

Packt
11 Aug 2015
17 min read
In this article by Carlos A. Méndez, the author of the book Learning Ext JS - Fourth Edition, we will see some of the important features in Ext JS. When learning a new technology such as Ext JS, some developers face a hard time to begin with, so this article will cover up certain important points that have been included in the recent version of Ext JS. We will be referencing certain online documentations, blogs and forums looking for answers, trying to figure out how the library and all the components work together. Even though there are tutorials in the official learning center, it would be great to have a guide to learn the library from the basics to a more advanced level. Ext JS is a state-of-the-art framework to create Rich Internet Applications (RIAs). The framework allows us to create cross-browser applications with a powerful set of components and widgets. The idea behind the framework is to create user-friendly applications in rapid development cycles, facilitate teamwork (MVC or MVVM), and also have a long-term maintainability. Ext JS is not just a library of widgets anymore; the brand new version is a framework full of new exciting features for us to play with. Some of these features are the new class system, the loader, the new application package, which defines a standard way to code our applications, and much more awesome stuff. The company behind the Ext JS library is Sencha Inc. They work on great products that are based on web standards. Some of the most famous products that Sencha also have are Sencha Touch and Sencha Architect. In this article, we will cover some of the basic concepts of the framework of version 5 and take a look at some of the new features in Ext JS 5. (For more resources related to this topic, see here.) Considering Ext JS for your next project Ext JS is a great library to create RIAs that require a lot of interactivity with the user. If you need complex components to manage your information, then Ext is your best option because it contains a lot of widgets such as the grid, forms, trees, panels, and a great data package and class system. Ext JS is best suited for enterprise or intranet applications; it's a great tool to develop an entire CRM or ERP software solution. One of the more appealing examples is the Desktop sample (http://dev.sencha.com/ext/5.1.0/examples/desktop/index.html). It really looks and feels like a native application running in the browser. In some cases, this is an advantage because the users already know how to interact with the components and we can improve the user experience. Ext JS 5 came out with a great tool to create themes and templates in a very simple way. The framework for creating themes is built on top of Compass and Sass, so we can modify some variables and properties and in a few minutes we can have a custom template for our Ext JS applications. If we want something more complex or unique, we can modify the original template to suit our needs. This might be more time-consuming depending on our experience with Compass and Sass. Compass and Sass are extensions for CSS. We can use expressions, conditions, variables, mixins, and many more awesome things to generate well-formatted CSS. You can learn more about Compass on their website at http://compass-style.org/. The new class system allows us to define classes incredibly easily. We can develop our application using the object-oriented programming paradigm and take advantage of the single and multiple inheritances. This is a great advantage because we can implement any of the available patterns such as MVC, MVVM, Observable, or any other. This will allow us to have a good code structure, which leads us to have easy access for maintenance. Another thing to keep in mind is the growing community around the library; there are lot of people around the world that are working with Ext JS right now. You can even join the meeting groups that have local reunions frequently to share knowledge and experiences; I recommend you to look for a group in your city or create one. The new loader system is a great way to load our modules or classes on demand. We can load only the modules and applications that the user needs just in time. This functionality allows us to bootstrap our application faster by loading only the minimal code for our application to work. One more thing to keep in mind is the ability to prepare our code for deployment. We can compress and obfuscate our code for a production environment using the Sencha Command, a tool that we can run on our terminal to automatically analyze all the dependencies of our code and create packages. Documentation is very important and Ext JS has great documentation, which is very descriptive with a lot of examples, videos, and sample code so that we can see it in action right on the documentation pages, and we can also read the comments from the community. What's new in Ext JS 5 Ext JS 5 introduces a great number of new features, and we'll briefly cover a few of the significant additions in version 5 as follows: Tablet support and new themes: This has introduced the ability to create apps compatible with touch-screen devices (touch-screen laptops, PCs, and tablets). The Crisp theme is introduced and is based on the Neptune theme. Also, there are new themes for tablet support, which are Neptune touch and Crisp touch. New application architecture – MVVM: Adding a new alternative to MVC Sencha called MVVM (which stands for Model-View-ViewModel), this new architecture has data binding and two-way data binding, allowing us to decrease much of the extra code that some of us were doing in past versions. This new architecture introduces: Data binding View controllers View models Routing: Routing provides deep linking of application functionality and allows us to perform certain actions or methods in our application by translating the URL. This gives us the ability to control the application state, which means that we can go to a specific part or a direct link to our application. Also, it can handle multiple actions in the URL. Responsive configurations: Now we have the ability to set the responsiveConfig property (new property) to some components, which will be a configuration object that represents conditions and criteria on which the configurations set will be applied, if the rule meets these configurations. As an example: responsiveConfig: { 'width > 800': { region: 'west' }, 'width <= 800':{ region: 'north' } } Data package improvements: Some good changes came in version 5 relating to data handling and data manipulation. These changes allowed developers an easier journey in their projects, and some of the new things are: Common Data (the Ext JS Data class, Ext.Data, is now part of the core package) Many-to-many associations Chained stores Custom field types Event system: The event logic was changed, and is now a single listener attached at the very top of the DOM hierarchy. So this means when a DOM element fires an event, it bubbles to the top of the hierarchy before it's handled. So Ext JS intercepts this and checks the relevant listeners you added to the component or store. This reduces the number of interactions on the DOM and also gives us the ability to enable gestures. Sencha Charts: Charts can work on both Ext JS and Sencha Touch, and have enhanced performance on tablet devices. Legacy Ext JS 4 charts were converted into a separate package to minimize the conversion/upgrade. In version 5, charts have new features such as: Candlestick and OHLC series Pan, zoom, and crosshair interactions Floating axes Multiple axes SVG and HTML Canvas support Better performance Greater customization Chart themes Tab Panels: Tab panels have more options to control configurations such as icon alignment and text rotation. Thanks to new flexible Sass mixins, we can easily control presentation options. Grids: This component, which has been present since version 2x, is one of the most popular components, and we may call it one of the cornerstones of this framework. In version 5, it got some awesome new features: Components in Cells Buffered updates Cell updaters Grid filters (The popular "UX" (user extension) has been rewritten and integrated into the framework. Also filters can be saved in the component state.) Rendering optimizations Widgets: This is a lightweight component, which is a middle ground between Ext.Component and the Cell renderer. Breadcrumb bars: This new component displays the data of a store (a specific data store for the tree component) in a toolbar form. This new control can be a space saver on small screens or tablets. Form package improvements: Ext JS 5 introduces some new controls and significant changes on others: Tagfield: This is a new control to select multiple values. Segmented buttons: These are buttons with presentation such as multiple selection on mobile interfaces. Goodbye to TriggerField: In version 5, TriggerField is deprecated and now the way to create triggers is by using the Text field and implementing the triggers on the TextField configuration. (TriggerField in version 4 is a text field with a configured button(s) on the right side.)  Field and Form layouts: Layouts were refactored using HTML and CSS, so there is improvement as the performance is now better. New SASS Mixins (http://sass-lang.com/): Several components that were not able to be custom-themed now have the ability to be styled in multiple ways in a single theme or application. These components are: Ext.menu.Menu Ext.form.Labelable Ext.form.FieldSet Ext.form.CheckboxGroup Ext.form.field.Text Ext.form.field.Spinner Ext.form.field.Display Ext.form.field.Checkbox The Sencha Core package: The core package contains code shared between Ext JS and Sencha Touch and in the future, this core will be part of the next major release of Sencha Touch. The Core includes: Class system Data Events Element Utilities Feature/environment detection Preparing for deployment So far, we have seen a few features that helps to architect a JavaScript code; but we need to prepare our application for a production environment. So initially, when an application is in the development environment, we need to make sure that Ext JS classes (also our own classes) are dynamically loaded when the application requires to use them. In this environment, it's really helpful to load each class in a separate file. This will allow us to debug the code easily, and find and fix bugs. Now, before the application is compiled, we must know the three basic parts of an application, as marked here: app.json: This file contains specific details about our application. Also, Sencha CMD processes this file first. build.xml: This file contains a minimal initial Ant script, and imports a task file located at .sencha/app/build-impl.xml. .sencha: This folder contains many files related to, and are to be used for, the build process. The app.json file As we said before, the app.json file contains the information about the settings of the application. Open the file and take a look. We can make changes to this file, such as the theme that our application is going to use. For example, we can use the following line of code: "theme": "my-custom-theme-touch", Alternatively, we can use a normal theme: "theme": "my-custom-theme", We can also use the following for using charts: "requires": [ "sencha-charts" ], This was to specify that we are going to use the charts or draw classes in our application (the chart package for Ext JS 5). Now, at the end of the file, there is an ID for the application: "id": "7833ee81-4d14-47e6-8293-0cb8120281ab" After this ID, we can add other properties. As an example, suppose the application will be generated for Central and South America. Then we need to include the locale (ES or PT), so we can add the following: ,"locales":["es"] We can also add multiple languages: ,"locales":["es","pt","en"] This will cause the compilation process to include the corresponding locale files located at ext/packages/ext-locale/build. However, this article can't cover each property in the file, so it's recommended that you take a deep look into the Sencha CMD documentation at: http://docs-origin.sencha.com/cmd/5.x/microloader.html to learn more about the app.json file. The Sencha command To create our production build, we need to use the Sencha Command. This tool will help us in our purpose. If you are running Sencha CMD on Windows 7 or Windows 8, it's recommended that you run the tool with "administrator privileges". So let's type this in our console tool: [path of my app]\sencha app build In my case (Windows OS 7; 64-bit), I typed: K:\x_extjsdev\app_test\myapp>sencha app build After the command runs, you will see something like this in your console tool: So, let's check out the build folder inside our application folder. We may have the following list of files: Notice that the build process has created these: resources: This file will contain a copy of our resources folder, plus one or more CSS files starting with myApp-all app.js: This file contains all of the necessary JS (Ext JS core classes, components, and our custom application classes) app.json: This is a small manifest file compressed index.html: This file is similar to our index file in development mode, except for the line: <script id="microloader" type="text/javascript" src="bootstrap.js"></script> This was replaced by some compressed JavaScript code, which will act in a similar way to the micro loader. Notice that the serverside folder, where we use some JSON files (other cases can be PHP, ASP, and so on), does not exist in the production folder. Well, the reason is that that folder is not part of what Sencha CMD and build files consider. Normally, many developers will say, "Hey, let's copy the folder and let's move on." However, the good news is that we can include that folder with an Apache Ant task Customizing the build.xml file We can add custom code (Apache Ant style) to perform new tasks and things we need in order to make our application build even better. Let's open the build.xml file. You will see something like this: <?xml version="1.0" encoding="utf-8"?> <project name="myApp" default=".help"> <!-- comments... --> <import file="${basedir}/.sencha/app/build-impl.xml"/> <!-- comments... --> </project> So, let's place the following code before </project>: <target name="-after-build" depends="init"> <copy todir="${build.out.base.path}/serverside" overwrite="false"> <fileset dir="${app.dir}/serverside" includes="**/*"/> </copy> </target> </project> This new code inside the build.xml file establishes that after making the whole building process, if there is no error during the Init process then it will copy the (${app.dir}/ serverside) folder to the (${build.out.base.path}/serverside) output path. So now, let's type the command for building the application again: sencha app build –c In this case, we added -c to first clean the build/production folder and create a new set of files. After the process completes, take a look at the folder contents, and you will see this: Notice that now the serverside folder has been copied to the production build folder, thanks to the custom code we placed in build.xml file. Compressing the code After building our application, let's open the app.js file. We may see something like what is shown here: By default, the build process uses the YUI compressor to compact the JS code (http://yui.github.io/yuicompressor/). Inside the .sencha folder, there are many files, and depending on the type of build we are creating, there are some files such as the base file, where the properties are defined in defaults.properties. This file must not be changed whatsoever; for that, we have other files that can override the values defined in this file. As an example for the production build, we have the following files: production.defaults.properties: This file will contain some properties/variables that will be used for the production build. production.properties: This file has only comments. The idea behind this file is that developers place the variables they want in order to customize the production build. By default, in the production.defaults.properties file, you will see something like the following code: # Comments ...... # more comments...... build.options.logger=no build.options.debug=false # enable the full class system optimizer app.output.js.optimize=true build.optimize=${build.optimize.enable} enable.cache.manifest=true enable.resource.compression=true build.embedded.microloader.compressor=-closure Now, as an example of compression, let's make a change and place some variables inside the production.properties file. The code we will place here will override the properties set in defaults.properties and production.defaults.properties. So, let's write the following code after the comments: build.embedded.microloader.compressor=-closure build.compression.yui=0 build.compression.closure=1 build.compression=-closure With this code, we are setting up the build process to use closure as the JavaScript compressor and also for the micro loader. Now save the file and use the Sencha CMD tool once again: sencha app build Wait for the process to end and take a look at app.js. You can notice that the code is quite different. This is because the code compiler (closure) was the one that made the compression. Run the app and you will notice no change in the behavior and use of the application. As we have used the production.properties file in this example, notice that in the .sencha folder, we have some other files for different environments, such as: Environment File (or files) Testing testing.defaults.properties and testing.properties Development development.defaults.properties and development.properties Production production.defaults.properties and production.properties It's not recommended that you change the *.default.properties file. That's the reason of the *.properties file, so that you can set your own variables, and doing this will override the settings on default file. Packaging and deploying Finally, after we have built our application, we have our production build/package ready to be deployed. We will have the following structure in our folder: Now we have all the files required to make our application work on a public server. We don't need to upload anything from the Ext JS folder because we have all that we need in app.js (all of the Ext JS code and our code). Also, the resources file contains the images, CSS (the theme used in the app), and of course our serverside folder. So now, we need to upload all of the content to the server: And we are ready to test the production in a public server. Summary In this article, you learned the reasons that will make us to consider using Ext JS 5 for developing projects. We briefly mentioned some of the significant additional features in version 5 that are instrumental in developing applications. Later, we talked about compiling and preparing an application for a production environment. Using Sencha CMD and also configuring JSON or XML files to build a project can sometimes be an overwhelming situation, but don't panic! Check out the documentation of Sencha and Apache. Do remember that there's no reason to be afraid of testing and playing with the configurations. It's all part of learning and knowing how to use Sencha Ext JS. Resources for Article: Further resources on this subject: The Login Page using Ext JS [Article] So, what is Ext JS? [Article] AngularJS Performance [Article]
Read more
  • 0
  • 0
  • 9996

article-image-neo4j-modeling-bookings-and-users
Packt
11 Aug 2015
14 min read
Save for later

Neo4j – Modeling Bookings and Users

Packt
11 Aug 2015
14 min read
In this article, by Mahesh Lal, author of the book Neo4j Graph Data Modeling, we will explore how graphs can be used to solve problems that are dominantly solved using RDBMS, for example, bookings. We will discuss the following topics in this article: Modeling bookings in an RDBMS Modeling bookings in a graph Adding bookings to graphs Using Cypher to find bookings and journeys (For more resources related to this topic, see here.) Building a data model for booking flights We have a graph that allows people to search flights. At this point, a logical extension to the problem statement could be to allow users to book flights online after they decide the route on which they wish to travel. We were only concerned with flights and the cities. However, we need to tweak the model to include users, bookings, dates, and capacity of the flight in order to make bookings. Most teams choose to use an RDBMS for sensitive data such as user information and bookings. Let's understand how we can translate a model from an RDBMS to a graph. A flight booking generally has many moving parts. While it would be great to model all of the parts of a flight booking, a smaller subset would be more feasible, to demonstrate how to model data that is normally stored in a RDBMS. A flight booking will contain information about the user who booked it along with the date of booking. It's not uncommon to change multiple flights to get from one city to another. We can call these journey legs or journeys, and model them separately from the booking that has these journeys. It is also possible that the person booking the flight might be booking for some other people. Because of this, it is advisable to model passengers with their basic details separately from the user. We have intentionally skipped details such as payment and costs in order to keep the model simple. A simple model of the bookings ecosystem A booking generally contains information such as the date of booking, the user who booked it, and a date of commencement of the travel. A journey contains information about the flight code. Other information about the journey such as the departure and arrival time, and the source and destination cities can be evaluated on the basis of the flight which the journey is being undertaken. Both booking and journey will have their own specific IDs to identify them uniquely. Passenger information related to the booking must have the name of the passengers at the very least, but more commonly will have more information such as the age, gender, and e-mail. A rough model of the Booking, Journey, Passenger, and User looks like this: Figure 4.1: Bookings ecosystem Modeling bookings in an RDBMS To model data shown in Figure 4.1 in an RDBMS, we will have to create tables for bookings, journeys, passengers, and users. In the previous model, we have intentionally added booking_id to Journeys and user_id to Bookings. In an RDBMS, these will be used as foreign keys. We also need an additional table Bookings_Passengers_Relationships so that we can depict the many relationships between Bookings and Passengers. The multiple relationships between Bookings and Passengers help us to ensure that we capture passenger details for two purposes. The first is that a user can have a master list of travelers they have travelled with and the second use is to ensure that all the journeys taken by a person can be fetched when the passenger logs into their account or creates an account in the future. We are naming the foreign key references with a prefix fk_ in adherence to the popular convention. Figure 4.2: Modeling bookings in an RDBMS In an RDBMS, every record is a representation of an entity (or a relationship in case of relationship tables). In our case, we tried to represent a single booking record as a single block. This applies to all other entities in the system, such as the journeys, passengers, users, and flights. Each of the records has its own ID by which it can be uniquely identified. The properties starting with fk_ are foreign keys, which should be present in the tables to which the key points. In our model, passengers may or may not be the users of our application. Hence, we don't add a foreign key constraint to the Passengers table. To infer whether the passenger is one of the users or not, we will have to use other means of inferences, for example, the e-mail ID. Given the relationships of the data, which are inferred using the foreign key relationships and other indirect means, we can draw the logical graph of bookings as shown in the following diagram: Figure 4.3: Visualizing related entities in an RDBMS Figure 4.3 shows us the logical graph of how entities are connected in our domain. We can translate this into a Bookings subgraph. From the related entities of Figure 4.3, we can create a specification of the Bookings subgraph, which is as follows: Figure 4.4: Specification of subgraph of bookings Comparing Figure 5.3 and Figure 5.4, we observe that all the fk_ properties are removed from the nodes that represent the entities. Since we have explicit relationships that can now be used to traverse the graph, we don't need implicit relationships that rely on foreign keys to be enforced. We put the date of booking on the booking itself rather than on the relationship between User and Bookings. The date of booking can be captured either in the booking node or in the :MADE_BOOKING relationship. The advantage of capturing it in the booking node is that we can further run queries efficiently on it rather than relying on crude filtering methods to extract information from the subgraph. An important addition to the Bookings object is adding the properties year, month, and day. Since date is not a datatype supported by Neo4j, range queries become difficult. Timestamps solve this problem to some extent, for example, if we want to find all bookings made between June 01, 2015 and July 01, 2015, we can convert them into timestamps and search for all bookings that have timestamps between these two timestamps. This, however, is a very expensive process, and would need a store scan of bookings. To alleviate these problems, we can capture the year, day, and month on the booking. While adapting to the changing needs of the system, remodeling the data model is encouraged. It is also important that we build a data model with enough data captured for our needs—both current and future. It is a judgment-based decision, without any correct answer. As long as the data might be easily derived from existing data in the node, we recommend not to add it until needed. In this case, converting a timestamp to its corresponding date with its components might require additional programming effort. To avoid that, we can begin capturing the data right away. There might be other cases, for example, we want to introduce a property Name on a node with First name and Last name as properties. The derivation of Name from First name and Last name is straightforward. In this case, we advise not to capture the data till the need arises. Creating bookings and users in Neo4j For bookings to exist, we should create users in our data model. Creating users To create users, we create a constraint on the e-mail of the user, which we will use as an unique identifier as shown in the following query: neo4j-sh (?)$ CREATE CONSTRAINT ON (user:User)   ASSERT user.email IS UNIQUE; The output of the preceding query is as follows: +-------------------+ | No data returned. | +-------------------+ Constraints added: 1 With the constraint added, let's create a few users in our system: neo4j-sh (?)$ CREATE (:User{name:"Mahesh Lal",   email:"mahesh.lal@gmail.com"}), (:User{name:"John Doe", email:"john.doe@gmail.com"}), (:User{name:"Vishal P", email:"vishal.p@gmail.com"}), (:User{name:"Dave Coeburg", email:"dave.coeburg@gmail.com"}), (:User{name:"Brian Heritage",     email:"brian.heritage@hotmail.com"}), (:User{name:"Amit Kumar", email:"amit.kumar@hotmail.com"}), (:User{name:"Pramod Bansal",     email:"pramod.bansal@hotmail.com"}), (:User{name:"Deepali T", email:"deepali.t@gmail.com"}), (:User{name:"Hari Seldon", email:"hari.seldon@gmail.com"}), (:User{name:"Elijah", email:"elijah.b@gmail.com"}); The output of the preceding query is as follows: +-------------------+ | No data returned. | +-------------------+ Nodes created: 10 Properties set: 20 Labels added: 10 Please add more users from users.cqy. Creating bookings in Neo4j As discussed earlier, a booking has multiple journey legs, and a booking is only complete when all its journey legs are booked. Bookings in our application aren't a single standalone entity. They involve multiple journeys and passengers. To create a booking, we need to ensure that journeys are created and information about passengers is captured. This results in a multistep process. To ensure that booking IDs remain unique and no two nodes have the same ID, we should add a constraint on the id property of booking: neo4j-sh (?)$ CREATE CONSTRAINT ON (b:Booking)   ASSERT b.id IS UNIQUE; The output will be as follows: +-------------------+ | No data returned. | +-------------------+ Constraints added: 1 We will create similar constraints for Journey as shown here: neo4j-sh (?)$ CREATE CONSTRAINT ON (journey:Journey)   ASSERT journey._id IS UNIQUE; The output is as follows: +-------------------+ | No data returned. | +-------------------+ Constraints added: 1 Add a constraint for the e-mail of passengers to be unique, as shown here: neo4j-sh (?)$ CREATE CONSTRAINT ON (p:Passenger)   ASSERT p.email IS UNIQUE; The output is as shown: +-------------------+ | No data returned. | +-------------------+ Constraints added: 1 With constraint creation, we can now focus on how bookings can be created. We will be running this query in the Neo4j browser, as shown: //Get all flights and users MATCH (user:User{email:"john.doe@gmail.com"}) MATCH (f1:Flight{code:"VS9"}), (f2:Flight{code:"AA9"}) //Create a booking for a date MERGE (user)-[m:MADE_BOOKING]->(booking:Booking {_id:"0f64711c-7e22-11e4-a1af-14109fda6b71", booking_date:1417790677.274862, year: 2014, month: 12, day: 5}) //Create or get passengers MERGE (p1:Passenger{email:"vishal.p@gmail.com"}) ON CREATE SET p1.name = "Vishal Punyani", p1.age= 30 MERGE (p2:Passenger{email:"john.doe@gmail.com"}) ON CREATE SET p2.name = "John Doe", p2.age= 25 //Create journeys to be taken by flights MERGE (j1:Journey{_id: "712785b8-1aff-11e5-abd4-6c40089a9424", date_of_journey:1422210600.0, year:2015, month: 1, day: 26})-[:BY_FLIGHT]-> (f1) MERGE (j2:Journey{_id:"843de08c-1aff-11e5-8643-6c40089a9424", date_of_journey:1422210600.0, year:2015, month: 1, day: 26})-[:BY_FLIGHT]-> (f2) WITH user, booking, j1, j2, f1, f2, p1, p2 //Merge journeys and booking, Create and Merge passengers with bookings, and return data MERGE (booking)-[:HAS_PASSENGER]->(p1) MERGE (booking)-[:HAS_PASSENGER]->(p2) MERGE (booking)-[:HAS_JOURNEY]->(j1) MERGE (booking)-[:HAS_JOURNEY]->(j2) RETURN user, p1, p2, j1, j2, f1, f2, booking The output is as shown in the following screenshot: Figure 4.5: Booking that was just created We have added comments to the query to explain the different parts of the query. The query can be divided into the following parts: Finding flights and user Creating bookings Creating journeys Creating passengers and link to booking Linking journey to booking We have the same start date for both journeys, but in general, the start dates of journeys in the same booking will differ if: The traveler is flying across time zones. For example, if a traveler is flying from New York to Istanbul, the journeys from New York to London and from London to Istanbul will be on different dates. The traveler is booking multiple journeys in which they will be spending some time at a destination. Let's use bookings.cqy to add a few more bookings to the graph. We will use them to run further queries. Queries to find journeys and bookings With the data on bookings added in, we can now explore some interesting queries that can help us. Finding all journeys of a user All journeys that a user has undertaken will be all journeys that they have been a passenger on. We can use the user's e-mail to search for journeys on which the user has been a passenger. To find all the journeys that the user has been a passenger on, we should find the journeys via the bookings, and then using the bookings, we can find the journeys, flights, and cities as shown: neo4j-sh (?)$ MATCH (b:Booking)-[:HAS_PASSENGER]->(p:Passenger{email:"vishal.p@gmail.com"}) WITH b MATCH (b)-[:HAS_JOURNEY]->(j:Journey)-[:BY_FLIGHT]->(f:Flight) WITH b._id as booking_id, j.date_of_journey as date_of_journey, COLLECT(f) as flights ORDER BY date_of_journey DESC MATCH (source:City)-[:HAS_FLIGHT]->(f)-[:FLYING_TO]->(destination:City) WHERE f in flights RETURN booking_id, date_of_journey, source.name as from, f.code as by_flight, destination.name as to; The output of this query is as follows: While this query is useful to get all the journeys of the user, it can also be used to map all the locations the user has travelled to. Queries for finding the booking history of a user The query for finding all bookings by a user is straightforward, as shown here: neo4j-sh (?)$ MATCH (user:User{email:"mahesh.lal@gmail.com"})-[:MADE_BOOKING]->(b:Booking) RETURN b._id as booking_id; The output of the preceding query is as follows: +----------------------------------------+ | booking_id                             | +----------------------------------------+ | "251679be-1b3f-11e5-820e-6c40089a9424" | | "ff3dd694-7e7f-11e4-bb93-14109fda6b71" | | "7c63cc35-7e7f-11e4-8ffe-14109fda6b71" | | "f5f15252-1b62-11e5-8252-6c40089a9424" | | "d45de0c2-1b62-11e5-98a2-6c40089a9424" | | "fef04c30-7e2d-11e4-8842-14109fda6b71" | | "f87a515e-7e2d-11e4-b170-14109fda6b71" | | "75b3e78c-7e2b-11e4-a162-14109fda6b71" | +----------------------------------------+ 8 rows Upcoming journeys of a user Upcoming journeys of a user is straightforward. We can construct it by simply comparing today's date to the journey date as shown: neo4j-sh (?)$ MATCH (user:User{email:"mahesh.lal@gmail.com"})-[:MADE_BOOKING]->(:Booking)-[:HAS_JOURNEY]-(j:Journey) WHERE j.date_of_journey >=1418055307 WITH COLLECT(j) as journeys MATCH (j:Journey)-[:BY_FLIGHT]->(f:Flight) WHERE j in journeys WITH j.date_of_journey as date_of_journey, COLLECT(f) as flights MATCH (source:City)-[:HAS_FLIGHT]->(f)-[:FLYING_TO]->(destination:City) WHERE f in flights RETURN date_of_journey, source.name as from, f.code as by_flight, destination.name as to; The output of the preceding query is as follows: +-------------------------------------------------------------+ | date_of_journey | from         | by_flight | to           | +-------------------------------------------------------------+ | 1.4226426E9     | "New York"   | "VS8"     | "London"     | | 1.4212602E9     | "Los Angeles" | "UA1262" | "New York"   | | 1.4212602E9     | "Melbourne"   | "QF94"   | "Los Angeles" | | 1.4304186E9     | "New York"   | "UA1507" | "Los Angeles" | | 1.4311962E9     | "Los Angeles" | "AA920"   | "New York"   | +-------------------------------------------------------------+ 5 rows Summary In this article, you learned how you can model a domain that has traditionally been implemented using RDBMS. We saw how tables can be changed to nodes and relationships, and we explored what happened to relationship tables. You also learned about transactions in Cypher and wrote Cypher to manipulate the database. Resources for Article: Further resources on this subject: Selecting the Layout [article] Managing Alerts [article] Working with a Neo4j Embedded Database [article]
Read more
  • 0
  • 0
  • 2463

article-image-introduction-wep
Packt
10 Aug 2015
4 min read
Save for later

An Introduction to WEP

Packt
10 Aug 2015
4 min read
In this article by Marco Alamanni, author of the book, Kali Linux Wireless Penetration Testing Essentials, has explained that the WEP protocol was introduced with the original 802.11 standard as a means to provide authentication and encryption to wireless LAN implementations. It is based on the RC4 (Rivest Cipher 4) stream cypher with a preshared secret key (PSK) of 40 or 104 bits, depending on the implementation. A 24 bit pseudo-random Initialization Vector (IV) is concatenated with the preshared key to generate the per-packet keystream used by RC4 for the actual encryption and decryption processes. Thus, the resulting keystream could be 64 or 128 bits long. (For more resources related to this topic, see here.) In the encryption phase, the keystream is XORed with the plaintext data to obtain the encrypted data, while in the decryption phase the encrypted data is XORed with the keystream to obtain the plaintext data. The encryption process is shown in the following diagram: Attacks against WEP First of all, we must say that WEP is an insecure protocol and has been deprecated by the Wi-Fi Alliance. It suffers from various vulnerabilities related to the generation of the keystreams, to the use of IVs and to the length of the keys. The IV is used to add randomness to the keystream, trying to avoid the reuse of the same keystream to encrypt different packets. This purpose has not been accomplished in the design of WEP, because the IV is only 24 bits long (with 2^24 = 16,777,216 possible values) and it is transmitted in clear-text within each frame. Thus, after a certain period of time (depending on the network traffic) the same IV, and consequently the same keystream, will be reused, allowing the attacker to collect the relative cypher texts and perform statistical attacks to recover the plain texts and the key. The first well-known attack against WEP was the Fluhrer, Mantin and Shamir (FMS) attack, back in 2001. The FMS attack relies on the way WEP generates the keystreams and on the fact that it also uses weak IVs to generate weak keystreams, making possible for an attacker to collect a sufficient number of packets encrypted with these keystreams, analyze them, and recover the key. The number of IVs to be collected to complete the FMS attack is about 250,000 for 40-bit keys and 1,500,000 for 104-bit keys. The FMS attack has been enhanced by Korek, improving its performances. Andreas Klein found more correlations between the RC4 keystream and the key than the ones discovered by Fluhrer, Mantin, and Shamir, that can used to crack the WEP key. In 2007, Pyshkin, Tews, and Weinmann (PTW) extended Andreas Klein's research and improved the FMS attack, significantly reducing the number of IVs needed to successfully recover the WEP key. Indeed, the PTW attack does not rely on weak IVs like the FMS attack does and is very fast and effective. It is able to recover a 104-bit WEP key with a success probability of 50 percent using less than 40,000 frames and with a probability of 95 percent with 85,000 frames. The PTW attack is the default method used by Aircrack-ng to crack WEP keys. Both the FMS and PTW attacks need to collect quite a large number of frames to succeed and can be conducted passively, sniffing the wireless traffic on the same channel of the target AP and capturing frames. The problem is that, in normal conditions, we will have to spend quite a long time to passively collect all the necessary packets for the attacks, especially with the FMS attack. To accelerate the process, the idea is to re-inject frames in the network to generate traffic in response so that we could collect the necessary IVs more quickly. A type of frame that is suitable for this purpose is the ARP request, because the AP broadcasts it and each time with a new IV. As we are not associated with the AP, if we send frames to it directly, they are discarded and a de-authentication frame is sent. Instead, we can capture ARP requests from associated clients and retransmit them to the AP. This technique is called the ARP Request Replay attack and is also adopted by Aircrack-ng for the implementation of the PTW attack. Summary In this article, we covered the WEP protocol, the attacks that have been developed to crack the keys. Resources for Article: Further resources on this subject: Kali Linux – Wireless Attacks [article] What is Kali Linux [article] Penetration Testing [article]
Read more
  • 0
  • 0
  • 9674

article-image-integrating-muzzley
Packt
10 Aug 2015
12 min read
Save for later

Integrating with Muzzley

Packt
10 Aug 2015
12 min read
In this article by Miguel de Sousa, author of the book Internet of Things with Intel Galileo, we will cover the following topics: Wiring the circuit The Muzzley IoT ecosystem Creating a Muzzley app Lighting up the entrance door (For more resources related to this topic, see here.) One identified issue regarding IoT is that there will be lots of connected devices and each one speaks its own language, not sharing the same protocols with other devices. This leads to an increasing number of apps to control each of those devices. Every time you purchase connected products, you'll be required to have the exclusive product app, and, in the near future, where it is predicted that more devices will be connected to the Internet than people, this is indeed a problem, which is known as the basket of remotes. Many solutions have been appearing for this problem. Some of them focus on creating common communication standards between the devices or even creating their own protocol such as the Intel Common Connectivity Framework (CCF). A different approach consists in predicting the device's interactions, where collected data is used to predict and trigger actions on the specific devices. An example using this approach is Muzzley. It not only supports a common way to speak with the devices, but also learns from the users' interaction, allowing them to control all their devices from a common app, and on collecting usage data, it can predict users' actions and even make different devices work together. In this article, we will start by understanding what Muzzley is and how we can integrate with it. We will then do some development to allow you to control your own building's entrance door. For this purpose, we will use Galileo as a bridge to communicate with a relay and the Muzzley cloud, allowing you to control the door from a common mobile app and from anywhere as long as there is Internet access. Wiring the circuit In this article, we'll be using a real home AC inter-communicator with a building entrance door unlock button and this will require you to do some homework. This integration will require you to open your inter-communicator and adjust the inner circuit, so be aware that there are always risks of damaging it. If you don't want to use a real inter-communicator, you can replace it by an LED or even by the buzzer module. If you want to use a real device, you can use a DC inter-communicator, but in this guide, we'll only be explaining how to do the wiring using an AC inter-communicator. The first thing you have to do is to take a look at the device manual and check whether it works with AC current, and the voltage it requires. If you can't locate your product manual, search for it online. In this article, we'll be using the solid state relay. This relay accepts a voltage range from 24 V up to 380 V AC, and your inter-communicator should also work in this voltage range. You'll also need some electrical wires and electrical wires junctions: Wire junctions and the solid state relay This equipment will be used to adapt the door unlocking circuit to allow it to be controlled from the Galileo board using a relay. The main idea is to use a relay to close the door opener circuit, resulting in the door being unlocked. This can be accomplished by joining the inter-communicator switch wires with the relay wires. Use some wire and wire junctions to do it, as displayed in the following image: Wiring the circuit The building/house AC circuit is represented in yellow, and S1 and S2 represent the inter-communicator switch (button). On pressing the button, we will also be closing this circuit, and the door will be unlocked. This way, the lock can be controlled both ways, using the original button and the relay. Before starting to wire the circuit, make sure that the inter-communicator circuit is powered off. If you can't switch it off, you can always turn off your house electrical board for a couple of minutes. Make sure that it is powered off by pressing the unlock button and trying to open the door. If you are not sure of what you must do or don't feel comfortable doing it, ask for help from someone more experienced. Open your inter-communicator, locate the switch, and perform the changes displayed in the preceding image (you may have to do some soldering). The Intel Galileo board will then activate the relay using pin 13, where you should wire it to the relay's connector number 3, and the Galileo's ground (GND) should be connected to the relay's connector number 4. Beware that not all the inter-communicator circuits work the same way and although we try to provide a general way to do it, there're always the risk of damaging your device or being electrocuted. Do it at your own risk. Power on your inter-communicator circuit and check whether you can open the door by pressing the unlock door button. If you prefer not using the inter-communicator with the relay, you can always replace it with a buzzer or an LED to simulate the door opening. Also, since the relay is connected to Galileo's pin 13, with the same relay code, you'll have visual feedback from the Galileo's onboard LED. The Muzzley IoT ecosystem Muzzley is an Internet of Things ecosystem that is composed of connected devices, mobile apps, and cloud-based services. Devices can be integrated with Muzzley through the device cloud or the device itself: It offers device control, a rules system, and a machine learning system that predicts and suggests actions, based on the device usage. The mobile app is available for Android, iOS, and Windows phone. It can pack all your Internet-connected devices in to a single common app, allowing them to be controlled together, and to work with other devices that are available in real-world stores or even other homemade connected devices, like the one we will create in this article. Muzzley is known for being one of the first generation platforms with the ability to predict a users' actions by learning from the user's interaction with their own devices. Human behavior is mostly unpredictable, but for convenience, people end up creating routines in their daily lives. The interaction with home devices is an example where human behavior can be observed and learned by an automated system. Muzzley tries to take advantage of these behaviors by identifying the user's recurrent routines and making suggestions that could accelerate and simplify the interaction with the mobile app and devices. Devices that don't know of each others' existence get connected through the user behavior and may create synergies among themselves. When the user starts using the Muzzley app, the interaction is observed by a profiler agent that tries to acquire a behavioral network of the linked cause-effect events. When the frequency of these network associations becomes important enough, the profiler agent emits a suggestion for the user to act upon. For instance, if every time a user arrives home, he switches on the house lights, check the thermostat, and adjust the air conditioner accordingly, the profiler agent will emit a set of suggestions based on this. The cause of the suggestion is identified and shortcuts are offered for the effect-associated action. For instance, the user could receive in the Muzzley app the following suggestions: "You are arriving at a known location. Every time you arrive here, you switch on the «Entrance bulb». Would you like to do it now?"; or "You are arriving at a known location. The thermostat «Living room» says that the temperature is at 15 degrees Celsius. Would you like to set your «Living room» air conditioner to 21 degrees Celsius?" When it comes to security and privacy, Muzzley takes it seriously and all the collected data is used exclusively to analyze behaviors to help make your life easier. This is the system where we will be integrating our door unlocker. Creating a Muzzley app The first step is to own a Muzzley developer account. If you don't have one yet, you can obtain one by visiting https://muzzley.com/developers, clicking on the Sign up button, and submitting the displayed form. To create an app, click on the top menu option Apps and then Create app. Name your App Galileo Lock and if you want to, add a description to your project. As soon as you click on Submit, you'll see two buttons displayed, allowing you to select the integration type: Muzzley allows you to integrate through the product manufacturer cloud or directly with a device. In this example, we will be integrating directly with the device. To do so, click on Device to Cloud integration. Fill in the provider name as you wish and pick two image URLs to be used as the profile (for example, http://hub.packtpub.com/wp-content/uploads/2015/08/Commercial1.jpg) and channel (for example, http://hub.packtpub.com/wp-content/uploads/2015/08/lock.png) images. We can select one of two available ways to add our device: it can be done using UPnP discovery or by inserting a custom serial number. Pick the device discovery option Serial number and ignore the fields Interface UUID and Email Access List; we will come back for them later. Save your changes by pressing the Save changes button. Lighting up the entrance door Now that we can unlock our door from anywhere using the mobile phone with an Internet connection, a nice thing to have is the entrance lights turn on when you open the building door using your Muzzley app. To do this, you can use the Muzzley workers to define rules to perform an action when the door is unlocked using the mobile app. To do this, you'll need to own one of the Muzzley-enabled smart bulbs such as Philips Hue, WeMo LED Lighting, Milight, Easybulb, or LIFX. You can find all the enabled devices in the app profiles selection list: If you don't have those specific lighting devices but have another type of connected device, search the available list to see whether it is supported. If it is, you can use that instead. Add your bulb channel to your account. You should now find it listed in your channels under the category Lighting. If you click on it, you'll be able to control the lights. To activate the trigger option in the lock profile we created previously, go to the Muzzley website and head back to the Profile Spec app, located inside App Details. Expand the property lock status by clicking on the arrow sign in the property #1 - Lock Status section and then expand the controlInterfaces section. Create a new control interface by clicking on the +controlInterface button. In the new controlInterface #1 section, we'll need to define the possible choices of label-values for this property when setting a rule. Feel free to insert an id, and in the control interface option, select the text-picker option. In the config field, we'll need to specify each of the available options, setting the display label and the real value that will be published. Insert the following JSON object: {"options":[{"value":"true","label":"Lock"}, {"value":"false","label":"Unlock"}]}. Now we need to create a trigger. In the profile spec, expand the trigger section. Create a new trigger by clicking on the +trigger button. Inside the newly created section, select the equals condition. Create an input by clicking on +input, insert the ID value, insert the ID of the control interface you have just created in the controlInterfaceId text field. Finally, add the [{"source":"selection.value","target":"data.value"}].path to map the data. Open your mobile app and click on the workers icon. Clicking on Create Worker will display the worker creation menu to you. Here, you'll be able to select a channel component property as a trigger to some other channel component property: Select the lock and select the Lock Status is equal to Unlock trigger. Save it and select the action button. In here, select the smart bulb you own and select the Status On option: After saving this rule, give it a try and use your mobile phone to unlock the door. The smart bulb should then turn on. With this, you can configure many things in your home even before you arrive there. In this specific scenario, we used our door locker as a trigger to accomplish an action on a lightbulb. If you want, you can do the opposite and open the door when a lightbulb lights up a specific color for instance. To do it, similar to how you configured your device trigger, you just have to set up the action options in your device profile page. Summary Everyday objects that surround us are being transformed into information ecosystems and the way we interact with them is slowly changing. Although IoT is growing up fast, it is nowadays in an early stage, and many issues must be solved in order to make it successfully scalable. By 2020, it is estimated that there will be more than 25 billion devices connected to the Internet. This fast growth without security regulations and deep security studies are leading to major concerns regarding the two biggest IoT challenges—security and privacy. Devices in our home that are remotely controllable or even personal data information getting into the wrong hands could be the recipe for a disaster. In this article you have learned the basic steps in wiring the circuit of your Galileo board, creating a Muzzley app, and lighting up the entrance door of your building through your Muzzley app, by using Intel Galileo board as a bridge to communicate with Muzzley cloud. Resources for Article: Further resources on this subject: Getting Started with Intel Galileo [article] Getting the current weather forecast [article] Controlling DC motors using a shield [article]
Read more
  • 0
  • 0
  • 8991

article-image-bayesian-network-fundamentals
Packt
10 Aug 2015
25 min read
Save for later

Bayesian Network Fundamentals

Packt
10 Aug 2015
25 min read
In this article by Ankur Ankan and Abinash Panda, the authors of Mastering Probabilistic Graphical Models Using Python, we'll cover the basics of random variables, probability theory, and graph theory. We'll also see the Bayesian models and the independencies in Bayesian models. A graphical model is essentially a way of representing joint probability distribution over a set of random variables in a compact and intuitive form. There are two main types of graphical models, namely directed and undirected. We generally use a directed model, also known as a Bayesian network, when we mostly have a causal relationship between the random variables. Graphical models also give us tools to operate on these models to find conditional and marginal probabilities of variables, while keeping the computational complexity under control. (For more resources related to this topic, see here.) Probability theory To understand the concepts of probability theory, let's start with a real-life situation. Let's assume we want to go for an outing on a weekend. There are a lot of things to consider before going: the weather conditions, the traffic, and many other factors. If the weather is windy or cloudy, then it is probably not a good idea to go out. However, even if we have information about the weather, we cannot be completely sure whether to go or not; hence we have used the words probably or maybe. Similarly, if it is windy in the morning (or at the time we took our observations), we cannot be completely certain that it will be windy throughout the day. The same holds for cloudy weather; it might turn out to be a very pleasant day. Further, we are not completely certain of our observations. There are always some limitations in our ability to observe; sometimes, these observations could even be noisy. In short, uncertainty or randomness is the innate nature of the world. The probability theory provides us the necessary tools to study this uncertainty. It helps us look into options that are unlikely yet probable. Random variable Probability deals with the study of events. From our intuition, we can say that some events are more likely than others, but to quantify the likeliness of a particular event, we require the probability theory. It helps us predict the future by assessing how likely the outcomes are. Before going deeper into the probability theory, let's first get acquainted with the basic terminologies and definitions of the probability theory. A random variable is a way of representing an attribute of the outcome. Formally, a random variable X is a function that maps a possible set of outcomes ? to some set E, which is represented as follows: X : ? ? E As an example, let us consider the outing example again. To decide whether to go or not, we may consider the skycover (to check whether it is cloudy or not). Skycover is an attribute of the day. Mathematically, the random variable skycover (X) is interpreted as a function, which maps the day (?) to its skycover values (E). So when we say the event X = 40.1, it represents the set of all the days {?} such that  , where  is the mapping function. Formally speaking, . Random variables can either be discrete or continuous. A discrete random variable can only take a finite number of values. For example, the random variable representing the outcome of a coin toss can take only two values, heads or tails; and hence, it is discrete. Whereas, a continuous random variable can take infinite number of values. For example, a variable representing the speed of a car can take any number values. For any event whose outcome is represented by some random variable (X), we can assign some value to each of the possible outcomes of X, which represents how probable it is. This is known as the probability distribution of the random variable and is denoted by P(X). For example, consider a set of restaurants. Let X be a random variable representing the quality of food in a restaurant. It can take up a set of values, such as {good, bad, average}. P(X), represents the probability distribution of X, that is, if P(X = good) = 0.3, P(X = average) = 0.5, and P(X = bad) = 0.2. This means there is 30 percent chance of a restaurant serving good food, 50 percent chance of it serving average food, and 20 percent chance of it serving bad food. Independence and conditional independence In most of the situations, we are rather more interested in looking at multiple attributes at the same time. For example, to choose a restaurant, we won't only be looking just at the quality of food; we might also want to look at other attributes, such as the cost, location, size, and so on. We can have a probability distribution over a combination of these attributes as well. This type of distribution is known as joint probability distribution. Going back to our restaurant example, let the random variable for the quality of food be represented by Q, and the cost of food be represented by C. Q can have three categorical values, namely {good, average, bad}, and C can have the values {high, low}. So, the joint distribution for P(Q, C) would have probability values for all the combinations of states of Q and C. P(Q = good, C = high) will represent the probability of a pricey restaurant with good quality food, while P(Q = bad, C = low) will represent the probability of a restaurant that is less expensive with bad quality food. Let us consider another random variable representing an attribute of a restaurant, its location L. The cost of food in a restaurant is not only affected by the quality of food but also the location (generally, a restaurant located in a very good location would be more costly as compared to a restaurant present in a not-very-good location). From our intuition, we can say that the probability of a costly restaurant located at a very good location in a city would be different (generally, more) than simply the probability of a costly restaurant, or the probability of a cheap restaurant located at a prime location of city is different (generally less) than simply probability of a cheap restaurant. Formally speaking, P(C = high | L = good) will be different from P(C = high) and P(C = low | L = good) will be different from P(C = low). This indicates that the random variables C and L are not independent of each other. These attributes or random variables need not always be dependent on each other. For example, the quality of food doesn't depend upon the location of restaurant. So, P(Q = good | L = good) or P(Q = good | L = bad)would be the same as P(Q = good), that is, our estimate of the quality of food of the restaurant will not change even if we have knowledge of its location. Hence, these random variables are independent of each other. In general, random variables  can be considered as independent of each other, if: They may also be considered independent if: We can easily derive this conclusion. We know the following from the chain rule of probability: P(X, Y) = P(X) P(Y | X) If Y is independent of X, that is, if X | Y, then P(Y | X) = P(Y). Then: P(X, Y) = P(X) P(Y) Extending this result on multiple variables, we can easily get to the conclusion that a set of random variables are independent of each other, if their joint probability distribution is equal to the product of probabilities of each individual random variable. Sometimes, the variables might not be independent of each other. To make this clearer, let's add another random variable, that is, the number of people visiting the restaurant N. Let's assume that, from our experience we know the number of people visiting only depends on the cost of food at the restaurant and its location (generally, lesser number of people visit costly restaurants). Does the quality of food Q affect the number of people visiting the restaurant? To answer this question, let's look into the random variable affecting N, cost C, and location L. As C is directly affected by Q, we can conclude that Q affects N. However, let's consider a situation when we know that the restaurant is costly, that is, C = high and let's ask the same question, "does the quality of food affect the number of people coming to the restaurant?". The answer is no. The number of people coming only depends on the price and location, so if we know that the cost is high, then we can easily conclude that fewer people will visit, irrespective of the quality of food. Hence,  . This type of independence is called conditional independence. Installing tools Let's now see some coding examples using pgmpy, to represent joint distributions and independencies. Here, we will mostly work with IPython and pgmpy (and a few other libraries) for coding examples. So, before moving ahead, let's get a basic introduction to these. IPython IPython is a command shell for interactive computing in multiple programming languages, originally developed for the Python programming language, which offers enhanced introspection, rich media, additional shell syntax, tab completion, and a rich history. IPython provides the following features: Powerful interactive shells (terminal and Qt-based) A browser-based notebook with support for code, text, mathematical expressions, inline plots, and other rich media Support for interactive data visualization and use of GUI toolkits Flexible and embeddable interpreters to load into one's own projects Easy-to-use and high performance tools for parallel computing You can install IPython using the following command: >>> pip3 install ipython To start the IPython command shell, you can simply type ipython3 in the terminal. For more installation instructions, you can visit http://ipython.org/install.html. pgmpy pgmpy is a Python library to work with Probabilistic Graphical models. As it's currently not on PyPi, we will need to build it manually. You can get the source code from the Git repository using the following command: >>> git clone https://github.com/pgmpy/pgmpy Now cd into the cloned directory switch branch for version used and build it with the following code: >>> cd pgmpy >>> git checkout book/v0.1 >>> sudo python3 setup.py install For more installation instructions, you can visit http://pgmpy.org/install.html. With both IPython and pgmpy installed, you should now be able to run the examples. Representing independencies using pgmpy To represent independencies, pgmpy has two classes, namely IndependenceAssertion and Independencies. The IndependenceAssertion class is used to represent individual assertions of the form of  or  . Let's see some code to represent assertions: # Firstly we need to import IndependenceAssertion In [1]: from pgmpy.independencies import IndependenceAssertion # Each assertion is in the form of [X, Y, Z] meaning X is # independent of Y given Z. In [2]: assertion1 = IndependenceAssertion('X', 'Y') In [3]: assertion1 Out[3]: (X _|_ Y) Here, assertion1 represents that the variable X is independent of the variable Y. To represent conditional assertions, we just need to add a third argument to IndependenceAssertion: In [4]: assertion2 = IndependenceAssertion('X', 'Y', 'Z') In [5]: assertion2 Out [5]: (X _|_ Y | Z) In the preceding example, assertion2 represents . IndependenceAssertion also allows us to represent assertions in the form of  . To do this, we just need to pass a list of random variables as arguments: In [4]: assertion2 = IndependenceAssertion('X', 'Y', 'Z') In [5]: assertion2 Out[5]: (X _|_ Y | Z) Moving on to the Independencies class, an Independencies object is used to represent a set of assertions. Often, in the case of Bayesian or Markov networks, we have more than one assertion corresponding to a given model, and to represent these independence assertions for the models, we generally use the Independencies object. Let's take a few examples: In [8]: from pgmpy.independencies import Independencies # There are multiple ways to create an Independencies object, we # could either initialize an empty object or initialize with some # assertions.   In [9]: independencies = Independencies() # Empty object In [10]: independencies.get_assertions() Out[10]: []   In [11]: independencies.add_assertions(assertion1, assertion2) In [12]: independencies.get_assertions() Out[12]: [(X _|_ Y), (X _|_ Y | Z)] We can also directly initialize Independencies in these two ways: In [13]: independencies = Independencies(assertion1, assertion2) In [14]: independencies = Independencies(['X', 'Y'],                                          ['A', 'B', 'C']) In [15]: independencies.get_assertions() Out[15]: [(X _|_ Y), (A _|_ B | C)] Representing joint probability distributions using pgmpy We can also represent joint probability distributions using pgmpy's JointProbabilityDistribution class. Let's say we want to represent the joint distribution over the outcomes of tossing two fair coins. So, in this case, the probability of all the possible outcomes would be 0.25, which is shown as follows: In [16]: from pgmpy.factors import JointProbabilityDistribution as         Joint In [17]: distribution = Joint(['coin1', 'coin2'],                              [2, 2],                              [0.25, 0.25, 0.25, 0.25]) Here, the first argument includes names of random variable. The second argument is a list of the number of states of each random variable. The third argument is a list of probability values, assuming that the first variable changes its states the slowest. So, the preceding distribution represents the following: In [18]: print(distribution) +--------------------------------------+ ¦ coin1   ¦ coin2   ¦   P(coin1,coin2) ¦ ¦---------+---------+------------------¦ ¦ coin1_0 ¦ coin2_0 ¦   0.2500         ¦ +---------+---------+------------------¦ ¦ coin1_0 ¦ coin2_1 ¦   0.2500         ¦ +---------+---------+------------------¦ ¦ coin1_1 ¦ coin2_0 ¦   0.2500         ¦ +---------+---------+------------------¦ ¦ coin1_1 ¦ coin2_1 ¦   0.2500         ¦ +--------------------------------------+ We can also conduct independence queries over these distributions in pgmpy: In [19]: distribution.check_independence('coin1', 'coin2') Out[20]: True Conditional probability distribution Let's take an example to understand conditional probability better. Let's say we have a bag containing three apples and five oranges, and we want to randomly take out fruits from the bag one at a time without replacing them. Also, the random variables  and  represent the outcomes in the first try and second try respectively. So, as there are three apples and five oranges in the bag initially,  and  . Now, let's say that in our first attempt we got an orange. Now, we cannot simply represent the probability of getting an apple or orange in our second attempt. The probabilities in the second attempt will depend on the outcome of our first attempt and therefore, we use conditional probability to represent such cases. Now, in the second attempt, we will have the following probabilities that depend on the outcome of our first try:  ,  ,  , and  . The Conditional Probability Distribution (CPD) of two variables  and  can be represented as  , representing the probability of  given  that is the probability of  after the event  has occurred and we know it's outcome. Similarly, we can have  representing the probability of  after having an observation for . The simplest representation of CPD is tabular CPD. In a tabular CPD, we construct a table containing all the possible combinations of different states of the random variables and the probabilities corresponding to these states. Let's consider the earlier restaurant example. Let's begin by representing the marginal distribution of the quality of food with Q. As we mentioned earlier, it can be categorized into three values {good, bad, average}. For example, P(Q) can be represented in the tabular form as follows: Quality P(Q) Good 0.3 Normal 0.5 Bad 0.2 Similarly, let's say P(L) is the probability distribution of the location of the restaurant. Its CPD can be represented as follows: Location P(L) Good 0.6 Bad 0.4 As the cost of restaurant C depends on both the quality of food Q and its location L, we will be considering P(C | Q, L), which is the conditional distribution of C, given Q and L: Location Good Bad Quality Good Normal Bad Good Normal Bad Cost             High 0.8 0.6 0.1 0.6 0.6 0.05 Low 0.2 0.4 0.9 0.4 0.4 0.95 Representing CPDs using pgmpy Let's first see how to represent the tabular CPD using pgmpy for variables that have no conditional variables: In [1]: from pgmpy.factors import TabularCPD   # For creating a TabularCPD object we need to pass three # arguments: the variable name, its cardinality that is the number # of states of the random variable and the probability value # corresponding each state. In [2]: quality = TabularCPD(variable='Quality',                              variable_card=3,                                values=[[0.3], [0.5], [0.2]]) In [3]: print(quality) +----------------------+ ¦ ['Quality', 0] ¦ 0.3 ¦ +----------------+-----¦ ¦ ['Quality', 1] ¦ 0.5 ¦ +----------------+-----¦ ¦ ['Quality', 2] ¦ 0.2 ¦ +----------------------+ In [4]: quality.variables Out[4]: OrderedDict([('Quality', [State(var='Quality', state=0),                                  State(var='Quality', state=1),                                  State(var='Quality', state=2)])])   In [5]: quality.cardinality Out[5]: array([3])   In [6]: quality.values Out[6]: array([0.3, 0.5, 0.2]) You can see here that the values of the CPD are a 1D array instead of a 2D array, which you passed as an argument. Actually, pgmpy internally stores the values of the TabularCPD as a flattened numpy array. In [7]: location = TabularCPD(variable='Location',                               variable_card=2,                              values=[[0.6], [0.4]]) In [8]: print(location) +-----------------------+ ¦ ['Location', 0] ¦ 0.6 ¦ +-----------------+-----¦ ¦ ['Location', 1] ¦ 0.4 ¦ +-----------------------+ However, when we have conditional variables, we also need to specify them and the cardinality of those variables. Let's define the TabularCPD for the cost variable: In [9]: cost = TabularCPD(                      variable='Cost',                      variable_card=2,                      values=[[0.8, 0.6, 0.1, 0.6, 0.6, 0.05],                              [0.2, 0.4, 0.9, 0.4, 0.4, 0.95]],                      evidence=['Q', 'L'],                      evidence_card=[3, 2]) Graph theory The second major framework for the study of probabilistic graphical models is graph theory. Graphs are the skeleton of PGMs, and are used to compactly encode the independence conditions of a probability distribution. Nodes and edges The foundation of graph theory was laid by Leonhard Euler when he solved the famous Seven Bridges of Konigsberg problem. The city of Konigsberg was set on both sides by the Pregel river and included two islands that were connected and maintained by seven bridges. The problem was to find a walk to exactly cross all the bridges once in a single walk. To visualize the problem, let's think of the graph in Fig 1.1: Fig 1.1: The Seven Bridges of Konigsberg graph Here, the nodes a, b, c, and d represent the land, and are known as vertices of the graph. The line segments ab, bc, cd, da, ab, and bc connecting the land parts are the bridges and are known as the edges of the graph. So, we can think of the problem of crossing all the bridges once in a single walk as tracing along all the edges of the graph without lifting our pencils. Formally, a graph G = (V, E) is an ordered pair of finite sets. The elements of the set V are known as the nodes or the vertices of the graph, and the elements of  are the edges or the arcs of the graph. The number of nodes or cardinality of G, denoted by |V|, are known as the order of the graph. Similarly, the number of edges denoted by |E| are known as the size of the graph. Here, we can see that the Konigsberg city graph shown in Fig 1.1 is of order 4 and size 7. In a graph, we say that two vertices, u, v ? V are adjacent if u, v ? E. In the City graph, all the four vertices are adjacent to each other because there is an edge for every possible combination of two vertices in the graph. Also, for a vertex v ? V, we define the neighbors set of v as  . In the City graph, we can see that b and d are neighbors of c. Similarly, a, b, and c are neighbors of d. We define an edge to be a self loop if the start vertex and the end vertex of the edge are the same. We can put it more formally as, any edge of the form (u, u), where u ? V is a self loop. Until now, we have been talking only about graphs whose edges don't have a direction associated with them, which means that the edge (u, v) is same as the edge (v, u). These types of graphs are known as undirected graphs. Similarly, we can think of a graph whose edges have a sense of direction associated with it. For these graphs, the edge set E would be a set of ordered pair of vertices. These types of graphs are known as directed graphs. In the case of a directed graph, we also define the indegree and outdegree for a vertex. For a vertex v ? V, we define its outdegree as the number of edges originating from the vertex v, that is,  . Similarly, the indegree is defined as the number of edges that end at the vertex v, that is,  . Walk, paths, and trails For a graph G = (V, E) and u,v ? V, we define a u - v walk as an alternating sequence of vertices and edges, starting with u and ending with v. In the City graph of Fig 1.1, we can have an example of a - d walk as . If there aren't multiple edges between the same vertices, then we simply represent a walk by a sequence of vertices. As in the case of the Butterfly graph shown in Fig 1.2, we can have a walk W : a, c, d, c, e: Fig 1.2: Butterfly graph—a undirected graph A walk with no repeated edges is known as a trail. For example, the walk  in the City graph is a trail. Also, a walk with no repeated vertices, except possibly the first and the last, is known as a path. For example, the walk  in the City graph is a path. Also, a graph is known as cyclic if there are one or more paths that start and end at the same node. Such paths are known as cycles. Similarly, if there are no cycles in a graph, it is known as an acyclic graph. Bayesian models In most of the real-life cases when we would be representing or modeling some event, we would be dealing with a lot of random variables. Even if we would consider all the random variables to be discrete, there would still be exponentially large number of values in the joint probability distribution. Dealing with such huge amount of data would be computationally expensive (and in some cases, even intractable), and would also require huge amount of memory to store the probability of each combination of states of these random variables. However, in most of the cases, many of these variables are marginally or conditionally independent of each other. By exploiting these independencies, we can reduce the number of values we need to store to represent the joint probability distribution. For instance, in the previous restaurant example, the joint probability distribution across the four random variables that we discussed (that is, quality of food Q, location of restaurant L, cost of food C, and the number of people visiting N) would require us to store 23 independent values. By the chain rule of probability, we know the following: P(Q, L, C, N) = P(Q) P(L|Q) P(C|L, Q) P(N|C, Q, L) Now, let us try to exploit the marginal and conditional independence between the variables, to make the representation more compact. Let's start by considering the independency between the location of the restaurant and quality of food over there. As both of these attributes are independent of each other, P(L|Q) would be the same as P(L). Therefore, we need to store only one parameter to represent it. From the conditional independence that we have seen earlier, we know that  . Thus, P(N|C, Q, L) would be the same as P(N|C, L); thus needing only four parameters. Therefore, we now need only (2 + 1 + 6 + 4 = 13) parameters to represent the whole distribution. We can conclude that exploiting independencies helps in the compact representation of joint probability distribution. This forms the basis for the Bayesian network. Representation A Bayesian network is represented by a Directed Acyclic Graph (DAG) and a set of Conditional Probability Distributions (CPD) in which: The nodes represent random variables The edges represent dependencies For each of the nodes, we have a CPD In our previous restaurant example, the nodes would be as follows: Quality of food (Q) Location (L) Cost of food (C) Number of people (N) As the cost of food was dependent on the quality of food (Q) and the location of the restaurant (L), there will be an edge each from Q ? C and L ? C. Similarly, as the number of people visiting the restaurant depends on the price of food and its location, there would be an edge each from L ? N and C ? N. The resulting structure of our Bayesian network is shown in Fig 1.3: Fig 1.3: Bayesian network for the restaurant example Factorization of a distribution over a network Each node in our Bayesian network for restaurants has a CPD associated to it. For example, the CPD for the cost of food in the restaurant is P(C|Q, L), as it only depends on the quality of food and location. For the number of people, it would be P(N|C, L) . So, we can generalize that the CPD associated with each node would be P(node|Par(node)) where Par(node) denotes the parents of the node in the graph. Assuming some probability values, we will finally get a network as shown in Fig 1.4: Fig 1.4: Bayesian network of restaurant along with CPDs Let us go back to the joint probability distribution of all these attributes of the restaurant again. Considering the independencies among variables, we concluded as follows: P(Q,C,L,N) = P(Q)P(L)P(C|Q, L)P(N|C, L) So now, looking into the Bayesian network (BN) for the restaurant, we can say that for any Bayesian network, the joint probability distribution  over all its random variables {X1,X2,...,Xn} can be represented as follows: This is known as the chain rule for Bayesian networks. Also, we say that a distribution P factorizes over a graph G, if P can be encoded as follows: Here, ParG(X) is the parent of X in the graph G. Summary In this article, we saw how we can represent a complex joint probability distribution using a directed graph and a conditional probability distribution associated with each node, which is collectively known as a Bayesian network. Resources for Article:   Further resources on this subject: Web Scraping with Python [article] Exact Inference Using Graphical Models [article] wxPython: Design Approaches and Techniques [article]
Read more
  • 0
  • 0
  • 47571
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime
article-image-cross-platform-building
Packt
10 Aug 2015
11 min read
Save for later

Cross-platform Building

Packt
10 Aug 2015
11 min read
In this article by Karan Sequeira, author of the book Cocos2d-x Game Development Blueprints, we'll leverage the awesome aspect of Cocos2d-x to build one of our games on Android and Windows Phone 8! (For more resources related to this topic, see here.) Setting up the environment for Android At this point in the timeline of technological evolution, Android needs no introduction. This mobile operating system was acquired by Google, and it has reached far and wide across the globe. It is now one of the top choices for application developers and game developers. With octa-core CPUs and ever-powerful GPUs, the sheer power offered by Android devices is a motivating factor! While setting up the environment for Android, you have more choices than any other mobile development platform. Your workstation could be running any of the three major operating systems (Windows, Mac OS, or Linux) and you would be able to build to Android just fine. Since Android is not fussy about its build environment, developers mostly choose their work environment based on which other platforms they will be developing for. As such, you might choose to build for Android on a machine running Mac OS since you would be able to build for iOS and Android on the same machine. The same applies for a machine running Windows as well. You would be able to build for both Android and Windows Phone. Although building for Windows Phone 8 requires you to have at least Windows 8 installed. We will discuss more on that later. Let's begin listing down the various software required to set up the environment for Android. Java Development Kit 7+ Since you already know that Java is the programming language used within the Android SDK, you must ensure that you have the environment set up to compile and run Java files. So go ahead and download the Java Development Kit (JDK)version 6 or later. You can download and install a Standard Edition (SE) version from the page available at the following link: http://www.oracle.com/technetwork/java/javase/downloads/index.html Mac OS comes with JDK installed and as such, you won't have to follow this step if you're setting up your development environment on a Mac. The Android SDK Once you've downloaded JDK, it's time to download the Android SDK from the following URL: http://developer.android.com/sdk/index.html If you're installing the Android SDK on Windows, a custom installer is provided that will take care of downloading and setting up the required parts of the Android SDK for you. For other operating systems, you can choose to download the respective archive files and extract them at the location of your choice. Eclipse or the ADT bundle Eclipse is the most commonly used IDE when it comes to Android application development. You can choose to download a standard Eclipse IDE for Java developers and then install the ADT plugin into Eclipse, or you can download the ADT bundle, which is a specialized version of Eclipse with the ADT plugin preinstalled. At the time of writing this article, the Android developer site had already deprecated ADT in favor of Android Studio. As such, we will choose the former approach for setting up our environment in Eclipse. You can download and install the standard Eclipse IDE for Java Developers for your specific machine from the following URL: http://www.eclipse.org/downloads/ ADT plugin for Eclipse Once you've downloaded Eclipse, you must now install a custom plugin for Eclipse: Android Development Tools (ADT). Visit the following URL and follow the detailed instructions that will help you install the ADT plugin into Eclipse: http://developer.android.com/sdk/installing/installing-adt.html Once you've followed the instructions on the preceding page, you will need to inform Eclipse about the location of the Android SDK that you downloaded earlier. So, open up the Preferences page for Eclipse and go to the location where you've placed the Android SDK in the Android section. With that done, we can now fire up the SDK Manager to install a few more necessary pieces of software. To launch the Android SDK Manager, select Android SDK Manager from the Windows menu in Eclipse. The resultant window should look something like this: By default, you will see a whole lot of packages selected, out of which Android SDK Platform-tools and Android SDK Build-tools are necessary. From the rest, you must select at least one of the target Android platforms. An additional package will be required if you're target environment is Windows: Google USB Driver. It is located under the Extras list. I would suggest skipping downloading the documentation and samples. If you already have an Android device, I would go one step further and suggest you skip downloading the system images as well. However, if you don't have an Android device, you will need at least one system image so that you can at least test on an emulator. Once you've chosen from the various platforms needed, proceed to install the packages and you get a window like this: Now, you must select Accept License and click on the Install button to install the respective packages. Once these packages have been installed, you have to add their locations to the path variable on your respective machines. For Windows, modify your path variable (go to Properties | Advance Settings | Environment Variables) to include the following: ;E:Androidandroid-sdkplatform-tools For Mac OS, you can add the following line to the .bash_profile file found under the home directory: export PATH=$PATH:/Android/android-sdk/platform-tools/ The preceding line can also be added to the .bash_rc file found under the home directory on your Linux machine. At this point, you can use Eclipse for Android development. Installing Cygwin for Windows Developers working on Linux can skip this step as most Linux distributions come with the make utility. Also, developers working on Mac OS may download Xcode from the Mac App Store, which will install the make utility on their respective Macs. We need to install Cygwin on Windows specifically for the GNU make utility. So, go to the following URL and download the installer for Cygwin: http://www.cygwin.com/install.html Once you've run the .exe file that you downloaded and get a window like this, click on the Next button: The next window will ask how you would like to install the required packages. Here, select option Install from Internet and click on Next: The next window will ask where you would like to install Cygwin. I'd recommend leaving it at the default value unless you have a reason to change it. Proceed by clicking on Next. In the next window, you will be asked to specify a path where the installation can download the files it requires. You can fill in a suitable path of your choice in the box and click on Next. In the next window, you will be asked to specify your Internet connection. Leave it at the Direct Connection option and click on Next. In the next window, you will be asked to select a mirror location from where to download the installation files. Here, select the site that is geographically closest to you and click on Next. In the window that follows, expand the Devel section and search for make: The GNU version of the 'make' utility. Click on the Skip option to select this package. The version of the make utility that will be installed is now displayed in place of Skip. Your window should look something like this: You can now go ahead and click the Next button to begin the download and installation of the required packages. The window should look something like this: Once all the packages have been downloaded, click on Finish to close the installation. Now that we have the make utility installed, we can go ahead and download the Android NDK, which will actually build our entire C++ code base. The Android NDK To download the Android NDK for your respective development machine, navigate to the following URL: https://developer.android.com/tools/sdk/ndk/index.html Unzip the downloaded archive and place it in the same location as the Android SDK. We must now add an environment variable named NDK_ROOT that points to the root of the Android NDK. For Windows, add a new user variable NDK_ROOT with the location of the Android NDK on your filesystem as its value. You can do this by going to Properties | Advance Settings | Environment Variables. Once you've done that, the Environment Variables window should look something like this: I'm sure you noticed the value of the NDK_ROOT variable in the previous screenshot. The value of this variable is given in Unix style and depends on the Cygwin environment, since it will be accessed within a Cygwin bash shell while executing the build script for each Android project. Mac OS and Linux users can add the following line to their .bash_profile and .bashrc files, respectively: export NDK_ROOT=/Android/android-ndk-r10 We have now successfully completed setting up the environment to build our Cocos2d-x games on Android. To test this, open up a Cygwin bash terminal (for Windows) or a standard terminal (for Mac OS or Linux) and navigate to the Cocos2d-x test bed located inside the samples folder of your Cocos2d-x source. Now, navigate to the proj.android folder and run the build_native.sh file. This is what my Cygwin bash terminal looks like on a Windows 7 machine: If you've followed the aforementioned instructions correctly, the build_native.sh script will then go on to compile the C++ source files required by the TestCpp project and will result in a single shared object (.so) file in the libs folder within the proj.android folder. Creating an Android Virtual Device We're close to running the game, but we need to create an Android Virtual Device (AVD) before we proceed. Open up the Android Virtual Device Manager from the Windows menu and click on Create.   In the next window, fill in the required details as per your requirements and configuration and click OK. This is what my window looks like with everything filled in: From the Android Virtual Device Manager window, select the newly created AVD and click on Start to boot it. Building the tests on Android With an Android device that is ready to run our project, let's begin by first importing the project into Eclipse. Within Eclipse, select File | Import.... In the following window, select Existing Projects into Workspace under the General setting and click on Next: In the next window, browse to the proj.android folder under the cocos2d-x-2.2.5samplesCppTestCpp path and click on Finish: Once imported, you can find the TestCpp project under Package Explorer. It should look something like this: As you can see, there are a few errors with the project. If you look at the Problems view (Window | Show View | Problems) located on the bottom-half of Eclipse, you might see something like this: All these errors are due to the fact that the Android project for our game depends on Cocos2d-x's Android project for Android-specific functionality, things such as the actual OpenGL surface where everything is rendered, the music player, accelerometer functionality, and many more. So let's import the Android project for Cocos2d-x located inside the following path in your Cocos2d-x source bundle: cocos2d-x-2.2.5cocos2dxplatformandroid You can import it the same way you imported TestCpp. Once the project has been imported, it will be titled libcocos2dx in Package Explorer. Now, select Clean... from the Project menu; You will notice that when the clean operation has finished, the pumpkindefense dependency on libcocos2dx is taken care of and the project for pumpkindefense builds error-free. Running the tests on Android Running the tests is as simple as right-clicking on the TestCpp project in Package Explorer and selecting Run As | Android Application. It might take a bit more time running on an emulator as compared to an actual device, but ultimately you will have something like this: Summary In this article, you learned what necessary software components are needed to set up your workstation to build and run an Android native application. You had also set up an Android Virtual Device and ran the Cocos2d-x test bed application on it. Resources for Article: Further resources on this subject: Run Xcode Run [article] Creating Games with Cocos2d-x is Easy and 100 percent Free [article] Creating Cool Content [article]
Read more
  • 0
  • 0
  • 21598

article-image-data-types-and-fields
Packt
10 Aug 2015
30 min read
Save for later

Data Types and Fields

Packt
10 Aug 2015
30 min read
In this article by David Studebaker and Christopher Studebaker, authors of the book Programming Microsoft Dynamics™ NAV 2015, explain the design of an application should begin at the simplest level, with the design of the data elements. The type of data our development tool supports has a significant effect on our design. Because NAV is designed for financially-oriented business applications, NAV data types are financially and business oriented. In this article, we will cover many of the data types we use within NAV. For each data type, we will cover some of the more frequently modified field properties and how particular properties, such as Field Class, are used to support application functionality. Field Class is a fundamental property which defines whether the contents of the field are data to be processed or control information to be interpreted. (For more resources related to this topic, see here.) Data types We are going to segregate the data types into several groups. We will first look at Fundamental data types and then at Complex data types. Fundamental data types Fundamental data types are the basic components from which the complex data types are formed. They are grouped into Numeric, String, and Date/Time data types. Numeric data Just like other systems, Microsoft Dynamics NAV 2015 supports several numeric data types. The specifications for each NAV data type are defined for NAV, independent of the supporting SQL Server database rules. However, some data types are stored and handled somewhat differently from a SQL Server point of view than the way they appear to us as NAV developers and users. For more details on the SQL Server-specific representations of various data elements, refer to the Developer and IT Pro Help. Our discussion will focus on NAV representation and handling for each data type. The various numeric data types are as follows: Integer: This is an integer number ranging from -2,147,483,646 to +2,147,483,647 Decimal: This is a decimal number in the range of +/- 999,999,999,999,999.99. Although it is possible to construct larger numbers, errors such as overflow, truncation, or loss of precision might occur. In addition, there is no facility to display or edit larger numbers. Option: This is a special instance of an integer, stored as an integer number ranging from 0 to +2,147,483,647. An option is normally represented in the body of our C/AL code as an option string. We can compare an option to an integer in C/AL rather than using the option string. However, this is not a good practice because it eliminates the self-documenting aspect of an option field. An option string is a set of choices listed in a comma-separated string, one of which is chosen and stored as the current option. Since the maximum length of this string is 250 characters, the practical maximum number of choices for a single option is less than 125. The currently selected choice within the set of options is stored in the option field as the ordinal position of that option within the set. For example, selection of an entry from the option string of red, yellow, and blue would result in the storing of 0 (red), 1 (yellow), and 2 (blue). If red were selected, 0 would be stored in the variable and if blue were selected, 2 would be stored. Quite often, an option string starts with a blank to allow an effective choice of "none chosen". An example of this (blank, Hourly, Daily,…) is as follows: Boolean: A Boolean variable is stored as 1 or 0. In a C/AL code, it is programmatically referred to as True or False, but sometimes, it is referred in properties as Yes or No. Boolean variables may be displayed as Yes or No (language dependent), P or blank, or True or False. BigInteger: 8-byte Integer as opposed to the 4 bytes of Integer. BigIntegers are for very big numbers (from -9,223,372,036,854,775,807 to 9,223,372,036,854,775,807). Char: A numeric code between 0 and 65535 (hexadecimal FFFF) representing a single 16-bit Unicode character. Char variables can operate either as text or numbers. Numeric operations can be done on Char variables. Char variables can also be defined with individual text character values. Char variables cannot be defined as permanent variables in a table; they can only be defined as working storage variables within C/AL objects. Byte: This is a single 8-bit ASCII character with a value ranging from 0 to 255. Byte variables can operate either as text or numbers. Numeric operations can be done on Byte variables. Byte variables can also be defined with individual text character values. Byte variables cannot be defined as permanent variables in a table, but only as working storage variables within C/AL objects. Action: This is a variable returned from a PAGE RUNMODAL function or RUNMODAL (Page) function that specifies what action a user performs on a page. The possible values are OK, Cancel, LookupOK, LookupCancel, Yes, No, RunObject, and RunSystem. ExecutionMode: This specifies the mode in which a session runs. The possible values are Debug or Standard. String data The following are the data types included in String data: Text: This contains any string of alphanumeric characters. In a table, a Text field can be from 1 to 250 characters long. In working storage within an object, a Text variable can be any length if no length is defined. If a maximum length is defined, it must not exceed 1024. NAV 2015 does not require a length to be specified, but if we define a maximum length, it will be enforced. When calculating the length of a record for design purposes (relative to the maximum record length of 8,000 bytes), the full defined field length should be counted. Code: Although the Help says that the length constraints for Code variables are the same as those for text variables, the C/AL Editor enforces length limits of 1 to 250 characters. All of the letters are automatically converted to uppercase when data is entered into a Code variable; any leading or trailing spaces are removed. Date/Time data The following are the data types included in Date/Time data: Date: This contains an integer number, which is interpreted as a date ranging from January 1, 1754 to December 31, 9999. A 0D (numeral zero, letter D) represents an undefined date (stored as a SQL Server DateTime field) that is interpreted as January 1, 1753. According to the Developer and IT Pro Help that, NAV 2015 supports a Date of 1/1/0000 (presumably as a special case for backward compatibility, but it is not supported by SQL Server). A date constant can be written as the letter D preceded by either six digits in the format MMDDYY or eight digits as MMDDYYYY (where M = month, D = day, and Y = year). For example, 011915D or 01192015D both represent January 19, 2015. Later, in DateFormula, we will find D interpreted as day, but here the trailing D is interpreted as the date (data type) constant. When the year is expressed as YY rather than YYYY, the century portion (in this case, 20) is 20 if the two digit year is from 00 to 29, or 19 if the year is from 30 through 99. NAV also defines a special date called the Closing date, which represents the point in time between one day and the next. The purpose of a closing date is to provide a point at the end of a day, after all of the real date- and time-sensitive activity is recorded—the point when accounting closing entries can be recorded. Closing entries are recorded, in effect, at the stroke of midnight between two dates—this is the date of closing accounting books, and it is designed so that one can include or exclude, at the user's option, closing entries in various reports. When sorted by date, the closing date entries will get sorted after all normal entries for a day. For example, the normal date entry for December 31, 2015 would display as 12/31/15 (depending on the date format masking), and the closing date entry would display as C12/31/15. All of the C12/31/15 ledger entries would appear after all normal 12/31/15 ledger entries. The following screenshot shows two 2014 closing date entries mixed with normal entries from December 2015 and January through April 2015. (This data is from Cronus demo. The 2014 Closing entries have an "Opening Entry" description, which shows that these were the first entries for the demo data in the respective accounts. This is not a normal set of production data.) Time: This contains an integer number, which is interpreted on a 24-hour clock, in milliseconds plus 1, from 00:00:00 to 23:59:59:999. A 0T (numeral zero, letter T) represents an undefined time and is stored as 1/1/1753 00:00:00.000. DateTime: This represents a combined Date and Time, stored in Coordinated Universal Time (UTC), and it always displays local time (that is, the local time on our system). DateTime fields do not support NAV "Closing" dates. DateTime is helpful for an application that must support multiple time zones simultaneously. DateTime values can range from January 1, 1754 00:00:00.000 to December 31, 9999 23:59:59.999, but dates earlier than January 1, 1754 cannot be entered (don't test with dates late in 9999 as an intended advance to the year 10000 won't work). Assigning a date of 0DT will yield an undefined or blank DateTime. Duration: This represents the positive or negative difference between two DateTime values, in milliseconds, stored as a BigInteger. Durations are automatically output in the text format as DDD days HH hours MM minutes SS seconds. Complex data types Each complex data type consists of multiple data elements. For ease of reference, we will categorize them into several groups of similar types. Data structure The following data types are in the data structure group: File: This refers to any standard Windows file outside the NAV database. There is a reasonably complete set of functions to allow to create, delete, open, close, read, write, and copy (among other things) data files. For example, we could create our own NAV routines in C/AL to import or export data from or to a file that had been created by some other application. With the three-tier architecture of NAV 2015, business logic runs on the server and not the client. We need to keep this in mind any time we refer to local external files because they will be on the server by default. Use of Universal Naming Convention (UNC) paths can make this easier to manage. Record: This refers to a single data row within a NAV table that consists of individual fields. Quite often, multiple variable instances of a Record (table) are defined in working storage to support a validation process, allowing access to different records within the table at one time in the same function. Objects Page, Report, Codeunit, Query, and XMLPort, each represents an object data type. Object data types are used when there is a need to refer to an object or a function in another object. Examples: Invoking a Report or an XMLPort from a Page or a Report Calling a function for data validation or processing is coded as a function in a Table or a Codeunit Automation The following are Automation data types. (these are not supported by the NAV Web client.) OCX and Automation data types are supported in NAV 2015 for backward compatibility only: OCX: This allows the definition of a variable that represents and allows access to an ActiveX or OCX custom control. Such a control is typically an external application object that we can invoke from our NAV object. Automation: This allows us to define a variable that we can access similar to an OCX. The application must act as an Automation Server and be registered with the NAV client or server that calls it. For example, we can interface from NAV into the various Microsoft Office products (Word, Excel, and so on) by defining them in Automation variables. DotNet: This allows us to define a variable for .NET Framework interface types within an assembly. It supports accessing .NET Framework type members, including methods, properties, and constructors from C/AL. These can be members of the global assembly cache or custom assemblies. Input/Output The following are the Input/Output data types: Dialog: This supports the definition of a simple user interface window without the use of a Page object. Typically, Dialog windows are used to communicate processing progress or allow a brief user response to a go/no-go question, though this latter use could result in bad performance due to locking. There are other user communication tools as well, but they do not use a Dialog type data item. InStream and Outstream: These allow us to read from and write to external files, BLOBS, and objects of the Automation and OCX data types. DateFormula DateFormula provides for the definition and storage of a simple, but clever, set of constructs to support the calculation of runtime-sensitive dates. A DateFormula is stored in a nonlanguage dependent format, thus supporting multilanguage functionality. A DateFormula is a combination of: Numeric multipliers (for example, 1, 2, 3, 4, and so on) Alpha time units (all must be in uppercase) D for a day W for a week WD for day of the week, that is, from day 1 to day 7 (either in the future or in the past but not today). Monday is day 1 and Sunday is day 7. M for calendar month Y for year CM for current month, CY for current year, CW for current week Math symbols interpretation + (plus) as in CM + 10D means the Current Month end plus 10 Days (in other words, the tenth of the next month) – (minus) as in (-WD3) means the date of the previous Wednesday (which is the 3rd day of the past week). Positional notation (D15 means the 15th day of the month and 15D means 15 days) Payment Terms for Invoices support full use of DateFormula. All DateFormula results are expressed as a date based on a reference date. The default reference date is the system date and not the Work Date. Here are some sample DateFormulas and their interpretations (displayed dates are based on the US calendar) with a reference date of July 10, 2015, a Friday: CM is the last day of Current Month, 07/31/15 CM + 10D is the tenth of the next month, 08/10/15 WD6 is the next sixth day of the week, 07/11/15 WD5 is the next fifth day of the week, 07/17/15 CM – M + D is the end of the current month minus one month plus one day, 07/01/15 CM – 5M is the end of the current month minus five months, 02/28/15 Let us take the opportunity to use the DateFormula data type to learn a few NAV development basics. We will do so by experimenting with some hands-on evaluations of several DateFormula values. We will create a table to calculate dates using DateFormula and Reference Dates. To do this, navigate to Tools | Object Designer | Tables. Then, click on the New button and define the fields shown in the following screenshot. Save it as Table 50009, named Date Formula Test. After we are done with this test, we will save this table for some later testing. Now, we will add some simple C/AL code to our table so that when we enter or change either the Reference Date or the DateFormula data, we can calculate a new result date. First, access the new table via the Design button. Then, go to the global variables definition form through the View menu option, the C/AL Globals sub-option, and finally, choose the Functions tab. Type in our new function name as CalculateNewDate on the first blank line, as shown in the following screenshot, and then exit (by means of the Esc key) from this form back to the list of data fields: From the Table Designer form that displays the list of data fields, either press F9 or click on the C/AL Code icon: This will take us to the following screen, where we can see all of the field triggers plus the trigger for the new function that we just defined. The table triggers will not be visible, unless we scroll up to show them. Note that our new function was defined as a LOCAL function. This means that it cannot be accessed from another object unless we change it to a GLOBAL function. Since our goal now is to focus on experimenting with the DateFormula, we will not go into detail and explain the logic of what we are creating. The logic that we're going to code is as follows: When an entry is made (new or changed) in either the "Reference Date" field or in the "Date Formula to Test field", invoke the CalculateNewDate function to calculate a new “Result Date” value based on the entered data. First, you need to create the logic within our new function, CalculateNewDate(), to evaluate and store a Date Result based on the DateFormula and Reference Date that you enter into the table. Just copy the C/AL code exactly as shown in the following screenshot, exit, compile, and save the table: If you get an error message of any type when you close and save the table, you probably have not copied the C/AL code exactly as it is shown in the screenshot. (also shown below for ease of copying.) CalculateNewDate;"Date Result" := CALCDATE("Date Formula to Test","Reference Date for Calculation"); This code will cause the CalculateNewDate()function to be called via the OnValidate trigger when an entry is made in either the Reference Date for Calculation or the Date Formula to Test fields. The function will place the result in the Date Result field. The use of an integer value in the redundantly named Primary Key field allows us to enter any number of records into the table (by manually numbering them 1, 2, 3, and so forth). Let's experiment with several different date and date formula combinations. We will access the table via the Run button. This will cause NAV to generate a default format page and run it in the Role Tailored Client. Enter a Primary Key value of 1 (one). In Reference Date for Calculation, enter either an upper or lower case T for Today and the system date. The same date will appear in the Date Result field because at this point, no Date Formula has been entered. Now, enter 1D (number 1 followed by uppercase or lowercase D (C/SIDE will make it uppercase) in the Date Formula to Test field. We will see that the Date Result field contents are changed to be one day beyond the date in the Reference Date for Calculation field. Now, for another test entry, start with a 2 in the Primary Key field. Again, enter the letter T (for Today) in the Reference Date for Calculation field, and enter the letter W (for Week) in the Date Formula to Test field. We will get an error message telling us that our formulas should include a number. Make the system happy and enter 1W. We will now see a date in the Date Result field that is one week beyond our system date. Set the system's Work Date to a date in the middle of a month. Start another line with the number 3 as the Primary Key, followed by a W (for Work Date) in the Reference Date for Calculation field. Enter cm (or CM or cM or Cm, it doesn't matter) in the Date Formula to Test field. Our result date will be the last day of our Work Date month. Now, enter another line using the Work Date, but enter a formula of –cm (the same as before but with a minus sign). This time, our result date will be the first day of our Work Date month. Note that the DateFormula logic handles month end dates correctly, including a leap year. Try starting with a date in the middle of February 2016 to confirm this. The following screen shows the Date Formula Test window: Now, enter another line with a new Primary Key. Skip over the Reference Date for Calculation field and just enter 1D in the Date Formula to Test field. So, what happens when you do this? We get an error message stating that "You cannot base a date calculation on an undefined date." In other words, NAV cannot make the requested calculation without a Reference Date. Before we put this function into production, we want our code to check for a Reference Date before calculating. We could default an empty date to the System Date or the Work Date and avoid this particular error. The preceding and following screenshots show different sample calculations. Build on these and then experiment. We can create a variety of different algebraic date formulae and get some very interesting results. One NAV user has due dates on Invoices for the tenth of the next month. Invoices are dated at various times during the month than they are actually printed. By using the DateFormula of CM + 10D, the due date is always automatically calculated to be the tenth of the next month. Don't forget to test with WD (weekday), Q (quarter), and Y (year) as well as D (day), W (week), and M (month). For our code to be language independent, we should enter the date formulae with < > delimiters around them (for example, <1D+1W>). NAV will translate the formula into the correct language codes using the installed language layer. Although our focus for the work we just completed was the Date Formula data type, we've accomplished a lot more than simply learning about that one data type: We created a new table just for the purpose of experimenting with a C/AL feature that we might use. This is a technique that comes in handy when we are learning a new feature or trying to decide how it works or how we might use it. We put some critical OnValidate logic in the table. When data is entered in one area, the entry is validated and, if valid, the defined processing is done instantly. We created a common routine as a new LOCAL function. This function is then called from all the places to which it applies. We did our entire test with a table object and a default tabular page that is automatically generated when we Run a table. We didn't have to create a supporting structure to do our testing. Of course, when we design a change to a complicated existing structure, we will have a more complicated testing scenario. One of our goals will always be to simplify our testing scenarios, both to minimize the setup effort and to keep our test narrowly focused on the specific issue. Finally, and most specifically, we saw how NAV tools make a variety of relative date calculations easy. These are very useful in business applications, many aspects of which are date centered. References and other data types The following data types are used for advanced functionality in NAV, sometimes supporting an interface with an external object: RecordID: This contains the object number and primary key of a table. RecordRef: This identifies a row in a table, a record. RecordRef can be used to obtain information about the table, the record, the fields in the record, and the currently active filters on the table. FieldRef: This identifies a field in a table; thus, it allows access to the contents of that field. KeyRef: This identifies a key in a table and the fields in that key. Since the specific record, field, and key references are assigned at runtime, RecordRef, FieldRef, and KeyRef are used to support logic which can run on tables that are not specified at design time. This means that one routine built on these data types can be created to perform a common function for a variety of different tables and table formats. Variant: This defines variables that are typically used to interface with Automation and OCX objects. Variant variables can contain data of various C/AL data types to pass them to an Automation or OCX object as well as external Automation data types that cannot be mapped to C/AL data types. TableFilter: For variables which can only be used for setting security filters from the Permissions table. Transaction Type: This has optional values of UpdateNoLocks, Update, Snapshot, Browse, and Report that define SQL Server behavior for a NAV Report or XMLport transaction from the beginning of the transaction. BLOB: This can contain either specially formatted text, a graphic in the form of a bitmap, or other developer-defined binary data up to 2 GB in size. The term Binary Large Object (BLOB). BLOBs can only be included in tables and not used to define working storage Variables. Refer to Developer and IT Pro Help for additional information. BigText: This can contain large chunks of text up to 2 GB in size. BigText variables can only be defined in the working storage within an object, but they cannot be included in tables. BigText variables cannot be directly displayed or seen in the debugger. There is a group of special functions that can be used to handle BigText data. Refer to Developer and IT Pro Help for additional information. To handle text strings in a single data element that are greater than 250 characters in length, use a combination of BLOB and BigText variables. GUID: This is used to assign a unique identifying number to any database object. Globally Unique Identifier (GUID), a 16-byte binary data type that is used for unique global identification of records, objects, and so on. GUID is generated by an algorithm developed by Microsoft. TestPage: This is used to store a test page, which is a logical representation of a page that does not display a user interface. Test pages are used when you do NAV application testing using the automated testing facility that is part of NAV. Data type usage About forty percent of the data types can be used to define the data either stored in tables or in working storage data definitions (that is, in a Global or Local data definition within an object). Two data types, BLOB and TableFilter, can only be used to define table-stored data, but not working storage data. About sixty percent of the data types can only be used for working storage data definitions. The following list shows which data types can be used for table (persisted) data fields and which ones can be used for working storage (variable) data: FieldClass property options Almost all data fields have a FieldClass property. FieldClass has as much effect on the content and usage of a data field as the data type; in some instances, it has more effect. Now we'll discuss the FieldClass property options now. FieldClass – Normal When the FieldClass is Normal, the field will contain the type of application data that's typically stored in a table—the contents we would expect based on the data type and various properties. FieldClass – FlowField FlowFields must be dynamically calculated. FlowFields are virtual fields stored as metadata; they do not contain data in the conventional sense. A FlowField contains the definition of how to calculate (at runtime) the data that the field represents and a place to store the result of that calculation. Generally, the Editable property for a FlowField is set to No.. Depending on the CalcFormula method, this could be a value, a reference lookup, or a Boolean. When the CalcFormula method is Sum, the FieldClass connects a data field to a previously defined SumIndexField in the table defined in the CalcFormula. The FlowField processing speed will be significantly affected by the key configuration of the table being processed. While we must be careful not to define extra keys, having the right keys defined will have a major effect on system performance and thus, on user satisfaction. A FlowField value is always 0, blank, or false, unless it has been calculated. If a FlowField is displayed directly on a page, it is calculated automatically when the page is rendered. FlowFields are also automatically calculated when they are the subject of predefined filters as part of the properties of a data item in an object. In all other cases, a FlowField must be forced to calculate using the C/AL RecordName.CALCFIELDS(FlowField1, [FlowField2],...) function or by the use of the SETAUTOCALCFIELDS function. This is also true if the underlying data is changed after the initial display of a page (that is, the FlowField must be recalculated to take a data change into account). Because a FlowField does not contain actual data, it cannot be used as a field in a key. In other words, we cannot include a FlowField as part of a key. In addition, we cannot define a FlowField that is based on another FlowField, except in special circumstances. When a field has its FieldClass set to FlowField, another directly associated property becomes available—CalcFormula. (Conversely, the AltSearchField, AutoIncrement, and TestTableRelation properties disappear from view when FieldClass is set to FlowField). The CalcFormula method is the place where we can define the formula for calculating the FlowField. On the CalcFormula property line, there is an ellipsis button. Clicking on that button will bring up the following screen: Click on the drop-down button to show the seven FlowField methods: The seven FlowFields are described in the following table: FlowField Method Field data type   Calculated value as it applies to the specified set of data within a specific column (field) in a table   Sum Decimal The sum total Average Decimal The average value (the sum divided by the row count) Exist Boolean Yes or No / True or False - does an entry exist? Count Integer The number of entries that exist Min Any The smallest value of any entry Max Any The largest value of any entry Lookup Any The value of the specified entry The Reverse Sign control allows us to change the displayed sign of the result for FlowField types Sum and Average only; the underlying data is not changed. If a Reverse Sign is used with the FlowField type Exists, it changes the effective function to does not Exist. Table and Field allow us to define the Table and the Field within that table to which our Calculation Formula will apply. When we make the entries in our Calculation Formula screen, no validation checking is done by the compiler to check whether we have chosen an eligible table and field combination. This checking doesn't occur until runtime. Therefore, when we create a new FlowField, we should test it as soon as we have defined it. The last, but by no means the least significant component of the FlowField calculation formula is the Table Filter. When we click on the ellipsis in the table filter field, the window shown in the following screenshot will appear: When we click on the Field column, we will be invited to select a field from the table that was entered into the Table field earlier. The Type field choice will determine the type of filter. The Value field will have the filter rules defined on this line, which must be consistent with the Type choices described in the following table: Filter type Value Filtering action OnlyMax- Limit Values- Filter Const   A constant which will be defined in the Value field This uses the constant to filter for equally valued entries     Filter   A filter that will be spelled out as a literal in the Value field This applies the filter expression from the Value field     Field   A field from the table within which the FlowField exists This uses the contents of the specified field to filter equally valued entries False False     If the specified field is a FlowFilter and the OnlyMaxLimit parameter is True, then the FlowFilter range will be applied on the basis of only having a MaxLimit, that is, having no bottom limit. This is useful for the date filters for the Balance Sheet data. (Refer to Balance at Date field in the G/L Account table for an example) True False     This causes the contents of the specified field to be interpreted as a filter (See Balance at Date field in the G/L Account table for an example) True or False True FieldClass – FlowFilter FlowFilters control the calculation of FlowFields in the table (when the FlowFilters are included in the CalcFormula). FlowFilters do not contain permanent data, but instead, they contain filters on a per-user basis, with the information stored in that user's instance of the code that is being executed. A FlowFilter field allows a filter to be entered at a parent record level by the user (for example, G/L Account) and applied (through the use of FlowField formulas, for example) to constrain what child data (for example, G/L Entry records) is selected. A FlowFilter allows us to provide flexible data selection functions to the users. The user does not need to have a full understanding of the data structure to apply filtering in intuitive ways to both the primary data table and the subordinate data. Based on our C/AL code design, FlowFilters can be used to apply filtering on multiple tables that are subordinate to a parent table. Of course, it is our responsibility as developers to make good use of this tool. As with many C/AL capabilities, a good way to learn more is by studying standard code designed by the Microsoft developers of NAV and then experimenting. A number of good examples on the use of FlowFilters can be found in the Customer (Table 18) and Item (Table 27) tables. In the Customer table, some of the FlowFields using FlowFilters are Balance, Balance (LCY), Net Change, Net Change (LCY), Sales (LCY), and Profit (LCY) where LCY stands for local currency. The Sales (LCY) FlowField FlowFilter usage is shown in the following screenshot: Similarly constructed FlowFields using FlowFilters in the Item table include Inventory, Net Invoiced Qty., Net Change, Purchases (Qty.) as well as other fields. Throughout the standard code, there are FlowFilters in most of the master table definitions; there are the Date Filters and Global Dimension Filters (global dimensions are user-defined codes that facilitate the segregation of accounting data by groupings such as divisions, departments, projects, customer type, and so on). Other FlowFilters that are widely used in the standard code related to Inventory activity such as Location Filter, Lot No. Filter, Serial No. Filter, and Bin Filter. The following pair of images shows two fields from the Customer table, both with a Data Type of Date. On the left side of the screenshot is the Last Date Modified field (FieldClass of Normal) and on the right side of the screenshot is the Date Filter field (FieldClass of FlowFilter). It's easy to see that the properties of the two fields are very similar, except for the properties that differ because one is a Normal field and the other is a FlowFilter field. Summary In this article, we focused on the basic building blocks of the NAV data structure: fields and their attributes. We reviewed the types of data fields, properties, and trigger elements for each type of field. We walked through a number of examples to illustrate most of these elements though we had postponed the exploration of triggers until later, when we had more knowledge of C/AL. We covered Data Type and FieldClass, properties which determine what kind of data can be stored in a field. Resources for Article: Further resources on this subject: Customization in Microsoft Dynamics CRM [article] What is BI and What are BI Tools for Microsoft Dynamics GP? [article] Learning MS Dynamics AX 2012 Programming [article]
Read more
  • 0
  • 0
  • 8059

article-image-splunk-interface
Packt
10 Aug 2015
17 min read
Save for later

The Splunk Interface

Packt
10 Aug 2015
17 min read
In this article by Vincent Bumgarner & James D. Miller, author of the book, Implementing Splunk - Second Edition, we will walk through the most common elements in the Splunk interface, and will touch upon concepts that will be covered in greater detail. You may want to dive right into the search section, but an overview of the user interface elements might save you some frustration later. We will cover the following topics: Logging in and app selection A detailed explanation of the search interface widgets A quick overview of the admin interface (For more resources related to this topic, see here.) Logging into Splunk The Splunk GUI interface (Splunk is also accessible through its command-line interface [CLI] and REST API) is web-based, which means that no client needs to be installed. Newer browsers with fast JavaScript engines, such as Chrome, Firefox, and Safari, work better with the interface. As of Splunk Version 6.2.0, no browser extensions are required. Splunk Versions 4.2 and earlier require Flash to render graphs. Flash can still be used by older browsers, or for older apps that reference Flash explicitly. The default port for a Splunk installation is 8000. The address will look like: http://mysplunkserver:8000 or http://mysplunkserver.mycompany.com:8000. The Splunk interface If you have installed Splunk on your local machine, the address can be some variant of http://localhost:8000, http://127.0.0.1:8000, http://machinename:8000, or http://machinename.local:8000. Once you determine the address, the first page you will see is the login screen. The default username is admin with the password changeme. The first time you log in, you will be prompted to change the password for the admin user. It is a good idea to change this password to prevent unwanted changes to your deployment. By default, accounts are configured and stored within Splunk. Authentication can be configured to use another system, for instance Lightweight Directory Access Protocol (LDAP). By default, Splunk authenticates locally. If LDAP is set up, the order is as follows: LDAP / Local. The home app After logging in, the default app is the Launcher app (some may refer to this as Home). This app is a launching pad for apps and tutorials. In earlier versions of Splunk, the Welcome tab provided two important shortcuts, Add data and the Launch search app. In version 6.2.0, the Home app is divided into distinct areas, or panes, that provide easy access to Explore Splunk Enterprise (Add Data, Splunk Apps, Splunk Docs, and Splunk Answers) as well as Apps (the App management page) Search & Reporting (the link to the Search app), and an area where you can set your default dashboard (choose a home dashboard).                 The Explore Splunk Enterprise pane shows links to: Add data: This links Add Data to the Splunk page. This interface is a great start for getting local data flowing into Splunk (making it available to Splunk users). The Preview data interface takes an enormous amount of complexity out of configuring dates and line breaking. Splunk Apps: This allows you to find and install more apps from the Splunk Apps Marketplace (http://apps.splunk.com). This marketplace is a useful resource where Splunk users and employees post Splunk apps, mostly free but some premium ones as well. Splunk Answers: This is one of your links to the wide amount of Splunk documentation available, specifically http://answers.splunk.com, where you can engage with the Splunk community on Splunkbase (https://splunkbase.splunk.com/) and learn how to get the most out of your Splunk deployment. The Apps section shows the apps that have GUI elements on your instance of Splunk. App is an overloaded term in Splunk. An app doesn't necessarily have a GUI at all; it is simply a collection of configurations wrapped into a directory structure that means something to Splunk. Search & Reporting is the link to the Splunk Search & Reporting app. Beneath the Search & Reporting link, Splunk provides an outline which, when you hover over it, displays a Find More Apps balloon tip. Clicking on the link opens the same Browse more apps page as the Splunk Apps link mentioned earlier. Choose a home dashboard provides an intuitive way to select an existing (simple XML) dashboard and set it as part of your Splunk Welcome or Home page. This sets you at a familiar starting point each time you enter Splunk. The following image displays the Choose Default Dashboard dialog: Once you select an existing dashboard from the dropdown list, it will be part of your welcome screen every time you log into Splunk – until you change it. There are no dashboards installed by default after installing Splunk, except the Search & Reporting app. Once you have created additional dashboards, they can be selected as the default. The top bar The bar across the top of the window contains information about where you are, as well as quick links to preferences, other apps, and administration. The current app is specified in the upper-left corner. The following image shows the upper-left Splunk bar when using the Search & Reporting app: Clicking on the text takes you to the default page for that app. In most apps, the text next to the logo is simply changed, but the whole block can be customized with logos and alternate text by modifying the app's CSS. The upper-right corner of the window, as seen in the previous image, contains action links that are almost always available: The name of the user who is currently logged in appears first. In this case, the user is Administrator. Clicking on the username allows you to select Edit Account (which will take you to the Your account page) or to Logout (of Splunk). Logout ends the session and forces the user to login again. The following screenshot shows what the Your account page looks like: This form presents the global preferences that a user is allowed to change. Other settings that affect users are configured through permissions on objects and settings on roles. (Note: preferences can also be configured using the CLI or by modifying specific Splunk configuration files). Full name and Email address are stored for the administrator's convenience. Time zone can be changed for the logged-in user. This is a new feature in Splunk 4.3. Setting the time zone only affects the time zone used to display the data. It is very important that the date is parsed properly when events are indexed. Default app controls the starting page after login. Most users will want to change this to search. Restart backgrounded jobs controls whether unfinished queries should run again if Splunk is restarted. Set password allows you to change your password. This is only relevant if Splunk is configured to use internal authentication. For instance, if the system is configured to use Windows Active Directory via LDAP (a very common configuration), users must change their password in Windows. Messages allows you to view any system-level error messages you may have pending. When there is a new message for you to review, a notification displays as a count next to the Messages menu. You can click the X to remove a message. The Settings link presents the user with the configuration pages for all Splunk Knowledge objects, Distributed Environment settings, System and Licensing, Data, and Users and Authentication settings. If you do not see some of these options, you do not have the permissions to view or edit them. The Activity menu lists shortcuts to Splunk Jobs, Triggered Alerts, and System Activity views. You can click Jobs (to open the search jobs manager window, where you can view and manage currently running searches), click Triggered Alerts (to view scheduled alerts that are triggered) or click System Activity (to see dashboards about user activity and the status of the system). Help lists links to video Tutorials, Splunk Answers, the Splunk Contact Support portal, and online Documentation. Find can be used to search for objects within your Splunk Enterprise instance. For example, if you type in error, it returns the saved objects that contain the term error. These saved objects include Reports, Dashboards, Alerts, and so on. You can also search for error in the Search & Reporting app by clicking Open error in search. The search & reporting app The Search & Reporting app (or just the search app) is where most actions in Splunk start. This app is a dashboard where you will begin your searching. The summary view Within the Search & Reporting app, the user is presented with the Summary view, which contains information about the data which that user searches for by default. This is an important distinction—in a mature Splunk installation, not all users will always search all data by default. But at first, if this is your first trip into Search & Reporting, you'll see the following: From the screen depicted in the previous screenshot, you can access the Splunk documentation related to What to Search and How to Search. Once you have at least some data indexed, Splunk will provide some statistics on the available data under What to Search (remember that this reflects only the indexes that this particular user searches by default; there are other events that are indexed by Splunk, including events that Splunk indexes about itself.) This is seen in the following image: In previous versions of Splunk, panels such as the All indexed data panel provided statistics for a user's indexed data. Other panels gave a breakdown of data using three important pieces of metadata—Source, Sourcetype, and Hosts. In the current version—6.2.0—you access this information by clicking on the button labeled Data Summary, which presents the following to the user: This dialog splits the information into three tabs—Hosts, Sources and Sourcetypes. A host is a captured hostname for an event. In the majority of cases, the host field is set to the name of the machine where the data originated. There are cases where this is not known, so the host can also be configured arbitrarily. A source in Splunk is a unique path or name. In a large installation, there may be thousands of machines submitting data, but all data on the same path across these machines counts as one source. When the data source is not a file, the value of the source can be arbitrary, for instance, the name of a script or network port. A source type is an arbitrary categorization of events. There may be many sources across many hosts, in the same source type. For instance, given the sources /var/log/access.2012-03-01.log and /var/log/access.2012-03-02.log on the hosts fred and wilma, you could reference all these logs with source type access or any other name that you like. Let's move on now and discuss each of the Splunk widgets (just below the app name). The first widget is the navigation bar. As a general rule, within Splunk, items with downward triangles are menus. Items without a downward triangle are links. Next we find the Search bar. This is where the magic starts. We'll go into great detail shortly. Search Okay, we've finally made it to search. This is where the real power of Splunk lies. For our first search, we will search for the word (not case specific); error. Click in the search bar, type the word error, and then either press Enter or click on the magnifying glass to the right of the bar. Upon initiating the search, we are taken to the search results page. Note that the search we just executed was across All time (by default); to change the search time, you can utilize the Splunk time picker. Actions Let's inspect the elements on this page. Below the Search bar, we have the event count, action icons, and menus. Starting from the left, we have the following: The number of events matched by the base search. Technically, this may not be the number of results pulled from disk, depending on your search. Also, if your query uses commands, this number may not match what is shown in the event listing. Job: This opens the Search job inspector window, which provides very detailed information about the query that was run. Pause: This causes the current search to stop locating events but keeps the job open. This is useful if you want to inspect the current results to determine whether you want to continue a long running search. Stop: This stops the execution of the current search but keeps the results generated so far. This is useful when you have found enough and want to inspect or share the results found so far. Share: This shares the search job. This option extends the job's lifetime to seven days and sets the read permissions to everyone. Export: This exports the results. Select this option to output to CSV, raw events, XML, or JavaScript Object Notation (JSON) and specify the number of results to export. Print: This formats the page for printing and instructs the browser to print. Smart Mode: This controls the search experience. You can set it to speed up searches by cutting down on the event data it returns and, additionally, by reducing the number of fields that Splunk will extract by default from the data (Fast mode). You can, otherwise, set it to return as much event information as possible (Verbose mode). In Smart mode (the default setting) it toggles search behavior based on the type of search you're running. Timeline Now we'll skip to the timeline below the action icons. Along with providing a quick overview of the event distribution over a period of time, the timeline is also a very useful tool for selecting sections of time. Placing the pointer over the timeline displays a pop-up for the number of events in that slice of time. Clicking on the timeline selects the events for a particular slice of time. Clicking and dragging selects a range of time. Once you have selected a period of time, clicking on Zoom to selection changes the time frame and reruns the search for that specific slice of time. Repeating this process is an effective way to drill down to specific events. Deselect shows all events for the time range selected in the time picker. Zoom out changes the window of time to a larger period around the events in the current time frame The field picker To the left of the search results, we find the field picker. This is a great tool for discovering patterns and filtering search results. Fields The field list contains two lists: Selected Fields, which have their values displayed under the search event in the search results Interesting Fields, which are other fields that Splunk has picked out for you Above the field list are two links: Hide Fields and All Fields. Hide Fields: Hides the field list area from view. All Fields: Takes you to the Selected Fields window. Search results We are almost through with all the widgets on the page. We still have a number of items to cover in the search results section though, just to be thorough. As you can see in the previous screenshot, at the top of this section, we have the number of events displayed. When viewing all results in their raw form, this number will match the number above the timeline. This value can be changed either by making a selection on the timeline or by using other search commands. Next, we have the action icons (described earlier) that affect these particular results. Under the action icons, we have four results tabs: Events list, which will show the raw events. This is the default view when running a simple search, as we have done so far. Patterns streamlines the event pattern detection. It displays a list of the most common patterns among the set of events returned by your search. Each of these patterns represents the number of events that share a similar structure. Statistics populates when you run a search with transforming commands such as stats, top, chart, and so on. The previous keyword search for error does not display any results in this tab because it does not have any transforming commands. Visualization transforms searches and also populates the Visualization tab. The results area of the Visualization tab includes a chart and the statistics table used to generate the chart. Not all searches are eligible for visualization. Under the tabs described just now, is the timeline. Options Beneath the timeline, (starting at the left) is a row of option links that include: Show Fields: shows the Selected Fields screen List: allows you to select an output option (Raw, List, or Table) for displaying the search results Format: provides the ability to set Result display options, such as Show row numbers, Wrap results, the Max lines (to display) and Drilldown as on or off. NN Per Page: is where you can indicate the number of results to show per page (10, 20, or 50). To the right are options that you can use to choose a page of results, and to change the number of events per page. In prior versions of Splunk, these options were available from the Results display options popup dialog. The events viewer Finally, we make it to the actual events. Let's examine a single event. Starting at the left, we have: Event Details: Clicking here (indicated by the right facing arrow) opens the selected event, providing specific information about the event by type, field, and value, and allows you the ability to perform specific actions on a particular event field. In addition, Splunk version 6.2.0 offers a button labeled Event Actions to access workflow actions, a few of which are always available. Build Eventtype: Event types are a way to name events that match a certain query. Extract Fields: This launches an interface for creating custom field extractions. Show Source: This pops up a window with a simulated view of the original source. The event number: Raw search results are always returned in the order most recent first. Next to appear are any workflow actions that have been configured. Workflow actions let you create new searches or links to other sites, using data from an event. Next comes the parsed date from this event, displayed in the time zone selected by the user. This is an important and often confusing distinction. In most installations, everything is in one time zone—the servers, the user, and the events. When one of these three things is not in the same time zone as the others, things can get confusing. Next, we see the raw event itself. This is what Splunk saw as an event. With no help, Splunk can do a good job finding the date and breaking lines appropriately, but as we will see later, with a little help, event parsing can be more reliable and more efficient. Below the event are the fields that were selected in the field picker. Clicking on the value adds the field value to the search. Summary As you have seen, the Splunk GUI provides a rich interface for working with search results. We have really only scratched the surface and will cover more elements. Resources for Article: Further resources on this subject: The Splunk Web Framework [Article] Loading data, creating an app, and adding dashboards and reports in Splunk [Article] Working with Apps in Splunk [Article]
Read more
  • 0
  • 0
  • 4002

article-image-updating-and-building-our-masters
Packt
10 Aug 2015
20 min read
Save for later

Updating and building our masters

Packt
10 Aug 2015
20 min read
In this article by John Henry Krahenbuhl, the author of the book, Axure Prototyping Blueprints, we determine that with modification, we can use all of the masters from the previous community site. To support our new use cases, we need additional registration variables, a master to support user registration, and interactions for the creation of, and to comment on, posts. Next we will create global variables and add new masters, as well as enhance the design and interactions for each master. (For more resources related to this topic, see here.) Creating additional global variables Based on project requirements, we identified that nine global variables will be required. To create global variables, on the main menu click on Project and then click on Global Variables…. In the Global Variables dialog, perform the following steps: Click the green + sign and type Email. Click on the Default Value field and type songwriter@test.com. Repeat step 1 eight more times to create additional variables using the following table for the Variable Name and Default Value fields: Variable Name Default Value Password Grammy UserEmail   UserPassword   LoggedIn No TopicIndex 0 UserText   NewPostTopic   NewPostHeadline   Click on OK. With our global variables created, we are now ready to create new masters, as well as update the design and interactions for existing masters. We will start by adding masters to the Masters pane. Adding masters to the Masters pane We will add a total of two masters to the Masters pane. To create our masters, perform the following steps: In the Masters pane, click on the, Add Master icon ,type PostCommentary and press Enter. Again, in the Masters pane, click on the Add Master icon , type NewPost and press Enter. In the same Masters pane, right-click on the icon next to the Header master, mouse over Drop Behavior and click on Lock to Master Location. We are now ready to remodel the existing masters and complete the design and interactions for our new masters. We will start with the Header master. Enhancing our Header master Once completed, the Header master will look as follows: To update the Header master, we will add an ErrorMessage label, delete the Search widgets, and update the menu items. To update widgets on the Header master, perform the following steps: In the Masters pane, double-click on the icon  next to the Header master to open in the design area. In the Widgets pane, drag the Label widget  and place it at coordinates (730,0). Now, select the Text Field widget and type Your email or password is incorrect.. In the Widget Interactions and Notes pane, click in the Shape Name field and type ErrorMessage. In the Widget Properties and Style pane, with the Style tab selected, scroll to Font and perform the following steps: Change the font size to 8. Click on the down arrow next to the Text Color icon . In the drop-down menu, in the # text field, enter FF0000. In the toolbar, click on the checkbox next to Hidden. Click on the EmailTextField at coordinates (730,10). If text is displayed on the text field, right-click and click Edit Text. All text on the widget will be highlighted, click on Delete. In the Widget Properties and Style pane, with the Properties tab selected, scroll to Text Field and perform the following steps: Next to Hint Text, enter Email. Click Hint Style. In the Set Interaction Styles dialog box, click on the checkbox next to Font Color. Click on the down arrow next to the Text Color icon . In the drop-down menu, in the # text field, enter 999999. Click on OK. Click on the PasswordTextField at coordinates (815,10). If text is displayed on the text field, right-click and click on Edit Text. All text on the widget will be highlighted, press Delete. In the Widget Properties and Style pane, with the Properties tab selected, scroll to Text Field and perform the following steps: Click on the drop-down menu next to Type and select Password. Next to Hint Text, enter Password. Click on Hint Style. In the Set Interaction Styles dialog box, click on the checkbox next to Font Color. Click on the down arrow next to the Text Color icon . In the drop-down menu, in the # text field, enter 999999. Click on OK. Click on the SearchTextField at coordinates (730,82) and then on Delete. Click on the SearchButton at coordinates (890,80) and then on Delete. Next, we will convert all the Log In widgets into a dynamic panel named LoginDP. The LoginDP will allow us to transition between states and show different content when a user logs in. To create the LoginDP, in our header, select the following widgets: Named Widget Coordinates ErrorMessage (730,0) EmailTextField (730,10) PasswordTextField (815,10) LogInButton (894,10) NewUserLink (730,30) ForgotLink (815,30) With the preceding six widgets selected, right-click and click Convert to Dynamic Panel. In the Widget Interactions and Notes pane, click on the Dynamic Panel Name field and type LogInDP. All the Log In widgets are now on State1 of the LogInDP. We will now add widgets to State2 for the LogInDP. With the Log In widgets converted into the LogInDP, we will now add and design State2. In the Widget Manager pane, under the LogInDP, right-click on State1, and in the menu, click on Add State. Click on the State icon beside  State2 twice, to open in the design area. Perform the following steps: In the Widgets pane, drag the Label widget  and place it at coordinates (0,13) and do the these steps: Type Welcome, email@test.com. In the Widget Interactions and Notes pane, click in the Shape Name field and type WelcomeLabel. In the Widget Properties and Style pane, with the Style tab selected scroll to Font, change the font size to 9, and click on the Italic icon . In the Widgets pane, drag the Button Shape widget  and place it at coordinates (164,10). Type Log Out. In the toolbar, change w: to 56 and h: to 16. In the Widget Interactions and Notes pane, click on the Shape Name field and type LogOutButton. To complete the design of the Header master, we need to rename the menu items on the HzMenu. In the Masters pane, double-click on the Header master to open in the design area. Click on the HzMenu at coordinates (250,80). Perform the following steps: Click on the first menu item and type Random Musings. In the Widget Interactions and Notes pane, click on the Menu Item Name field and type RandomMusingsMenuItem. Click on Case 1 under the OnClick event and press the Delete key. Click on Create Link…. In the pop-up sitemap, click on Random Musings. Again, click on the first menu item and type Accolades and News. In the Widget Interactions and Notes pane, click on the Menu Item Name field and type AccoladesMenuItem. Click on Case 1 under the OnClick event and press the Delete key. Click on Create Link…. In the pop-up sitemap, click on Accolades and News. Click on the first menu item and type About. In the Widget Interactions and Notes pane, click on the Menu Item Name field and type AboutMenuItem. Click on Case 1 under the OnClick event and press the Delete key. Click on Create Link…. In the pop-up sitemap, click on About. We will now create a registration lightbox that will be shown when the user clicks on the NewUserLink. To display a dynamic panel in a lightbox, we will use the OnShow action with the option treat as lightbox set. We will use the Registration dynamic panel's Pin to Browser property to have the dynamic panel shown in the center and middle of the window. Learn more at http://www.axure.com/learn/dynamic-panels/basic/lightbox-tutorial. In the Masters pane, double-click on the icon  next to the Header master to open in the design area. In the Widgets pane, drag the Dynamic Panel widget  and place it at coordinates (310,200). In the toolbar, change w: to 250, h: to 250, and click on the Hidden checkbox. In the Widget Interactions and Notes pane, click on the Dynamic Panel Name field and type RegistrationLightBoxDP. In the Widget Manager pane with the Properties tab selected, click on Pin to Browser. In the Pin to Browser dialog box, click on the checkbox next to Pin to browser window and click on OK. In the Widget Manager pane, under the RegistrationLightBoxDP, click on the State icon  beside State1 twice to open in the design area. In the Widgets pane, drag the Rectangle widget  and place it at coordinates (0,0). In the Widget Interactions and Notes pane, click on the Shape Name field and type BackgroundRectangle. In the toolbar, change w: to 250 and h: to 250. Again in the Widgets pane, drag the Heading2 widget  and place it at coordinates (25,20). With the Heading2 widget selected, type Registration. In the toolbar, change w: to 141 and h: to 28. In the Widget Interactions and Notes pane, click on the Shape Name field and type RegistrationHeading. Repeat steps 8-10 to complete the design of the RegistrationLightBoxDP using the following table (* if applicable): Widget Coordinates Text* (Shown on Widget) Width* (w:) Height* (h:) Name field (In the Widget Interactions and Notes pane)   Label (25,67) Enter Email     EnterEmailLabel   Text Field (25,86)       EnterEmailField   Label (25,121) Enter Password     EnterPasswordLabel   Text Field (25,140)       EnterPasswordField   Button Shape (25,190) Submit 200 30 SubmitButton Click on the EnterEmailField text field at coordinates (25,86). In the Widget Properties and Style pane, with the Properties tab selected, scroll to Text Field and perform the following steps: Next to Hint Text, enter Email. Click on Hint Style. In the Set Interaction Styles dialog box, click on the checkbox next to Font Color. Click on the down arrow next to the Text Color icon . In the drop-down menu, in the # text field, enter 999999. Click on OK. Click on the EnterPasswordField text field at coordinates (25,140). In the Widget Properties and Style pane, with the Properties tab selected, scroll to Text Field and perform the following steps: Click on the drop-down menu next to Type and select Password. Next to Hint Text, enter Password. Click on Hint Style. In the Set Interaction Styles dialog box, click on the checkbox next to Font Color. Click on the down arrow next to the Text Color icon . In the drop-down menu, in the # text field, enter 999999. Click on OK. With the updates completed for the Header master, we are now ready to define the interactions. Refining the interactions for our Header master We will need to add additional interactions for Log In and Registration on our Header master. Interactions with our Header master will be triggered by the following named widgets and events: Dynamic Panel State Widget Event LoginDP State1 LoginButton OnClick LoginDP State1 NewUserLink OnClick LoginDP State1 ForgotLink OnClick LoginDP State2 LogOutButton OnClick RegistrationLightBoxDP State1 SubmitButton OnClick We will now define the interactions for each widget, starting with LoginButton. Defining interactions for the LoginButton When the LoginButton is clicked, the OnClick event will evaluate if the text entered in the EmailTextField and PasswordTextField equals the e-mail and password variable values. If the variables are valid, LoginDP will be set to State2 and text on the WelcomeLabel will be updated. If the variables values are not equal, we will show an error message. We will define these actions by creating two cases: ValidateUser and ShowErrorMessage. Validating the user's email and password To define the ValidateUser case for the OnClick interaction, open the LogInDP State1 in the design area. Click on the LogInButton at coordinates (164,10). In the Widget Interactions and Notes pane with the Interactions tab selected, click on Add Case…. A Case Editor dialog box will open. In the Case Name field, type ValidateUser. In the Case Editor dialog, perform the following steps: You will see the Condition Builder window similar to the one shown in the following screenshot after the first and second conditions are defined: Create the first condition. Click on the Add Condition button. In the Condition Builder dialog box, in the outlined condition box, perform the following steps: In the first dropdown, select text on widget. In the second dropdown, select EmailTextField. In the third dropdown, select equals. In the fourth dropdown, select value. In the fifth dropdown, select [[Email]]. Click the green + sign. Create the second condition. Click on the Add Condition button. In the Condition Builder dialog box, in the outlined condition box, perform the following steps: In the first dropdown, select text on widget. In the second dropdown, select PasswordTextField. In the third dropdown, select equals. In the fourth dropdown, select value. In the fifth dropdown, select [[Password]]. Click on OK. Once the following three actions are defined, you should see the Case Editor similar to the one shown in the following screenshot: Create the first action. To set panel state for the LogInDP dynamic panel, perform the following steps: Under Click to add actions, scroll to the Dynamic Panels drop-down menu and click on Set Panel State. Under Configure actions, click on the checkbox next to LoginDP. Next to Select the state, click on the dropdown and select State2. Create the second action. To set text for the WelcomeLabel, perform the following steps: Under Click to add actions, scroll to the Widgets drop-down menu and click on Set Text. Under Configure actions, click the checkbox next to WelcomeLabel. Under Set text to, click on the dropdown and select value. In the text field, enter Welcome, [[Email]]. Create the third action. To set value of the LoggedIn variable, perform the following steps: Under Click to add actions, scroll to the Variables drop-down menu and click on Set Variable Value. Under Configure actions, click on the checkbox next to LoggedIn. Under Set variable to, click on the first dropdown and click on value. In the text field, enter [[Email]]. Click on OK. With the ValidateUser case completed, next we will create the ShowErrorMessage case. Creating the ShowErrorMessage case To create the ShowErrorMessage case, in the Widget Interactions and Notes pane with the Interactions tab selected, click on Add Case…. A Case Editor dialog box will open. In the Case Name field, type ShowErrorMessage. Create the action. To show the ErrorMessage label, perform the following steps: Under Click to add actions, scroll to the Widgets dropdown, click on the Show/Hide dropdown and click on Show. Under Configure actions, under LoginDP dynamic panel, click on the checkbox next to ErrorMessage. Click on OK. Next, we will enable the interaction for the NewUserLink. Enabling interaction for the NewUserLink When the NewUserLink is clicked, the OnClick event will show the RegistrationLightBox dynamic panel as a lightbox, as shown in the following screenshot: With the LogInDP State1 still opened in the design area, click on the NewUserLink at coordinates (0,30). To enable the OnClick event in the Widget Interactions and Notes pane with the Interactions tab selected, click on Add Case…. A Case Editor dialog box will open. In the Case Name field, type ShowLightBox. Now, create the action; to show the RegistrationLightBox, perform the following steps: Under Click to add actions, scroll to the Widgets dropdown, click on the Show/Hide dropdown, and click on Show. Under Configure actions, click on the checkbox next to RegistrationLightBoxDP. Next go to More options, click on the dropdown and select treat as lightbox. Click on OK. Next, we will activate interactions for the ForgotLink. Activating interactions for the ForgotLink When the ForgotLink is clicked, the OnClick event will show the RegistrationLightBox dynamic panel as a lightbox, the RegistrationHeading text will be updated to display Forgot Password? and the EnterPassworldLabel, as well as the EnterPasswordField, will be hidden. To enable the OnClick event, in the Widget Interactions and Notes pane with the Interactions tab selected, click on Add Case…. A Case Editor dialog box will open. In the Case Name field, type ShowForgotLB. In the Case Editor dialog, perform the following steps: Create the first action; to show the RegistrationLightBox, perform the following steps: Under Click to add actions, scroll to the Widgets dropdown, click on the Show/Hide dropdown and click on Show. Under Configure actions, click on the checkbox next to RegistrationLightBoxDP. Next, go to More options, click on the dropdown and select treat as lightbox. Create the second action; to set text for the RegistrationHeading, perform the following steps: Under Click to add actions, scroll to the Widgets drop-down menu and click on Set Text. Under Configure actions, click on the checkbox next to RegistrationHeading. Under Set text to, click on the dropdown and select value. In the text field, enter Forgot Password?. Create the third action; to hide the EnterPasswordLabel and EnterPasswordField, perform the following steps: Under Click to add actions, scroll to the Widgets dropdown, click on the Show/Hide dropdown, and click on Hide. Under Configure actions, under RegistrationLightBoxDP, click on the checkboxes next to EnterPasswordLabel and EnterPasswordField. Click on OK. We have now completed the interactions for State1 of LoginDP. Next, we will facilitate interactions for the LogOutButton. Facilitating interactions for the LogOutButton When the LogOutButton is clicked, the OnClick event will perform the following actions: Hide the ErrorMessage on the LoginDP State1 Set text for PasswordTextField and EmailTextField Set panel state for LoginDP to State1 Set variable value for LoggedIn To enable the OnClick event, open the LogInDP State2 in the design area. Click on the LogInOut at coordinates (164,10). In the Widget Interactions and Notes pane, with the Interactions tab selected, click on Add Case…. A Case Editor dialog box will open. In the Case Name field, type LogOut. In the Case Editor dialog, perform the following steps: Create the first action; to hide the ErrorMessage, perform the following steps: Under Click to add actions, scroll to the Widgets dropdown, click on the Show/Hide dropdown, and click on Hide. Under Configure actions, under LoginDP, click on the checkbox next to ErrorMessage. Create the second action; to set text for the PasswordTextField and EmailTextField, perform the following steps: Under Click to add actions, scroll to the Widgets drop-down menu and click on Set Text. Under Configure actions, click the checkbox next to PasswordTextField. Under Set text to, click the dropdown and select value. In the text field, clear any text shown. Under Configure actions, click the checkbox next to EmailTextField. Under Set text to, click on the dropdown and select value. In the text field, enter Email. Create the third action; to set panel state for the LogInDP dynamic panel, perform the following steps: Under Click to add actions, scroll to the Dynamic Panels drop-down menu and click on Set Panel State. Under Configure actions, click on the checkbox next to LoginDP. Next to Select the state, click on the dropdown and select State1. Create the fourth action. To set variable value of LoggedIn, perform the following steps: Under Click to add actions, scroll to the Variables drop-down menu and click on Set Variable Value. Under Configure actions, click on the checkbox next to LoggedIn. Under Set variable to, click on the first dropdown and click on value. In the text field, enter No. Click on OK. We have now completed interactions for State2 of the LoginDP. Next, we will construct interactions for the RegistrationLightBoxDP. Constructing interactions for the RegistrationLightBoxDP When the LoginButton is clicked, the OnClick event hides RegistrationLightBoxDp and sets the Email and Password variable values to the text entered in the EnterEmailField and EnterPasswordField. Also, if text on the RegistrationHeading label is equal to Registration, LoginDP will be set to State2. We will define these actions by creating two cases: UpdateVariables and ShowLogInState. Updating Variables and hiding the RegistrationLightBoxDP In the Widget Manger pane, double-click on the RegistrationLightBoxDP State1 to open in the design area. To define the UpdateVariables case for the OnClick interaction, click on the SubmitButton at coordinates (25,190). In the Widget Interactions and Notes pane with the Interactions tab selected, click on Add Case…. A Case Editor dialog box will open. In the Case Name field, type UpdateVariables. In the Case Editor dialog, perform the following steps: The following screenshot shows Case Editor with the actions defined: Create the first action; to set variable value for the Email and Password variables, perform the following steps: Under Click to add actions, scroll to the Widgets drop-down menu and click on Set Variable Value. Under Configure actions, click on the checkbox next to Email. Under Set variable to, click on the first dropdown and select text on widget. Click on the second dropdown and select EnterEmailField. Under Configure actions, click on the checkbox next to Password. Under Set variable to, click on the first dropdown and select text on widget. Click on the second dropdown and select EnterPasswordField. Create the second action; to hide RegistrationLightBoxDP, perform the following steps: Under Click to add actions, scroll to the Widgets dropdown, click on the Show/Hide dropdown and click on Hide. Under Configure actions, click on the checkbox next to RegistrationLightBoxDP. Click on OK. With the UpdateVariables case completed, next we will create the ShowLogInState case. Creating the ShowLoginState case To create the ShowLogInState case, in the Widget Interactions and Notes pane with the Interactions tab selected click on Add Case…. A Case Editor dialog box will open. In the Case Name field, type ShowLogInState. In the Case Editor dialog, perform the following steps: Click on the Add Condition button to create the first condition. In the Condition Builder dialog box, go to the outlined condition box and perform the following steps: In the first dropdown, select text on widget. In the second dropdown, select RegistrationHeadline. In the third dropdown, select equals. In the fourth dropdown, select value. In the fifth dropdown, select Registration. Click on OK. Create the first action; to set text for the WelcomeLabel, perform the following steps: Under Click to add actions, scroll to the Widgets drop-down menu and click on Set Text. Under Configure actions, click on the checkbox next to WelcomeLabel. Under Set text to, click on the dropdown and select value. In the text field, enter Welcome, [[Email]]. Click on OK. Create the second action; to set panel state for the LogInDP dynamic panel, perform the following steps: Under Click to add actions, scroll to the Dynamic Panels drop-down menu and click on Set Panel State. Under Configure actions, click on the checkbox next to LoginDP. Next to Select the state, click on the dropdown and select State2. Create the third action; to set value of the LoggedIn variable, perform the following steps: Under Click to add actions, scroll to the Variables drop-down menu and click on Set Variable Value. Under Configure actions, click on the checkbox next to LoggedIn. Under Set variable to, click on the first dropdown and click on value. In the text field, enter [[Email]]. Click on OK. Under the OnClick event, right-click on the ShowErrorMessage case and click on Toggle IF/ELSE IF. With our Header master updated, we are now ready to refresh data for our Forum repeater. Summary We learned how to leverage masters and pages from our community site to create a new blog site. We enhanced the Header master and refined the interactions for our Header master. Resources for Article: Further resources on this subject: Home Page Structure [article] Axure RP 6 Prototyping Essentials: Advanced Interactions [article] Common design patterns and how to prototype them [article]
Read more
  • 0
  • 0
  • 9264
article-image-oracle-goldengate-12c-overview
Packt
10 Aug 2015
21 min read
Save for later

Oracle GoldenGate 12c — An Overview

Packt
10 Aug 2015
21 min read
In this article by John P Jeffries, author of the book Oracle GoldenGate 12c Implementer's Guide, he provides an introduction to Oracle GoldenGate by describing the key components, processes, and considerations required to build and implement a GoldenGate solution. John tells you how to address some of the issues that influence the decision-making process when you design a GoldenGate solution. He focuses on the additional configuration options available in Oracle GoldenGate 12c (For more resources related to this topic, see here.) 12c new features Oracle has provided some exciting new features in their 12c version of GoldenGate, some of which we have already touched upon. Following the official desupport of Oracle Streams in Oracle Database 12c, Oracle has essentially migrated some of the key features to its strategic product. You will find that GoldenGate now has a tighter integration with the Oracle database, enabling enhanced functionality. Let's explore some of the new features available in Oracle GoldenGate 12c. Integrated capture Integrated capture has been available since Oracle GoldenGate 11gR2 with Oracle Database 11g (11.2.0.3). Originally decoupled from the database, GoldenGate's new architecture provides the option to integrate its Extract process(es) with the Oracle database. This enables GoldenGate to access the database's data dictionary and undo tablespace, providing replication support for advanced features and data types. Oracle GoldenGate 12c still supports the original Extract configuration, known as Classic Capture. Integrated Replicat Integrated Replicat is a new feature in Oracle GoldenGate 12c for the delivery of data to Oracle Database 11g (11.2.0.4) or 12c. The performance enhancement provides better scalability and load balancing that leverages the database parallel apply servers for automatic, dependency-aware parallel Replicat processes. With Integrated Replicat, there is no need for users to manually split the delivery process into multiple threads and manage multiple parameter files. GoldenGate now uses a lightweight streaming API to prepare, coordinate, and apply the data to the downstream database. Oracle GoldenGate 12c still supports the original Replicat configuration, known as Classic Delivery. Downstream capture Downstream capture was one of my favorite Oracle Stream features. It allows for a combined in-memory capture and apply process that achieves very low latency even in heavy data load situations. Like Streams, GoldenGate builds on this feature by employing a real-time downstream capture process. This method uses Oracle Data Guard's log transportation mechanism, which writes changed data to standby redo logs. It provides a best-of-both-worlds approach, enabling a real-time mine configuration that falls back to archive log mining when the apply process cannot keep up. In addition, the real-time mine process is re-enabled automatically when the data throughput is less. Installation One of the major changes in Oracle GoldenGate 12c is the installation method. Like other Oracle products, Oracle GoldenGate 12c is now installed using the Java-based Oracle Universal Installer (OUI) in either the interactive or silent mode. OUI reads the Oracle Inventory on your system to discover existing installations (Oracle Homes), allowing you to install, deinstall, or clone software products. Upgrading to 12c Whether you wish to upgrade your current GoldenGate installation from Oracle GoldenGate 11g Release 2 or from an earlier version, the steps are the same. Simply stop all the GoldenGate running processes on your database server, backup the GoldenGate home, and then use OUI to perform the fresh installation. It is important to note, however, while restarting replication, ensure the capture process begins from the point at which it was gracefully stopped to guarantee against lost synchronization data. Multitenant database replication As the version suggests, Oracle GoldenGate 12c now supports data replication for Oracle Database 12c. Those familiar with the 12c database features will be aware of the multitenant container database (CDB) that provides database consolidation. Each CDB consists of a root container and one or more pluggable databases (PDB). The PDB can contain multiple schemas and objects, just like a conventional database that GoldenGate replicates data to and from. The GoldenGate Extract process pulls data from multiple PDBs or containers in the source, combining the changed data into a single trail file. Replicat, however, splits the data into multiple process groups in order to apply the changes to a target PDB. Coordinated Delivery The Coordinated Delivery option applies to the GoldenGate Replicat process when configured in the classic mode. It provides a performance gain by automatically splitting the delivered data from a remote trail file into multiple threads that are then applied to the target database in parallel. GoldenGate manages the coordination across selected events that require ordering, including DDL, primary key updates, event marker interface (EMI), and SQLEXEC. Coordinated Delivery can be used with both Oracle (from version 11.2.0.4) and non-Oracle databases. Event-based processing In GoldenGate 12c, event-based processing has been enhanced to allow specific events to be captured and acted upon automatically through an EMI. SQLEXEC provides the API to EMI, enabling programmatic execution of tasks following an event. Now it is possible, for example, to detect the start of a batch job or large transaction, trap the SQL statement(s), and ignore the subsequent multiple change records until the end of the source system transaction. The original DML can then be replayed on the target database as one transaction. This is a major step forward in the performance tuning for data replication. Enhanced security Recent versions of GoldenGate have included security features such as the encryption of passwords and data. Oracle GoldenGate 12c now supports a credential store, better known as an Oracle wallet, that securely stores an alias associated with a username and password. The alias is then referenced in the GoldenGate parameter files rather than the actual username and password. Conflict Detection and Resolution In earlier versions of GoldenGate, Conflict Detection and Resolution (CDR) has been somewhat lightweight and was not readily available out of the box. Although available in Oracle Streams, the GoldenGate administrator would have to programmatically resolve any data conflict in the replication process using GoldenGate built-in tools. In the 12c version, the feature has emerged as an easily configurable option through Extract and Replicat parameters. Dynamic Rollback Selective data back out of applied transactions is now possible using the Dynamic Rollback feature. The feature operates at table and record-level and supports point-in-time recovery. This potentially eliminates the need for a full database restore, following data corruption, erroneous deletions, or perhaps the removal of test data, thus avoiding hours of system downtime. Streams to GoldenGate migration Oracle Streams users can now migrate their data replication solution to Oracle GoldenGate 12c using a purpose-built utility. This is a welcomed feature given that Streams is no longer supported in Oracle Database 12c. The Streams2ogg tool auto generates Oracle GoldenGate configuration files that greatly simplify the effort required in the migration process. Performance In today's demand for real-time access to real-time data, high performance is the key. For example, businesses will no longer wait for information to arrive on their DSS to make decisions and users will expect the latest information to be available in the public cloud. Data has value and must be delivered in real time to meet the demand. So, how long does it take to replicate a transaction from the source database to its target? This is known as end-to-end latency, which typically has a threshold that must not be breeched in order to satisfy a predefined Service Level Agreement (SLA). GoldenGate refers to latency as lag, which can be measured at different intervals in the replication process. They are as follows: Source to Extract: The time taken for a record to be processed by the Extract compared to the commit timestamp on the database Replicat to target: The time taken for the last record to be processed by the Replicat process compared to the record creation time in the trail file A well-designed system may still encounter spikes in the latency, but it should never be continuous or growing. Peaks are typically caused by load on the source database system, where the latency increases with the number of transactions per second. Lag should be measured as an average over a specified period. Trying to tune GoldenGate when the design is poor is a difficult situation to be in. For the system to perform well, you may need to revisit the design. Availability Another important NFR is availability. Normally quoted as a percentage, the system must be available for the specified length of time. For example, NFR of 99.9 percent availability equates to a downtime of 8.76 hours in a year, which sounds quite a lot, especially if it were to occur all at once. Oracle's maximum availability architecture (MAA) offers enhanced availability through products such as Real Application Clusters (RAC) and Active Data Guard (ADG). However, as we previously described, the network plays a major role in data replication. The NFR relates to the whole system, so you need to be sure your design covers redundancy for all components. Event-based processing It is important in any data replication environment to capture and manage events, such as trail records containing specific data or operations or maybe the occurrence of a certain error. These are known as Event Markers. GoldenGate provides a mechanism to perform an action on a given event or condition. These are known as Event Actions and are triggered by Event Records. If you are familiar with Oracle Streams, Event Actions are like rules. The Event Marker System GoldenGate's Event Marker System, also known as event marker interface (EMI), allows custom DML-driven processing on an event. This comprises of an Event Record to trigger a given action. An Event Record can be either a trail record that satisfies a condition evaluated by a WHERE or FILTER clause or a record written to an event table that enables an action to occur. Typical actions are writing status information, reporting errors, ignoring certain records in a trail, invoking a shell script, or performing an administrative task. The following Replicat code describes the process of capturing an event and performing an action by logging DELETE operations made against the CREDITCARD_ACCOUNTS table using the EVENTACTIONS parameter: MAP SRC.CREDITCARD_ACCOUNTS, TARGET TGT.CREDITCARD_ACCOUNTS_DIM;TABLE SRC.CREDITCARD_ACCOUNTS, &FILTER (@GETENV ('GGHEADER', 'OPTYPE') = 'DELETE'), &EVENTACTIONS (LOG INFO); By default, all logged information is written to the process group report file, the GoldenGate error log, and the system messages file. On Linux, this is the /var/log/messages file. Note that the TABLE parameter is also used in the Replicat's parameter file. This is a means of triggering an Event Action to be executed by the Replicat when it encounters an Event Marker. The following code shows the use of the IGNORE option that prevents certain records from being extracted or replicated, which is particularly useful to filter out system type data. When used with the TRANSACTION option, the whole transaction and not just the Event Record is ignored: TABLE SRC.CREDITCARD_ACCOUNTS, &FILTER (@GETENV ('GGHEADER', 'OPTYPE') = 'DELETE'), &EVENTACTIONS (IGNORE TRANSACTION); The preceding code extends the previous code by stopping the Event Record itself from being replicated. Using Event Actions to improve batch performance All replication technologies typically suffer from one flaw that is the way in which the data is replicated. Consider a table that is populated with a million rows as part of a batch process. This may be a bulk insert operation that Oracle completes on the source database as one transaction. However, Oracle will write each change to its redo logs as Logical Change Records (LCRs). GoldenGate will subsequently mine the logs, write the LCRs to a remote trail, convert each one back to DML, and apply them to the target database, one row at a time. The single source transaction becomes one million transactions, which causes a huge performance overhead. To overcome this issue, we can use Event Actions to: Detect the DML statement (INSERT INTO TABLE SELECT ..) Ignore the data resulting from the SELECT part of the statement Replicate just the DML statement as an Event Record Execute just the DML statement on the target database The solution requires a statement table on both source and target databases to trigger the event. Also, both databases must be perfectly synchronized to avoid data integrity issues. User tokens User tokens are GoldenGate environment variables that are captured and stored in the trail record for replication. They can be accessed via the @GETENV function. We can use token data in column maps, stored procedures called by SQLEXEC, and, of course, in macros. Using user tokens to populate a heartbeat table A vast array of user tokens exist in GoldenGate. Let's start by looking at a common method of replicating system information to populate a heartbeat table that can be used to monitor performance. We can use the TOKENS option of the Extract TABLE parameter to define a user token and associate it with the GoldenGate environment data. The following Extract configuration code shows the token declarations for the heartbeat table: TABLE GGADMIN.GG_HB_OUT, &TOKENS (EXTGROUP = @GETENV ("GGENVIRONMENT","GROUPNAME"), &EXTTIME = @DATE ("YYYY-MM-DD HH:MI:SS.FFFFFF","JTS",@GETENV("JULIANTIMESTAMP")), &EXTLAG = @GETENV ("LAG","SEC"), &EXTSTAT_TOTAL = @GETENV ("DELTASTATS","DML"), &), FILTER (@STREQ (EXTGROUP, @GETENV("GGENVIRONMENT","GROUPNAME"))); For the data pump, the example Extract configuration is shown here: TABLE GGADMIN.GG_HB_OUT, &TOKENS (PMPGROUP = @GETENV ("GGENVIRONMENT","GROUPNAME"), &PMPTIME = @DATE ("YYYY-MM-DD HH:MI:SS.FFFFFF","JTS",@GETENV("JULIANTIMESTAMP")), &PMPLAG = @GETENV ("LAG","SEC")); Also, for the Replicat, the following configuration populates the heartbeat table on the target database with the token data derived from Extract, data pump, and Replicat, containing system details and replication lag: MAP GGADMIN.GG_HB_OUT_SRC, TARGET GGADMIN.GG_HB_IN_TGT, &KEYCOLS (DB_NAME, EXTGROUP, PMPGROUP, REPGROUP), &INSERTMISSINGUPDATES, &COLMAP (USEDEFAULTS, &ID = 0, &SOURCE_COMMIT = @GETENV ("GGHEADER", "COMMITTIMESTAMP"), &EXTGROUP = @TOKEN ("EXTGROUP"), &EXTTIME = @TOKEN ("EXTTIME"), &PMPGROUP = @TOKEN ("PMPGROUP"), &PMPTIME = @TOKEN ("PMPTIME"), &REPGROUP = @TOKEN ("REPGROUP"), &REPTIME = @DATE ("YYYY-MM-DD HH:MI:SS.FFFFFF","JTS",@GETENV("JULIANTIMESTAMP")), &EXTLAG = @TOKEN ("EXTLAG"), &PMPLAG = @TOKEN ("PMPLAG"), &REPLAG = @GETENV ("LAG","SEC"), &EXTSTAT_TOTAL = @TOKEN ("EXTSTAT_TOTAL")); As in the heartbeat table example, the defined user tokens can be called in a MAP statement using the @TOKEN function. The SOURCE_COMMIT and LAG metrics are self-explained. However, EXTSTAT_TOTAL, which is derived from DELTASTATS, is particularly useful to measure the load on the source system when you evaluate latency peaks. For applications, user tokens are useful to audit data and trap exceptions within the replicated data stream. Common user tokens are shown in the following code that replicates the token data to five columns of an audit table: MAP SRC.AUDIT_LOG, TARGET TGT.AUDIT_LOG, &COLMAP (USEDEFAULTS, &OSUSER = @TOKEN ("TKN_OSUSER"), &DBNAME = @TOKEN ("TKN_DBNAME"), &HOSTNAME = @TOKEN ("TKN_HOSTNAME"), &TIMESTAMP = @TOKEN ("TKN_COMMITTIME"), &BEFOREAFTERINDICATOR = @TOKEN ("TKN_ BEFOREAFTERINDICATOR"); The BEFOREAFTERINDICATOR environment variable is particularly useful to provide a status flag in order to check whether the data was from a Before or After image of an UPDATE or DELETE operation. By default, GoldenGate provides After images. To enable a Before image extraction, the GETUPDATEBEFORES Extract parameter must be used on the source database. Using logic in the data replication GoldenGate has a number of functions that enable the administrator to program logic in the Extract and Replicat process configuration. These provide generic functions found in the IF and CASE programming languages. In addition, the @COLTEST function enables conditional calculations by testing for one or more column conditions. This is typically used with the @IF function, as shown in the following code: MAP SRC.CREDITCARD_PAYMENTS, TARGET TGT.CREDITCARD_PAYMENTS_FACT,&COLMAP (USEDEFAULTS, &AMOUNT = @IF(@COLTEST(AMOUNT, MISSING, INVALID), 0, AMOUNT)); Here, the @COLTEST function tests the AMOUNT column in the source data to check whether it is MISSING or INVALID. The @IF function returns 0 if @COLTEST returns TRUE and returns the value of AMOUNT if FALSE. The target AMOUNT column is therefore set to 0 when the equivalent source is found to be missing or invalid; otherwise, a direct mapping occurs. The @CASE function tests a list of values for a match and then returns a specified value. If no match is found, @CASE will return a default value. There is no limit to the number of cases to test; however, if the list is very large, a database lookup may be more appropriate. The following code shows the simplicity of the @CASE statement. Here, the country name is returned from the country code: MAP SRC.CREDITCARD_STATEMENT, TARGET TGT.CREDITCARD_STATEMENT_DIM,&COLMAP (USEDEFAULTS, &COUNTRY = @CASE(COUNTRY_CODE, "UK", "United Kingdom", "USA","United States of America")); Other GoldenGate functions: @EVAL and @VALONEOF exist that perform tests. Similar to @CASE, @VALONEOF compares a column or string to a list of values. The difference being it evaluates more than one value against a single column or string. When the following code is used with @IF, it returns "EUROPE" when TRUE and "UNKNOWN" when FALSE: MAP SRC.CREDITCARD_STATEMENT, TARGET TGT.CREDITCARD_STATEMENT_DIM,&COLMAP (USEDEFAULTS, &REGION = @IF(@VALONEOF(COUNTRY_CODE, "UK","E", "D"),"EUROPE","UNKNOWN")); The @EVAL function evaluates a list of conditions and returns a specified value. Optionally, if none are satisfied, it returns a default value. There is no limit to the number of evaluations you can list. However, it is best to list the most common evaluations at the beginning to enhance performance. The following code includes the BEFORE option that compares the before value of the replicated source column to the current value of the target column. Depending on the evaluation, @EVAL will return "PAID MORE", "PAID LESS", or "PAID SAME": MAP SRC.CREDITCARD_ PAYMENTS, TARGET TGT.CREDITCARD_PAYMENTS, &COLMAP (USEDEFAULTS, &STATUS = @EVAL(AMOUNT < BEFORE.AMOUNT, "PAID LESS", AMOUNT > BEFORE.AMOUNT, "PAID MORE", AMOUNT = BEFORE.AMOUNT, "PAID SAME")); The BEFORE option can be used with other GoldenGate functions, including the WHERE and FILTER clauses. However, for the Before image to be written to the trail and to be available, the GETUPDATEBEFORES parameter must be enabled in the source database's Extract parameter file or the target database's Replicat parameter file, but not both. The GETUPDATEBEFORES parameter can be set globally for all tables defined in the Extract or individually per table using GETUPDATEBEFORES and IGNOREUPDATEBEFORES, as seen in the following code: EXTRACT EOLTP01USERIDALIAS srcdb DOMAIN adminSOURCECATALOG PDB1EXTTRAIL ./dirdat/aaGETAPPLOPSIGNOREREPLICATESGETUPDATEBEFORESTABLE SRC.CHECK_PAYMENTS;IGNOREUPDATEBEFORESTABLE SRC.CHECK_PAYMENTS_STATUS;TABLE SRC.CREDITCARD_ACCOUNTS;TABLE SRC.CREDITCARD_PAYMENTS; Tracing processes to find wait events If you have worked with Oracle software, particularly in the performance tuning space, you will be familiar with tracing. Tracing enables additional information to be gathered from a given process or function to diagnose performance problems or even bugs. One example is the SQL trace that can be enabled at a database session or the system level to provide key information, such as; wait events, parse, fetch, and execute times. Oracle GoldenGate 12c offers a similar tracing mechanism through its trace and trace2 options of the SEND GGSCI command. This is like the session-level SQL trace. Also, in a similar fashion to performing a database system trace, tracing can be enabled in the GoldenGate process parameter files that make it permanent until the Extract or Replicat is stopped. trace provides processing information, whereas trace2 identifies the processes with wait events. The following commands show tracing being dynamically enabled for 2 minutes on a running Replicat process: GGSCI (db12server02) 1> send ROLAP01 trace2 ./dirrpt/ROLAP01.trc Wait for 2 minutes, then turn tracing off: GGSCI (db12server02) 2> send ROLAP01 trace2 offGGSCI (db12server02) 3> exit To view the contents of the Replicat trace file, we can execute the following command. In the case of a coordinated Replicat, the trace file will contain information from all of its threads: $ view dirrpt/ROLAP01.trcstatistics between 2015-08-08 Wed HKT 11:55:27 and 2015-08-08 Wed HKT11:57:28RPT_PROD_Ol.LIMIT_TP_RESP : n=2 : op=Insert; total=3; avg=1.5000;max=3msecRPT_PROD_01.SUP_POOL_SMRY_HIST : n=1 : op=Insert; total=2; avg=2.0000;max=2msecRPT_PROD_01.EVENTS : n=1 : op=Insert; total=2; avg=2.0000; max=2msecRPT_PROD_01.DOC_SHIP_DTLS : n=17880 : op=FieldComp; total=22003;avg=1.2306; max=42msecRPT_PROD_01.BUY_POOL_SMRY_HIST : n=1 : op=Insert; total=2; avg=2.0000;max=2msecRPT_PROD_01.LIMIT_TP_LOG : n=2 : op-Insert; total=2; avg=1.0000;max=2msecRPT_PROD_01.POOL_SMRY : n=1 : op=FieldComp; total=2; avg=2.0000;max=2msec..===============================================summary==============Delete : n=2; total=2; avg=1.00;Insert : n=78; total=356; avg=4.56;FieldComp : n=85728; total=123018; avg=1.43;total_op_num=85808 : total_op_time=123376 ms : total_avg_time=1.44ms/optotal commit number=1 The trace file provides the following information: The table name The operation type (FieldComp is for a compressed field) The number of operations The average wait The maximum wait Summary Armed with the preceding information, we can quickly see what operations against which tables are taking the longest time. Exception handling Oracle GoldenGate 12c now supports Conflict Detection and Resolution (CDR). However, out-of-the-box, GoldenGate takes a catch all approach to exception handling. For example, by default, should any operational failure occur, a Replicat process will ABEND and roll back the transaction to the last known checkpoint. This may not be ideal in a production environment. The HANDLECOLLISIONS and NOHANDLECOLLISIONS parameters can be used to control whether or not a Replicat process tries to resolve the duplicate record error and the missing record error. The way to determine what error occurred and on which Replicat is to create an exceptions handler. Exception handling differs from CDR by trapping and reporting Oracle errors suffered by the data replication (DML and DDL). On the other hand, CDR detects and resolves inconsistencies in the replicated data, such as mismatches with before and after images. Exceptions can always be trapped by the Oracle error they produce. GoldenGate provides an exception handler parameter called REPERROR that allows the Replicat to continue processing data after a predefined error. For example, we can include the following configuration in our Replicat parameter file to ignore ORA-00001 "unique constraint (%s.%s) violated": REPERROR (DEFAULT, EXCEPTION)REPERROR (DEFAULT2, ABEND)REPERROR (-1, EXCEPTION) Cloud computing Cloud computing has grown enormously in the recent years. Oracle has named its latest version of products: 12c, the c standing for Cloud of course. The architecture of Oracle 12c Database allows a multitenant container database to support multiple pluggable databases—a key feature of cloud computing—rather than implement the inefficient schema consolidation, typical of the previous Oracle database version architecture, which is known to cause contention on shared resources during high load. The Oracle 12c architecture supports a database consolidation approach through its efficient memory management and dedicated background processes. Online computer companies such as Amazon have leveraged the cloud concept by offering Relational Database Services (RDS), which is becoming very popular for its speed of readiness, support, and low cost. The cloud environments are often huge, containing hundreds of servers, petabytes of storage, terabytes of memory, and countless CPU cores. The cloud has to support multiple applications in a multi-tiered, shared environment, often through virtualization technologies, where storage and CPUs are typically the driving factors for cost-effective options. Customers choose their hardware footprint that best suits their budget and system requirements, commonly known as Platform as a Service (PaaS). Cloud computing is an extension to grid computing that offers both public and private clouds. GoldenGate and Big Data It is increasingly evident that organizations need to quickly access, analyze, and report on their data across their Enterprise in order to be agile in a competitive market. Data is becoming more of an asset to companies; it adds value to a business, but may be stored in any number of current and legacy systems, making it difficult to realize its full potential. Known as big data, it has until recently been nearly impossible to perform real-time business analysis on the combined data from multiple sources. Nowadays, the ability to access all transactional data with low latency is essential. With the introduction of products such as Apache Hadoop, integration of structured data from an RDBMS, including semi-structured and unstructured data, offers a common playing field to support business intelligence. When coupled with ODI, GoldenGate for big data provides real-time delivery to a suite of Apache products, such as Flume, HDFS, Hive, and Hbase, to support big data analytics. Summary In this article, we have learned an introduction to Oracle GoldenGate by describing the key components, processes, and considerations required to build and implement a GoldenGate solution. Resources for Article: Further resources on this subject: What is Oracle Public Cloud? [Article] Oracle GoldenGate- Advanced Administration Tasks - I [Article] Oracle B2B Overview [Article]
Read more
  • 0
  • 0
  • 6636

article-image-securing-openstack-networking
Packt
10 Aug 2015
10 min read
Save for later

Securing OpenStack Networking

Packt
10 Aug 2015
10 min read
In this article by Fabio Alessandro Locati, author of the book OpenStack Cloud Security, you will learn about the importance of firewall, IDS, and IPS. You will also learn about Generic Routing Encapsulation, VXLAN. (For more resources related to this topic, see here.) The importance of firewall, IDS, and IPS The security of a network can and should be achieved in multiple ways. Three components that are critical to the security of a network are: Firewall Intrusion detection system (IDS) Intrusion prevention system (IPS) Firewall Firewalls are systems that control traffic passing through them based on rules. This can seem something like a router, but they are very different. The router allows communication between different networks while the firewall limits communication between networks and hosts. The root of this confusion may occur because very often the router will have the firewall functionality and vice versa. Firewalls need to be connected in a series to your infrastructure. The first paper on the firewall technology appeared in 1988 and designed the packet filter firewall. This kind of firewall is often known as first generation firewall. This kind of firewall analyzes the packages passing through and if the package matches a rule, the firewall will act accordingly to that rule. This firewall will analyze each package by itself and will not consider other aspects such as other packages. It works on the first three layers of the OSI model with very few features using layer 4 specifically to check port numbers and protocols (UDP/TCP). First generation firewalls are still in use, because in a lot of situations, to do the job properly and are cheap and secure. Examples of typical filtering those firewalls prohibit (or allow) to IPs of certain classes (or specific IPs), to access certain IPs, or allow traffic to a specific IP only on specific ports. There are no known attacks to those kind of firewalls, but specific models can have specific bugs that can be exploited. In 1990, a new generation of firewall appeared. The initial name was circuit-level gateway, but today it is far more commonly known as stateful firewalls or second generation firewall. These firewalls are able to understand when connections are being initialized and closed so that the firewall comes to know what is the current state of a connection when a package arrives. To do so, this kind of firewall uses the first four layers of the networking stack. This allows the firewall to drop all packages that are not establishing a new connection or are in an already established connection. These firewalls are very powerful with the TCP protocol because it has states, while they have very small advantages compared to first generation firewalls handling UDP or ICMP packages, since those packages travel with no connection. In these cases, the firewall sets the connection as established; only the first valid package passes through and closes it after the connection times out. Performance-wise, stateful firewall can be faster than packet firewall because if the package is part of an active connection, no further test will be performed against that package. These kinds of firewalls are more susceptible to bugs in their code since reading more about the package makes it easier to exploit. Also, on many devices, it is possible to open connections (with SYN packages) until the firewall is saturated. In such cases, the firewall usually downgrades itself as a simple router allowing all traffic to pass through it. In 1991, improvements were made to the stateful firewall allowing it to understand more about the protocol of the package it was evaluating. The firewalls of this kind before 1994 had major problems, such as working as a proxy that the user had to interact with. In 1994, the first application firewall, as we know it, was born doing all its job completely transparently. To be able to understand the protocol, this kind of firewall requires an understanding of all seven layers of the OSI model. As for security, the same as the stateful firewall does apply to the application firewall as well. Intrusion detection system (IDS) IDSs are systems that monitor the network traffic looking for policy violation and malicious traffic. The goal of the IDS is not to block malicious activity, but instead to log and report them. These systems act in a passive mode, so you'll not see any traffic coming from them. This is very important because it makes them invisible to attackers so you can gain information about the attack, without the attacker knowing. IDSs need to be connected in parallel to your infrastructure. Intrusion prevention system (IPS) IPSs are sometimes referred to as Intrusion Detection and Prevention Systems (IDPS), since they are IDS that are also able to fight back malicious activities. IPSs have greater possibility to act than IDSs. Other than reporting, like IDS, they can also drop malicious packages, reset the connection, and block the traffic from the offending IP address. IPSs need to be connected in series to your infrastructure. Generic Routing Encapsulation (GRE) GRE is a Cisco tuning protocol that is difficult to position in the OSI model. The best place for it to be is between layers 2 and 3. Being above layer 2 (where VLANs are), we can use GRE inside VLAN. We will not go deep into the technicalities of this protocol. I'd like to focus more on the advantages and disadvantages it has over VLAN. The first advantage of (extended) GRE over VLAN is scalability. In fact, VLAN is limited to 4,096, while GRE tunnels do not have this limitation. If you are running a private cloud and you are working in a small corporation, 4,096 networks could be enough, but will definitely not be enough if you work for a big corporation or if you are running a public cloud. Also, unless you use VTP for your VLANs, you'll have to add VLANs to each network device, while GREs don't need this. You cannot have more than 4,096 VLANs in an environment. The second advantage is security. Since you can deploy multiple GRE tunnels in a single VLAN, you can connect a machine to a single VLAN and multiple GRE networks without the risks that come with putting a port in trunking that is needed to bring more VLANs in the same physical port. For these reasons, GRE has been a very common choice in a lot of OpenStack clusters deployed up to OpenStack Havana. The current preferred networking choice (since Icehouse) is Virtual Extensible LAN (VXLAN). VXLAN VXLAN is a network virtualization technology whose specifications have been originally created by Arista Networks, Cisco, and VMWare, and many other companies have backed the project. Its goal is to offer a standardized overlay encapsulation protocol and it was created because the standard VLAN were too limited for the current cloud needs and the GRE protocol was a Cisco protocol. It works using layer 2 Ethernet frames within layer 4 UDP packages on port 4789. As for the maximum number of networks, the limit is 16 million logical networks. Since the Icehouse release, the suggested standard for networking is VXLAN. Flat network versus VLAN versus GRE in OpenStack Quantum In OpenStack Quantum, you can decide to use multiple technologies for your networks: flat network, VLAN, GRE, and the most recent, VXLAN. Let's discuss them in detail: Flat network: It is often used in private clouds since it is very easy to set up. The downside is that any virtual machine will see any other virtual machines in our cloud. I strongly discourage people from using this network design because it's unsafe, and in the long run, it will have problems, as we have seen earlier. VLAN: It is sometimes used in bigger private clouds and sometimes even in small public clouds. The advantage is that many times you already have a VLAN-based installation in your company. The major disadvantages are the need to trunk ports for each physical host and the possible problems in propagation. I discourage this approach, since in my opinion, the advantages are very limited while the disadvantages are pretty strong. VXLAN: It should be used in any kind of cloud due to its technical advantages. It allows a huge number of networks, its way more secure, and often eases debugging. GRE: Until the Havana release, it was the suggested protocol, but since the Icehouse release, the suggestion has been to move toward VXLAN, where the majority of the development is focused. Design a secure network for your OpenStack deployment As for the physical infrastructure, we have to design it securely. We have seen that the network security is critical and that there a lot of possible attacks in this realm. Is it possible to design a secure environment to run OpenStack? Yes it is, if you remember a few rules: Create different networks, at the very least for management and external data (this network usually already exists in your organization and is the one where all your clients are) Never put ports on trunking mode if you use VLANs in your infrastructure, otherwise physically separated networks will be needed The following diagram is an example of how to implement it: Here, the management, tenant external networks could be either VLAN or real networks. Remember that to not use VLAN trunking, you need at least the same amount of physical ports as of VLAN, and the machine has to be subscribed to avoid port trunking that can be a huge security hole. A management network is needed for the administrator to administer the machines and for the OpenStack services to speak to each other. This network is critical, since it may contain sensible data, and for this reason, it has to be disconnected from other networks, or if not possible, have very limited connectivity. The external network is used by virtual machines to access the Internet (and vice versa). In this network, all machines will need an IP address reachable from the Web. The tenant network, sometimes even called internal or guest network is the network where the virtual machines can communicate with other virtual machines in the same cloud. This network, in some deployment cases, can be merged with the external network, but this choice has some security drawbacks. The API network is used to expose OpenStack APIs to the users. This network requires IP addresses reachable from the Web, and for this reason, is often merged into the external network. There are cases where provider networks are needed to connect tenant networks to existing networks outside the OpenStack cluster. Those networks are created by the OpenStack administrator and map directly to an existing physical network in the data center. Summary In this article, we have seen how networking works, which attacks we can expect, and how we can counter them. Also, we have seen how to implement a secure deployment of OpenStack Networking. Resources for Article: Further resources on this subject: Cloud distribution points [Article] Photo Stream with iCloud [Article] Integrating Accumulo into Various Cloud Platforms [Article]
Read more
  • 0
  • 0
  • 12711

article-image-understanding-hadoop-backup-and-recovery-needs
Packt
10 Aug 2015
25 min read
Save for later

Understanding Hadoop Backup and Recovery Needs

Packt
10 Aug 2015
25 min read
In this article by Gaurav Barot, Chintan Mehta, and Amij Patel, authors of the book Hadoop Backup and Recovery Solutions, we will discuss backup and recovery needs. In the present age of information explosion, data is the backbone of business organizations of all sizes. We need a complete data backup and recovery system and a strategy to ensure that critical data is available and accessible when the organizations need it. Data must be protected against loss, damage, theft, and unauthorized changes. If disaster strikes, data recovery must be swift and smooth so that business does not get impacted. Every organization has its own data backup and recovery needs, and priorities based on the applications and systems they are using. Today's IT organizations face the challenge of implementing reliable backup and recovery solutions in the most efficient, cost-effective manner. To meet this challenge, we need to carefully define our business requirements and recovery objectives before deciding on the right backup and recovery strategies or technologies to deploy. (For more resources related to this topic, see here.) Before jumping onto the implementation approach, we first need to know about the backup and recovery strategies and how to efficiently plan them. Understanding the backup and recovery philosophies Backup and recovery is becoming more challenging and complicated, especially with the explosion of data growth and increasing need for data security today. Imagine big players such as Facebook, Yahoo! (the first to implement Hadoop), eBay, and more; how challenging it will be for them to handle unprecedented volumes and velocities of unstructured data, something which traditional relational databases can't handle and deliver. To emphasize the importance of backup, let's take a look at a study conducted in 2009. This was the time when Hadoop was evolving and a handful of bugs still existed in Hadoop. Yahoo! had about 20,000 nodes running Apache Hadoop in 10 different clusters. HDFS lost only 650 blocks, out of 329 million total blocks. Now hold on a second. These blocks were lost due to the bugs found in the Hadoop package. So, imagine what the scenario would be now. I am sure you will bet on losing hardly a block. Being a backup manager, your utmost target is to think, make, strategize, and execute a foolproof backup strategy capable of retrieving data after any disaster. Solely speaking, the plan of the strategy is to protect the files in HDFS against disastrous situations and revamp the files back to their normal state, just like James Bond resurrects after so many blows and probably death-like situations. Coming back to the backup manager's role, the following are the activities of this role: Testing out various case scenarios to forestall any threats, if any, in the future Building a stable recovery point and setup for backup and recovery situations Preplanning and daily organization of the backup schedule Constantly supervising the backup and recovery process and avoiding threats, if any Repairing and constructing solutions for backup processes The ability to reheal, that is, recover from data threats, if they arise (the resurrection power) Data protection is one of the activities and it includes the tasks of maintaining data replicas for long-term storage Resettling data from one destination to another Basically, backup and recovery strategies should cover all the areas mentioned here. For any system data, application, or configuration, transaction logs are mission critical, though it depends on the datasets, configurations, and applications that are used to design the backup and recovery strategies. Hadoop is all about big data processing. After gathering some exabytes for data processing, the following are the obvious questions that we may come up with: What's the best way to back up data? Do we really need to take a backup of these large chunks of data? Where will we find more storage space if the current storage space runs out? Will we have to maintain distributed systems? What if our backup storage unit gets corrupted? The answer to the preceding questions depends on the situation you may be facing; let's see a few situations. One of the situations is where you may be dealing with a plethora of data. Hadoop is used for fact-finding semantics and data is in abundance. Here, the span of data is short; it is short lived and important sources of the data are already backed up. Such is the scenario wherein the policy of not backing up data at all is feasible, as there are already three copies (replicas) in our data nodes (HDFS). Moreover, since Hadoop is still vulnerable to human error, a backup of configuration files and NameNode metadata (dfs.name.dir) should be created. You may find yourself facing a situation where the data center on which Hadoop runs crashes and the data is not available as of now; this results in a failure to connect with mission-critical data. A possible solution here is to back up Hadoop, like any other cluster (the Hadoop command is Hadoop). Replication of data using DistCp To replicate data, the distcp command writes data to two different clusters. Let's look at the distcp command with a few examples or options. DistCp is a handy tool used for large inter/intra cluster copying. It basically expands a list of files to input in order to map tasks, each of which will copy files that are specified in the source list. Let's understand how to use distcp with some of the basic examples. The most common use case of distcp is intercluster copying. Let's see an example: bash$ hadoop distcp2 hdfs://ka-16:8020/parth/ghiya hdfs://ka-001:8020/knowarth/parth This command will expand the namespace under /parth/ghiya on the ka-16 NameNode into the temporary file, get its content, divide them among a set of map tasks, and start copying the process on each task tracker from ka-16 to ka-001. The command used for copying can be generalized as follows: hadoop distcp2 hftp://namenode-location:50070/basePath hdfs://namenode-location Here, hftp://namenode-location:50070/basePath is the source and hdfs://namenode-location is the destination. In the preceding command, namenode-location refers to the hostname and 50070 is the NameNode's HTTP server post. Updating and overwriting using DistCp The -update option is used when we want to copy files from the source that don't exist on the target or have some different contents, which we do not want to erase. The -overwrite option overwrites the target files even if they exist at the source. The files can be invoked by simply adding -update and -overwrite. In the example, we used distcp2, which is an advanced version of DistCp. The process will go smoothly even if we use the distcp command. Now, let's look at two versions of DistCp, the legacy DistCp or just DistCp and the new DistCp or the DistCp2: During the intercluster copy process, files that were skipped during the copy process have all their file attributes (permissions, owner group information, and so on) unchanged when we copy using legacy DistCp or just DistCp. This, however, is not the case in new DistCp. These values are now updated even if a file is skipped. Empty root directories among the source inputs were not created in the target folder in legacy DistCp, which is not the case anymore in the new DistCp. There is a common misconception that Hadoop protects data loss; therefore, we don't need to back up the data in the Hadoop cluster. Since Hadoop replicates data three times by default, this sounds like a safe statement; however, it is not 100 percent safe. While Hadoop protects from hardware failure on the data nodes—meaning that if one entire node goes down, you will not lose any data—there are other ways in which data loss may occur. Data loss may occur due to various reasons, such as Hadoop being highly susceptible to human errors, corrupted data writes, accidental deletions, rack failures, and many such instances. Any of these reasons are likely to cause data loss. Consider an example where a corrupt application can destroy all data replications. During the process, it will attempt to compute each replication and on not finding a possible match, it will delete the replica. User deletions are another example of how data can be lost, as Hadoop's trash mechanism is not enabled by default. Also, one of the most complicated and expensive-to-implement aspects of protecting data in Hadoop is the disaster recovery plan. There are many different approaches to this, and determining which approach is right requires a balance between cost, complexity, and recovery time. A real-life scenario can be Facebook. The data that Facebook holds increases exponentially from 15 TB to 30 PB, that is, 3,000 times the Library of Congress. With increasing data, the problem faced was physical movement of the machines to the new data center, which required man power. Plus, it also impacted services for a period of time. Data availability in a short period of time is a requirement for any service; that's when Facebook started exploring Hadoop. To conquer the problem while dealing with such large repositories of data is yet another headache. The reason why Hadoop was invented was to keep the data bound to neighborhoods on commodity servers and reasonable local storage, and to provide maximum availability to data within the neighborhood. So, a data plan is incomplete without data backup and recovery planning. A big data execution using Hadoop states a situation wherein the focus on the potential to recover from a crisis is mandatory. The backup philosophy We need to determine whether Hadoop, the processes and applications that run on top of it (Pig, Hive, HDFS, and more), and specifically the data stored in HDFS are mission critical. If the data center where Hadoop is running disappeared, will the business stop? Some of the key points that have to be taken into consideration have been explained in the sections that follow; by combining these points, we will arrive at the core of the backup philosophy. Changes since the last backup Considering the backup philosophy that we need to construct, the first thing we are going to look at are changes. We have a sound application running and then we add some changes. In case our system crashes and we need to go back to our last safe state, our backup strategy should have a clause of the changes that have been made. These changes can be either database changes or configuration changes. Our clause should include the following points in order to construct a sound backup strategy: Changes we made since our last backup The count of files changed Ensure that our changes are tracked The possibility of bugs in user applications since the last change implemented, which may cause hindrance and it may be necessary to go back to the last safe state After applying new changes to the last backup, if the application doesn't work as expected, then high priority should be given to the activity of taking the application back to its last safe state or backup. This ensures that the user is not interrupted while using the application or product. The rate of new data arrival The next thing we are going to look at is how many changes we are dealing with. Is our application being updated so much that we are not able to decide what the last stable version was? Data is produced at a surpassing rate. Consider Facebook, which alone produces 250 TB of data a day. Data production occurs at an exponential rate. Soon, terms such as zettabytes will come upon a common place. Our clause should include the following points in order to construct a sound backup: The rate at which new data is arriving The need for backing up each and every change The time factor involved in backup between two changes Policies to have a reserve backup storage The size of the cluster The size of a cluster is yet another important factor, wherein we will have to select cluster size such that it will allow us to optimize the environment for our purpose with exceptional results. Recalling the Yahoo! example, Yahoo! has 10 clusters all over the world, covering 20,000 nodes. Also, Yahoo! has the maximum number of nodes in its large clusters. Our clause should include the following points in order to construct a sound backup: Selecting the right resource, which will allow us to optimize our environment. The selection of the right resources will vary as per need. Say, for instance, users with I/O-intensive workloads will go for more spindles per core. A Hadoop cluster contains four types of roles, that is, NameNode, JobTracker, TaskTracker, and DataNode. Handling the complexities of optimizing a distributed data center. Priority of the datasets The next thing we are going to look at are the new datasets, which are arriving. With the increase in the rate of new data arrivals, we always face a dilemma of what to backup. Are we tracking all the changes in the backup? Now, if are we backing up all the changes, will our performance be compromised? Our clause should include the following points in order to construct a sound backup: Making the right backup of the dataset Taking backups at a rate that will not compromise performance Selecting the datasets or parts of datasets The next thing we are going to look at is what exactly is backed up. When we deal with large chunks of data, there's always a thought in our mind: Did we miss anything while selecting the datasets or parts of datasets that have not been backed up yet? Our clause should include the following points in order to construct a sound backup: Backup of necessary configuration files Backup of files and application changes The timeliness of data backups With such a huge amount of data collected daily (Facebook), the time interval between backups is yet another important factor. Do we back up our data daily? In two days? In three days? Should we backup small chunks of data daily, or should we back up larger chunks at a later period? Our clause should include the following points in order to construct a sound backup: Dealing with any impacts if the time interval between two backups is large Monitoring a timely backup strategy and going through it The frequency of data backups depends on various aspects. Firstly, it depends on the application and usage. If it is I/O intensive, we may need more backups, as each dataset is not worth losing. If it is not so I/O intensive, we may keep the frequency low. We can determine the timeliness of data backups from the following points: The amount of data that we need to backup The rate at which new updates are coming Determining the window of possible data loss and making it as low as possible Critical datasets that need to be backed up Configuration and permission files that need to be backed up Reducing the window of possible data loss The next thing we are going to look at is how to minimize the window of possible data loss. If our backup frequency is great then what are the chances of data loss? What's our chance of recovering the latest files? Our clause should include the following points in order to construct a sound backup: The potential to recover latest files in the case of a disaster Having a low data-loss probability Backup consistency The next thing we are going to look at is backup consistency. The probability of invalid backups should be less or even better zero. This is because if invalid backups are not tracked, then copies of invalid backups will be made further, which will again disrupt our backup process. Our clause should include the following points in order to construct a sound backup: Avoid copying data when it's being changed Possibly, construct a shell script, which takes timely backups Ensure that the shell script is bug-free Avoiding invalid backups We are going to continue the discussion on invalid backups. As you saw, HDFS makes three copies of our backup for the recovery process. What if the original backup was flawed with errors or bugs? The three copies will be corrupted copies; now, when we recover these flawed copies, the result indeed will be a catastrophe. Our clause should include the following points in order to construct a sound backup: Avoid having a long backup frequency Have the right backup process, and probably having an automated shell script Track unnecessary backups If our backup clause covers all the preceding mentioned points, we surely are on the way to making a good backup strategy. A good backup policy basically covers all these points; so, if a disaster occurs, it always aims to go to the last stable state. That's all about backups. Moving on, let's say a disaster occurs and we need to go to the last stable state. Let's have a look at the recovery philosophy and all the points that make a sound recovery strategy. The recovery philosophy After a deadly storm, we always try to recover from the after-effects of the storm. Similarly, after a disaster, we try to recover from the effects of the disaster. In just one moment, storage capacity which was a boon turns into a curse and just another expensive, useless thing. Starting off with the best question, what will be the best recovery philosophy? Well, it's obvious that the best philosophy will be one wherein we may never have to perform recovery at all. Also, there may be scenarios where we may need to do a manual recovery. Let's look at the possible levels of recovery before moving on to recovery in Hadoop: Recovery to the flawless state Recovery to the last supervised state Recovery to a possible past state Recovery to a sound state Recovery to a stable state So, obviously we want our recovery state to be flawless. But if it's not achieved, we are willing to compromise a little and allow the recovery to go to a possible past state we are aware of. Now, if that's not possible, again we are ready to compromise a little and allow it to go to the last possible sound state. That's how we deal with recovery: first aim for the best, and if not, then compromise a little. Just like the saying goes, "The bigger the storm, more is the work we have to do to recover," here also we can say "The bigger the disaster, more intense is the recovery plan we have to take." So, the recovery philosophy that we construct should cover the following points: An automation system setup that detects a crash and restores the system to the last working state, where the application runs as per expected behavior. The ability to track modified files and copy them. Track the sequences on files, just like an auditor trails his audits. Merge the files that are copied separately. Multiple version copies to maintain a version control. Should be able to treat the updates without impacting the application's security and protection. Delete the original copy only after carefully inspecting the changed copy. Treat new updates but first make sure they are fully functional and will not hinder anything else. If they hinder, then there should be a clause to go to the last safe state. Coming back to recovery in Hadoop, the first question we may think of is what happens when the NameNode goes down? When the NameNode goes down, so does the metadata file (the file that stores data about file owners and file permissions, where the file is stored on data nodes and more), and there will be no one present to route our read/write file request to the data node. Our goal will be to recover the metadata file. HDFS provides an efficient way to handle name node failures. There are basically two places where we can find metadata. First, fsimage and second, the edit logs. Our clause should include the following points: Maintain three copies of the name node. When we try to recover, we get four options, namely, continue, stop, quit, and always. Choose wisely. Give preference to save the safe part of the backups. If there is an ABORT! error, save the safe state. Hadoop provides four recovery modes based on the four options it provides (continue, stop, quit, and always): Continue: This allows you to continue over the bad parts. This option will let you cross over a few stray blocks and continue over to try to produce a full recovery mode. This can be the Prompt when found error mode. Stop: This allows you to stop the recovery process and make an image file of the copy. Now, the part that we stopped won't be recovered, because we are not allowing it to. In this case, we can say that we are having the safe-recovery mode. Quit: This exits the recovery process without making a backup at all. In this, we can say that we are having the no-recovery mode. Always: This is one step further than continue. Always selects continue by default and thus avoids stray blogs found further. This can be the prompt only once mode. We will look at these in further discussions. Now, you may think that the backup and recovery philosophy is cool, but wasn't Hadoop designed to handle these failures? Well, of course, it was invented for this purpose but there's always the possibility of a mashup at some level. Are we overconfident and not ready to take precaution, which can protect us, and are we just entrusting our data blindly with Hadoop? No, certainly we aren't. We are going to take every possible preventive step from our side. In the next topic, we look at the very same topic as to why we need preventive measures to back up Hadoop. Knowing the necessity of backing up Hadoop Change is the fundamental law of nature. There may come a time when Hadoop may be upgraded on the present cluster, as we see many system upgrades everywhere. As no upgrade is bug free, there is a probability that existing applications may not work the way they used to. There may be scenarios where we don't want to lose any data, let alone start HDFS from scratch. This is a scenario where backup is useful, so a user can go back to a point in time. Looking at the HDFS replication process, the NameNode handles the client request to write a file on a DataNode. The DataNode then replicates the block and writes the block to another DataNode. This DataNode repeats the same process. Thus, we have three copies of the same block. Now, how these DataNodes are selected for placing copies of blocks is another issue, which we are going to cover later in Rack awareness. You will see how to place these copies efficiently so as to handle situations such as hardware failure. But the bottom line is when our DataNode is down there's no need to panic; we still have a copy on a different DataNode. Now, this approach gives us various advantages such as: Security: This ensures that blocks are stored on two different DataNodes High write capacity: This writes only on a single DataNode; the replication factor is handled by the DataNode Read options: This denotes better options from where to read; the NameNode maintains records of all the locations of the copies and the distance from the NameNode Block circulation: The client writes only a single block; others are handled through the replication pipeline During the write operation on a DataNode, it receives data from the client as well as passes data to the next DataNode simultaneously; thus, our performance factor is not compromised. Data never passes through the NameNode. The NameNode takes the client's request to write data on a DataNode and processes the request by deciding on the division of files into blocks and the replication factor. The following figure shows the replication pipeline, wherein a block of the file is written and three different copies are made at different DataNode locations: After hearing such a foolproof plan and seeing so many advantages, we again arrive at the same question: is there a need for backup in Hadoop? Of course there is. There often exists a common mistaken belief that Hadoop shelters you against data loss, which gives you the freedom to not take backups in your Hadoop cluster. Hadoop, by convention, has a facility to replicate your data three times by default. Although reassuring, the statement is not safe and does not guarantee foolproof protection against data loss. Hadoop gives you the power to protect your data over hardware failures; the scenario wherein one disk, cluster, node, or region may go down, data will still be preserved for you. However, there are many scenarios where data loss may occur. Consider an example where a classic human-prone error can be the storage locations that the user provides during operations in Hive. If the user provides a location wherein data already exists and they perform a query on the same table, the entire existing data will be deleted, be it of size 1 GB or 1 TB. In the following figure, the client gives a read operation but we have a faulty program. Going through the process, the NameNode is going to see its metadata file for the location of the DataNode containing the block. But when it reads from the DataNode, it's not going to match the requirements, so the NameNode will classify that block as an under replicated block and move on to the next copy of the block. Oops, again we will have the same situation. This way, all the safe copies of the block will be transferred to under replicated blocks, thereby HDFS fails and we need some other backup strategy: When copies do not match the way NameNode explains, it discards the copy and replaces it with a fresh copy that it has. HDFS replicas are not your one-stop solution for protection against data loss. The needs for recovery Now, we need to decide up to what level we want to recover. Like you saw earlier, we have four modes available, which recover either to a safe copy, the last possible state, or no copy at all. Based on your needs decided in the disaster recovery plan we defined earlier, you need to take appropriate steps based on that. We need to look at the following factors: The performance impact (is it compromised?) How large is the data footprint that my recovery method leaves? What is the application downtime? Is there just one backup or are there incremental backups? Is it easy to implement? What is the average recovery time that the method provides? Based on the preceding aspects, we will decide which modes of recovery we need to implement. The following methods are available in Hadoop: Snapshots: Snapshots simply capture a moment in time and allow you to go back to the possible recovery state. Replication: This involves copying data from one cluster and moving it to another cluster, out of the vicinity of the first cluster, so that if one cluster is faulty, it doesn't have an impact on the other. Manual recovery: Probably, the most brutal one is moving data manually from one cluster to another. Clearly, its downsides are large footprints and large application downtime. API: There's always a custom development using the public API available. We will move on to the recovery areas in Hadoop. Understanding recovery areas Recovering data after some sort of disaster needs a well-defined business disaster recovery plan. So, the first step is to decide our business requirements, which will define the need for data availability, precision in data, and requirements for the uptime and downtime of the application. Any disaster recovery policy should basically cover areas as per requirements in the disaster recovery principal. Recovery areas define those portions without which an application won't be able to come back to its normal state. If you are armed and fed with proper information, you will be able to decide the priority of which areas need to be recovered. Recovery areas cover the following core components: Datasets NameNodes Applications Database sets in HBase Let's go back to the Facebook example. Facebook uses a customized version of MySQL for its home page and other interests. But when it comes to Facebook Messenger, Facebook uses the NoSQL database provided by Hadoop. Now, looking from that point of view, Facebook will have both those things in recovery areas and will need different steps to recover each of these areas. Summary In this article, we went through the backup and recovery philosophy and what all points a good backup philosophy should have. We went through what a recovery philosophy constitutes. We saw the modes available for recovery in Hadoop. Then, we looked at why backup is important even though HDFS provides the replication process. Lastly, we looked at the recovery needs and areas. Quite a journey, wasn't it? Well, hold on tight. These are just your first steps into Hadoop User Group (HUG). Resources for Article: Further resources on this subject: Cassandra Architecture [article] Oracle GoldenGate 12c — An Overview [article] Backup and Restore Improvements [article]
Read more
  • 0
  • 0
  • 7059
article-image-sending-and-syncing-data
Packt
10 Aug 2015
4 min read
Save for later

Sending and Syncing Data

Packt
10 Aug 2015
4 min read
This article, by Steven F. Daniel, author of the book, Android Wearable Programming, will provide you with the background and understanding of how you can effectively build applications that communicate between the Android handheld device and the Android wearable. Android Wear comes with a number of APIs that will help to make communicating between the handheld and the wearable a breeze. We will be learning the differences between using MessageAPI, which is sometimes referred to as a "fire and forget" type of message, and DataLayerAPI that supports syncing of data between a handheld and a wearable, and NodeAPI that handles events related to each of the local and connected device nodes. (For more resources related to this topic, see here.) Creating a wearable send and receive application In this section, we will take a look at how to create an Android wearable application that will send an image and a message, and display this on our wearable device. In the next sections, we will take a look at the steps required to send data to the Android wearable using DataAPI, NodeAPI, and MessageAPIs. Firstly, create a new project in Android Studio by following these simple steps: Launch Android Studio, and then click on the File | New Project menu option. Next, enter SendReceiveData for the Application name field. Then, provide the name for the Company Domain field. Now, choose Project location and select where you would like to save your application code: Click on the Next button to proceed to the next step. Next, we will need to specify the form factors for our phone/tablet and Android Wear devices using which our application will run. On this screen, we will need to choose the minimum SDK version for our phone/tablet and Android Wear. Click on the Phone and Tablet option and choose API 19: Android 4.4 (KitKat) for Minimum SDK. Click on the Wear option and choose API 21: Android 5.0 (Lollipop) for Minimum SDK: Click on the Next button to proceed to the next step. In our next step, we will need to add Blank Activity to our application project for the mobile section of our app. From the Add an activity to Mobile screen, choose the Add Blank Activity option from the list of activities shown and click on the Next button to proceed to the next step: Next, we need to customize the properties for Blank Activity so that it can be used by our application. Here we will need to specify the name of our activity, layout information, title, and menu resource file. From the Customize the Activity screen, enter MobileActivity for Activity Name shown and click on the Next button to proceed to the next step in the wizard: In the next step, we will need to add Blank Activity to our application project for the Android wearable section of our app. From the Add an activity to Wear screen, choose the Blank Wear Activity option from the list of activities shown and click on the Next button to proceed to the next step: Next, we need to customize the properties for Blank Wear Activity so that our Android wearable can use it. Here we will need to specify the name of our activity and the layout information. From the Customize the Activity screen, enter WearActivity for Activity Name shown and click on the Next button to proceed to the next step in the wizard:   Finally, click on the Finish button and the wizard will generate your project and after a few moments, the Android Studio window will appear with your project displayed. Summary In this article, we learned about three new APIs, DataAPI, NodeAPI, and MessageAPIs, and how we can use them and their associated methods to transmit information between the handheld mobile and the wearable. If, for whatever reason, the connected wearable node gets disconnected from the paired handheld device, the DataApi class is smart enough to try sending again automatically once the connection is reestablished. Resources for Article: Further resources on this subject: Speeding up Gradle builds for Android [article] Saying Hello to Unity and Android [article] Testing with the Android SDK [article]
Read more
  • 0
  • 0
  • 7490

article-image-hands-prezi-mechanics
Packt
10 Aug 2015
8 min read
Save for later

Hands-on with Prezi Mechanics

Packt
10 Aug 2015
8 min read
In this In this article by J.J. Sylvia IV, author of the book Mastering Prezi for Business Presentations - Second Edition, we will see how to edit a figure and to style symbols. Also we will see the Grouping feature and brief introduction of the Prezi text editor. (For more resources related to this topic, see here.) Editing lines When editing lines or arrows, you can change them from being straight to curved by dragging the center point in any direction: This is extremely useful when creating the line drawings we saw earlier. It's also useful to get arrows pointing at various objects on your canvas: Styled symbols If you're on a tight deadline, or trying to create drawings with shapes simply isn't for you, then the styles available in Prezi may be of more interest to you. These are common symbols that Prezi has created in a few different styles that can be easily inserted into any of your presentations. You can select these from the same Symbols & shapes… option from the Insert menu where we found the symbols. You'll see several different styles to choose from on the right-hand side of your screen. Each of these categories has similar symbols, but styled differently. There is a wide variety of symbols available ranging from people to social media logos. You can pick a style that best matches your theme or the atmosphere you've created for your presentation. Instead of creating your own person from shapes, you can select from a variety of people symbols available: Although these symbols can be very handy, you should be aware that you can't edit them as part of your presentation. If you decide to use one, note that it will work as it is—there are no new hairstyles for these symbols. Highlighter The highlighter tool is extremely useful for pointing out key pieces of information such as an interesting fact. To use it, navigate to the Insert menu and select the Highlighter option. Then, just drag the cursor across the text you'd like to highlight. Once you've done this, the highlighter marks become objects in their own right, so you can click on them to change their size or position just as you would do for a shape. To change the color of your highlighter, you will need to go into the Theme Wizard and edit the RGB values. We'll cover how to do this later when we discuss branding. Grouping Grouping is a great feature that allows you to move or edit several different elements of your presentation at once. This can be especially useful if you're trying to reorganize the layout of your Prezi after it's been created, or to add animations to several elements at once. Let's go back to the drawing we created earlier to see how this might work: The first way to group items is to hold down the Ctrl key (Command on Mac OS) and to left-click on each element you want to group individually. In this case, I need to click on each individual line that makes up the flat top hair in the preceding image. This might be necessary if I only want to group the hair, for example: Another method for grouping is to hold down the Shift key while dragging your mouse to select multiple items at once. In the preceding screenshot, I've selected my entire person at once. Now, I can easily rotate, resize, or move the entire person at once, without having to move each individual line or shape. If you select a group of objects, move them, and then realize that a piece is missing because it didn't get selected, just press the Ctrl+Z (Command+Z on Mac OS) keys on your keyboard to undo the move. Then, broaden your selection and try again. Alternatively, you can hold down the Shift key and simply click on the piece you missed to add it to the group. If we want to keep these elements grouped together instead of having to reselect them each time we decide to make a change, we can click on the Group button that appears with this change. Now these items will stay grouped unless we click on the new Ungroup button, now located in the same place as the Group button previously was: You can also use frames to group material together. If you already created frames as part of your layout, this might make the grouping process even easier. Prezi text editor Over the years, the Prezi text editor has evolved to be quite robust, and it's now possible to easily do all of your text editing directly within Prezi. Spell checker When you spell something incorrectly, Prezi will underline the word it doesn't recognize with a red line. This is just as you would see it in Microsoft Word or any other text editor. To correct the word, simply right-click on it (or Command + Click on Mac OS) and select the word you meant to type from the suggestions, as shown in the following screenshot: The text drag-apart feature So a colleague of yours has just e-mailed you the text that they want to appear in the Prezi you're designing for them? That's great news as it'll help you understand the flow of the presentation. What's frustrating, though, is that you'll have to copy and paste every single line or paragraph across to put it in the right place on your canvas. At least, that used to be the case before Prezi introduced the drag-apart feature in the text editor. This means you can now easily drag a selection of text anywhere on your canvas without having to rely on the copy and paste options. Let's see how we can easily change the text we spellchecked previously, as shown in the following screenshot: In order to drag your text apart, simply highlight the area you require, hold the mouse button down, and then drag the text anywhere on your canvas. Once you have separated your text, you can then edit the separate parts as you would edit any other individual object on your canvas. In this example, we can change the size of the company name and leave the other text as it is, which we couldn't do within a single textbox: Building Prezis for colleagues If you've kindly offered to build a Prezi for one of your colleagues, ask them to supply the text for it in Word format. You'll be able to run a spellcheck on it from there before you copy and paste it into Prezi. Any bad spellings you miss will also get highlighted on your Prezi canvas but it's good to use both options as a safety net. Font colors Other than dragging text apart to make it stand out more on its own, you might want to highlight certain words so that they jump out at your audience even more. The great news is that you can now highlight individual lines of text or single words and change their color. To do so, just highlight a word by clicking and dragging your mouse across it. Then, click on the color picker at the top of the textbox to see the color menu, as shown in the following screenshot: Select any of the colors available in the palette to change the color of that piece of text. Nothing else in the textbox will be affected apart from the text you have selected. This gives you much greater freedom to use colored text in your Prezi design, and doesn't leave you restricted as in older versions of the software. Choose the right color To make good use of this feature, we recommend that you use a color that completely contrasts to the rest of your design. For example, if your design and corporate colors are blue, we suggest you use red or purple to highlight key words. Also, once you pick a color, stick to it throughout the presentation so that your audience knows when they see a key piece of information. Bullet points and indents Bullets and indents make it much easier to put together your business presentations and helps to give the audience some short, simple information as text in the same format they're used to seeing in other presentations. This can be done by simply selecting the main body of text and clicking on the bullet point icon at the top of the textbox. This is a really simple feature, but a useful one nonetheless. We'd obviously like to point out that too much text on any presentation is a bad thing. Keep it short and to the point. Also, remember that too many bullets can kill a presentation. Summary In this article, we discussed the basic mechanics of Prezi. Learning to combine these tools in creative ways will help you move from a Prezi novice to master. Shapes can be used creatively to create content and drawings, and can be grouped together for easy movement and editing. Prezi also features basic text editing which are explained in this article. Resources for Article: Further resources on this subject: Turning your PowerPoint into a Prezi [Article] The Fastest Way to Go from an Idea to a Prezi [Article] Using Prezi - The Online Presentation Software Tool [Article]
Read more
  • 0
  • 0
  • 923
Modal Close icon
Modal Close icon