How-To Tutorials

16 Feb 2015

8 min read

Animations

16 Feb 2015

In this article, by Bruce Graham, author of the book, Power Up Your PowToon Studio Project, we'll look at bringing your PowToon alive with "movement". There are three ways to do this. First, you can animate PowToon Studio objects (making thing move). Secondly, you can use animations (objects that have been built containing moving elements within the PowToon tool or imported objects that do the same). Thirdly, you can animate in a way such that one slide moves to another in what is known as transitions. (For more resources related to this topic, see here.) The definitions I have made between "animating" and "animations" are perhaps a bit arbitrary; however, they are useful in this instance because PowToon has given us a huge advantage by providing: Some interesting/unique ways to bring objects in and out of play (animating objects) A huge number of moving objects (animated), which means we do not need to be animators (although you can import animated .swf images Some unique and fun ways to move between slides in the form of transitions Once again, beware! All of these can be overdone, and you should use them with intent and caution. When you are creating a PowToon Studio project, try and think of it as a coherent whole, not just a collection of linked slides. This will help you use an appropriate amount of animation. This is where storyboarding comes into its own as it reduces the chance of your creating a project than having to go back and rework when you realize the effects have become more important than the message instead of having the effects there to support the message. If I want to be "very enthusiastic", I make the slides shorter, use the Pow animation more (we'll discuss this later) and use transitions that give a fast "feel" to the project. Let's begin by looking at each one of these and what is available, and then we'll look at some of the creative ways in which you can use them. Animating objects Animations can be added to the start and end of an object's timeline (and also during the middle using the Split feature, which we'll look at later). Let's imagine that we introduced a ghost image onto our stage, as shown here: You can set up animating on the timeline, or you can use the Enter and Exit icons at the left-hand side of the window. Both of these areas provide exactly the same functionality. In the preceding screenshot, you will see that the object timeline starts at 0 seconds and ends at 10 seconds. In this case, neither of the animations will actually be displayed as the object is using up all of the available timeline, and the animations take 0.5 seconds to display. If you want the animation to actually be displayed, grab hold of the vertical bars and reduce the length that the object is on the timeline. When you can see the "sloping lines", you will see the animation. You cannot change the length of time that an animation takes to display/execute; it is always 0.5 seconds. Here, you can see that the enter animation ("Pop") starts at 0.5 seconds and the exit animation (again "Pop") starts at 9 seconds. The timing of the animation is represented by the sloping line. You might need to adjust your object timing/animation settings in order to get the precise effect that you want. Ok, so what animations are available for object Enter and Exit? Currently, there are seven animations, namely, Fade, Pop, Up, Down, Right, Left, and No Effect, as shown in the following screenshot. You can use any mix of them on any object for Enter and Exit. There is also the Hand option, which we will cover shortly. All of these are self-explanatory, except Pop. For the Enter variant, the image appears and gets larger from the center, becomes slightly larger than the original image you are using and then bounces back to the original size. For the Exit version, the image momentarily gets larger and then disappears into its center point. Once again, select animation effects to suit your audience and/or your message. Having several props all "popping" onto the stage at the same time or in quick succession might look fun, but it might detract from your message. It is all about the message, so use the animations wisely. Very often, you will see examples of PowToon Studio projects where animated objects "crossover" each other. For example, an object will come on from the left-hand side and stop. Then, another object will come in from the left-hand side, crossing over the first object. Visually, and from a design perspective that can look quite ugly, you should use another animation that prevents this (perhaps have the second image fly in from the bottom of the slide). Any prop or image will have the Pop animation applied as default for both Enter and Exit. Any text that you insert will have the Fade animation applied as the default Enter and Exit animations. You might of course want to change these. The Hand option The animation options: Up, Down, Left, Right, and Pop all have the option of adding Hand. I think it's best used with the first four options; I don't think it makes visual "sense" to use it with the Pop animation. The Hand option can be added in the following two ways: Firstly, click on the animated object and then click on the square animation icons at the start and end of the timeline bar. You can then toggle the Hand option on and off, as shown here: Highlighting the animated object and toggling the Hand option on and off at the bottom-left corner of the stage: By default, the Hand option adds a male hand with a blue shirtsleeve. Currently, there is only one option, so although it adds some interest and variation to the animations, this option should be used only occasionally. I tend to use this effect when I want to place a heading at the top or at the bottom of the page. The shortest and "cleanest" route is to use the Down animation, and sometimes adding the hand can be very effective. You will notice in this example that the distance is short and the blue shirt is not showing; the textbox is just being nudged in using the fingertips. This way, the hand could almost be male or female because it appears and then exits so fast. I prefer to use the Hand effect this way as there is currently only the one option available: Here is the same image with the text in a position where the shirt sleeve shows. You have a lot of area where it is possible to place things where the shirt does not show, and for me, it's worth taking a little time to ensure that only the fingers show. Scrubbing a slide to view the animation When you have inserted an animation, you can play the slide from the start (using the Play button) to view it; however, there is another option to quickly see whether the animation looks good, usually called "scrubbing", as shown here: Grab hold of the red slider within the sloping "animation zone", and as you move it from left to right, you will see the animation move. Here, you can see it is a text item that is animated moving in from the left-hand side. As you move the slider, you will see any animations that you have inserted on this slide. It is an excellent and quick technique to review single or multiple animations just to check that they work in the way that you want, are synchronized correctly, and look correct from the visual and design perspectives. One concept that's fun to play around with is animating multiple objects at once, for example, to clear a screen when you have finished with one concept. Perhaps try making an object enter with the Pop effect in the center of the screen, while making other objects exit to Top, Bottom, Right, and Left while it appears. Summary In this article, you learned to add movement into your project using objects that move by design, by adding movement to objects when they move on and off the stage, and by adding animations between slides (known as transitions). Resources for Article: Further resources on this subject: Turning your PowerPoint presentation into a Prezi [article] Getting Started with Impressive Presentations [article] Using Prezi - The Online Presentation Software Tool [article]

0
0
1778

How-To Tutorials

article-image-exploring-nmap-scripting-engine-api-and-libraries

Packt

13 Feb 2015

24 min read

Exploring the Nmap Scripting Engine API and Libraries

Packt

13 Feb 2015

24 min read

0
0
21833

article-image-programming-littlebits-circuits-javascript-part-1

Anna Gerber

12 Feb 2015

6 min read

Programming littleBits circuits with JavaScript Part 1

Anna Gerber

12 Feb 2015

6 min read

littleBits are electronic building blocks that snap together with magnetic connectors. They are great for getting started with electronics and robotics and for prototyping circuits. The littleBits Arduino Coding Kit includes an Arduino-compatible microcontroller, which means that you can use the Johnny-Five JavaScript Robotics programming framework to program your littleBits creations using JavaScript, the programming language of the web. Setup Plug the Arduino bit into your computer from the port at the top of the Arduino module. You'll need to supply power to the Arduino by connecting a blue power module to any of the input connectors. The Arduino will appear as a device with a name like /dev/cu.usbmodemfa131 on Mac, or COM3 on Windows. Johnny-Five uses a communication protocol called Firmata to communicate with the Arduino microcontroller. We'll load the Standard Firmata sketch onto the Arduino the first time we go to use it, to make this communication possible. Installing Firmata via the Chrome App One of the easiest ways to get started programming with Johnny-Five is by using this app for Google Chrome. After you have installed it, open the 'Johnny-Five Chrome' app from the Chrome apps page. To send the Firmata sketch to your board using the extension, select the port corresponding to your Arduino bit from the drop-down menu and then hit the Install Firmata button. If the device does not appear in the list at first, try the app's refresh button. Installing Firmata via the command line If you would prefer not to use the Chrome app, you can skip straight to using Node.js via the command line. You'll need a recent version of Node.js installed. Create a folder for your project's code. On a Mac run the Terminal app, and on Windows run Command Prompt. From the command line change directory so you are inside your project folder, and then use npm to install the Johnny-Five library and nodebots-interchange: npm install johnny-five npm install -g nodebots-interchange Use the interchange program from nodebots-interchange to send the StandardFirmata sketch to your Arduino: interchange install StandardFirmata -a leonardo -p /dev/cu.usbmodemfa131 Note: If you are familiar with Arduino IDE, you could alternatively use it to write Firmata to your Arduino. Open File > Examples > Firmata > StandardFirmata and select your port and Arduino Leonardo from Tools > Board then hit Upload. Inputs and Outputs Programming with hardware is all about I/O: inputs and outputs. These can be either analog (continuous values) or digital (discrete 0 or 1 values). littleBits input modules are color coded pink, while outputs are green. The Arduino Coding Kit includes analog inputs (dimmers) as well as a digital input module (button). The output modules included in the kit are a servo motor and an LED bargraph, which can be used as a digital output (i.e. on or off) or as an analog output to control the number of LEDs displayed, or with Pulse-Width-Modulation (PWM) - using a pattern of pulses on a digital output - to control LED brightness. Building a circuit Let's start with our output modules: the LED bargraph and servo. Connect a blue power module to any connector on the left-hand side of the Arduino. Connect the LED bargraph to the connector labelled d5 and the servo module to the connector labelled d9. Flick the switch next to both outputs to PWM. The mounting boards that come with the Arduino Coding Kit come in handy for holding your circuit together. Blinking an LED bargraph You can write the JavaScript program using the editor inside the Chrome app, or any text editor. We require the johnny-five library tocreate a board object with a "ready" handler. Our code for working with inputs and outputs will go inside the ready handler so that it will run after the Arduino has started up and communication has been established: var five = require("johnny-five"); var board = new five.Board(); board.on("ready", function() { // code for button, dimmers, servo etc goes here }); We'll treat the bargraph like a single output. It's connected to digital "pin" 5 (d5), so we'll need to provide this with a parameter when we create the Led object. The strobe function causes the LED to blink on and off The parameter to the function indicates the number of milliseconds between toggling the LED on or off (one second in this case): var led = new five.Led(5); led.strobe( 1000 ); Running the code Note: Make sure the power switch on your power module is switched on. If you are using the Chrome app, hit the Run button to start the program. You should see the LED bargraph start blinking. Any errors will be printed to the console below the code. If you have unplugged your Arduino since the last time you ran code via the app, you'll probably need to hit refresh and select the port for your device again from the drop-down above the code editor. The Chrome app is great for getting started, but eventually you'll want to switch to running programs using Node.js, because the Chrome app only supports a limited number of built-in libraries. Use a text editor to save your code to a file (e.g. blink.js) within your project directory, and run it from the command line using Node.js: node blink.js You can hit control-D on Windows or command-D on Mac to end the program. Controlling a Servo Johnny-Five includes a Servo class, but this is for controlling servo motors directly using PWM. The littleBits servo module already takes care of that for us, so we can treat it like a simple motor. Create a Motor object on pin 9 to correspond to the servo. We can start moving it using the start function. The parameter is a number between 0 and 255, which controls the speed. The stop function stops the servo. We'll use the board's wait function to stop the servo after 5 seconds (i.e. 5000 milliseconds). var servo = new five.Motor(9); servo.start(255); this.wait(5000, function(){ servo.stop(); }); In Part 2, we'll read data from our littleBits input modules and use these values to trigger changes to the servo and bargraph. About the author Anna Gerber is a full-stack developer with 15 years of experience in the university sector. Specializing in Digital Humanities, she was a Technical Project Manager at the University of Queensland’s eResearch centre, and she has worked at Brisbane’s Distributed System Technology Centre as a Research Scientist. Anna is a JavaScript robotics enthusiast who enjoys tinkering with soft circuits and 3D printers.

0
0
4184

Packt

12 Feb 2015

22 min read

Using NLP APIs

Packt

12 Feb 2015

22 min read

0
0
2795

article-image-fronting-external-api-ruby-rails-part-2

Mike Ball

12 Feb 2015

10 min read

Fronting an external API with Ruby on Rails: Part 2

Mike Ball

12 Feb 2015

10 min read

Historically, a conventional Ruby on Rails application leverages server-side business logic, a relational database, and a RESTful architecture to serve dynamically-generated HTML. However, JavaScript-intensive applications and widespread use of external web APIs somewhat challenge this architecture. In many cases, Rails is tasked with performing as an orchestration layer, collecting data from various backend services and serving re-formatted JSON or XML to clients. In such instances, how is Rails' model-view-controller architecture still relevant? In the second part of this series, we'll continue with the creation of Noterizer that we started on in Part 1, creating a simple Rails backend that makes requests to an external XML-based web service and serves JSON. We'll use RSpec for tests and Jbuilder for view rendering. Building out the Noterizer backend Currently, NotesController serves an empty JSON document, yet the goal is to serve a JSON array representing the XML data served by the aforementioned NotesXmlService endpoints: http://NotesXmlService.herokuapp.com/note-onehttp://NotesXmlService.herokuapp.com/note-two Creating a Note model First, create a model to represent the note data returned by each of the NotesXmlService endpoints. In a more traditional Rails application, such a model represents data from a database. In Noterizer, the Note model provides a Ruby interface to each NotesXmlService XML document. Create a Note model: $ touch app/models/note.rb On initialization, Note should perform an HTTP request to the URL it's passed on instantiation and expose relevant XML values via its public methods. The Note class will use Nokogiri to parse the XML. Require the nokogiri gem by adding the following to your Gemfile: gem 'nokogiri' Install nokogiri: $ bundle install Let’s also store the NotesXmlService base URL in a config property. This allows us to easily change its value throughout the application should it ever change. Add the following to config/application.rb: config.notes_xml_service_root = ‘http://notesxmlservice.herokuapp.com’ Add the following to app/models/note.rb: require 'nokogiri'class Notedef initialize(path = nil) @uri = URI.parse(“#{Rails.config.notes_xml_service_root}/#{path}”) @xml = get_and_parse_response create_methodsend private def get_and_parse_response Nokogiri::XML(get_response)end def get_response http = Net::HTTP.new(@uri.host, @uri.port) http.request(request).bodyend def request Net::HTTP::Get.new(@uri.request_uri)end def create_methods available_methods.each do |method| self.class.send(:define_method, method) { fetch_value method.to_s } endend def available_methods [ :to, :from, :heading, :body ]end def fetch_value(value) @xml.xpath("//add[@key='#{value}']/@value").textendend ``` The Note class now works like this: It performs an HTTP request to the URL it's passed on initialization. It leverages Nokogiri to parse the resulting XML. It uses Nokogiri's support of XPATH expressions to dynamically create to, from, heading, and body methods based on the corresponding values in the XML. Testing the Note model Let's test Note by creating a corresponding model spec file: $ rails g rspec:model note First, test the Note#to method by adding the following to the newly created spec/models/note_spec.rb: require 'rails_helper'RSpec.describe Note, :type => :model dobefore do @note = Note.new(‘note-one')end describe '#to' do it 'returns the correct "to" value from the XML' do expect(@note.to).to eq 'Samantha' endendend Running rake spec reveals that the test passes, though it performs a real HTTP request. This is not ideal: such requests generate unwelcome traffic on NotesServiceDmo, makes hard-coded assumptions about the XML returned by the /note-one endpoint, and relies upon an Internet connection to pass. Let's configure RSpec to use WebMock to fake HTTP requests. Add webmock to the Gemfile; specify that it’s part of the :test group: gem 'webmock', group: :test Install webmock: $ bundle install Add the following to spec/spec_helper.rb’s RSpec.configure block to disable network requests during RSpec runs: require 'webmock/rspec'RSpec.configure do |config|WebMock.disable_net_connect!end Use WebMock to stub the NotesXmlService request/response in spec/models/note_spec.rb by editing its before block to the following: before :each dopath = 'note-one'stub_request(:get, ”#{Rails.application.config.notes_xml_service_root}/#{path}”).to_return( body: [ '<?xml version="1.0" encoding="UTF-8"?>', '<note type="work">', '<add key="to" value="Samantha"/>', '<add key="from" value="David"/>', '<add key="heading" value="Our Meeting"/>', '<add key="body" value="Are you available to get started at 1pm?"/>', '</note>' ].join(''))@note = Note.new(path)end Running rake spec should now run the full test suite, including the Note model spec, without performing real HTTP requests. Similar tests can be authored for Note's from, heading, and body methods. The complete examples are viewable in Noterizer's master branch. Building up the controller Now that we have a Note model, our Notes#index controller should create an instance variable, inside of which lives an array of Note models representing the data served by each NotesXmlService endpoint. Add the following to app/controllers/notes_controller.rb's index method: def index@notes = [ Note.new(‘note-one’), Note.new(‘note-two’)]end Testing the NotesController Let's test the modifications to NotesController. Creating a spec helper To test the controller, we'll need to stub the NotesXmlService requests, just as was done in spec/models/note_spec.rb. Rather than repeat the stub_request code, let's abstract it into a helper that can be used throughout the specs. Create a spec/support/helpers.rb file: $ mkdir spec/support && touch spec/support/helpers.rb Define the helper method by adding the following to the newly created spec/support/helpers.rb file: module Helpersdef stub_note_request(path) base_url = Rails.application.config.notes_xml_service_root stub_request(:get, "#{base_url}/#{path}").to_return( body: [ '<?xml version="1.0" encoding="UTF-8"?>', '<note type="work">', '<add key="to" value="Samantha"/>', '<add key="from" value="David"/>', '<add key="heading" value="Our Meeting"/>', '<add key="body" value="Are you available to get started at 1pm?"/>', '</note>' ].join('') )endend Tweak the RSpec configuration such that it can be used by adding the following to the configure block in spec/rails_helper.rb: config.include Helpers Edit the spec/models/note_spec.rb's before block to use the #stub_note_request helper: before :each dopath = ‘note-one’stub_note_request(path)@note = Note.new(path)end Confirm that all tests continue passing by running rake spec. Writing the new NotesController tests Let's make use of the stub_note_request helper by changing spec/controllers/notes_controller_spec.rb's before block to the following: before :each dostub_note_request('note-one')stub_note_request('note-two')get :indexend Add the following to its #index tests to test the new functionality: context 'the @notes it assigns' doit 'is an array containing 2 items' do expect(assigns(:notes).length).to eq 2endit 'is an array of Note models' do assigns(:notes).each do |note| expect(note).to be_a Note endendend rake spec should now confirm that all tests pass. See Noterizer’s master branch for the complete example code. The Jbuilder view With a fully functional Note model and NotesController, Noterizer now needs a view to render the proper JSON. Much like ERB offers a Ruby HTML templating solution, Jbuilder offers a JSON templating solution. Writing the Jbuilder view templates Add the following to app/views/notes/index.json.jbuilder: json.array! @notes, partial: 'note' as: :note What does this do? This instructs Jbuilder to build an array of objects with the @notes array and render each one via a partial named _note. Create the app/views/notes/_note.json.jbuilder partial file: $ touch app/views/notes/_note.json.jbuilder Add the following: json.toField note.tojson.fromField note.fromjson.heading note.headingjson.body note.body Now, http://localhost:3000/notes renders the following JSON: [{"toField": "Samantha","fromField": "David","heading": "Our Meeting","body": "Are you available to get started at 1pm?"},{"toField": "Melissa","fromField": "Chris","heading": "Saturday","body": "Are you still interested in going to the beach?"}] Testing the Jbuilder view templates First, let's test the app/views/notes/_note.json.jbuilder template. Create a spec file: $ mkdir spec/views/notes && touch spec/views/notes/_note.json.jbuilder_spec.rb Add the following to the newly created _note spec: require 'spec_helper'describe 'notes/_note' dolet(:note) do double('Note', to: 'Mike', from: 'Sam', heading: 'Tomorrow', body: 'Call me after 3pm.', )endbefore :each do assign(:note, note) render '/notes/note', note: noteendcontext 'verifying the JSON values it renders' do subject { JSON.parse(rendered) } describe "['toField']" do subject { super()['toField'] } it { is_expected.to eq 'Mike' } endendend rake spec should confirm that all tests pass. You can write additional _note tests for fromField, heading, and body. Conclusion You have now completed this two part blog series, and you’ve built Noterizer, a basic example of a Rails application that fronts an external API. The application preserves an MVC architecture that separates concerns across the Note model, the NotesController, and JSON view templates. Noterizer also offers a simple example of using RSpec to test such an application and Jbuilder as a JSON templating language. Mike Ball is a Philadelphia-based software developer specializing in Ruby on Rails and JavaScript. He works for Comcast Interactive Media where he helps build web-based TV and video consumption applications.

0
0
3137

How-To Tutorials

article-image-financial-management-microsoft-dynamics-ax-2012-r3

Packt

11 Feb 2015

4 min read

Financial Management with Microsoft Dynamics AX 2012 R3

Packt

11 Feb 2015

4 min read

0
0
4187

Packt

10 Feb 2015

36 min read

Hive in Hadoop

Packt

10 Feb 2015

36 min read

In this article by Garry Turkington and Gabriele Modena, the author of the book Learning Hadoop 2. explain how MapReduce is a powerful paradigm that enables complex data processing that can reveal valuable insights. It does require a different mindset and some training and experience on the model of breaking processing analytics into a series of map and reduce steps. There are several products that are built atop Hadoop to provide higher-level or more familiar views of the data held within HDFS, and Pig is a very popular one. This article will explore the other most common abstraction implemented atop Hadoop: SQL. In this article, we will cover the following topics: What the use cases for SQL on Hadoop are and why it is so popular HiveQL, the SQL dialect introduced by Apache Hive Using HiveQL to perform SQL-like analysis of the Twitter dataset How HiveQL can approximate common features of relational databases such as joins and views (For more resources related to this topic, see here.) Why SQL on Hadoop So far we have seen how to write Hadoop programs using the MapReduce APIs and how Pig Latin provides a scripting abstraction and a wrapper for custom business logic by means of UDFs. Pig is a very powerful tool, but its dataflow-based programming model is not familiar to most developers or business analysts. The traditional tool of choice for such people to explore data is SQL. Back in 2008 Facebook released Hive, the first widely used implementation of SQL on Hadoop. Instead of providing a way of more quickly developing map and reduce tasks, Hive offers an implementation of HiveQL, a query language based on SQL. Hive takes HiveQL statements and immediately and automatically translates the queries into one or more MapReduce jobs. It then executes the overall MapReduce program and returns the results to the user. This interface to Hadoop not only reduces the time required to produce results from data analysis, it also significantly widens the net as to who can use Hadoop. Instead of requiring software development skills, anyone who's familiar with SQL can use Hive. The combination of these attributes is that HiveQL is often used as a tool for business and data analysts to perform ad hoc queries on the data stored on HDFS. With Hive, the data analyst can work on refining queries without the involvement of a software developer. Just as with Pig, Hive also allows HiveQL to be extended by means of User Defined Functions, enabling the base SQL dialect to be customized with business-specific functionality. Other SQL-on-Hadoop solutions Though Hive was the first product to introduce and support HiveQL, it is no longer the only one. There are others, but we will mostly discuss Hive and Impala as they have been the most successful. While introducing the core features and capabilities of SQL on Hadoop however, we will give examples using Hive; even though Hive and Impala share many SQL features, they also have numerous differences. We don't want to constantly have to caveat each new feature with exactly how it is supported in Hive compared to Impala. We'll generally be looking at aspects of the feature set that are common to both, but if you use both products, it's important to read the latest release notes to understand the differences. Prerequisites Before diving into specific technologies, let's generate some data that we'll use in the examples throughout this article. We'll create a modified version of a former Pig script as the main functionality for this. The script in this article assumes that the Elephant Bird JARs used previously are available in the /jar directory on HDFS. The full source code is at https://github.com/learninghadoop2/book-examples/ch7/extract_for_hive.pig, but the core of extract_for_hive.pig is as follows: -- load JSON data tweets = load '$inputDir' using com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad'); -- Tweets tweets_tsv = foreach tweets { generate (chararray)CustomFormatToISO($0#'created_at', 'EEE MMMM d HH:mm:ss Z y') as dt, (chararray)$0#'id_str', (chararray)$0#'text' as text, (chararray)$0#'in_reply_to', (boolean)$0#'retweeted' as is_retweeted, (chararray)$0#'user'#'id_str' as user_id, (chararray)$0#'place'#'id' as place_id; } store tweets_tsv into '$outputDir/tweets' using PigStorage('u0001'); -- Places needed_fields = foreach tweets { generate (chararray)CustomFormatToISO($0#'created_at', 'EEE MMMM d HH:mm:ss Z y') as dt, (chararray)$0#'id_str' as id_str, $0#'place' as place; } place_fields = foreach needed_fields { generate (chararray)place#'id' as place_id, (chararray)place#'country_code' as co, (chararray)place#'country' as country, (chararray)place#'name' as place_name, (chararray)place#'full_name' as place_full_name, (chararray)place#'place_type' as place_type; } filtered_places = filter place_fields by co != ''; unique_places = distinct filtered_places; store unique_places into '$outputDir/places' using PigStorage('u0001'); -- Users users = foreach tweets { generate (chararray)CustomFormatToISO($0#'created_at', 'EEE MMMM d HH:mm:ss Z y') as dt, (chararray)$0#'id_str' as id_str, $0#'user' as user; } user_fields = foreach users { generate (chararray)CustomFormatToISO(user#'created_at', 'EEE MMMM d HH:mm:ss Z y') as dt, (chararray)user#'id_str' as user_id, (chararray)user#'location' as user_location, (chararray)user#'name' as user_name, (chararray)user#'description' as user_description, (int)user#'followers_count' as followers_count, (int)user#'friends_count' as friends_count, (int)user#'favourites_count' as favourites_count, (chararray)user#'screen_name' as screen_name, (int)user#'listed_count' as listed_count; } unique_users = distinct user_fields; store unique_users into '$outputDir/users' using PigStorage('u0001'); Run this script as follows: $ pig –f extract_for_hive.pig –param inputDir=<json input> -param outputDir=<output path> The preceding code writes data into three separate TSV files for the tweet, user, and place information. Notice that in the store command, we pass an argument when calling PigStorage. This single argument changes the default field separator from a tab character to unicode value U0001, or you can also use Ctrl +C + A. This is often used as a separator in Hive tables and will be particularly useful to us as our tweet data could contain tabs in other fields. Overview of Hive We will now show how you can import data into Hive and run a query against the table abstraction Hive provides over the data. In this example, and in the remainder of the article, we will assume that queries are typed into the shell that can be invoked by executing the hive command. Recently a client called Beeline also became available and will likely be the preferred CLI client in the near future. When importing any new data into Hive, there is generally a three-stage process: Create the specification of the table into which the data is to be imported Import the data into the created table Execute HiveQL queries against the table Most of the HiveQL statements are direct analogues to similarly named statements in standard SQL. We assume only a passing knowledge of SQL throughout this article, but if you need a refresher, there are numerous good online learning resources. Hive gives a structured query view of our data, and to enable that, we must first define the specification of the table's columns and import the data into the table before we can execute any queries. A table specification is generated using a CREATE statement that specifies the table name, the name and types of its columns, and some metadata about how the table is stored: CREATE table tweets ( created_at string, tweet_id string, text string, in_reply_to string, retweeted boolean, user_id string, place_id string ) ROW FORMAT DELIMITED FIELDS TERMINATED BY 'u0001' STORED AS TEXTFILE; The statement creates a new table tweets defined by a list of names for columns in the dataset and their data type. We specify that fields are delimited by the Unicode U0001 character and that the format used to store data is TEXTFILE. Data can be imported from a location in HDFS tweets/ into hive using the LOAD DATA statement: LOAD DATA INPATH 'tweets' OVERWRITE INTO TABLE tweets; By default, data for Hive tables is stored on HDFS under /user/hive/warehouse. If a LOAD statement is given a path to data on HDFS, it will not simply copy the data into /user/hive/warehouse, but will move it there instead. If you want to analyze data on HDFS that is used by other applications, then either create a copy or use the EXTERNAL mechanism that will be described later. Once data has been imported into Hive, we can run queries against it. For instance: SELECT COUNT(*) FROM tweets; The preceding code will return the total number of tweets present in the dataset. HiveQL, like SQL, is not case sensitive in terms of keywords, columns, or table names. By convention, SQL statements use uppercase for SQL language keywords, and we will generally follow this when using HiveQL within files, as will be shown later. However, when typing interactive commands, we will frequently take the line of least resistance and use lowercase. If you look closely at the time taken by the various commands in the preceding example, you'll notice that loading data into a table takes about as long as creating the table specification, but even the simple count of all rows takes significantly longer. The output also shows that table creation and the loading of data do not actually cause MapReduce jobs to be executed, which explains the very short execution times. The nature of Hive tables Although Hive copies the data file into its working directory, it does not actually process the input data into rows at that point. Both the CREATE TABLE and LOAD DATA statements do not truly create concrete table data as such; instead, they produce the metadata that will be used when Hive generates MapReduce jobs to access the data conceptually stored in the table but actually residing on HDFS. Even though the HiveQL statements refer to a specific table structure, it is Hive's responsibility to generate code that correctly maps this to the actual on-disk format in which the data files are stored. This might seem to suggest that Hive isn't a real database; this is true, it isn't. Whereas a relational database will require a table schema to be defined before data is ingested and then ingest only data that conforms to that specification, Hive is much more flexible. The less concrete nature of Hive tables means that schemas can be defined based on the data as it has already arrived and not on some assumption of how the data should be, which might prove to be wrong. Though changeable data formats are troublesome regardless of technology, the Hive model provides an additional degree of freedom in handling the problem when, not if, it arises. Hive architecture Until version 2, Hadoop was primarily a batch system. Internally, Hive compiles HiveQL statements into MapReduce jobs. Hive queries have traditionally been characterized by high latency. This has changed with the Stinger initiative and the improvements introduced in Hive 0.13 that we will discuss later. Hive runs as a client application that processes HiveQL queries, converts them into MapReduce jobs, and submits these to a Hadoop cluster either to native MapReduce in Hadoop 1 or to the MapReduce Application Master running on YARN in Hadoop 2. Regardless of the model, Hive uses a component called the metastore, in which it holds all its metadata about the tables defined in the system. Ironically, this is stored in a relational database dedicated to Hive's usage. In the earliest versions of Hive, all clients communicated directly with the metastore, but this meant that every user of the Hive CLI tool needed to know the metastore username and password. HiveServer was created to act as a point of entry for remote clients, which could also act as a single access-control point and which controlled all access to the underlying metastore. Because of limitations in HiveServer, the newest way to access Hive is through the multi-client HiveServer2. HiveServer2 introduces a number of improvements over its predecessor, including user authentication and support for multiple connections from the same client. More information can be found at https://cwiki.apache.org/confluence/display/Hive/Setting+Up+HiveServer2. Instances of HiveServer and HiveServer2 can be manually executed with the hive --service hiveserver and hive --service hiveserver2 commands, respectively. In the examples we saw before and in the remainder of this article, we implicitly use HiveServer to submit queries via the Hive command-line tool. HiveServer2 comes with Beeline. For compatibility and maturity reasons, Beeline being relatively new, both tools are available on Cloudera and most other major distributions. The Beeline client is part of the core Apache Hive distribution and so is also fully open source. Beeline can be executed in embedded version with the following command: $ beeline -u jdbc:hive2:// Data types HiveQL supports many of the common data types provided by standard database systems. These include primitive types, such as float, double, int, and string, through to structured collection types that provide the SQL analogues to types such as arrays, structs, and unions (structs with options for some fields). Since Hive is implemented in Java, primitive types will behave like their Java counterparts. We can distinguish Hive data types into the following five broad categories: Numeric: tinyint, smallint, int, bigint, float, double, and decimal Date and time: timestamp and date String: string, varchar, and char Collections: array, map, struct, and uniontype Misc: boolean, binary, and NULL DDL statements HiveQL provides a number of statements to create, delete, and alter databases, tables, and views. The CREATE DATABASE <name> statement creates a new database with the given name. A database represents a namespace where table and view metadata is contained. If multiple databases are present, the USE <database name> statement specifies which one to use to query tables or create new metadata. If no database is explicitly specified, Hive will run all statements against the default database. SHOW [DATABASES, TABLES, VIEWS] displays the databases currently available within a data warehouse and which table and view metadata is present within the database currently in use: CREATE DATABASE twitter; SHOW databases; USE twitter; SHOW TABLES; The CREATE TABLE [IF NOT EXISTS] <name> statement creates a table with the given name. As alluded to earlier, what is really created is the metadata representing the table and its mapping to files on HDFS as well as a directory in which to store the data files. If a table or view with the same name already exists, Hive will raise an exception. Both table and column names are case insensitive. In older versions of Hive (0.12 and earlier), only alphanumeric and underscore characters were allowed in table and column names. As of Hive 0.13, the system supports unicode characters in column names. Reserved words, such as load and create, need to be escaped by backticks (the ` character) to be treated literally. The EXTERNAL keyword specifies that the table exists in resources out of Hive's control, which can be a useful mechanism to extract data from another source at the beginning of a Hadoop-based Extract-Transform-Load (ETL) pipeline. The LOCATION clause specifies where the source file (or directory) is to be found. The EXTERNAL keyword and LOCATION clause have been used in the following code: CREATE EXTERNAL TABLE tweets ( created_at string, tweet_id string, text string, in_reply_to string, retweeted boolean, user_id string, place_id string ) ROW FORMAT DELIMITED FIELDS TERMINATED BY 'u0001' STORED AS TEXTFILE LOCATION '${input}/tweets'; This table will be created in metastore, but the data will not be copied into the /user/hive/warehouse directory. Note that Hive has no concept of primary key or unique identifier. Uniqueness and data normalization are aspects to be addressed before loading data into the data warehouse. The CREATE VIEW <view name> … AS SELECT statement creates a view with the given name. For example, we can create a view to isolate retweets from other messages, as follows: CREATE VIEW retweets COMMENT 'Tweets that have been retweeted' AS SELECT * FROM tweets WHERE retweeted = true; Unless otherwise specified, column names are derived from the defining SELECT statement. Hive does not currently support materialized views. The DROP TABLE and DROP VIEW statements remove both metadata and data for a given table or view. When dropping an EXTERNAL table or a view, only metadata will be removed and the actual data files will not be affected. Hive allows table metadata to be altered via the ALTER TABLE statement, which can be used to change a column type, name, position, and comment or to add and replace columns. When adding columns, it is important to remember that only metadata will be changed and not the dataset itself. This means that if we were to add a column in the middle of the table which didn't exist in older files, then while selecting from older data, we might get wrong values in the wrong columns. This is because we would be looking at old files with a new format Similarly, ALTER VIEW <view name> AS <select statement> changes the definition of an existing view. File formats and storage The data files underlying a Hive table are no different from any other file on HDFS. Users can directly read the HDFS files in the Hive tables using other tools. They can also use other tools to write to HDFS files that can be loaded into Hive through CREATE EXTERNAL TABLE or through LOAD DATA INPATH. Hive uses the Serializer and Deserializer classes, SerDe, as well as FileFormat to read and write table rows. A native SerDe is used if ROW FORMAT is not specified or ROW FORMAT DELIMITED is specified in a CREATE TABLE statement. The DELIMITED clause instructs the system to read delimited files. Delimiter characters can be escaped using the ESCAPED BY clause. Hive currently uses the following FileFormat classes to read and write HDFS files: TextInputFormat and HiveIgnoreKeyTextOutputFormat: will read/write data in plain text file format SequenceFileInputFormat and SequenceFileOutputFormat: classes read/write data in the Hadoop SequenceFile format Additionally, the following SerDe classes can be used to serialize and deserialize data: MetadataTypedColumnsetSerDe: This will read/write delimited records such as CSV or tab-separated records ThriftSerDe, and DynamicSerDe: These will read/write Thrift objects JSON As of version 0.13, Hive ships with the native org.apache.hive.hcatalog.data.JsonSerDe JSON SerDe. For older versions of Hive, Hive-JSON-Serde (found at https://github.com/rcongiu/Hive-JSON-Serde) is arguably one of the most feature-rich JSON serialization/deserialization modules. We can use either module to load JSON tweets without any need for preprocessing and just define a Hive schema that matches the content of a JSON document. In the following example, we use Hive-JSON-Serde. As with any third-party module, we load the SerDe JARS into Hive with the following code: ADD JAR JAR json-serde-1.3-jar-with-dependencies.jar; Then, we issue the usual create statement, as follows: CREATE EXTERNAL TABLE tweets ( contributors string, coordinates struct < coordinates: array <float>, type: string>, created_at string, entities struct < hashtags: array <struct < indices: array <tinyint>, text: string>>, … ) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' STORED AS TEXTFILE LOCATION 'tweets'; With this SerDe, we can map nested documents (such as entities or users) to the struct or map types. We tell Hive that the data stored at LOCATION 'tweets' is text (STORED AS TEXTFILE) and that each row is a JSON object (ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'). In Hive 0.13 and later, we can express this property as ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'. Manually specifying the schema for complex documents can be a tedious and error-prone process. The hive-json module (found at https://github.com/hortonworks/hive-json) is a handy utility to analyze large documents and generate an appropriate Hive schema. Depending on the document collection, further refinement might be necessary. In our example, we used a schema generated with hive-json that maps the tweets JSON to a number of struct data types. This allows us to query the data using a handy dot notation. For instance, we can extract the screen name and description fields of a user object with the following code: SELECT user.screen_name, user.description FROM tweets_json LIMIT 10; Avro AvroSerde (https://cwiki.apache.org/confluence/display/Hive/AvroSerDe) allows us to read and write data in Avro format. Starting from 0.14, Avro-backed tables can be created using the STORED AS AVRO statement, and Hive will take care of creating an appropriate Avro schema for the table. Prior versions of Hive are a bit more verbose. This dataset was created using Pig's AvroStorage class, which generated the following schema: { "type":"record", "name":"record", "fields": [ {"name":"topic","type":["null","int"]}, {"name":"source","type":["null","int"]}, {"name":"rank","type":["null","float"]} ] } The table structure is captured in an Avro record, which contains header information (a name and optional namespace to qualify the name) and an array of the fields. Each field is specified with its name and type as well as an optional documentation string. For a few of the fields, the type is not a single value, but instead a pair of values, one of which is null. This is an Avro union, and this is the idiomatic way of handling columns that might have a null value. Avro specifies null as a concrete type, and any location where another type might have a null value needs to be specified in this way. This will be handled transparently for us when we use the following schema. With this definition, we can now create a Hive table that uses this schema for its table specification, as follows: CREATE EXTERNAL TABLE tweets_pagerank ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' WITH SERDEPROPERTIES ('avro.schema.literal'='{ "type":"record", "name":"record", "fields": [ {"name":"topic","type":["null","int"]}, {"name":"source","type":["null","int"]}, {"name":"rank","type":["null","float"]} ] }') STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION '${data}/ch5-pagerank'; Then, look at the following table definition from within Hive (note also that HCatalog): DESCRIBE tweets_pagerank; OK topic int from deserializer source int from deserializer rank float from deserializer In the DDL, we told Hive that data is stored in Avro format using AvroContainerInputFormat and AvroContainerOutputFormat. Each row needs to be serialized and deserialized using org.apache.hadoop.hive.serde2.avro.AvroSerDe. The table schema is inferred by Hive from the Avro schema embedded in avro.schema.literal. Alternatively, we can store a schema on HDFS and have Hive read it to determine the table structure. Create the preceding schema in a file called pagerank.avsc—this is the standard file extension for Avro schemas. Then place it on HDFS; we prefer to have a common location for schema files such as /schema/avro. Finally, define the table using the avro.schema.url SerDe property WITH SERDEPROPERTIES ('avro.schema.url'='hdfs://<namenode>/schema/avro/pagerank.avsc'). If Avro dependencies are not present in the classpath, we need to add the Avro MapReduce JAR to our environment before accessing individual fields. Within Hive, on the Cloudera CDH5 VM: ADD JAR /opt/cloudera/parcels/CDH/lib/avro/avro-mapred-hadoop2.jar; We can also use this table like any other. For instance, we can query the data to select the user and topic pairs with a high PageRank: SELECT source, topic from tweets_pagerank WHERE rank >= 0.9; Columnar stores Hive can also take advantage of columnar storage via the ORC (https://cwiki.apache.org/confluence/display/Hive/LanguageManual+ORC) and Parquet (https://cwiki.apache.org/confluence/display/Hive/Parquet) formats. If a table is defined with very many columns, it is not unusual for any given query to only process a small subset of these columns. But even in a SequenceFile each full row and all its columns will be read from disk, decompressed, and processed. This consumes a lot of system resources for data that we know in advance is not of interest. Traditional relational databases also store data on a row basis, and a type of database called columnar changed this to be column-focused. In the simplest model, instead of one file for each table, there would be one file for each column in the table. If a query only needed to access five columns in a table with 100 columns in total, then only the files for those five columns will be read. Both ORC and Parquet use this principle as well as other optimizations to enable much faster queries. Queries Tables can be queried using the familiar SELECT … FROM statement. The WHERE statement allows the specification of filtering conditions, GROUP BY aggregates records, ORDER BY specifies sorting criteria, and LIMIT specifies the number of records to retrieve. Aggregate functions, such as count and sum, can be applied to aggregated records. For instance, the following code returns the top 10 most prolific users in the dataset: SELECT user_id, COUNT(*) AS cnt FROM tweets GROUP BY user_id ORDER BY cnt DESC LIMIT 10 The following are the top 10 most prolific users in the dataset: NULL 7091 1332188053 4 959468857 3 1367752118 3 362562944 3 58646041 3 2375296688 3 1468188529 3 37114209 3 2385040940 3 We can improve the readability of the hive output by setting the following: SET hive.cli.print.header=true; This will instruct hive, though not beeline, to print column names as part of the output. You can add the command to the .hiverc file usually found in the root of the executing user's home directory to have it apply to all hive CLI sessions. HiveQL implements a JOIN operator that enables us to combine tables together. In the Prerequisites section, we generated separate datasets for the user and place objects. Let's now load them into hive using external tables. We first create a user table to store user data, as follows: CREATE EXTERNAL TABLE user ( created_at string, user_id string, `location` string, name string, description string, followers_count bigint, friends_count bigint, favourites_count bigint, screen_name string, listed_count bigint ) ROW FORMAT DELIMITED FIELDS TERMINATED BY 'u0001' STORED AS TEXTFILE LOCATION '${input}/users'; We then create a place table to store location data, as follows: CREATE EXTERNAL TABLE place ( place_id string, country_code string, country string, `name` string, full_name string, place_type string ) ROW FORMAT DELIMITED FIELDS TERMINATED BY 'u0001' STORED AS TEXTFILE LOCATION '${input}/places'; We can use the JOIN operator to display the names of the 10 most prolific users, as follows: SELECT tweets.user_id, user.name, COUNT(tweets.user_id) AS cnt FROM tweets JOIN user ON user.user_id = tweets.user_id GROUP BY tweets.user_id, user.user_id, user.name ORDER BY cnt DESC LIMIT 10; Only equality, outer, and left (semi) joins are supported in Hive. Notice that there might be multiple entries with a given user ID but different values for the followers_count, friends_count, and favourites_count columns. To avoid duplicate entries, we count only user_id from the tweets tables. We can rewrite the previous query as follows: SELECT tweets.user_id, u.name, COUNT(*) AS cnt FROM tweets join (SELECT user_id, name FROM user GROUP BY user_id, name) u ON u.user_id = tweets.user_id GROUP BY tweets.user_id, u.name ORDER BY cnt DESC LIMIT 10; Instead of directly joining the user table, we execute a subquery, as follows: SELECT user_id, name FROM user GROUP BY user_id, name; The subquery extracts unique user IDs and names. Note that Hive has limited support for subqueries, historically only permitting a subquery in the FROM clause of a SELECT statement. Hive 0.13 has added limited support for subqueries within the WHERE clause also. HiveQL is an ever-evolving rich language, a full exposition of which is beyond the scope of this article. A description of its query and ddl capabilities can be found at https://cwiki.apache.org/confluence/display/Hive/LanguageManual. Structuring Hive tables for given workloads Often Hive isn't used in isolation, instead tables are created with particular workloads in mind or needs invoked in ways that are suitable for inclusion in automated processes. We'll now explore some of these scenarios. Partitioning a table With columnar file formats, we explained the benefits of excluding unneeded data as early as possible when processing a query. A similar concept has been used in SQL for some time: table partitioning. When creating a partitioned table, a column is specified as the partition key. All values with that key are then stored together. In Hive's case, different subdirectories for each partition key are created under the table directory in the warehouse location on HDFS. It's important to understand the cardinality of the partition column. With too few distinct values, the benefits are reduced as the files are still very large. If there are too many values, then queries might need a large number of files to be scanned to access all the required data. Perhaps the most common partition key is one based on date. We could, for example, partition our user table from earlier based on the created_at column, that is, the date the user was first registered. Note that since partitioning a table by definition affects its file structure, we create this table now as a non-external one, as follows: CREATE TABLE partitioned_user ( created_at string, user_id string, `location` string, name string, description string, followers_count bigint, friends_count bigint, favourites_count bigint, screen_name string, listed_count bigint ) PARTITIONED BY (created_at_date string) ROW FORMAT DELIMITED FIELDS TERMINATED BY 'u0001' STORED AS TEXTFILE; To load data into a partition, we can explicitly give a value for the partition into which to insert the data, as follows: INSERT INTO TABLE partitioned_user PARTITION( created_at_date = '2014-01-01') SELECT created_at, user_id, location, name, description, followers_count, friends_count, favourites_count, screen_name, listed_count FROM user; This is at best verbose, as we need a statement for each partition key value; if a single LOAD or INSERT statement contains data for multiple partitions, it just won't work. Hive also has a feature called dynamic partitioning, which can help us here. We set the following three variables: SET hive.exec.dynamic.partition = true; SET hive.exec.dynamic.partition.mode = nonstrict; SET hive.exec.max.dynamic.partitions.pernode=5000; The first two statements enable all partitions (nonstrict option) to be dynamic. The third one allows 5,000 distinct partitions to be created on each mapper and reducer node. We can then simply use the name of the column to be used as the partition key, and Hive will insert data into partitions depending on the value of the key for a given row: INSERT INTO TABLE partitioned_user PARTITION( created_at_date ) SELECT created_at, user_id, location, name, description, followers_count, friends_count, favourites_count, screen_name, listed_count, to_date(created_at) as created_at_date FROM user; Even though we use only a single partition column here, we can partition a table by multiple column keys; just have them as a comma-separated list in the PARTITIONED BY clause. Note that the partition key columns need to be included as the last columns in any statement being used to insert into a partitioned table. In the preceding code we use Hive's to_date function to convert the created_at timestamp to a YYYY-MM-DD formatted string. Partitioned data is stored in HDFS as /path/to/warehouse/<database>/<table>/key=<value>. In our example, the partitioned_user table structure will look like /user/hive/warehouse/default/partitioned_user/created_at=2014-04-01. If data is added directly to the filesystem, for instance by some third-party processing tool or by hadoop fs -put, the metastore won't automatically detect the new partitions. The user will need to manually run an ALTER TABLE statement such as the following for each newly added partition: ALTER TABLE <table_name> ADD PARTITION <location>; To add metadata for all partitions not currently present in the metastore we can use: MSCK REPAIR TABLE <table_name>; statement. On EMR, this is equivalent to executing the following statement: ALTER TABLE <table_name> RECOVER PARTITIONS; Notice that both statements will work also with EXTERNAL tables. Overwriting and updating data Partitioning is also useful when we need to update a portion of a table. Normally a statement of the following form will replace all the data for the destination table: INSERT OVERWRITE INTO <table>… If OVERWRITE is omitted, then each INSERT statement will add additional data to the table. Sometimes, this is desirable, but often, the source data being ingested into a Hive table is intended to fully update a subset of the data and keep the rest untouched. If we perform an INSERT OVERWRITE statement (or a LOAD OVERWRITE statement) into a partition of a table, then only the specified partition will be affected. Thus, if we were inserting user data and only wanted to affect the partitions with data in the source file, we could achieve this by adding the OVERWRITE keyword to our previous INSERT statement. We can also add caveats to the SELECT statement. Say, for example, we only wanted to update data for a certain month: INSERT INTO TABLE partitioned_user PARTITION (created_at_date) SELECT created_at , user_id, location, name, description, followers_count, friends_count, favourites_count, screen_name, listed_count, to_date(created_at) as created_at_date FROM user WHERE to_date(created_at) BETWEEN '2014-03-01' and '2014-03-31'; Bucketing and sorting Partitioning a table is a construct that you take explicit advantage of by using the partition column (or columns) in the WHERE clause of queries against the tables. There is another mechanism called bucketing that can further segment how a table is stored and does so in a way that allows Hive itself to optimize its internal query plans to take advantage of the structure. Let's create bucketed versions of our tweets and user tables; note the following additional CLUSTER BY and SORT BY statements in the CREATE TABLE statements: CREATE table bucketed_tweets ( tweet_id string, text string, in_reply_to string, retweeted boolean, user_id string, place_id string ) PARTITIONED BY (created_at string) CLUSTERED BY(user_ID) into 64 BUCKETS ROW FORMAT DELIMITED FIELDS TERMINATED BY 'u0001' STORED AS TEXTFILE; CREATE TABLE bucketed_user ( user_id string, `location` string, name string, description string, followers_count bigint, friends_count bigint, favourites_count bigint, screen_name string, listed_count bigint ) PARTITIONED BY (created_at string) CLUSTERED BY(user_ID) SORTED BY(name) into 64 BUCKETS ROW FORMAT DELIMITED FIELDS TERMINATED BY 'u0001' STORED AS TEXTFILE; Note that we changed the tweets table to also be partitioned; you can only bucket a table that is partitioned. Just as we need to specify a partition column when inserting into a partitioned table, we must also take care to ensure that data inserted into a bucketed table is correctly clustered. We do this by setting the following flag before inserting the data into the table: SET hive.enforce.bucketing=true; Just as with partitioned tables, you cannot apply the bucketing function when using the LOAD DATA statement; if you wish to load external data into a bucketed table, first insert it into a temporary table, and then use the INSERT…SELECT… syntax to populate the bucketed table. When data is inserted into a bucketed table, rows are allocated to a bucket based on the result of a hash function applied to the column specified in the CLUSTERED BY clause. One of the greatest advantages of bucketing a table comes when we need to join two tables that are similarly bucketed, as in the previous example. So, for example, any query of the following form would be vastly improved: SET hive.optimize.bucketmapjoin=true; SELECT … FROM bucketed_user u JOIN bucketed_tweet t ON u.user_id = t.user_id; With the join being performed on the column used to bucket the table, Hive can optimize the amount of processing as it knows that each bucket contains the same set of user_id columns in both tables. While determining which rows against which to match, only those in the bucket need to be compared against, and not the whole table. This does require that the tables are both clustered on the same column and that the bucket numbers are either identical or one is a multiple of the other. In the latter case, with say one table clustered into 32 buckets and another into 64, the nature of the default hash function used to allocate data to a bucket means that the IDs in bucket 3 in the first table will cover those in both buckets 3 and 35 in the second. Sampling data Bucketing a table can also help while using Hive's ability to sample data in a table. Sampling allows a query to gather only a specified subset of the overall rows in the table. This is useful when you have an extremely large table with moderately consistent data patterns. In such a case, applying a query to a small fraction of the data will be much faster and will still give a broadly representative result. Note, of course, that this only applies to queries where you are looking to determine table characteristics, such as pattern ranges in the data; if you are trying to count anything, then the result needs to be scaled to the full table size. For a non-bucketed table, you can sample in a mechanism similar to what we saw earlier by specifying that the query should only be applied to a certain subset of the table: SELECT max(friends_count) FROM user TABLESAMPLE(BUCKET 2 OUT OF 64 ON name); In this query, Hive will effectively hash the rows in the table into 64 buckets based on the name column. It will then only use the second bucket for the query. Multiple buckets can be specified, and if RAND() is given as the ON clause, then the entire row is used by the bucketing function. Though successful, this is highly inefficient as the full table needs to be scanned to generate the required subset of data. If we sample on a bucketed table and ensure the number of buckets sampled is equal to or a multiple of the buckets in the table, then Hive will only read the buckets in question. For example: SELECT MAX(friends_count) FROM bucketed_user TABLESAMPLE(BUCKET 2 OUT OF 32 on user_id); In the preceding query against the bucketed_user table, which is created with 64 buckets on the user_id column, the sampling, since it is using the same column, will only read the required buckets. In this case, these will be buckets 2 and 34 from each partition. A final form of sampling is block sampling. In this case, we can specify the required amount of the table to be sampled, and Hive will use an approximation of this by only reading enough source data blocks on HDFS to meet the required size. Currently, the data size can be specified as either a percentage of the table, as an absolute data size, or as a number of rows (in each block). The syntax for TABLESAMPLE is as follows, which will sample 0.5 percent of the table, 1 GB of data or 100 rows per split, respectively: TABLESAMPLE(0.5 PERCENT) TABLESAMPLE(1G) TABLESAMPLE(100 ROWS) If these latter forms of sampling are of interest, then consult the documentation, as there are some specific limitations on the input format and file formats that are supported. Writing scripts We can place Hive commands in a file and run them with the -f option in the hive CLI utility: $ cat show_tables.hql show tables; $ hive -f show_tables.hql We can parameterize HiveQL statements by means of the hiveconf mechanism. This allows us to specify an environment variable name at the point it is used rather than at the point of invocation. For example: $ cat show_tables2.hql show tables like '${hiveconf:TABLENAME}'; $ hive -hiveconf TABLENAME=user -f show_tables2.hql The variable can also be set within the Hive script or an interactive session: SET TABLE_NAME='user'; The preceding hiveconf argument will add any new variables in the same namespace as the Hive configuration options. As of Hive 0.8, there is a similar option called hivevar that adds any user variables into a distinct namespace. Using hivevar, the preceding command would be as follows: $ cat show_tables3.hql show tables like '${hivevar:TABLENAME}'; $ hive -hivevar TABLENAME=user –f show_tables3.hql Or we can write the command interactively: SET hivevar_TABLE_NAME='user'; Summary In this article, we learned that in its early days, Hadoop was sometimes erroneously seen as the latest supposed relational database killer. Over time, it has become more apparent that the more sensible approach is to view it as a complement to RDBMS technologies and that, in fact, the RDBMS community has developed tools such as SQL that are also valuable in the Hadoop world. HiveQL is an implementation of SQL on Hadoop and was the primary focus of this article. In regard to HiveQL and its implementations, we covered the following topics: How HiveQL provides a logical model atop data stored in HDFS in contrast to relational databases where the table structure is enforced in advance How HiveQL offers the ability to extend its core set of operators with user-defined code and how this contrasts to the Pig UDF mechanism The recent history of Hive developments, such as the Stinger initiative, that have seen Hive transition to an updated implementation that uses Tez Resources for Article: Further resources on this subject: Big Data Analysis [Article] Understanding MapReduce [Article] Amazon DynamoDB - Modelling relationships, Error handling [Article]

0
0
4808

article-image-jenkins-continuous-integration

Packt

09 Feb 2015

17 min read

Jenkins Continuous Integration

Packt

09 Feb 2015

17 min read

This article by Alan Mark Berg, the author of Jenkins Continuous Integration Cookbook Second Edition, outlines the main themes surrounding the correct use of a Jenkins server. Overview Jenkins (http://jenkins-ci.org/) is a Java-based Continuous Integration (CI) server that supports the discovery of defects early in the software cycle. Thanks to over 1,000 plugins, Jenkins communicates with many types of systems building and triggering a wide variety of tests. CI involves making small changes to software and then building and applying quality assurance processes. Defects do not only occur in the code, but also appear in the naming conventions, documentation, how the software is designed, build scripts, the process of deploying the software to servers, and so on. CI forces the defects to emerge early, rather than waiting for software to be fully produced. If defects are caught in the later stages of the software development life cycle, the process will be more expensive. The cost of repair radically increases as soon the bugs escape to production. Estimates suggest it is 100 to 1,000 times cheaper to capture defects early. Effective use of a CI server, such as Jenkins, could be the difference between enjoying a holiday and working unplanned hours to heroically save the day. And as you can imagine, in my day job as a senior developer with aspirations to quality assurance, I like long boring days, at least for mission-critical production environments. Jenkins can automate the building of software regularly and trigger tests pulling in the results and failing based on defined criteria. Failing early via build failure lowers the costs, increases confidence in the software produced, and has the potential to morph subjective processes into an aggressive metrics-based process that the development team feels is unbiased. Jenkins is: A proven technology that is deployed at large scale in many organizations. Open source, so the code is open to review and has no licensing costs. Has a simple configuration through a web-based GUI. This speeds up job creation, improves consistency, and decreases the maintenance costs. A master slave topology that distributes the build and testing effort over slave servers with the results automatically accumulated on the master. This topology ensures a scalable, responsive, and stable environment. Has the ability to call slaves from the cloud. Jenkins can use Amazon services or an Application Service Provider (ASP) such as CloudBees. (http://www.cloudbees.com/). No fuss installation. Installation is as simple as running only a single downloaded file named jenkins.war. Has many plugins. Over 1,000 plugins supporting communication, testing, and integration to numerous external applications (https://wiki.jenkins-ci.org/display/JENKINS/Plugins). A straightforward plugin framework. For Java programmers, writing plugins is straightforward. Jenkins plugin framework has clear interfaces that are easy to extend. The framework uses XStream (http://xstream.codehaus.org/) for persisting configuration information as XML and Jelly (http://commons.apache.org/proper/commons-jelly/) for the creation of parts of the GUI. Runs Groovy scripts. Jenkins has the facility to support running Groovy scripts both in the master and remotely on slaves. This allows for consistent scripting across operating systems. However, you are not limited to scripting in Groovy. Many administrators like to use Ant, Bash, or Perl scripts and statisticians and developers with complex analytics requirements the R language. Not just Java. Though highly supportive of Java, Jenkins also supports other languages. Rapid pace of improvement. Jenkins is an agile project; you can see numerous releases in the year, pushing improvements rapidly at http://jenkins-ci.org/changelog.There is also a highly stable Long-Term Support Release for the more conservative. Jenkins pushes up code quality by automatically testing within a short period after code commit and then shouting loudly if build failure occurs. The importance of continuous testing In 2002, NIST estimated that software defects were costing America around 60 billion dollars per year (http://www.abeacha.com/NIST_press_release_bugs_cost.htm). Expect the cost to have increased considerably since. To save money and improve quality, you need to remove defects as early in the software lifecycle as possible. The Jenkins test automation creates a safety net of measurements. Another key benefit is that once you have added tests, it is trivial to develop similar tests for other projects. Jenkins works well with best practices such as Test Driven Development (TDD) or Behavior Driven Development (BDD). Using TDD, you write tests that fail first and then build the functionality needed to pass the tests. With BDD, the project team writes the description of tests in terms of behavior. This makes the description understandable to a wider audience. The wider audience has more influence over the details of the implementation. Regression tests increase confidence that you have not broken code while refactoring software. The more coverage of code by tests, the more confidence. There are a number of good introductions to software metrics. These include a wikibook on the details of the metrics (http://en.wikibooks.org/wiki/Introduction_to_Software_Engineering/Quality/Metrics). And a well written book is by Diomidis Spinellis Code Quality: The Open Source Perspective. Remote testing through Jenkins considerably increases the number of dependencies in your infrastructure and thus the maintenance effort. Remote testing is a problem that is domain specific, decreasing the size of the audience that can write tests. You need to make test writing accessible to a large audience. Embracing the largest possible audience improves the chances that the tests defend the intent of the application. The technologies highlighted in the Jenkins book include: Fitnesse: It is a wiki with which you can write different types of tests. Having a WIKI like language to express and change tests on the fly gives functional administrators, consultants, and the end-user a place to express their needs. You will be shown how to run Fitnesse tests through Jenkins. Fitnesse is also a framework where you can extend Java interfaces to create new testing types. The testing types are called fixtures; there are a number of fixtures available, including ones for database testing, running tools from the command line and functional testing of web applications. JMeter: It is a popular open source tool for stress testing. It can also be used to functionally test through the use of assertions. JMeter has a GUI that allows you to build test plans. The test plans are then stored in XML format. Jmeter is runnable through a Maven or Ant scripts. JMeter is very efficient and one instance is normally enough to hit your infrastructure hard. However, for super high load scenarios Jmeter can trigger an array of JMeter instances. Selenium: It is the de-facto industrial standard for functional testing of web applications. With Selenium IDE, you can record your actions within Firefox, saving them in HTML format to replay later. The tests can be re-run through Maven using Selenium Remote Control (RC). It is common to use Jenkins slaves with different OS's and browser types to run the tests. The alternative is to use Selenium Grid (https://code.google.com/p/selenium/wiki/Grid2). Selenium Webdriver and TestNG unit tests: A programmer-specific approach to functional testing is to write unit tests using the TestNG framework. The unit tests apply the Selenium WebDriver framework. Selenium RC is a proxy that controls the web browser. In contrast, the WebDriver framework uses native API calls to control the web browser. You can even run the HtmlUnit framework removing the dependency of a real web browser. This enables OS independent testing, but removes the ability to test for browser specific dependencies. WebDriver supports many different browser types. SoapUI: It simplifies the creation of functional tests for web services. The tool can read Web Service Definition Language (WSDL) files publicized by web services, using the information to generate the skeleton for functional tests. The GUI makes it easy to understand the process. Plugins Jenkins is not only a CI server, it is also a platform to create extra functionality. Once a few concepts are learned, a programmer can adapt available plugins to their organization's needs. If you see a feature that is missing, it is normally easier to adapt an existing one than to write from scratch. If you are thinking of adapting then the plugin tutorial (https://wiki.jenkins-ci.org/display/JENKINS/Plugin+tutorial) is a good starting point. The tutorial is relevant background information on the infrastructure you use daily. There is a large amount of information available on plugins. Here are some key points: There are many plugins and more will be developed. To keep up with these changes, you will need to regularly review the available section of the Jenkins plugin manager. Work with the community: If you centrally commit your improvements then they become visible to a wide audience. Under the careful watch if the community, the code is more likely to be reviewed and further improved. Don't reinvent the wheel: With so many plugins, in the majority of situations, it is easier to adapt an already existing plugin than write from scratch. Pinning a plugin occurs when you cannot update the plugin to a new version through the Jenkins plugin manager. Pinning helps to maintain a stable Jenkins environment. Most plugin workflows are easy to understand. However, as the number of plugins you use expands, the likelihood of an inadvertent configuration error increases. The Jenkins Maven Plugin allows you to run a test Jenkins server from within a Maven build without risk. Conventions save effort: The location of files in plugins matters. For example, you can find the description of a plugin displayed in Jenkins at the file location /src/main/resources/index.jelly. By keeping to Jenkins conventions, the amount of source code you write is minimized and the readability is improved. The three frameworks that are heavily used in Jenkins are as follows: Jelly for the creation of the GUI Stapler for the binding of the Java classes to the URL space XStream for persistence of configuration into XML Maintenance In the Jenkins' book, we will also provide recipes that support maintenance cycles. For large scale deployments of Jenkins within diverse enterprise infrastructure, proper maintenance of Jenkins is crucial to planning predictable software cycles. Proper maintenance lowers the risk of failures: Storage over-flowing with artifacts: If you keep a build history that includes artifacts such as WAR files, large sets of JAR files, or other types of binaries, then your storage space is consumed at a surprising rate. Storage costs have decreased tremendously, but storage usage equates to longer backup times and more communication from slave to master. To minimize the risk of disk, overflowing you will need to consider your backup and also restore policy and the associated build retention policy expressed in the advanced options of jobs. Script spaghetti: As jobs are written by various development teams, the location and style of the included scripts vary. This makes it difficult for you to keep track. Consider using well-defined locations for your scripts and a scripts repository managed through a plugin. Resource depletion: As memory is consumed or the number of intense jobs increases, then Jenkins slows down. Proper monitoring and quick reaction reduce impact. A general lack of consistency between Jobs due to organic growth: Jenkins is easy to install and use. The ability to seamlessly turn on plugins is addictive. The pace of adoption of Jenkins within an organization can be breath taking. Without a consistent policy, your teams will introduce lots of plugins and also lots of ways of performing the same work. Conventions improve consistency and readability of Jobs and thus decrease maintenance. New plugins causing exceptions: There are a lot of good plugins being written with rapid version change. In this situation, it is easy for you to accidentally add new versions of plugins with new defects. There have been a number of times during upgrading plugins that suddenly the plugin does not work. To combat the risk of plugin exceptions, consider using a sacrificial Jenkins instance before releasing to a critical system. Jack of all trades Jenkins has many plugins that allow it to integrate easily into complex and diverse environments. If there is a need that is not directly supported you can always use a scripting language of choice and wire that into your jobs. In this section, we'll explore the R plugin and see how it can help you generate great graphics. R is a popular programming language for statistics (http://en.wikipedia.org/wiki/R_programming_language). It has many hundreds of extensions and has a powerful set of graphical capabilities. In this recipe, we will show you how to use the graphical capabilities of R within your Jenkins Jobs and then point you to some excellent starter resources. For a full list of plugins that improve the UI of Jenkins including Jenkins' graphical capabilities, visit https://wiki.jenkins-ci.org/display/JENKINS/Plugins#Plugins-UIplugins. Getting ready Install the R plugin (https://wiki.jenkins-ci.org/display/JENKINS/R+Plugin). Review the R installation documentation (http://cran.r-project.org/doc/manuals/r-release/R-admin.html). How to do it... From the command line, install the R language: sudo apt-get install r-base Review the available R packages: apt-cache search r-cran | less Create a free-style job with the name ch4.powerfull.visualizations. In the Build section, under Add build step, select Execute R script. In the Script text area, add the following lines of code: paste('======================================='); paste('WORKSPACE: ', Sys.getenv('WORKSPACE')) paste('BUILD_URL: ', Sys.getenv('BUILD_URL')) print('ls /var/lib/jenkins/jobs/R-ME/builds/') paste('BUILD_NUMBER: ', Sys.getenv('BUILD_NUMBER')) paste('JOB_NAME: ', Sys.getenv('JOB_NAME')) paste('JENKINS_HOME: ', Sys.getenv('JENKINS_HOME')) paste( 'JOB LOCATION: ', Sys.getenv('JENKINS_HOME'),'/jobs/',Sys.getenv('JOB_NAME'),'/builds/', Sys.getenv('BUILD_NUMBER'),"/test.pdf",sep="") paste('======================================='); filename<-paste('pie_',Sys.getenv('BUILD_NUMBER'),'.pdf',sep="") pdf(file=filename) slices<- c(1,2,3,3,6,2,2) labels <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday","Saturday","Sunday") pie(slices, labels = labels, main="Number of failed jobs for each day of the week") filename<-paste('freq_',Sys.getenv('BUILD_NUMBER'),'.pdf',sep="") pdf(file=filename) Number_OF_LINES_OF_ACTIVE_CODE=rnorm(10000, mean=200, sd=50) hist(Number_OF_LINES_OF_ACTIVE_CODE,main="Frequency plot of Class Sizes") filename<-paste('scatter_',Sys.getenv('BUILD_NUMBER'),'.pdf',sep="") pdf(file=filename) Y <- rnorm(3000) plot(Y,main='Random Data within a normal distribution') Click on the Save button. Click on the Build Now icon Beneath Build History, click on the Workspace button. Review the generated graphics by clicking on the freq_1.pdf, pie_1.pdf, and scatter_1.pdf links, as shown in the following screenshot: The following screenshot is a histogram of the values from the random data generated by the R script during the build process. The data simulates class sizes within a large project: Another view is a pie chart. The fake data representing the number of failed jobs for each day of the week. If you make this plot against your own values, you might see particularly bad days, such as the day before or after the weekend. This might have implications about how developers work or motivation is distributed through the week. Perform the following steps: Run the job and review the Workspace. Click on Console Output. You will see output similar to: Started by user anonymousBuilding in workspace /var/lib/jenkins/workspace/ch4.Powerfull.Visualizations[ch4.Powerfull.Visualizations] $ Rscript /tmp/hudson6203634518082768146.R [1] "=======================================" [1] "WORKSPACE: /var/lib/jenkins/workspace/ch4.Powerfull.Visualizations" [1] "BUILD_URL: " [1] "ls /var/lib/jenkins/jobs/R-ME/builds/" [1] "BUILD_NUMBER: 9" [1] "JOB_NAME: ch4.Powerfull.Visualizations" [1] "JENKINS_HOME: /var/lib/jenkins" [1] "JOB LOCATION: /var/lib/jenkins/jobs/ch4.Powerfull.Visualizations/builds/9/test.pdf" [1] "=======================================" Finished: SUCCESS Click on Back to Project Click on Workspace. How it works... With a few lines of R code, you have generated three different well-presented PDF graphs. The R plugin ran a script as part of the build. The script printed out the WORKSPACE and other Jenkins environment variables to the console: paste ('WORKSPACE: ', Sys.getenv('WORKSPACE')) Next, a filename is set with the build number appended to the pie_ string. This allows the script to generate a different filename each time it is run, as shown: filename <-paste('pie_',Sys.getenv('BUILD_NUMBER'),'.pdf',sep="") The script now opens output to the location defined in the filename variable through the pdf(file=filename) command. By default, the output directory is the job's workspace. Next, we define fake data for the graph, representing the number of failed jobs on any given day of the week. Note that in the simulated world, Friday is a bad day: slices <- c(1,2,3,3,6,2,2)labels <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday","Saturday","Sunday") We can also plot a pie graph, as follows: pie(slices, labels = labels, main="Number of failed jobs for each day of the week") For the second graph, we generated 10,000 pieces of random data within a normal distribution.The fake data represents the number of lines of active code that ran for a give job: Number_OF_LINES_OF_ACTIVE_CODE=rnorm(10000, mean=200, sd=50) The hist command generates a frequency plot: hist(Number_OF_LINES_OF_ACTIVE_CODE,main="Frequency plot of Class Sizes") The third graph is a scatter plot with 3,000 data points generated at random within a normal distribution. This represents a typical sampling process, such as the number of potential defects found using Sonar or FindBugs: Y <- rnorm(3000)plot(Y,main='Random Data within a normal distribution') We will leave it to an exercise for the reader to link real data to the graphing capabilities of R. There's more... Here are a couple more points for you to think about. R studio or StatET A popular IDE for R is RStudio (http://www.rstudio.com/). The open source edition is free. The feature set includes a source code editor with code completion and syntax highlighting, integrated help, solid debugging features and a slew of other features. An alternative for the Eclipse environment is the StatET plugin (http://www.walware.de/goto/statet). Quickly getting help The first place to start to learn R is by typing help.start() from the R console. The command launches a browser with an overview of the main documentation If you want descriptions of R commands then typing ? before a command generates detailed help documentation. For example, in the recipe we looked at the use of the rnorm command. Typing ?rnorm produces documentation similar to: The Normal DistributionDescriptionDensity, distribution function, quantile function and random generationfor the normal distribution with mean equal to mean and standard deviation equal to sd.Usagednorm(x, mean = 0, sd = 1, log = FALSE)pnorm(q, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE)qnorm(p, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE)rnorm(n, mean = 0, sd = 1) The Community Jenkins is not just a CI server, it is also a vibrant and highly active community. Enlightened self-interest dictates participation. There are a number of ways to do this: Participate on the mailing lists and Twitter (https://wiki.jenkins-ci.org/display/JENKINS/Mailing+Lists). First, read the postings and as you get to understand what is needed then participate in the discussions. Consistently reading the lists will generate many opportunities to collaborate. Improve code, write plugins (https://wiki.jenkins-ci.org/display/JENKINS/Help+Wanted). Test Jenkins and especially the plugins and write bug reports, donating your test plans. Improve documentation by writing tutorials and case studies. Sponsor and support events. Final Comments An efficient approach to learning how to effectively use Jenkins is to download and install the server and then trying out recipes you find in books, the Internet or developed by your fellow developers. I wish you good fortune and an interesting learning experience. Resources for Article: Further resources on this subject: Creating a basic JavaScript plugin [article] What is continuous delivery and DevOps? [article] Continuous Integration [article]

0
0
4493

article-image-controls-and-player-movement

Packt

09 Feb 2015

17 min read

Controls and player movement

Packt

09 Feb 2015

17 min read

In this article by Miguel DeQuadros, author of the book GameSalad Essentials, you will learn how to create your playable character. We are going to cover a couple of different control schemes, depending on the platform you want to release your awesome game on. We will deal with some keyboard controls, mouse controls, and even some touch controls for mobile devices. (For more resources related to this topic, see here.) Let's go into our project, and open up level 1. First we are going to add some gravity to the scene. In the scene editor, click on the Scene button in the Inspector window. In the Attributes window, expand the Gravity drop down, and change the Y value to 900. Now, on to our actor! We have already created our player actor in the Library, so let's drag him into our level. Once he's placed exactly where you want him to be, double-click on him to open up the actor editor. Instead of clicking the lock button, click the Edit Prototype... button on the upper left-hand corner of the screen, just above the actor's image. After doing this, we will edit the main actor within the Library; so instead of having to program each actor in every scene, we just drag the main actor in the library and boom! It's programmed. For our player actor, we are going to keep things super organized, because honestly, he's going to have a lot of behaviors, and even more so if you decide to continue after we are done. Click the Create Group button, right beside the Create Rule button. This will now create a group which we will put all our keyboard control-based behaviors within, so let's name it Keyboard Controls by double-clicking the name of the group. Now let's start creating the rules for each button that will control our character. Each rule will be Actor receives event | key | (whichever key you want) | is down. To select the key, simply press the Keyboard button inside the rule, and it will open up a virtual keyboard. All you have to do is click the key you want to use. For our first rule, let's do Actor receives event | key | left | is down. Because our sprite has been created with Kevin facing the right side, we have to flip him whenever the player presses the left arrow key. So, we have to drag in a Change Attribute behavior, and change it to Change Attribute: self.Graphics.Flip Horizontally to true. Then drag in an Animate behavior for the walking animation, and drag in the corresponding walking images for our character. Now let's get Kevin moving! For each Move rule you create, drag in an Accelerate behavior, change it to the corresponding direction you want him to move in, and change the Acceleration value to 500. Again, you can play around with these moving values to whatever suits your style. When you create the Move Right rule, don't forget to change the Flip Horizontally attribute to false so that he flips back to normal when the right key is pressed. Test out the level to see if he's moving. Oops! Did he fall through the platforms? We need to fix that! We are going to do this the super easy way. Click on the home button and then to the Actors tab. On the bottom-left corner of the screen, you will find a + button, click on it. This will add a new group of actors, or tags; name it Platforms. Now all you have to do is, simply drag in each platform actor you want the character to collide against in this section. Now when we create our Collision behavior, we only have to do one for the whole set of platforms, instead of creating one per actor. Time saved, money made! Plus, it keeps things more organized. Let's go back to our Kevin actor, and outside of the controls group, drag in a Collide behavior, and all you have to do is change it to Bounce when colliding with | actor with tag: | Platforms. Now, whenever he collides with an actor that you dragged into the Platforms section, he will stop. Test it out to see how it works. Bounce! Bounce! Bounce! We need to adjust a few things in the Physics section of the Kevin actor, and also the Platform actors. Open up one of the Platform actors (it doesn't matter which one, because we are going to change them all), in the Attributes window, open up the Physics drop down, and change the Bounciness value to 0. Or if you want it to be a rubbery surface, you can leave it at 1, or even 100, whatever you like. If you want a slippery platform, change the Friction to a lower than one value, such as 0.1. Do the same with our Kevin actor. Change his Bounciness to 0, his Density to 100, and Friction to 50, and check off (if not already) the Fixed Rotation, and Movable options. Now test it out. Everything should work perfectly! If you change the Platform's Friction value and set it much higher, our actor won't move very well, which would be good for super sticky platforms. We are now going to work on jumping. You want him to jump only when he is colliding with the ground, otherwise the player could keep pressing the jump button and he would just continue to fly; which would be cool, but it wouldn't make the levels very challenging now, would it? Let's go back to editing our Kevin actor. We need to create a new attribute within the actor itself, so in the Attributes window, click on the + button, select Boolean, and name it OnGround, and leave it unchecked. Create a new rule, and change it to Actor receives event | overlaps of collides | with actor with tag: | Platforms. Then drag in another rule, change it to Attribute | self.Motion Linear Velocity.Y | < 1, then click on the + button within the rule to create another condition, and change it to Attribute | self.Motion Linear Velocity.Y > -1. Whoa there cowboy! What does this mean? Let's break it down. This is detecting the actor's up and down motion, so when he is going slower than 1, and faster than -1, we will change the attribute. Why do this? It's a bit of a fail safe, so when the actor jumps and the player keeps holding the jump key, he doesn't keep jumping. If we didn't add these conditions, the player could jump at times when he shouldn't. Within that rule, drag in a Change Attribute behavior, and change it to Change Attribute: | self.OnGround To 1. (1 being true, 0 being false, or you can type in true or false.) Now, inside the Keyboard Controls group, create a new rule and name it Jumping. Change the rule to Actor receives event | key | space | is down, and we need to check another condition, so click on the + button inside of the rule, and change the new condition to Attribute | self.OnGround is true. This will not only check if the space bar is down, but will also make sure that he is colliding with the ground. Drag in a Change Attribute behavior inside of this Jumping rule, and change it to self.OnGround to 0, which, you guessed it, turns off the OnGround condition so the player can't keep jumping! Now drag in a Timer behavior, change it to For | 0.1 seconds, then drag in an Accelerate behavior inside of the Timer, and change the Direction to 90, Acceleration to 5000. Again, play around with these values. If you want him to jump a little higher, or lower, just change the timing slower or faster. If you want to slow him down a bit, such as walking speed and falling speed (which is highly recommended), in the Attributes section in the Actor window, under Motion change the Max Speed (I ended up changing it to 500), then click on Apply Max Speed. If you put too low a value, he will pause before he jumps because the combined speed of him walking and jumping is over the max speed so GameSalad will throttle him down, and that doesn't look right. For the finishing touch, let's drag in a Control Camera behavior so the camera moves with Kevin. (Don't forget to change the camera's bounds in the scene!) Now Kevin should be shooting around the level like a maniac! If you want to use mobile controls, let's go through some details. If not, skip ahead to the next heading. Mobile controls Mobile controls in GameSalad are relatively simple to do. It involves creating a non-moveable layer within the scene, and creating actors to act as buttons on screen. Let's see what we're talking about. Go into your scene. Under the Inspector window, click on the Scene button, then the Layers tab. Click on the + button to add a new layer, rename it to Buttons, and uncheck scrollable. If you have a hard time renaming the new layer by double clicking on it, simply click on the layer and press Enter or the return key, you will now be able to rename it. Whether or not this is a bug, we will find out in further releases. In each button, you'll want to uncheck the Moveable option in the Physics section, so the button doesn't fall off the screen when you click on play. Drag them into your scene in the ways you want the buttons to be laid out. We are now going to create three new game attributes; each will correspond to the button being pressed. Go to our scene, and click on the Game button in the Inspector window, and under the Attributes tab add three new Boolean attributes, LeftButton, RightButton, and JumpButton. Now let's start editing each button in the Library, not in the scene. We'll start with the left arrow button. Create a new rule, name it Button Pressed, change it to Actor receives event | touch | is pressed. Inside that rule, drag in a change attribute behavior and change it to game.LeftButton | to true; then expand the Otherwise drop down in the rule and copy and paste the previous change attribute rule into the Otherwise section and change it from true to false. What this does is, any time the player isn't touching that button, the LeftButton attribute is false. Easy! Do the same for each of the other buttons, but change each attribute accordingly. If you want, you can even change the color or image of the actor when it's pressed so you can actually see what button you are pressing. Simply adding three Change Attribute behaviors to the rule and changing the colors to 1 when being touched, and 0 when not, will highlight the button on touch. It's not required, but hey, it looks a lot better! Let's go back into our Kevin actor, copy and paste the Keyboard Controls group, and rename it to Touch Controls. Now we have to change each rule from Actor receives event | key, to Attribute | game.LeftButton | is true (or whatever button is being pressed). Test it out to see if everything works. If it does, awesome! If not, go back and make sure everything was typed in correctly. Don't forget, when it comes to changing attributes, you can't simply type in game.LeftButton, you have to click on the … button beside the Attribute, and then click on Game, then click LeftButton. Now that Kevin is alive, let's make him armed and dangerous. Go back to the Scene button within the scene editor, and under the Layers tab click on the Background layer to start creating actors in that layer. If you forget to do this, GameSalad will automatically start creating actors in the last selected layer, which can be a fast way of creating objects, but annoying if you forget. Attacking Pretty much any game out there nowadays has some kind of shooting or attacking in it. Whether you're playing NHL, NFL, or Portal 2, there is some sort of attack, or shot that you can make, and Kevin will locate a special weapon in the first level. The Inter Dimensional Gravity Disruptor I created both, the weapon and the projectile that will be launched from the Inter Dimensional Gravity Disruptor. For the gun actor, don't forget to uncheck the Movable option in the Physics drop down to prevent the gun from moving around. Now let's place the gun somewhere near the end of the level. For now, let's create a new game-wide attribute, so within the scene, click on the Game button within the Inspector window, and open the Attributes tab. Now, click on the + button to add a new attribute. It's going to be a Boolean, and we will name it GunCollected. What we are going to do is, when Kevin collides with the collectable gun at the end of the level, the Boolean will switch to true to show it's been collected, then we will do some trickery that will allow us to use multiple weapons. We are going to add two more new game-wide attributes, both Integers; name one KevX, and the other KevY. I'll explain this in a minute. Open up the Kevin actor in the Inspector window, create a new Rule, and change it to: Actor receives event | overlaps or collides | with | actor of type | Inter-Dimensional-Gravity-Disruptor, or whatever you want to name it; then drag in a Change Attribute behavior into the rule, and change it to game.GunCollected | to true. Finally, outside of this rule, drag in two Constrain Attribute behaviors and change them to the following: game.KevX | to | self.position.xgame.KevY | to | self.position.y The Constrain attributes do exactly what it sounds like, it constantly changes the attribute, constraining it; whereas the Change Attribute does it once. That's all we need to do within our Kevin actor. Now let's go back to our gun actor inside the Inspector and create a new Rule, change it to Attribute | game.GunCollected | is true. Inside that rule, drag in two Constrain Attribute behaviors, and change them to the following: self.position.x | to | game.KevXself.position.y| to | game.KevY +2 Test out your game now and see if the gun follows Kevin when he collects it. If we were to put a Change Attribute instead of Constrict Attribute, the gun would quickly pop to his position, and that's it. You could fix this by putting the Change attributes inside of a Timer behavior where every 0.0001 second it changes the attributes, but when you test it, the gun will be following quite choppily and that doesn't look good. We now want the gun to flip at the same time Kevin does, so let's create two rules the same way we did with Kevin. When the Left arrow key is pressed, change the flipped attribute to true, and when the Right arrow key is pressed change the flipped attribute to false. This happens only when the GunCollected attribute is true, otherwise the gun will be flipping before you even collect it. So create these rules within the GunCollected is true rule. Now the gun will flip with Kevin! Keep in mind, if you haven't checked out the gun image I created (I drew the gun on top of the Kevin sprite) and saved it where it was positioned, no centering will happen. This way the gun is automatically positioned and you don't have to finagle positioning it in the behaviors. Now, for the pew pew. Open up our GravRound actor within the Inspector window, simply add in a Move behavior and make it pretty fast; I did 500. Change the Density, Friction, and Bounciness, all to zero. Open up our Gun actor. We are going to create a new Rule; it will have three conditions to the rule (for Shoot Left) and they are as follows: Attribute | game.GunCollected | is true Actor receives event | key | (whatever shoot button you want) | is down Attribute | self.graphic.Flip.Horizontally | is true Now drag in a Spawn Actor behavior into this rule, select the GravRound actor, Direction: 180, Position: -25. Copy and paste this rule and change it so flip is false, and the Direction of the Spawn Actor is 0.0, and Position is 25: Test it out to make sure he's shooting in the right directions. When developing for a while, it's easy to get tired and miss things. Don't forget to create a new button for shooting, if you are creating a mobile version of the game. Shooting with touch controls is done the exact same way as walking and jumping with touch buttons. Melee weapons If you want to create a game with melee weapons, I'll quickly take you through the process of doing it. Melee weapons can be made the same way as shooting a bullet in real. You can either have the blade or object as a separate actor positioned to the actor, and when the player presses the attack button, have the player and the melee weapon animate at the same time. This way when the melee weapon collides with the enemy, it will be more accurate. If you want to have the player draw the weapon in hand, simply animate the actor, and detect if there is a collision. This will be a little less accurate, as an enemy could run into the player when he is animating and get hurt, so what you can do is have the player spawn an invisible actor to detect the melee weapon collisions more accurately. I'll quickly go through this with you. When the player presses the attack button and the actor isn't flipped (so he'll be facing the right end of the screen), animate accordingly, and then spawn the actor in front of the player. Then, in the detector actor you spawn when attacking, have it automatically destroyed when the player has finished animating. Also, throw in a Move behavior, especially when there is a down swipe such as the attack. For this example, you'll want a detector to follow the edge of the blade. This way, all you have to do is detect collisions inside the enemies with the detector actor. It is way more accurate to do things this way; in fact you can even do collisions with walls. Let's say the actor punches a wall and this detector collides with the block, spawn some particles for debris, and change the image of the block into a cracked image. There are a lot of cool things you can do with melee weapons and with ranged weapons. You can even have the wall pieces have a sort of health bar, which is really a set amount of times a bullet or sword will have to hit it before it is destroyed. Kevin is now bouncing around our level, running like a silly little boy, and now he can shoot things up. Summary We had an unmovable, boring character—albeit a good looking one. We discussed how to get him moving using a keyboard, and mobile touch controls. We then discussed how to get our character to collect a weapon, have that weapon follow him around, and then start shooting it. What are we going to talk about, you ask? Well, we are going to make Kevin have health, lives, and power ups. We'll discuss an inventory system, scoring, and some more game play mechanics like pausing and such. Relax for a little bit, take a nice break. If the weather is as nice out there as it is at the time of me writing, go enjoy the lovely sun. I can't stress the importance of taking a break enough when it comes to game development. There have been many times when I've been programming for hours on end, and I end up making a mistake and because I'm so tired, I can't find the problem. As soon as I take rest, the problem is easy to fix and I'm more alert. Resources for Article: Further resources on this subject: Django 1.2 E-commerce: Generating PDF ArcGIS Spatial Analyst [Article] Sparrow iOS Game Framework - The Basics of Our Game [Article] Three.js - Materials and Texture [Article]

0
0
24220

article-image-migrating-wordpress-blog-middleman-and-deploying-to-amazon-part3

Mike Ball

09 Feb 2015

9 min read

Part 3: Migrating a WordPress Blog to Middleman and Deploying to Amazon S3

Mike Ball

09 Feb 2015

9 min read

Part 3: Migrating WordPress blog content and deploying to production In parts 1 and 2 of this series, we created middleman-demo, a basic Middleman-based blog, imported content from WordPress, and deployed middleman-demo to Amazon S3. Now that middleman-demo has been deployed to production, let’s design a continuous integration workflow that automates builds and deployments. In part 3, we’ll cover the following: Testing middleman-demo with RSpec and Capybara Integrating with GitHub and Travis CI Configuring automated builds and deployments from Travis CI If you didn’t follow parts 1 and 2, or you no longer have your original middleman-demo code, you can clone mine and check out the part3 branch: $ git clone http://github.com/mdb/middleman-demo && cd middleman-demo && git checkout part3 Create some automated tests In software development, the practice of continuous delivery serves to frequently deploy iterative software bug fixes and enhancements, such that users enjoy an ever-improving product. Automated processes, such as tests, assist in rapidly validating quality with each change. middleman-demo is a relatively simple codebase, though much of its build and release workflow can still be automated via continuous delivery. Let’s write some automated tests for middleman-demo using RSpec and Capybara. These tests can assert that the site continues to work as expected with each change. Add the gems to the middleman-demo Gemfile: gem 'rspec'gem 'capybara' Install the gems: $ bundle install Create a spec directory to house tests: $ mkdir spec As is the convention in RSpec, create a spec/spec_helper.rb file to house the RSpec configuration: $ touch spec/spec_helper.rb Add the following configuration to spec/spec_helper.rb to run middleman-demo during test execution: require "middleman" require "middleman-blog" require 'rspec' require 'capybara/rspec' Capybara.app = Middleman::Application.server.inst do set :root, File.expand_path(File.join(File.dirname(__FILE__), '..')) set :environment, :development end Create a spec/features directory to house the middleman-demo RSpec test files: $ mkdir spec/features Create an RSpec spec file for the homepage: $ touch spec/features/index_spec.rb Let’s create a basic test confirming that the Middleman Demo heading is present on the homepage. Add the following to spec/features/index_spec.rb: require "spec_helper" describe 'index', type: :feature do before do visit '/' end it 'displays the correct heading' do expect(page).to have_selector('h1', text: 'Middleman Demo') end end Run the test and confirm that it passes: $ rspec You should see output like the following: Finished in 0.03857 seconds (files took 6 seconds to load) 1 example, 0 failures Next, add a test asserting that the first blog post is listed on the homepage; confirm it passes by running the rspec command: it 'displays the "New Blog" blog post' do expect(page).to have_selector('ul li a[href="/blog/2014/08/20/new-blog/"]', text: 'New Blog') end As an example, let’s add one more basic test, this time asserting that the New Blog text properly links to the corresponding blog post. Add the following to spec/features/index_spec.rb and confirm that the test passes: it 'properly links to the "New Blog" blog post' do click_link 'New Blog' expect(page).to have_selector('h2', text: 'New Blog') end middleman-demo can be further tested in this fashion. The extent to which the specs test every element of the site’s functionality is up to you. At what point can it be confidently asserted that the site looks good, works as expected, and can be publicly deployed to users? Push to GitHub Next, push your middleman-demo code to GitHub. If you forked my original github.com/mdb/middleman-demo repository, skip this section. 1. Create a GitHub repositoryIf you don’t already have a GitHub account, create one. Create a repository through GitHub’s web UI called middleman-demo. 2. What should you do if your version of middleman-demo is not a git repository?If your middleman-demo is already a git repository, skip to step 3. If you started from scratch and your code isn’t already in a git repository, let’s initialize one now. I’m assuming you have git installed and have some basic familiarity with it. Make a middleman-demo git repository: $ git init && git add . && git commit -m 'initial commit' Declare your git origin, where <your_git_url_from_step_1> is your GitHub middleman-demo repository URL: $ git remote add origin <your_git_url_from_step_1> Push to your GitHub repository: $ git push origin master You’re done; skip step 3 and move on to Integrate with Travis CI. 3. If you cloned my mdb/middleman-demo repository…If you cloned my middleman-demo git repository, you’ll need to add your newly created middleman-demo GitHub repository as an additional remote: $ git remote add my_origin <your_git_url_from_step_1> If you are working in a branch, merge all your changes to master. Then push to your GitHub repository: $ git push -u my_origin master Integrate with Travis CI Travis CI is a distributed continuous integration service that integrates with GitHub. It’s free for open source projects. Let’s configure Travis CI to run the middleman-demo tests when we push to the GitHub repository. Log in to Travis CI First, sign in to Travis CI using your GitHub credentials. Visit your profile. Find your middleman-demo repository in the “Repositories” list. Activate Travis CI for middleman-demo; click the toggle button “ON.” Create a .travis.ymlfile Travis CI looks for a .travis.yml YAML file in the root of a repository. YAML is a simple, human-readable markup language; it’s a popular option in authoring configuration files. The .travis.yml file informs Travis how to execute the project’s build. Create a .travis.yml file in the root of middleman-demo: $ touch .travis.yml Configure Travis CI to use Ruby 2.1 when building middleman-demo. Add the following YAML to the .travis.yml file: language: rubyrvm: 2.1 Next, declare how Travis CI can install the necessary gem dependencies to build middleman-demo; add the following: install: bundle install Let’s also add before_script, which runs the middleman-demo tests to ensure all tests pass in advance of a build: before_script: bundle exec rspec Finally, add a script that instructs Travis CI how to build middleman-demo: script: bundle exec middleman build At this point, the .travis.yml file should look like the following: language: ruby rvm: 2.1 install: bundle install before_script: bundle exec rspec script: bundle exec middleman build Commit the .travis.yml file: $ git add .travis.yml && git commit -m "added basic .travis.yml file" Now, after pushing to GitHub, Travis CI will attempt to install middleman-demo dependencies using Ruby 2.1, run its tests, and build the site. Travis CI’s command build output can be seen here: https://travis-ci.org/<your_github_username>/middleman-demo Add a build status badge Assuming the build passes, you should see a green build passing badge near the top right corner of the Travis CI UI on your Travis CI middleman-demo page. Let’s add this badge to the README.md file in middleman-demo, such that a build status badge reflecting the status of the most recent Travis CI build displays on the GitHub repository’s README. If one does not already exist, create a README.md file: $ touch README.md Add the following markdown, which renders the Travis CI build status badge: [![Build Status](https://travis-ci.org/<your_github_username>/middleman-demo.svg?branch=master)](https://travis-ci.org/<your_github_username>/middleman-demo) Configure continuous deployments Through continuous deployments, code is shipped to users as soon as a quality-validated change is committed. Travis CI can be configured to deploy a middleman-demo build with each successful build. Let’s configure Travis CI to continuously deploy middleman-demo to the S3 bucket created in part 2 of this tutorial series. First, install the travis command-line tools: $ gem install travis Use the travis command-line tools to set S3 deployments. Enter the following; you’ll be prompted for your S3 details (see the example below if you’re unsure how to answer): $ travis setup s3 An example response is: Access key ID: <your_aws_access_key_id> Secret access key: <your_aws_secret_access_key_id> Bucket: <your_aws_bucket> Local project directory to upload (Optional): build S3 upload directory (Optional): S3 ACL Settings (private, public_read, public_read_write, authenticated_read, bucket_owner_read, bucket_owner_full_control): |private| public_read Encrypt secret access key? |yes| yes Push only from <your_github_username>/middleman-demo? |yes| yes This automatically edits the .travis.yml file to include the following deploy information: deploy: provider: s3 access_key_id: <your_aws_access_key_id> secret_access_key: secure: <your_encrypted_aws_secret_access_key_id> bucket: <your_s3_bucket> local-dir: build acl: !ruby/string:HighLine::String public_read on: repo: <your_github_username>/middleman-demo Add one additional option, informing Travis to preserve the build directory for use during the deploy process: skip_cleanup: true The final .travis.yml file should look like the following: language: ruby rvm: 2.1 install: bundle install before_script: bundle exec rspec script: bundle exec middleman build deploy: provider: s3 access_key_id: <your_aws_access_key> secret_access_key: secure: <your_encrypted_aws_secret_access_key> bucket: <your_aws_bucket> local-dir: build skip_cleanup: true acl: !ruby/string:HighLine::String public_read on: repo: <your_github_username>/middleman-demo Confirm that your continuous integration works Commit your changes: $ git add .travis.yml && git commit -m "added travis deploy configuration" Push to GitHub and watch the build output on Travis CI: https://travis-ci.org/<your_github_username>/middleman-demo If all works as expected, Travis CI will run the middleman-demo tests, build the site, and deploy to the proper S3 bucket. Recap Throughout this series, we’ve examined the benefits of static site generators and covered some basics regarding Middleman blogging. We’ve learned how to use the wp2middleman gem to migrate content from a WordPress blog, and we’ve learned how to deploy Middleman to Amazon’s cloud-based Simple Storage Service (S3). We’ve configured Travis CI to run automated tests, produce a build, and automate deployments. Beyond what’s been covered within this series, there’s an extensive Middleman ecosystem worth exploring, as well as numerous additional features. Middleman’s custom extensions seek to extend basic Middleman functionality through third-party gems. Read more about Middleman at Middlemanapp.com. About this author Mike Ball is a Philadelphia-based software developer specializing in Ruby on Rails and JavaScript. He works for Comcast Interactive Media where he helps build web-based TV and video consumption applications.

0
0
5275

Packt

09 Feb 2015

15 min read

Home Page Structure

Packt

09 Feb 2015

15 min read

0
0
1189

How-To Tutorials

Packt

09 Feb 2015

40 min read

Advanced Less Coding

Packt

09 Feb 2015

40 min read

0
0
3502

article-image-looking-back-looking-forward

Packt

09 Feb 2015

24 min read

Looking Back, Looking Forward

Packt

09 Feb 2015

24 min read

In this article by Simon Jackson, author of the book Unity 3D UI Essentials we will see how the new Unity UI has long been sought by developers; it has been announced and re-announced over several years, and now it is finally here. The new UI system is truly awesome and (more importantly for a lot of developers on a shoestring budget) it's free. (For more resources related to this topic, see here.) As we start to look forward to the new UI system, it is very important to understand the legacy GUI system (which still exists for backwards compatibility) and all it has to offer, so you can fully understand just how powerful and useful the new system is. It's crucial to have this understanding, especially since most tutorials will still speak of the legacy GUI system (so you know what on earth they are talking about). With an understanding of the legacy system, you will then peer over the diving board and walk through a 10,000-foot view of the new system, so you get a feel of what to expect from the rest of this book. The following is the list of topics that will be covered in this article: A look back into what legacy Unity GUI is Tips, tricks, and an understanding of legacy GUI and what it has done for us A high level overview of the new UI features State of play You may not expect it, but the legacy Unity GUI has evolved over time, "adding new features and improving performance. However, because it has "kept evolving based on the its original implementation, it has been hampered "with many constraints and the ever pressing need to remain backwards compatible (just look at Windows, which even today has to cater for" programs written in BASIC (http://en.wikipedia.org/wiki/BASIC)). Not to say the old system is bad, it's just not as evolved as some of the newer features being added to the Unity 4.x and Unity 5.x series, which are based on newer and more enhanced designs, and more importantly, a new core. The main drawback of the legacy GUI system is that it is only drawn in screen space (drawn on" the screen instead of within it) on top of any 3D elements or drawing in your scenes. This is fine if you want menus or overlays in your title but if you want to integrate it further within your 3D scene, then it is a lot more difficult. For more information about world space and screen space, see this Unity Answers article (http://answers.unity3d.com/questions/256817/about-world-space-and-local-space.html). So before we can understand how good the new system is, we first need to get to grips with where we are coming from. (If you are already familiar with the legacy GUI system, feel free to skip over this section.) A point of reference Throughout this book, we will refer to the legacy GUI simply as GUI. When we talk about the new system, it will be referred to as UI or Unity UI, just so you don't get mixed-up when reading. When looking around the Web (or even in the Unity support forums), you may hear about or see references to uGUI, which was the development codename for the new Unity UI system. GUI controls The legacy" GUI controls provide basic and stylized controls for use in your titles. All legacy GUI controls are drawn during the GUI rendering phase from the built-in "OnGUI method. In the sample that accompanies this title, there are examples of all the controls in the AssetsBasicGUI.cs script. For GUI controls to function, a camera in the scene must have the GUILayer component attached to it. It is there by default on any Camera in a scene, so for most of the time you won't notice it. However, if you have removed it, then you will have to add it back for GUI to work. The component is just the hook for the OnGUI delegate handler, to ensure it has called each frame. Like "the Update method in scripts, the OnGUI method can be called several times per frame if rendering is slowing things down. Keep this in mind when building your own legacy GUI scripts. The controls that are available in the legacy GUI are: Label Texture Button Text fields (single/multiline and password variant) Box Toolbars Sliders ScrollView Window So let's go through them in more detail: All the following code is implemented in the sample project in the basic GUI script located in the AssetsScripts folder of the downloadable code. To experiment yourself, create a new project, scene, and script, placing the code for each control in the script and attach the script to the camera (by dragging it from the project view on to the Main Camera GameObject in the scene hierarchy). You can then either run the project or adorn the class in the script with the [ExecuteInEditMode] attribute to see it in the game view. The Label control Most "GUI systems start with a Label control; this simply "provides a stylized control to display read-only text on the screen, it is initiated by including the following OnGUI method in your script: void OnGUI() {GUI.Label(new Rect(25, 15, 100, 30), "Label");} This results in the following on-screen display: The Label control "supports altering its font settings through the" use of the guiText GameObject property (guiText.font) or GUIStyles. To set guiText.font in your script, you would simply apply the following in your script, either in the Awake/Start functions or before drawing the next section of text you want drawn in another font: public Font myFont = new Font("arial");guiText.font = myFont; You can also set the myFont property in Inspector using an "imported font. The Label control forms the basis for all controls to display text, and as such, "all other controls inherit from it and have the same behaviors for styling the displayed text. The Label control also supports using a Texture for its contents, but not both text "and a texture at the same time. However, you can layer Labels and other controls on top of each other to achieve the same effect (controls are drawn implicitly in the order they are called), for example: public Texture2D myTexture;void Start() {myTexture = new Texture2D(125, 15);}void OnGUI() {//Draw a textureGUI.Label(new Rect(125, 15, 100, 30), myTexture);//Draw some text on top of the texture using a labelGUI.Label(new Rect(125, 15, 100, 30), "Text overlay");} You can override the order in which controls are drawn by setting GUI.depth = /*<depth number>*/; in between calls; however, I would advise against this unless you have a desperate need. The texture" will then be drawn to fit the dimensions of the Label field, By" default it scales on the shortest dimension appropriately. This too can be altered using GUIStyle to alter the fixed width and height or even its stretch characteristics. Texture drawing Not "specifically a control in itself, the GUI framework" also gives you the ability to simply draw a Texture to the screen Granted there is little difference to using DrawTexture function instead of a Label with a texture or any other control. (Just another facet of the evolution of the legacy GUI). This is, in effect, the same as the previous Label control but instead of text it only draws a texture, for example: public Texture2D myTexture;void Start() {myTexture = new Texture2D(125, 15);}void OnGUI() {GUI.DrawTexture(new Rect(325, 15, 100, 15), myTexture);} Note that in all the examples providing a Texture, I have provided a basic template to initialize an empty texture. In reality, you would be assigning a proper texture to be drawn. You can also provide scaling and alpha blending values when drawing the texture to make it better fit in the scene, including the ability to control the aspect ratio that the texture is drawn in. A warning though, when you scale the image, it affects the rendering properties for that image under the legacy GUI system. Scaling the image can also affect its drawn position. You may have to offset the position it is drawn at sometimes. For" example: public Texture2D myTexture;void Start() {myTexture = new Texture2D(125, 15);}void OnGUI() {GUI.DrawTexture(new Rect(325, 15, 100, 15), myTexture, " ScaleMode.ScaleToFit,true,0.5f);} This will do its best to draw the source texture with in the drawn area with alpha "blending (true) and an aspect ratio of 0.5. Feel free to play with these settings "to get your desired effect. This is useful in certain scenarios in your game when you want a simple way to "draw a full screen image on the screen on top of all the 3D/2D elements, such as pause or splash screen. However, like the Label control, it does not receive any "input events (see the Button control for that). There is also a variant of the DrawTexture function called DrawTextureWithTexCoords. This allows you to not only pick where you want the texture drawn on to the screen, but also from which part of the source texture you want to draw, for example: public Texture2D myTexture;void Start() {myTexture = new Texture2D(125, 15);}void OnGUI() {GUI.DrawTextureWithTexCoords (new Rect(325, 15, 100, 15), " myTexture ,new Rect(10, 10, 50, 5));} This will pick a region from the source texture (myTexture) 50 pixels wide by "5 pixels high starting from position 10, 10 on the texture. It will then draw this to "the Rect region specified. However, the DrawTextureWithTexCoords function cannot perform scaling, it "can only perform alpha blending! It will simply draw to fit the selected texture region to the size specified in the initial Rect. DrawTextureWithTexCoords has also been used to draw individual sprites using the legacy GUI, which has a notion of what a sprite is. The Button control Unity "also provides a Button control, which comes in two" variants. The basic "Button control which only supports a single click, whereas RepeatButton supports "holding down the button. They are both instantiated the same way by using an if statement to capture "when the button is clicked, as shown in the following script: void OnGUI() {if (GUI.Button(new Rect(25, 40, 120, 30), "Button")){ //The button has clicked, holding does nothing}if (GUI.RepeatButton(new Rect(170, 40, 170, 30), " "RepeatButton")){ //The button has been clicked or is held down}} Each will result in a simple button on screen as follows: The controls also support using a Texture for the button content as well by providing a texture value for the second parameter as follows: public Texture2D myTexture;void Start() {myTexture = new Texture2D(125, 15);}void OnGUI() {if (GUI.Button(new Rect(25, 40, 120, 30), myTexture)){ }} Like the Label, the "font of the text can be altered using GUIStyle or" by setting the guiText property of the GameObject. It also supports using textures in the same "way as the Label. The easiest way to look at this control is that it is a Label that "can be clicked. The Text control Just as "there is a need to display text, there is also a need for a "user to be able to enter text, and the legacy GUI provides the following controls to do just that: Control Description TextField This" is a basic text box, it supports a single line of text, no new lines (although, if the text contains end of line characters, it will draw the extra lines down). TextArea This "is an extension of TextField that supports entering of multiple lines of text; new lines will be added when the user "hits the enter key. PasswordField This is" a variant of TextField; however, it won't display "the value entered, it will replace each character with a replacement character. Note, the password itself is still stored in text form and if you are storing users' passwords, you should encrypt/decrypt the actual password when using it. Never store characters in plain text. Using the TextField control is simple, as it returns the final state of the value that has been entered and you have to pass that same variable as a parameter for the current text to be displayed. To use the controls, you apply them in script as follows: string textString1 = "Some text here";string textString2 = "Some more text here";string textString3 = "Even more text here";void OnGUI() {textString = GUI.TextField(new Rect(25, 100, 100, 30), " textString1);textString = GUI.TextArea(new Rect(150, 100, 200, 75), " textString2);textString = GUI.PasswordField(new Rect(375, 100, 90, 30), " textString3, '*');} A note about strings in Unity scripts Strings are immutable. Every time you change their value they create a new string in memory by having the textString variable declared at the class level it is a lot more memory efficient. If you declare the textString variable in the OnGUI method, "it will generate garbage (wasted memory) in each frame. "Worth keeping in mind. When" displayed, the textbox by default looks like this: As with" the Label and Button controls, the font of the text displayed can be altered using either a GUIStyle or guiText GameObject property. Note that text will overflow within the field if it is too large for the display area, but it will not be drawn outside the TextField dimensions. The same goes for multiple lines. The Box control In the "midst of all the controls is a generic purpose control that "seemingly "can be used for a variety of purposes. Generally, it's used for drawing a "background/shaded area behind all other controls. The Box control implements most of the features mentioned in the controls above controls (Label, Texture, and Text) in a single control with the same styling and layout options. It also supports text and images as the other controls do. You can draw it with its own content as follows: void OnGUI() {GUI.Box (new Rect (350, 350, 100, 130), "Settings");} This gives you the following result: Alternatively, you" can use it to decorate the background" of other controls, "for example: private string textString = "Some text here";void OnGUI() {GUI.Box (new Rect (350, 350, 100, 130), "Settings");GUI.Label(new Rect(360, 370, 80, 30), "Label");textString = GUI.TextField(new Rect(360, 400, 80, 30), " textString);if (GUI.Button (new Rect (360, 440, 80, 30), "Button")) {}} Note that using the Box control does not affect any controls you draw upon it. It is drawn as a completely separate control. This statement will be made clearer when you look at the Layout controls later in this article. The example will draw the box background and the Label, Text, and Button controls on top of the box area and looks like this: The box control can" be useful to highlight groups of controls or providing "a simple background (alternatively, you can use an image instead of just text and color). As with the other controls, the Box control supports styling using GUIStyle. The Toggle/checkbox control If checking "on / checking off is your thing, then the legacy" GUI also has a checkbox control for you, useful for those times when" you need to visualize the on/off state "of an option. Like the TextField control, you "pass the variable that manages Togglestate as "a parameter and it returns the new value (if it changes), so it is applied in code "as follows: bool blnToggleState = false;void OnGUI() {blnToggleState = GUI.Toggle(new Rect(25, 150, 250, 30),blnToggleState, "Toggle");} This results in the following on-screen result: As with the Label and Button controls, the font of the text displayed can be altered using either a GUIStyle or guiText GameObject property. Toolbar panels The basic "GUI controls also come with some very basic" automatic layout panels: "the first handles an arrangement of buttons, however these buttons are grouped "and only one can be selected at a time. As with other controls, the style of the button can be altered using a GUIStyles definition, including fixing the width of the buttons and spacing. There are two layout options available, these are: The Toolbar control The Selection grid control The Toolbar control The Toolbar control "arranges buttons in a horizontal "pattern (vertical is "not supported). Note that it is possible to fake a vertical toolbar by using a selection grid with just one item per row. See the Selection grid section later in this article for more details. As with other controls, the Toolbar returns the index of the currently selected button in the toolbar. The buttons are also the same as the base button control so it also offers options to support either text or images for the button content. The RepeatButton control however is not supported. To implement the toolbar, you specify an array of the content you wish to display in each button and the integer value that controls the selected button, as follows: private int toolbarInt;private string[] toolbarStrings ;Void Start() {toolbarInt = 0;toolbarStrings = { "Toolbar1", "Toolbar2", "Toolbar3" };}void OnGUI() {toolbarInt = GUI.Toolbar(new Rect(25, 200, 200, 30), " toolbarInt, toolbarStrings);} The main" difference between the preceding controls is that you" have to pass the currently selected index value in to the control and it then returns the new value. The Toolbar when drawn looks as follows: As stated, you can also pass an array of textures as well and they will be displayed instead of the text content. The SelectionGrid control The "SelectionGrid control is a customization "of the previous standard Toolbar control, it is able to arrange the buttons in a grid layout and resize the buttons appropriately to fill the target area. In code, SelectionGrid is used in a very similar format to the Toolbar code shown previously, for example: private int selectionGridInt ;private string[] selectionStrings;Void Start() {selectionGridInt = 0;selectionStrings = { "Grid 1", "Grid 2", "Grid 3", "Grid 4" };}void OnGUI() {selectionGridInt = GUI.SelectionGrid(new Rect(250, 200, 200, 60), selectionGridInt, selectionStrings, 2);} The main difference between SelectionGrid and Toolbar in code is that with SelectionGrid you can specify the number of items in a single row and the control will automatically lay out the buttons accordingly (unless overridden using GUIStyle). The preceding code will result in the following layout: On "their own, they provide a little more flexibility" than just using buttons alone. The Slider/Scrollbar controls When "you need to control a range in your games. GUI or add "a handle to control moving properties between two values, like" moving an object around in your scene, this is "where the Slider and Scrollbar controls come in. They provide two similar out-of-the-box implementations that give a scrollable region and a handle to control the value behind the control. Here, they are presented side by side: The slimmer Slider and chunkier Scrollbar controls can both work in either horizontal or vertical modes and have presets for the minimum and maximum values allowed. Slider control code In code, the "Slider control code is represented as follows: private float fltSliderValue = 0.5f;void OnGUI() {fltSliderValue = GUI.HorizontalSlider(new Rect(25, 250, 100,30), " fltSliderValue, 0.0f, 10.0f);fltSliderValue = GUI.VerticalSlider(new Rect(150, 250, 25, 50), " fltSliderValue, 10.0f, 0.0f);} Scrollbar control code In code, the "Scrollbar control code is represented as follows: private float fltScrollerValue = 0.5f;void OnGUI() {fltScrollerValue = GUI.HorizontalScrollbar(new Rect(25, 285," 100, 30), fltScrollerValue, 1.0f, 0.0f, 10.0f);fltScrollerValue = GUI.VerticalScrollbar(new Rect(200, 250, 25," 50), fltScrollerValue, 1.0f, 10.0f, 0.0f);} Like Toolbar and SelectionGrid, you are required to pass in the current value and it will return the updated value. However, unlike all the other controls, you actually have two style points, so you can style the bar and the handle separately, giving you a little more freedom and control over the look and feel of the sliders. Normally, you would only use the Slider control; The main reason the Scrollbar is available is that it forms the basis for the next control, the ScrollView control. The ScrollView control The last" of the displayable controls is the ScrollView control, "which gives you masking abilities over GUI elements with" optional horizontal and vertical Scrollbars. Simply put, it allows you to define an area larger for controls behind a smaller window on the screen, for example: Example list implementation using a ScrollView control Here we have a list that has many items that go beyond the drawable area of the ScrollView control. We then have the two scrollbars to move the content of the scroll viewer up/down and left/right to change the view. The background content is hidden behind a viewable mask that is the width and height of the ScrollView control's main window. Styling the "control is a little different as there is no base style" for it; it relies on the currently set default GUISkin. You can however set separate GUIStyles for each of the sliders but only over the whole slider, not its individual parts (bar and handle). If you don't specify styles for each slider, it will also revert to the base GUIStyle. Implementing a ScrollView is fairly easy, for example: Define the visible area along with a virtual background layer where the controls will be drawn using a BeginScrollView function. Draw your controls in the virtual area. Any GUI drawing between the ScrollView calls is drawn within the scroll area. Note that 0,0 in the ScrollView is from the top-left of the ScrollView area and not the top-left-hand corner of the screen. Complete it by closing the control with the EndScrollView function. For example, the previous example view was created with the following code: private Vector2 scrollPosition = Vector2.zero;private bool blnToggleState = false;void OnGUI(){scrollPosition = GUI.BeginScrollView(" new Rect(25, 325, 300, 200), " scrollPosition, " new Rect(0, 0, 400, 400)); for (int i = 0; i < 20; i++){ //Add new line items to the background addScrollViewListItem(i, "I'm listItem number " + i);}GUI.EndScrollView();} //Simple function to draw each list item, a label and checkboxvoid addScrollViewListItem(int i, string strItemName){GUI.Label(new Rect(25, 25 + (i * 25), 150, 25), strItemName);blnToggleState = GUI.Toggle(" new Rect(175, 25 + (i * 25), 100, 25), " blnToggleState, "");} In this "code, we define a simple function (addScrollViewListItem) to "draw a list item (consisting of a label and checkbox). We then begin the ScrollView control by passing the visible area (the first Rect parameter), the starting scroll position, and finally the virtual area behind the control (the second Rect parameter). Then we "use that to draw 20 list items inside the virtual area of the ScrollView control using our helper function before finishing off and closing the control with the EndScrollView command. If you want to, you can also nest ScrollView controls within each other. The ScrollView control also has some actions to control its use like the ScrollTo command. This command will move the visible area to the coordinates of the virtual layer, bringing it into focus. (The coordinates for this are based on the top-left position of the virtual layer; 0,0 being top-left.) To use the ScrollTo function on ScrollView, you must use it within the Begin and End ScrollView commands. For example, we can add a button in ScrollView to jump to the top-left of the virtual area, for example: public Vector2 scrollPosition = Vector2.zero;void OnGUI(){scrollPosition = GUI.BeginScrollView(" new Rect(10, 10, 100, 50)," scrollPosition, " new Rect(0, 0, 220, 10)); if (GUI.Button(new Rect(120, 0, 100, 20), "Go to Top Left")) GUI.ScrollTo(new Rect(0, 0, 100, 20));GUI.EndScrollView();} You can also additionally turn on/off the sliders on either side of the control by specifying "the BeginScrollView statement "using the alwayShowHorizontal and alwayShowVertical properties; these are highlighted here in an updated "GUI.BeginScrollView call: Vector2 scrollPosition = Vector2.zero;bool ShowVertical = false; // turn off vertical scrollbarbool ShowHorizontal = false; // turn off horizontal scrollbarvoid OnGUI() {scrollPosition = GUI.BeginScrollView(new Rect(25, 325, 300, 200),scrollPosition,new Rect(0, 0, 400, 400),ShowHorizontal,ShowVertical);GUI.EndScrollView ();}GUI.EndScrollView ();} Rich Text Formatting Now" having just plain text everywhere would not look" that great and would likely force developers to create images for all the text on their screens (granted a fair few still do so for effect). However, Unity does provide a way to enable richer text display using a style akin to HTML wherever you specify text on a control (only for label and display purposes; getting it to" work with input fields is not recommended). In this HTML style of writing text, we have the following tags we can use to liven up the text displayed. This gives" text a Bold format, for example: The <b>quick</b> brown <b>Fox</b> jumped over the <b>lazy Frog</b> This would result in: The quick brown Fox jumped over the lazy Frog Using this tag "will give text an Italic format, for example: The <b><i>quick</i></b> brown <b>Fox</b><i>jumped</i> over the <b>lazy Frog</b> This would result in: The quick brown Fox jumped over the lazy Frog As you can "probably guess, this tag will alter the Size of the text it surrounds. For reference, the default size for the font is set by the font itself. For example: The <b><i>quick</i></b> <size=50>brown <b>Fox</b></size> <i>jumped</i> over the <b>lazy Frog</b> This would result in: Lastly, you can specify different colors for text surrounded by the Color tag. "The color itself is" denoted using its 8-digit RGBA hex value, for example: The <b><i>quick</i></b> <size=50><color=#a52a2aff>brown</color> <b>Fox</b></size> <i>jumped</i> over the <b>lazy Frog</b> Note that the "color is defined using normal RGBA color space notation (http://en.wikipedia.org/wiki/RGBA_color_space) in hexadecimal form with two characters per color, for example, RRGGBBAA. Although the color property does also support the shorter RGB color space, which is the same notation "but without the A (Alpha) component, for example,. RRGGBB The preceding code would result in: (If you are reading this in print, the previous word brown is in the color brown.) You can also use a color name to reference it but the pallet is quite limited; for more "details, see the Rich Text manual reference page at http://docs.unity3d.com/Manual/StyledText.html. For text meshes, there are two additional tags: <material></material> <quad></quad> These only apply when associated to an existing mesh. The material is one of the materials assigned to the mesh, which is accessed using the mesh index number "(the array of materials applied to the mesh). When applied to a quad, you can also specify a size, position (x, y), width, and height to the text. The text mesh isn't well documented and is only here for reference; as we delve deeper into the new UI system, we will find much better ways of achieving this. Summary So now we have a deep appreciation of the past and a glimpse into the future. One thing you might realize is that there is no one stop shop when it comes to Unity; each feature has its pros and cons and each has its uses. It all comes down to what you want to achieve and the way you want to achieve it; never dismiss anything just because it's old. In this article, we covered GUI controls. It promises to be a fun ride, and once we are through that, we can move on to actually building some UI and then placing it in your game scenes in weird and wonderful ways. Now stop reading this and turn the page already!! Resources for Article: Further resources on this subject: What's Your Input? [article] Parallax scrolling [article] Unity Networking – The Pong Game [article]

0
0
23149

article-image-fronting-external-api-ruby-rails-part-1

Mike Ball

09 Feb 2015

6 min read

Fronting an external API with Ruby on Rails: Part 1

Mike Ball

09 Feb 2015

6 min read

Historically, a conventional Ruby on Rails application leverages server-side business logic, a relational database, and a RESTful architecture to serve dynamically-generated HTML. JavaScript-intensive applications and the widespread use of external web APIs, however, somewhat challenge this architecture. In many cases, Rails is tasked with performing as an orchestration layer, collecting data from various backend services and serving re-formatted JSON or XML to clients. In such instances, how is Rails' model-view-controller architecture still relevant? In this two part post series, we'll create a simple Rails backend that makes requests to an external XML-based web service and serves JSON. We'll use RSpec for tests and Jbuilder for view rendering. What are we building? We'll create Noterizer, a simple Rails application that requests XML from externally hosted endpoints and re-renders the XML data as JSON at a single URL. To assist in this post, I've created NotesXmlService, a basic web application that serves two XML-based endpoints: http://NotesXmlService.herokuapp.com/note-onehttp://NotesXmlService.herokuapp.com/note-two Why is this necessary in a real-world scenario? Fronting external endpoints with an application like Noterizer opens up a few opportunities: Noterizer's endpoint could serve JavaScript clients who can't perform HTTP requests across domain names to the original, external API. Noterizer's endpoint could reformat the externally hosted data to better serve its own clients' data formatting preferences. Noterizer's endpoint is a single interface to the data; multiple requests are abstracted away by its backend. Noterizer provides caching opportunities. While it's beyond the scope of this series, Rails can cache external request data, thus offloading traffic to the external API and avoiding any terms of service or rate limit violations imposed by the external service. Setup For this series, I’m using Mac OS 10.9.4, Ruby 2.1.2, and Rails 4.1.4. I’m assuming some basic familiarity with Git and the command line. Clone and set up the repo I've created a basic Rails 4 Noterizer app. Clone its repo, enter the project directory, and check out its tutorial branch: $ git clone http://github.com/mdb/noterizer && cd noterizer && git checkout tutorial Install its dependencies: $ bundle install Set up the test framework Let’s install RSpec for testing. Add the following to the project's Gemfile: gem 'rspec-rails', '3.0.1' Install rspec-rails: $ bundle install There’s now an rspec generator available for the rails command. Let's generate a basic RSpec installation: $ rails generate rspec:install This creates a few new files in a spec directory: ├── spec│ ├── rails_helper.rb│ └── spec_helper.rb We’re going to make a few adjustments to our RSpec installation. First, because Noterizer does not use a relational database, delete the following ActiveRecord reference in spec/rails_helper.rb: # Checks for pending migrations before tests are run. # If you are not using ActiveRecord, you can remove this line. ActiveRecord::Migration.maintain_test_schema! Next, configure RSpec to be less verbose in its warning output; such verbose warnings are beyond the scope of this series. Remove the following line from .rspec: --warnings The RSpec installation also provides a spec rake task. Test this by running the following: $ rake spec You should see the following output, as there aren’t yet any RSpec tests: No examples found. Finished in 0.00021 seconds (files took 0.0422 seconds to load) 0 examples, 0 failures Note that a default Rails installation assumes tests live in a tests directory. RSpec uses a spec directory. For clarity's sake, you’re free to delete the test directory from Noterizer. Building a basic route and controller Currently, Noterizer does not have any URLs; we’ll create a single/notes URL route. Creating the controller First, generate a controller: $ rails g controller notes Note that this created quite a few files, including JavaScript files, stylesheet files, and a helpers module. These are not relevant to our NotesController; so let's undo our controller generation by removing all untracked files from the project. Note that you'll want to commit any changes you do want to preserve. $ git clean -f Now, open config/application.rb and add the following generator configuration: config.generators do |g| g.helper false g.assets false end Re-running the generate command will now create only the desired files: $ rails g controller notes Testing the controller Let's add a basic NotesController#index test to spec/controllers/notes_spec.rb. The test looks like this: require 'rails_helper' describe NotesController, :type => :controller do describe '#index' do before :each do get :index end it 'successfully responds to requests' do expect(response).to be_success end end end This test currently fails when running rake spec, as we haven't yet created a corresponding route. Add the following route to config/routes.rb get 'notes' => 'notes#index' The test still fails when running rake spec, because there isn't a proper #index controller action. Create an empty index method in app/controllers/notes_controller.rb class NotesController < ApplicationController def index end end rake spec still yields failing tests, this time because we haven't yet created a corresponding view. Let's create a view: $ touch app/views/notes/index.json.jbuilder To use this view, we'll need to tweak the NotesController a bit. Let's ensure that requests to the /notes route always returns JSON via a before_filter run before each controller action: class NotesController < ApplicationController before_filter :force_json def index end private def force_json request.format = :json end end Now, rake spec yields passing tests: $ rake spec . Finished in 0.0107 seconds (files took 1.09 seconds to load) 1 example, 0 failures Let's write one more test, asserting that the response returns the correct content type. Add the following to spec/controllers/notes_controller_spec.rb it 'returns JSON' do expect(response.content_type).to eq 'application/json' end Assuming rake spec confirms that the second test passes, you can also run the Rails server via the rails server command and visit the currently-empty Noterizer http://localhost:3000/notes URL in your web browser. Conclusion In this first part of the series we have created the basic route and controller for Noterizer, which is a basic example of a Rails application that fronts an external API. In the next blog post (Part 2), you will learn how to build out the backend, test the model, build up and test the controller, and also test the app with JBuilder. About this Author Mike Ball is a Philadelphia-based software developer specializing in Ruby on Rails and JavaScript. He works for Comcast Interactive Media where he helps build web-based TV and video consumption applications.

0
0
6039

article-image-learning-nservicebus-preparing-failure

Packt

09 Feb 2015

19 min read

Learning NServiceBus - Preparing for Failure

Packt

09 Feb 2015

19 min read

In this article by David Boike, author of the book Learning NServiceBus Second Edition we will explore the tools that NServiceBus gives us to stare at failure in the face and laugh. We'll discuss error queues, automatic retries, and controlling how those retries occur. We'll also discuss how to deal with messages that may be transient and should not be retried in certain conditions. Lastly, we'll examine the difficulty of web service integrations that do not handle retries cleanly on their own. (For more resources related to this topic, see here.) Fault tolerance and transactional processing In order to understand the fault tolerance we gain from using NServiceBus, let's first consider what happens without it. Let's order something from a fictional website and watch what might happen to process that order. On our fictional website, we add Batman Begins to our shopping cart and then click on the Checkout button. While our cursor is spinning, the following process is happening: Our web request is transmitted to the web server. The web application knows it needs to make several database calls, so it creates a new transaction scope. Database Call 1 of 3: The shopping cart information is retrieved from the database. Database Call 2 of 3: An Order record is inserted. Database Call 3 of 3: We attempt to insert OrderLine records, but instead get Error Message: Transaction (Process ID 54) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction. This exception causes the transaction to roll back. This process is shown in the following diagram: Ugh! If you're using SQL Server and you've never seen this, you haven't been coding long enough. It never happens during development; there just isn't enough load. It's even possible that this won't occur during load testing. It will likely occur during heavy load at the worst possible time, for example, right after your big launch. So obviously, we should log the error, right? But then what happens to the order? Well that's gone, and your boss may not be happy about losing that revenue. And what about our user? They will likely get a nasty error message. We won't want to divulge the actual exception message, so they will get something like, "An unknown error has occurred. The system administrator has been notified. Please try again later." However, the likelihood that they want to trust their credit card information to a website that has already blown up in their face once is quite low. So how can we do better? Here's how this scenario could have happened with NServiceBus: The web request is transmitted to the web server. We add the shopping cart identifier to an NServiceBus command and send it through the Bus. We redirect the user to a new page that displays the receipt, even though the order has not yet been processed. Elsewhere, an Order service is ready to start processing a new message: The service creates a new transaction scope, and receives the message within the transaction. Database Call 1 of 3: The shopping cart information is retrieved from the database. Database Call 2 of 3: An Order record is inserted. Database Call 3 of 3: Deadlock! The exception causes the database transaction to roll back. The transaction controlling the message also rolls back. The order is back in the queue. This is great news! The message is back in the queue, and by default, NServiceBus will automatically retry this message a few times. Generally, deadlocks are a temporary condition, and simply trying again is all that is needed. After all, the SQL Server exception says Rerun the transaction. Meanwhile, the user has no idea that there was ever a problem. It will just take a little longer (in the order of milliseconds or seconds) to process the order. Error queues and replay Whenever you talk about automatic retries in a messaging environment, you must invariably consider poison messages. A poison message is a message that cannot be immediately resolved by a retry because it will consistently result in an error. A deadlock is a transient error. We can reasonably expect deadlocks and other transient errors to resolve by themselves without any intervention. Poison messages, on the other hand, cannot resolve themselves. Sometimes, this is because of an extended outage. At other times, it is purely our fault—an exception we didn't catch or an input condition we didn't foresee. Automatic retries If we retry poison messages in perpetuity, they will create a blockage in our incoming queue of messages. They will retry over and over, and valid messages will get stuck behind them, unable to make it through. For this reason, we must set a reasonable limit on retries, and after failing too many times, poison messages must be removed from the processing queue and stored someplace else. NServiceBus handles all of this for us. By default, NServiceBus will try to process a message five times, after which it will move the message to an error queue, configured by the MessageForwardingInCaseOfFaultConfig configuration section: <MessageForwardingInCaseOfFaultConfigErrorQueue="error" /> It is in this error queue that messages will wait for administrative intervention. In fact, you can even specify a different server to collect these messages, which allows you to configure one central point in a system where you watch for and deal with all failures: <MessageForwardingInCaseOfFaultConfigErrorQueue="error@SERVER" /> As mentioned previously, five failed attempts form the default metric for a failed message, but this is configurable via the TransportConfig configuration section: <section name="TransportConfig" type="NServiceBus.Config.TransportConfig, NServiceBus.Core" /> ... <TransportConfig MaxRetries="3" /> You could also generate the TransportConfig section using the Add-NServiceBusTransportConfig PowerShell cmdlet. Keep two things in mind: Depending upon how you read it, MaxRetries can be a somewhat confusing name. What it really means is the total number of tries, so a value of 5 will result in the initial attempt plus 4 retries. This has the odd side effect that MaxRetries="0" is the same as MaxRetries="1". In both instances, the message would be attempted once. During development, you may want to limit retries to MaxRetries="1" so that a single error doesn't cause a nausea-inducing wall of red that flushes your console window's buffer, leaving you unable to scroll up to see what came before. You can then enable retries in production by deploying the endpoint with a different configuration. Replaying errors What happens to those messages unlucky enough to fail so many times that they are unceremoniously dumped in an error queue? "I thought you said that Alfred would never give up on us!" you cry. As it turns out, this is just a temporary holding pattern that enables the rest of the system to continue functioning, while the errant messages await some sort of intervention, which can be human or automated based on your own business rules. Let's say our message handler divides two numbers from the incoming message, and we forget to account for the possibility that one of those numbers might be zero and that dividing by zero is frowned upon. At this point, we need to fix the error somehow. Exactly what we do will depend upon your business requirements: If the messages were sent in an error, we can fix the code that was sending them. In this case, the messages in the error queue are junk and can be discarded. We can check the inputs on the message handler, detect the divide-by-zero condition, and make compensating actions. This may mean returning from the message handler, effectively discarding any divide-by-zero messages that are processed, or it may mean doing new work or sending new messages. In this case, we may want to replay the error messages after we have deployed the new code. We may want to fix both the sending and receiving side. Second-level retries Automatically retrying error messages and sending repeated errors to an error queue is a pretty good strategy to manage both transient errors, such as deadlocks, and poison messages, such as an unrecoverable exception. However, as it turns out, there is a gray area in between, which is best referred to as semi-transient errors. These include incidents such as a web service being down for a few seconds, or a database being temporarily offline. Even with a SQL Server failover cluster, the failover procedure can take upwards of a minute depending on its size and traffic levels. During a time like this, the automatic retries will be executed immediately and great hordes of messages might go to the error queue, requiring an administrator to take notice and return them to their source queues. But is this really necessary? As it turns out, it is not. NServiceBus contains a feature called Second-Level Retries (SLR) that will add additional sets of retries after a wait. By default, the SLR will add three additional retry sessions, with an additional wait of 10 seconds each time. By contrast, the original set of retries is commonly referred to as First-Level Retries (FLR). Let's track a message's full path to complete failure, assuming default settings: Attempt to process the message five times, then wait for 10 seconds Attempt to process the message five times, then wait for 20 seconds Attempt to process the message five times, then wait for 30 seconds Attempt to process the message five times, and then send the message to the error queue Remember that by using five retries, NServiceBus attempts to process the message five times on every pass. Using second-level retries, almost every message should be able to be processed unless it is definitely a poison message that can never be successfully processed. Be warned, however, that using SLR has its downsides too. The first is ignorance of transient errors. If an error never makes it to an error queue and we never manually check out the error logs, there's a chance we might miss it completely. For this reason, it is smart to always keep an eye on error logs. A random deadlock now and then is not a big deal, but if they happen all the time, it is probably still worth some work to improve the code so that the deadlock is not as frequent. An additional risk lies in the time to process a true poison message through all the retry levels. Not accounting for any time taken to process the message itself 20 times or to wait for other messages in the queue, the use of second-level retries with the default settings results in an entire minute of waiting before you see the message in an error queue. If your business stakeholders require the message to either succeed or fail in 30 seconds, then you cannot possibly meet those requirements. Due to the asynchronous nature of messaging, we should be careful never to assume that messages in a distributed system will arrive in any particular order. However, it is still good to note that the concept of retries exacerbates this problem. If Message A and then Message B are sent in order, and Message B succeeds immediately but Message A has to wait in an error queue for awhile, then they will most certainly be processed out of order. Luckily, second-level retries are completely configurable. The configuration element is shown here with the default settings: <section name="SecondLevelRetriesConfig" type="NServiceBus.Config.SecondLevelRetriesConfig, NServiceBus.Core"/> ... <SecondLevelRetriesConfig Enabled="true" TimeIncrease="00:00:10" NumberOfRetries="3" /> You could also generate the SecondLevelRetriesConfig section using the Add-NServiceBus SecondLevelRetriesConfig PowerShell cmdlet. Keep in mind that you may want to disable second-level retries, like first-level retries, during development for convenience, and then enable them in production. Messages that expire Messages that lose their business value after a specific amount of time are an important consideration with respect to potential failures. Consider a weather reporting system that reports the current temperature every few minutes. How long is that data meaningful? Nobody seems to care what the temperature was 2 hours ago; they want to know what the temperature is now! NServiceBus provides a method to cause messages to automatically expire after a given amount of time. Unlike storing this information in a database, you don't have to run any batch jobs or take any other administrative action to ensure that old data is discarded. You simply mark the message with an expiration date and when that time arrives, the message simply evaporates into thin air: [TimeToBeReceived("01:00:00")] public class RecordCurrentTemperatureCmd : ICommand { public double Temperature { get; set; } } This example shows that the message must be received within one hour of being sent, or it is simply deleted by the queuing system. NServiceBus isn't actually involved in the deletion at all, it simply tells the queuing system how long to allow the message to live. If a message fails, however, and arrives at an error queue, NServiceBus will not include the expiration date in order to give you a chance to debug the problem. It would be very confusing to try to find an error message that had disappeared into thin air! Another valuable use for this attribute is for high-volume message types, where a communication failure between servers or extended downtime could cause a huge backlog of messages to pile up either at the sending or the receiving side. Running out of disk space to store messages is a show-stopper for most message-queuing systems, and the TimeToBeReceived attribute is the way to guard against it. However, this means we are throwing away data, so we need to be very careful when applying this strategy. It should not simply be used as a reaction to low disk space! Auditing messages At times, it can be difficult to debug a distributed system. Commands and events are sent all around, but after they are processed, they go away. We may be able to tell what will happen to a system in the future by examining queued messages, but how can we analyze what happened in the past? For this reason, NServiceBus contains an auditing function that will enable an endpoint to send a copy of every message it successfully processes to a secondary location, a queue that is generally hosted on a separate server. This is accomplished by adding an attribute or two to the UnicastBusConfig section of an endpoint's configuration: <UnicastBusConfig ForwardReceivedMessagesTo="audit@SecondaryServer" TimeToBeReceivedOnForwardedMessages="1.00:00:00"> <MessageEndpointMappings>  </MessageEndpointMappings> </UnicastBusConfig> In this example, the endpoint will forward a copy of all successfully processed messages to a queue named audit on a server named SecondaryServer, and those messages will expire after one day. While it is not required to use the TimeToBeReceivedOnForwardedMessages parameter, it is highly recommended. Otherwise, it is possible (even likely) that messages will build up in your audit queue until you run out of available storage, which you would really like to avoid. The exact time limit you use is dependent upon the volume of messages in your system and how much storage your queuing system has available. You don't even have to design your own tool to monitor these audit messages; the Particular Service Platform has that job covered for you. NServiceBus includes the auditing configuration in new endpoints by default so that ServiceControl, ServiceInsight, and ServicePulse can keep tabs on your system. Web service integration and idempotence When talking about managing failure, it's important to spend a few minutes discussing web services because they are such a special case; they are just too good at failing. When the message is processed, the email would either be sent or it won't; there really aren't any in-between cases. In reality, when sending an email, it is technically possible that we could call the SMTP server, successfully send an email, and then the server could fail before we are able to finish marking the message as processed. However, in practice, this chance is so infinitesimal that we generally assume it to be zero. Even if it is not zero, we can assume in most cases that sending a user a duplicate email one time in a few million won't be the end of the world. Web services are another story. There are just so many ways a web service can fail: A DNS or network failure may not let us contact the remote web server at all The server may receive our request, but then throw an error before any state is modified on the server The server may receive our request and successfully process it, but a communication problem prevents us from receiving the 200 OK response The connection times out, thus ignoring any response the server may have been about to send us For this reason, it makes our lives a lot easier if all the web services we ever have to deal with are idempotent, which means a process that can be invoked multiple times with no adverse effects. Any service that queries data without modifying it is inherently idempotent. We don't have to worry about how many times we call a service if doing so doesn't change any data. Where we start to get into trouble is when we begin mutating state. Sometimes, we can modify state safely. Consider an example used previously regarding registering for alert notifications. Let's assume that on the first try, the third-party service technically succeeds in registering our user for alerts, but it takes too long to do so and we receive a timeout error. When we retry, we ask to subscribe the email address to alerts again, and the web service call succeeds. What's the net effect? Either way, the user is subscribed for alerts. This web service satisfies idempotence. The classic example of a non-idempotent web service is a credit card transaction processor. If the first attempt to authorize a credit card succeeds on the server and we retry, we may double charge our customer! This is not an acceptable business case and you will quickly find many people angry with you. In these cases, we need to do a little work ourselves because unfortunately, it's impossible for NServiceBus to know whether your web service is idempotent or not. Generally, this work takes the form of recording each step we perform on durable storage in real time, and then query that storage to see which steps have been attempted. In our example of credit card processing, the happy path approach would look like this: Record our intent to make a web service call to durable storage. Make the actual web service call. Record the results of the web service call to durable storage. Send commands or publish events with the results of the web service call. Now, if the message is retried, we can inspect the durable storage and decide what step to jump to and whether any compensating actions need to be taken first. If we have recorded our intent to call the web service but do not see any evidence of a response, we can query the credit card processor based on an order or transaction identifier. Then we will know whether we need to retry the authorization or just get the results of the already completed authorization. If we see that we have already made the web service call and received the results, then we know that the web service call was successful but some exception happened before the resulting messages could be sent. In response, we can just take the results and send the messages without requiring any further web service invocations. It's important to be able to handle the case where our durable storage throws an exception, rendering us unable to make our state persist. This is why it's so important to record the intent to do something before attempting it—so that we know the difference between never having done something and attempting it but not necessarily knowing the results. The process we have just discussed is admittedly a bit abstract, and can be visualized much more easily with the help of the following diagram: The choice of using the durable storage strategy for this process is up to you. If you choose to use a database, however, you must remember to exempt it from the message handler's ambient transaction, or those changes will also get rolled back if and when the handler fails. In order to escape the transaction to write to durable storage, use a new TransactionScope object to suppress the transaction, like this: public void Handle(CallNonIdempotentWebServiceCmdcmd) { // Under control of ambient transaction using (var ts = new TransactionScope(TransactionScopeOption.Suppress)) { // Not under transaction control // Write updates to durable storage here ts.Complete(); } // Back under control of ambient transaction } Summary In this article, we considered the inevitable failure of our software and how NServiceBus can help us to be prepared for it. You learned how NServiceBus promises fault tolerance within every message handler so that messages are never dropped or forgotten, but instead retried and then held in an error queue if they cannot be successfully processed. Once we fix the error, or take some other administrative action, we can replay those messages. In order to avoid flooding our system with useless messages during a failure, you learned how to cause messages that lose their business value after a specific amount of time to expire. Finally, you learned how to build auditing in a system by forwarding a copy of all messages for later inspection, and how to properly deal with the challenges involved in calling external web services. In this article, we dealt exclusively with NServiceBus endpoints hosted by the NServiceBus Host process.

0
0
3939

Animations

Exploring the Nmap Scripting Engine API and Libraries

Programming littleBits circuits with JavaScript Part 1

Using NLP APIs

Fronting an external API with Ruby on Rails: Part 2

Financial Management with Microsoft Dynamics AX 2012 R3

Hive in Hadoop

Jenkins Continuous Integration

Controls and player movement

Part 3: Migrating a WordPress Blog to Middleman and Deploying to Amazon S3

Trending Topics

Home Page Structure

Advanced Less Coding

Looking Back, Looking Forward

Fronting an external API with Ruby on Rails: Part 1

Learning NServiceBus - Preparing for Failure

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access