Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials

7019 Articles
article-image-module-facts-types-and-reporting-tools-puppet
Packt
11 Jul 2014
11 min read
Save for later

Module, Facts, Types and Reporting tools in Puppet

Packt
11 Jul 2014
11 min read
(For more resources related to this topic, see here.) Module Files and Templates Transferring files with puppet is something best done within modules, when you define a file resource, you can use content => "something" or you can push a file from the puppet master using source. As an example, using our judy database, we could have judy::config with the following file definition: class judy::config { file {'/etc/judy/judy.conf': source => 'puppet:///modules/judy/judy.conf' } } Now puppet will search for this file in the directory judy/files. It is also possible to add full paths and have your module mimic the filesystem, the previous source line would be changed to source => 'puppet:///modules/judy/etc/judy/judy.conf' and the file would be found in judy/files/etc/judy/judy.conf. The puppet:/// url given previously has three backslashes, optionally the name of a puppet server may appear between the second and third backslash. If left blank, the puppet server performing catalog compilation is used to retrieve the file. You can alternatively specify the server using source => 'puppet://puppetfile.example.com/modules/judy/judy.conf'. Having files come from specific puppet servers can make maintenance difficult. If you change the name of your puppet server, you have to change all references to that name as well. Templates are searched in a similar fashion, in this example in judy/templates, when specifying the template, you use content => template('judy/template.erb') to have puppet look for the template in your modules templates directory. As an example another config file for judy could be: file {'/etc/judy/judyadm.conf': content => template('judy/judyadm.conf.erb') } Puppet will look for the file 'judy/judyadm.conf.erb' in modulepath/judy/templates/judyadm.conf.erb. We haven't covered ruby templates up to this point, templates are files that are parsed according to erb syntax rules. If you need to distribute a file where you need to change some settings based on variables, then a template is the thing to use. ERB syntax is covered in detail at http://docs.puppetlabs.com/guides/templating.html. In the next section, we will discuss module implementations in a large organization before writing custom modules. Creating a Custom Fact for use in Hiera The most useful custom facts are those that return a calculated value that you can use to organize your nodes. Such facts allow you to group your nodes into smaller groups or create groups with functionality or locality. These facts allow you to separate the data component of your modules from the logic or code components. Such a fact can be used in your hiera.yaml file to add a level to the hierarchy. One aspect of the system that can be used to determine information about the node is the ipaddress. Assuming you do not reuse ipaddresses within your organization, the ipaddress can be used to determine on which part of the network a node resides, the zone. In this example, we will define three zone's in which machines reside, production, development, and sandbox. The ipaddresses in each zone are on different subnets. We'll start by building up a script to calculate the zone and then turn it into a fact like our last example. Our script will need to calculate IP ranges using netmasks, so we'll import the ipaddr library and use IPAddr objects to calculate ranges. require('ipaddr') require('facter') require('puppet') Next we'll define a function that takes an ipaddress as the argument and returns the zone to which that ipaddress belongs: def zone(ip) zones = { 'production' => [IPAddr.new('192.168.122.0/24'),IPAddr.new('192.168.124.0/23')], 'development' => [IPAddr.new('192.168.123.0/24'),IPAddr.new('192.168.126.0/23')], 'sandbox' => [IPAddr.new('192.168.128.0/22')] } for zone in zones.keys do for subnet in zones[zone] do if subnet.include?(ip) return zone end end end return 'undef' end This function will loop through the zones hash looking for a match on IP address. If no match is found, the value of 'undef' is returned. We then obtain the ipaddress for the machine using the ipaddress fact from facter. ip = IPAddr.new(Facter.value('ipaddress')) Then we call the zone function with this ipaddress to obtain the zone. print zone(ip),"n" Now we can make this script executable and test. node1# facteripaddress 192.168.122.132 node1# ./example_zone.rb production Now all we have to do is replace print zone(ip),"n" with the following to define the fact. Facter.add('example_zone') do setcode do zone(ip) end end Now when we insert this code into our example_facts module and run puppet on our nodes, the custom fact is available. # facter -p example_zone production Now that we can define a zone based on a custom fact, we can go back to our hiera.yaml file and add %{::example_zone} to the hierarchy. The hiera.yaml hierarchy will now contain the following --- :hierarchy: - "zones/%{::example_zone}" - "hosts/%{::hostname}" - "roles/%{::role}" - "%{::kernel}/%{::osfamily}/%{::lsbmajdistrelease}" - "is_virtual/%{::is_virtual}" - common After restarting httpd to have the hiera.yaml file reread, we create a zones directory in hieradata and add production.yaml with the following contents. --- welcome: "example_zone - production" Now when we run puppet on our node1, we see the motd updated with the new welcome message node1# cat /etc/motd PRODUCTION example_zone - production Managed Node: node1 Managed by Puppet version 3.4.2 Creating a few key facts that can be used to build up your hierarchy can greatly reduce the complexity of your modules. There are several workflows available, in addition to the custom fact we just described above, you can use the /etc/facter/facts.d directory with static files or scripts, or you can have tasks run from other tools dump files into that directory to create custom facts. When writing ruby scripts you can use any other fact by calling Facter.value('factname'). If you write your script in ruby you can access any ruby library using require. Your custom fact could query the system using lspci or lsusb to determine what hardware is specifically installed on that node. As an example, you could use lspci to determine the make and model of graphics card on the machine and return that as a fact, such as videocard. In the next section we'll write our own custom modules that will take such a fact and install the appropriate driver for the videocard based on the custom fact. Parameterized Classes Parameterized classes are classes where you have defined several parameters that can be overridden when you instantiate the class for your node. The use case for parameterized classes is when you have something that won't be repeated within a single node. You cannot define the same parameterized class more than once per node. As a simple example, we'll create a class which installs a database program and starts that databases service. We'll call this class example::db, the definition will live in modules/example/manifests/db.pp class example::db ($db) { case $db { 'mysql': { $dbpackage = 'mysql-server' $dbservice = 'mysqld' } 'postgresql': { $dbpackage = 'postgresql-server' $dbservice = 'postgresql' } } package { "$dbpackage": } service { "$dbservice": ensure => true, enable => true, require => Package["$dbpackage"] } } This class takes a single parameter ($db) that specifies the type of the database, in this case either postgresql or mysql. To use this class, we have to instantiate it. class { 'example::db': db => 'mysql' } Now when we apply this to a node we see that mysql-server is installed and mysqld is started and enabled at boot. This works great for something like a database, since we don't think we will have more than one type of database server on a single node. If we try to instantiate the example::db class with postgresql on our node, we'll get an error as follows: Types and Providers Puppet separates the implementation of a type into the type definition and any one of many providers for that type. For instance, the package type in puppet has multiple providers depending on the platform in use (apt, yum, rpm and others). Early on in puppet development there were only a few core types defined. Since then, the core types have expanded to the point where anything that I feel should be a type is already defined by core puppet. The lvm module created a type for defining logical volumes, the concat module created types for defining file fragments. The firewall module created a type for defining firewall rules. Each of these types represents something on the system with the following properties: unique searchable creatable destroyable atomic When creating a new type, you have to make sure your new type has these properties. The resource defined by the type has to be unique, this is why the file type uses the path to a file as the naming variable (namevar), a system may have files with the same name (not unique) but it cannot have more than one file with an identical path. As an example, the ldap configuration file for openldap is /etc/openldap/ldap.conf, the ldap configuration file for the name services library is /etc/ldap.conf, if you used filename, then they would both be the same resource. Resources must be unique. By atomic I mean that it is indivisible, it cannot be made of smaller components. For instance, the firewall module creates a type for single iptables rules. Creating a type for the tables (INPUT, OUTPUT, FORWARD) within iptables wouldn't be atomic, each table is made up of multiple smaller parts, the rules. Your type has to be searchable so that puppet can determine the state of the thing you are modifying. A mechanism has to exist to know what the current state is of the thing in question. The last two properties are equally important, puppet must be able to remove the thing, destroy it and likewise, puppet must be able to create the thing anew. Given these criteria, there are several modules that define new types, some examples include types which manage: git repositories apache virtual hosts ldap entries network routes gem modules perlcpan modules databases drupalmultisites Foreman Foreman is more than just a puppet reporting tool, it bills itself as a complete lifecycle management platform. Foreman can act as the ENC (external node classifier) for your entire installation and configure DHCP, DNS and PXE booting. It's a one stop shop. We'll configure foreman to be our report backend in this example. mcollective Mcollective is an orchestration tool created by puppetlabs that is not specific to puppet, plugins exist to work with other configuration management systems. Mcollective uses a message queue (MQ) with active connections from all active nodes to enable parallel job execution on large numbers of nodes. To understand how mcollective works, we'll consider the following high level diagram and work through the various components. The configuration of mcollective is still somewhat involved and prone to errors. Still, once mcollective is working properly, the power it provides can become addicting, it will be worth the effort. The default MQ install for marionette is using activemq, the activemq provided by the puppetlabs repo is known to work. Mcollective uses message queue and can use your existing message queue infrastructure. If using activemq, a single server can handle 800 nodes. After that you'll need to spread out. We'll cover the standard mcollective install using puppet's certificate authority to provide ssl security to mcollective. The theory here being that we trust puppet to configure the machines already, we can trust it a little more to run arbitrary commands. We'll also require that users of mcollective have proper ssl authentication as well. Summary In this article we learned how to deal with puppet. We also created a custom fact for use in Hiera and covered different topics like Foreman, mcollective, and much more. Resources for Article: Further resources on this subject: Puppet: Integrating External Tools [article] Quick start – Using the core Puppet resource types [article] External Tools and the Puppet Ecosystem [article]
Read more
  • 0
  • 0
  • 5645

article-image-virtual-machine-virtual-world
Packt
11 Jul 2014
15 min read
Save for later

A Virtual Machine for a Virtual World

Packt
11 Jul 2014
15 min read
(For more resources related to this topic, see here.) Creating a VM from a template Let us start by creating our second virtual machine from the Ubuntu template. Right-click on the template and select Clone, as shown in the following screenshot: Use the settings shown in the following screenshot for the new virtual machine. You can also use any virtual machine name you like. A VM name can only be alphanumeric without any special characters. You can also use any other VM you have already created in your own virtual environment. Access the virtual machine through the Proxmox console after cloning and setting up network connectivity such as IP address, hostname, and so on. For our Ubuntu virtual machine, we are going to edit interfaces in /etc/network/, hostname in /etc/, and hosts in /etc/. Advanced configuration options for a VM We will now look at some of the advanced configuration options we can use to extend the capability of a KVM virtual machine. The hotplugging option for a VM Although it is not a very common occurrence, a virtual machine can run out of storage unexpectedly whether due to over provisioning or improper storage requirement planning. For a physical server with hot swap bays, we can simply add a new hard drive and then partition it, and you are up and running. Imagine another situation when you have to add some virtual network interface to the VM right away, but you cannot afford shutting down the VM to add the vNICs. The hotplug option also allows hotplugging virtual network interfaces without shutting down a VM. Proxmox virtual machines by default do not support hotplugging. There are some extra steps needed to be followed in order to enable hotplugging for devices such as virtual disks and virtual network interfaces. Without the hotplugging option, the virtual machine needs to be completely powered off and then powered on after adding a new virtual disk or virtual interface. Simply rebooting the virtual machine will not activate the newly added virtual device. In Proxmox 3.2 and later, the hotplug option is not shown on the Proxmox GUI. It has to be done through CLI by adding options to the <vmid>.conf file. Enabling the hotplug option for a virtual machine is a three-step process: Shut down VM and add the hotplug option into the <vmid>.conf file. Power up VM and then load modules that will initiate the actual hotplugging. Add a virtual disk or virtual interface to be hotplugged into the virtual machine. The hotplugging option for <vmid>.conf Shut down the cloned virtual machine we created earlier and then open the configuration file from the following location. Securely log in to the Proxmox node or use the console in the Proxmox GUI using the following command: # nano /etc/pve/nodes/<node_name>/qemu-server/102.conf With default options added during the virtual machine creation process, the following code is what the VM configuration file looks like: ballon: 512 bootdisk: virtio0 cores: 1 ide2: none, media=cdrom kvm: 0 memory: 1024 name: pmxUB01 net0: e1000=56:63:C0:AC:5F:9D,bridge=vmbr0 ostype: l26 sockets: 1 virtio0: vm-nfs-01:102/vm-102-disk-1.qcow2,format=qcow2,size=32G Now, at the bottom of the 102.conf configuration file located under /etc/pve/nodes/<node_name>/qemu-server/, we will add the following option to enable hotplugging in the virtual machine: hotplug: Save the configuration file and power up the virtual machine. Loading modules After the hotplug option is added and the virtual machine is powered up, it is now time to load two modules into the virtual machine, which will allow hotplugging a virtual disk anytime without rebooting the VM. Securely log in to VM or use the Proxmox GUI console to get into the command prompt of the VM. Then, run the following commands to load the acpiphp and pci_hotplug modules. Do not load these modules to the Proxmox node itself: # sudo modprobe acpiphp # sudo modprobe pci_hotplug   The acpiphp and pci_hotplug modules are two hot plug drivers for the Linux operating system. These drivers allow addition of a virtual disk image or virtual network interface card without shutting down the Linux-based virtual machine. The modules can also be loaded automatically during the virtual machine boot by inserting them in /etc/modules. Simply add acpiphp and pci_hotplug on two separate lines in /etc/modules. Adding virtual disk/vNIC After loading both the acpiphp and pci_hotplug modules, all that remains is adding a new virtual disk or virtual network interface in the virtual machine through a web GUI. On adding a new disk image, check that the virtual machine operating system recognizes the new disk through the following command: #sudo fdisk -l For a virtual network interface, simply add a new virtual interface from a web GUI and the operating system will automatically recognize a new vNIC. After adding the interface, check that the vNIC is recognized through the following command: #sudo ifconfig –a Please note that while the hotplugging option works great with Linux-based virtual machines, it is somewhat problematic on Windows XP/7-based VMs. Hotplug seems to work great with both 32- and 64-bit versions of the Windows Server 2003/2008/2012 VMs. The best practice for a Windows XP/7-based virtual machine is to just power cycle the virtual machine to activate newly added virtual disk images. Forcing the Windows VM to go through hotplugging will cause an unstable operating environment. This is a limitation of the KVM itself. Nested virtual environment In simple terms, a virtual environment inside another virtual environment is known as a nested virtual environment. If the hardware resource permits, a nested virtual environment can open up whole new possibilities for a company. The most common scenario of a nested virtual environment is to set up a fully isolated test environment to test software such as hypervisor, or operating system updates/patches before applying them in a live environment. A nested environment can also be used as a training platform to teach computer and network virtualization, where students can set up their own virtual environment from the ground without breaking the main system. This eliminates the high cost of hardware for each student or for the test environment. When an isolated test platform is needed, it is just a matter of cloning some real virtual machines and giving access to authorized users. A nested virtual environment has the potential to give the network administrator an edge in the real world by allowing cost cutting and just getting things done with limited resources. One very important thing to keep in mind is that a nested virtual environment will have a significantly lower performance than a real virtual environment. If the nested virtual environment also has virtualized storage, performance will degrade significantly. The loss of performance can be offset by creating a nested environment with an SSD storage backend. When a nested virtual environment is created, it usually also contains virtualized storage to provide virtual storage for nested virtual machines. This allows for a fully isolated nested environment with its own subnet and virtual firewall. There are many debates about the viability of a nested virtual environment. Both pros and cons can be argued equally. But it will come down to the administrator's grasp on his or her existing virtual environment and good understanding of the nature of requirement. This allowed us to build a fully functional Proxmox cluster from the ground up without using additional hardware. The following screenshot is a side-by-side representation of a nested virtual environment scenario: In the previous comparison, on the right-hand side we have our basic cluster we have been building so far. On the left-hand side we have the actual physical nodes and virtual machines used to create the nested virtual environment. Our nested cluster is completely isolated from the rest of the physical cluster with a separate subnet. Internet connectivity is provided to the nested environment by using a virtualized firewall 1001-scce-fw-01. Like the hotplugging option, nesting is also not enabled in the Proxmox cluster by default. Enabling nesting will allow nested virtual machines to have KVM hardware virtualization, which increases the performance of nested virtual machines. To enable KVM hardware virtualization, we have to edit the modules in /etc/ of the physical Proxmox node and <vmid>.conf of the virtual machine. We can see that the option is disabled for our cloned nested virtual machine in the following screenshot: Enabling KVM hardware virtualization KVM hardware virtualization can be added just by performing the following few additional steps: In each Proxmox node, add the following line in the /etc/modules file: kvm-amd nested=1 Migrate or shut down all virtual machines of Proxmox nodes and then reboot. After the Proxmox nodes reboot, add the following argument in the <vmid>.conf file of the virtual machines used to create a nested virtual environment: args: -enable-nesting Enable KVM hardware virtualization from the virtual machine option menu through GUI. Restart the nested virtual machine. Network virtualization Network virtualization is a software approach to set up and maintain network without physical hardware. Proxmox has great features to virtualize the network for both real and nested virtual environments. By using virtualized networking, management becomes simpler and centralized. Since there is no physical hardware to deal with, the network ability can be extended within a minute's notice. Especially in a nested virtual environment, the use of virtualized network is very prominent. In order to set up a successful nested virtual environment, a better grasp of the Proxmox network feature is required. With the introduction of Open vSwitch (www.openvswitch.org) in Proxmox 3.2 and later, network virtualization is now much more efficient. Backing up a virtual machine A good backup strategy is the last line of defense against disasters, such as hardware failure, environmental damages, accidental deletions, and misconfigurations. In a virtual environment, a backup strategy turns into a daunting task because of the number of machines needed to be backed up. In a busy production environment, virtual machines may be created and discarded whenever needed or not needed. Without a proper backup plan, the entire backup task can go out of control. Gone are those days when we only had few server hardware to deal with and backing them up was an easy task. Today's backup solutions have to deal with several dozens or possibly several hundred virtual machines. Depending on the requirement, an administrator may have to backup all the virtual machines regularly instead of just the files inside them. Backing up an entire virtual machine takes up a very large amount of space after a while depending on how many previous backups we have. A granular file backup helps to quickly restore just the file needed but sure is a bad choice if the virtual server is badly damaged to a point that it becomes inaccessible. Here, we will see different backup options available in Proxmox, their advantages, and disadvantages. Proxmox backup and snapshot options Proxmox has the following two backup options: Full backup: This backs up the entire virtual machine. Snapshot: This only creates a snapshot image of the virtual machine. Proxmox 3.2 and above can only do a full backup and cannot do any granular file backup from inside a virtual machine. Proxmox also does not use any backup agent. Backing up a VM with a full backup All full backups are in the .tar format containing both the configuration file and virtual disk image file. The TAR file is all you need to restore the virtual machine on any nodes and on any storage. Full backups can also be scheduled on a daily and weekly basis. Full virtual backup files are named based on the following format: vzdump-qemu-<vm_id>-YYYY_MM_DD-HH_MM_SS.vma.lzo The following screenshot shows what a typical list of virtual machine backups looks like: Proxmox 3.2 and above cannot do full backups on LVM and Ceph RBD storage. Full backups can only occur on local, Ceph FS, and NFS-based storages, which are defined as backup during storage creation. Please note that Ceph FS and RBD are not the same type of storage even though they both coexist on the same Ceph cluster. The following screenshot shows the storage feature through the Proxmox GUI with backup-enabled attached storages: The backup menu in Proxmox is a true example of simplicity. With only three choices to select, it is as easy as it can get. The following screenshot is an example of a Proxmox backup menu. Just select the backup storage, backup mode, and compression type and that's it: Creating a schedule for Backup Schedules can be created from the virtual machine backup option. We will see each option box in detail in the following sections. The options are shown in the following screenshot: Node By default, a backup job applies to all nodes. If you want to apply the backup job to a particular node, then select it here. With a node selected, backup job will be restricted to that node only. If a virtual machine on node 1 was selected for backup and later on the virtual machine was moved to node 2, it will not be backed up since only node 1 was selected for this backup task. Storage Select a backup storage destination where all full backups will be stored. Typically an NFS server is used for backup storage. They are easy to set up and do not require a lot of upfront investment due to their low performance requirements. Backup servers are much leaner than computing nodes since they do not have to run any virtual machines. Backups are supported on local, NFS, and Ceph FS storage systems. Ceph FS storages are mounted locally on Proxmox nodes and selected as a local directory. Both Ceph FS and RBD coexist on the same Ceph cluster. Day of Week Select which day or days the backup task applies to. Days' selection is clickable in a drop-down menu. If the backup task should run daily, then select all the days from the list. Start Time Unlike Day of Week, only one time slot can be selected. Multiple selections of time to backup different times of the day are not possible. If the backup must run multiple times a day, create a separate task for each time slot. Selection mode The All selection mode will select all the virtual machines within the whole Proxmox cluster. The Exclude selected VMs mode will back up all VMs except the ones selected. Include selected VMs will back up only the ones selected. Send email to Enter a valid e-mail address here so that the Proxmox backup task can send an e-mail upon backup task completion or if there was any issue during backup. The e-mail includes the entire log of the backup tasks. It is highly recommended to enter the e-mail address here so that an administrator or backup operator can receive backup task feedback e-mails. This will allow us to find out if there was an issue during backup or how much time it actually takes to see if any performance issue occurred during backup. The following screenshot is a sample of a typical e-mail received after a backup task: Compression By default, the LZO compression method is selected. LZO (http://en.wikipedia.org/wiki/Lempel–Ziv–Oberhumer) is based on a lossless data compression algorithm, designed with the decompression ratio in mind. LZO is capable to do fast compression and even faster decompressions. GZIP will create smaller backup files at the cost of high CPU usage to achieve a higher compression ratio. Since higher compression ratio is the main focal point, it is a slow backup process. Do not select the None compression option, since it will create large backups without compression. With the None method, a 200 GB RAW disk image with 50 GB used will have a 200 GB backup image. With compression turned on, the backup image size will be around 70-80 GB. Mode Typically, all running virtual machine backups occur with the Snapshot option. Do not confuse this Snapshot option with Live Snapshots of VM. The Snapshot mode allows live backup while the virtual machine is turned on, while Live Snapshots captures the state of the virtual machine for a certain point in time. With the Suspend or Stop mode, the backup task will try to suspend the running virtual machine or forcefully stop it prior to commencing full backup. After backup is done, Proxmox will resume or power up the VM. Since Suspend only freezes the VM during backup, it has less downtime than the Stop mode because VM does not need to go through the entire reboot cycle. Both the Suspend and Stop modes backup can be used for VM, which can have partial or full downtime without disrupting regular infrastructure operation, while the Snapshot mode is used for VMs that can have a significant impact due to their downtime.
Read more
  • 0
  • 0
  • 10779

article-image-introduction-mobile-forensics
Packt
10 Jul 2014
18 min read
Save for later

Introduction to Mobile Forensics

Packt
10 Jul 2014
18 min read
(For more resources related to this topic, see here.) In 2013, there were almost as many mobile cellular subscriptions as there were people on earth, says International Telecommunication Union (ITU). The following figure shows the global mobile cellular subscriptions from 2005 to 2013. Mobile cellular subscriptions are moving at lightning speed and passed a whopping 7 billion early in 2014. Portio Research Ltd. predicts that mobile subscribers will reach 7.5 billion by the end of 2014 and 8.5 billion by the end of 2016. Mobile cellular subscription growth from 2005 to 2013 Smartphones of today, such as the Apple iPhone, Samsung Galaxy series, and BlackBerry phones, are compact forms of computers with high performance, huge storage, and enhanced functionalities. Mobile phones are the most personal electronic device a user accesses. They are used to perform simple communication tasks, such as calling and texting, while still providing support for Internet browsing, e-mail, taking photos and videos, creating and storing documents, identifying locations with GPS services, and managing business tasks. As new features and applications are incorporated into mobile phones, the amount of information stored on the devices is continuously growing. Mobiles phones become portable data carriers, and they keep track of all your moves. With the increasing prevalence of mobile phones in peoples' daily lives and in crime, data acquired from phones become an invaluable source of evidence for investigations relating to criminal, civil, and even high-profile cases. It is rare to conduct a digital forensic investigation that does not include a phone. Mobile device call logs and GPS data were used to help solve the attempted bombing in Times Square, New York, in 2010. The details of the case can be found at http://www.forensicon.com/forensics-blotter/cell-phone-email-forensics-investigation-cracks-nyc-times-square-car-bombing-case/. The science behind recovering digital evidence from mobile phones is called mobile forensics. Digital evidence is defined as information and data that is stored on, received, or transmitted by an electronic device that is used for investigations. Digital evidence encompasses any and all digital data that can be used as evidence in a case. Mobile forensics Digital forensics is a branch of forensic science, focusing on the recovery and investigation of raw data residing in electronic or digital devices. Mobile forensics is a branch of digital forensics related to the recovery of digital evidence from mobile devices. Forensically sound is a term used extensively in the digital forensics community to qualify and justify the use of particular forensic technology or methodology. The main principle for a sound forensic examination of digital evidence is that the original evidence must not be modified. This is extremely difficult with mobile devices. Some forensic tools require a communication vector with the mobile device, thus standard write protection will not work during forensic acquisition. Other forensic acquisition methods may involve removing a chip or installing a bootloader on the mobile device prior to extracting data for forensic examination. In cases where the examination or data acquisition is not possible without changing the configuration of the device, the procedure and the changes must be tested, validated, and documented. Following proper methodology and guidelines is crucial in examining mobile devices as it yields the most valuable data. As with any evidence gathering, not following the proper procedure during the examination can result in loss or damage of evidence or render it inadmissible in court. The mobile forensics process is broken into three main categories: seizure, acquisition, and examination/analysis. Forensic examiners face some challenges while seizing the mobile device as a source of evidence. At the crime scene, if the mobile device is found switched off, the examiner should place the device in a faraday bag to prevent changes should the device automatically power on. Faraday bags are specifically designed to isolate the phone from the network. If the phone is found switched on, switching it off has a lot of concerns attached to it. If the phone is locked by a PIN or password or encrypted, the examiner will be required to bypass the lock or determine the PIN to access the device. Mobile phones are networked devices and can send and receive data through different sources, such as telecommunication systems, Wi-Fi access points, and Bluetooth. So if the phone is in a running state, a criminal can securely erase the data stored on the phone by executing a remote wipe command. When a phone is switched on, it should be placed in a faraday bag. If possible, prior to placing the mobile device in the faraday bag, disconnect it from the network to protect the evidence by enabling the flight mode and disabling all network connections (Wi-Fi, GPS, Hotspots, and so on). This will also preserve the battery, which will drain while in a faraday bag and protect against leaks in the faraday bag. Once the mobile device is seized properly, the examiner may need several forensic tools to acquire and analyze the data stored on the phone. Mobile device forensic acquisition can be performed using multiple methods, which are defined later. Each of these methods affects the amount of analysis required. Should one method fail, another must be attempted. Multiple attempts and tools may be necessary in order to acquire the most data from the mobile device. Mobile phones are dynamic systems that present a lot of challenges to the examiner in extracting and analyzing digital evidence. The rapid increase in the number of different kinds of mobile phones from different manufacturers makes it difficult to develop a single process or tool to examine all types of devices. Mobile phones are continuously evolving as existing technologies progress and new technologies are introduced. Furthermore, each mobile is designed with a variety of embedded operating systems. Hence, special knowledge and skills are required from forensic experts to acquire and analyze the devices. Mobile forensic challenges One of the biggest forensic challenges when it comes to the mobile platform is the fact that data can be accessed, stored, and synchronized across multiple devices. As the data is volatile and can be quickly transformed or deleted remotely, more effort is required for the preservation of this data. Mobile forensics is different from computer forensics and presents unique challenges to forensic examiners. Law enforcement and forensic examiners often struggle to obtain digital evidence from mobile devices. The following are some of the reasons: Hardware differences: The market is flooded with different models of mobile phones from different manufacturers. Forensic examiners may come across different types of mobile models, which differ in size, hardware, features, and operating system. Also, with a short product development cycle, new models emerge very frequently. As the mobile landscape is changing each passing day, it is critical for the examiner to adapt to all the challenges and remain updated on mobile device forensic techniques. Mobile operating systems: Unlike personal computers where Windows has dominated the market for years, mobile devices widely use more operating systems, including Apple's iOS, Google's Android, RIM's BlackBerry OS, Microsoft's Windows Mobile, HP's webOS, Nokia's Symbian OS, and many others. Mobile platform security features: Modern mobile platforms contain built-in security features to protect user data and privacy. These features act as a hurdle during the forensic acquisition and examination. For example, modern mobile devices come with default encryption mechanisms from the hardware layer to the software layer. The examiner might need to break through these encryption mechanisms to extract data from the devices. Lack of resources: As mentioned earlier, with the growing number of mobile phones, the tools required by a forensic examiner would also increase. Forensic acquisition accessories, such as USB cables, batteries, and chargers for different mobile phones, have to be maintained in order to acquire those devices. Generic state of the device: Even if a device appears to be in an off state, background processes may still run. For example, in most mobiles, the alarm clock still works even when the phone is switched off. A sudden transition from one state to another may result in the loss or modification of data. Anti-forensic techniques: Anti-forensic techniques, such as data hiding, data obfuscation, data forgery, and secure wiping, make investigations on digital media more difficult. Dynamic nature of evidence: Digital evidence may be easily altered either intentionally or unintentionally. For example, browsing an application on the phone might alter the data stored by that application on the device. Accidental reset: Mobile phones provide features to reset everything. Resetting the device accidentally while examining may result in the loss of data. Device alteration: The possible ways to alter devices may range from moving application data, renaming files, and modifying the manufacturer's operating system. In this case, the expertise of the suspect should be taken into account. Passcode recovery: If the device is protected with a passcode, the forensic examiner needs to gain access to the device without damaging the data on the device. Communication shielding: Mobile devices communicate over cellular networks, Wi-Fi networks, Bluetooth, and Infrared. As device communication might alter the device data, the possibility of further communication should be eliminated after seizing the device. Lack of availability of tools: There is a wide range of mobile devices. A single tool may not support all the devices or perform all the necessary functions, so a combination of tools needs to be used. Choosing the right tool for a particular phone might be difficult. Malicious programs: The device might contain malicious software or malware, such as a virus or a Trojan. Such malicious programs may attempt to spread over other devices over either a wired interface or a wireless one. Legal issues: Mobile devices might be involved in crimes, which can cross geographical boundaries. In order to tackle these multijurisdictional issues, the forensic examiner should be aware of the nature of the crime and the regional laws. Mobile phone evidence extraction process Evidence extraction and forensic examination of each mobile device may differ. However, following a consistent examination process will assist the forensic examiner to ensure that the evidence extracted from each phone is well documented and that the results are repeatable and defendable. There is no well-established standard process for mobile forensics. However, the following figure provides an overview of process considerations for extraction of evidence from mobile devices. All methods used when extracting data from mobile devices should be tested, validated, and well documented. A great resource for handling and processing mobile devices can be found at http://digital-forensics.sans.org/media/mobile-device-forensic-process-v3.pdf. Mobile phone evidence extraction process The evidence intake phase The evidence intake phase is the starting phase and entails request forms and paperwork to document ownership information and the type of incident the mobile device was involved in, and outlines the type of data or information the requester is seeking. Developing specific objectives for each examination is the critical part of this phase. It serves to clarify the examiner's goals. The identification phase The forensic examiner should identify the following details for every examination of a mobile device: The legal authority The goals of the examination The make, model, and identifying information for the device Removable and external data storage Other sources of potential evidence We will discuss each of them in the following sections. The legal authority It is important for the forensic examiner to determine and document what legal authority exists for the acquisition and examination of the device as well as any limitations placed on the media prior to the examination of the device. The goals of the examination The examiner will identify how in-depth the examination needs to be based upon the data requested. The goal of the examination makes a significant difference in selecting the tools and techniques to examine the phone and increases the efficiency of the examination process. The make, model, and identifying information for the device As part of the examination, identifying the make and model of the phone assists in determining what tools would work with the phone. Removable and external data storage Many mobile phones provide an option to extend the memory with removable storage devices, such as the Trans Flash Micro SD memory expansion card. In cases when such a card is found in a mobile phone that is submitted for examination, the card should be removed and processed using traditional digital forensic techniques. It is wise to also acquire the card while in the mobile device to ensure data stored on both the handset memory and card are linked for easier analysis. Other sources of potential evidence Mobile phones act as good sources of fingerprint and other biological evidence. Such evidence should be collected prior to the examination of the mobile phone to avoid contamination issues unless the collection method will damage the device. Examiners should wear gloves when handling the evidence. The preparation phase Once the mobile phone model is identified, the preparation phase involves research regarding the particular mobile phone to be examined and the appropriate methods and tools to be used for acquisition and examination. The isolation phase Mobile phones are by design intended to communicate via cellular phone networks, Bluetooth, Infrared, and wireless (Wi-Fi) network capabilities. When the phone is connected to a network, new data is added to the phone through incoming calls, messages, and application data, which modifies the evidence on the phone. Complete destruction of data is also possible through remote access or remote wiping commands. For this reason, isolation of the device from communication sources is important prior to the acquisition and examination of the device. Isolation of the phone can be accomplished through the use of faraday bags, which block the radio signals to or from the phone. Past research has found inconsistencies in total communication protection with faraday bags. Therefore, network isolation is advisable. This can be done by placing the phone in radio frequency shielding cloth and then placing the phone into airplane or flight mode. The processing phase Once the phone has been isolated from the communication networks, the actual processing of the mobile phone begins. The phone should be acquired using a tested method that is repeatable and is as forensically sound as possible. Physical acquisition is the preferred method as it extracts the raw memory data and the device is commonly powered off during the acquisition process. On most devices, the least amount of changes occur to the device during physical acquisition. If physical acquisition is not possible or fails, an attempt should be made to acquire the file system of the mobile device. A logical acquisition should always be obtained as it may contain only the parsed data and provide pointers to examine the raw memory image. The verification phase After processing the phone, the examiner needs to verify the accuracy of the data extracted from the phone to ensure that data is not modified. The verification of the extracted data can be accomplished in several ways. Comparing extracted data to the handset data Check if the data extracted from the device matches the data displayed by the device. The data extracted can be compared to the device itself or a logical report, whichever is preferred. Remember, handling the original device may make changes to the only evidence—the device itself. Using multiple tools and comparing the results To ensure accuracy, use multiple tools to extract the data and compare results. Using hash values All image files should be hashed after acquisition to ensure data remains unchanged. If file system extraction is supported, the examiner extracts the file system and then computes hashes for the extracted files. Later, any individually extracted file hash is calculated and checked against the original value to verify the integrity of it. Any discrepancy in a hash value must be explainable (for example, if the device was powered on and then acquired again, thus the hash values are different). The document and reporting phase The forensic examiner is required to document throughout the examination process in the form of contemporaneous notes relating to what was done during the acquisition and examination. Once the examiner completes the investigation, the results must go through some form of peer-review to ensure the data is checked and the investigation is complete. The examiner's notes and documentation may include information such as the following: Examination start date and time The physical condition of the phone Photos of the phone and individual components Phone status when received—turned on or off Phone make and model Tools used for the acquisition Tools used for the examination Data found during the examination Notes from peer-review The presentation phase Throughout the investigation, it is important to make sure that the information extracted and documented from a mobile device can be clearly presented to any other examiner or to a court. Creating a forensic report of data extracted from the mobile device during acquisition and analysis is important. This may include data in both paper and electronic formats. Your findings must be documented and presented in a manner that the evidence speaks for itself when in court. The findings should be clear, concise, and repeatable. Timeline and link analysis, features offered by many commercial mobile forensics tools, will aid in reporting and explaining findings across multiple mobile devices. These tools allow the examiner to tie together the methods behind the communication of multiple devices. The archiving phase Preserving the data extracted from the mobile phone is an important part of the overall process. It is also important that the data is retained in a useable format for the ongoing court process, for future reference, should the current evidence file become corrupt, and for record keeping requirements. Court cases may continue for many years before the final judgment is arrived at, and most jurisdictions require that data be retained for long periods of time for the purposes of appeals. As the field and methods advance, new methods for pulling data out of a raw, physical image may surface, and then the examiner can revisit the data by pulling a copy from the archives. Practical mobile forensic approaches Similar to any forensic investigation, there are several approaches that can be used for the acquisition and examination/analysis of data from mobile phones. The type of mobile device, the operating system, and the security setting generally dictate the procedure to be followed in a forensic process. Every investigation is distinct with its own circumstances, so it is not possible to design a single definitive procedural approach for all the cases. The following details outline the general approaches followed in extracting data from mobile devices. Mobile operating systems overview One of the major factors in the data acquisition and examination/analysis of a mobile phone is the operating system. Starting from low-end mobile phones to smartphones, mobile operating systems have come a long way with a lot of features. Mobile operating systems directly affect how the examiner can access the mobile device. For example, Android OS gives terminal-level access whereas iOS does not give such an option. A comprehensive understanding of the mobile platform helps the forensic examiner make sound forensic decisions and conduct a conclusive investigation. While there is a large range of smart mobile devices, four main operating systems dominate the market, namely, Google Android, Apple iOS, RIM BlackBerry OS, and Windows Phone. More information can be found at http://www.idc.com/getdoc.jsp?containerId=prUS23946013. Android Android is a Linux-based operating system, and it's a Google open source platform for mobile phones. Android is the world's most widely used smartphone operating system. Sources show that Apple's iOS is a close second (http://www.forbes.com/sites/tonybradley/2013/11/15/android-dominates-market-share-but-apple-makes-all-the-money/). Android has been developed by Google as an open and free option for hardware manufacturers and phone carriers. This makes Android the software of choice for companies who require a low-cost, customizable, lightweight operating system for their smart devices without developing a new OS from scratch. Android's open nature has further encouraged the developers to build a large number of applications and upload them onto Android Market. Later, end users can download the application from Android Market, which makes Android a powerful operating system. iOS iOS, formerly known as the iPhone operating system, is a mobile operating system developed and distributed solely by Apple Inc. iOS is evolving into a universal operating system for all Apple mobile devices, such as iPad, iPod touch, and iPhone. iOS is derived from OS X, with which it shares the Darwin foundation, and is therefore a Unix-like operating system. iOS manages the device hardware and provides the technologies required to implement native applications. iOS also ships with various system applications, such as Mail and Safari, which provide standard system services to the user. iOS native applications are distributed through AppStore, which is closely monitored by Apple. Windows phone Windows phone is a proprietary mobile operating system developed by Microsoft for smartphones and pocket PCs. It is the successor to Windows mobile and primarily aimed at the consumer market rather than the enterprise market. The Windows Phone OS is similar to the Windows desktop OS, but it is optimized for devices with a small amount of storage. BlackBerry OS BlackBerry OS is a proprietary mobile operating system developed by BlackBerry Ltd., known as Research in Motion (RIM), exclusively for its BlackBerry line of smartphones and mobile devices. BlackBerry mobiles are widely used in corporate companies and offer native support for corporate mail via MIDP, which enables wireless sync with Microsoft Exchange, e-mail, contacts, calendar, and so on, while used along with the BlackBerry Enterprise server. These devices are known for their security.
Read more
  • 0
  • 0
  • 20309

article-image-introduction-microsoft-remote-desktop-services-and-vdi
Packt
09 Jul 2014
10 min read
Save for later

An Introduction to Microsoft Remote Desktop Services and VDI

Packt
09 Jul 2014
10 min read
(For more resources related to this topic, see here.) Remote Desktop Services and VDI What Terminal Services did was to provide simultaneous access to the desktop running on a server/group of servers. The name was changed to Remote Desktop Services (RDS) with the release of Windows Server 2008 R2 but actually this encompasses both VDI and what was Terminal Services and now referred to as Session Virtualization. This is still available alongside VDI in Windows Server 2012 R2; each user who connects to the server is granted a Remote Desktop session (RD Session) on an RD Session Host and shares this same server-based desktop. Microsoft refers to any kind of remote desktop as a Virtual Desktop and these are grouped into Collections made available to specific group users and managed as one, and a Session Collection is a group of virtual desktop based on session virtualization. It's important to note here that what the users see with Session Virtualization is the desktop interface delivered with Windows Server which is similar to but not the same as the Windows client interface, for example, it does have a modern UI by default. We can add/remove the user interface features in Windows Server to change the way it looked, one of these the Desktop Experience option is there specifically to make Windows Server look more like Windows client and in Windows Server 2012 R2 if you add this feature option in you'll get the Windows store just as you do in Windows 8.1. VDI also provides remote desktops to our users over RDP but does this in a completely different way. In VDI each user accesses a VM running Windows Client and so the user experience looks exactly the same as it would on a laptop or physical desktop. In Windows VDI, these Collections of VMs run on Hyper-V and our users connect to them with RDP just as they can connect to RD Sessions described above. Other parts of RDS are common to both as we'll see shortly but what is important for now is that RDS manages the VDI VMs for us and organises which users are connected to which desktops and so on. So we don't directly create VMs in a Collection or directly set up security on them. Instead a collection is created from a template which is a special VM that is never turned on as the Guest OS is sysprepped and turning it on would instantly negate that. The VMs in a Collection inherit this master copy of the Guest OS and any installed applications as well. The settings for the template VM are also inherited by the virtual desktops in a collection; CPU memory, graphics, networking, and so on, and as we'll see there are a lot of VM settings that specifically apply to VDI rather than for VMs running a server OS as part of our infrastructure. To reiterate VDI in Windows Server 2012 R2, VDI is one option in an RDS deployment and it's even possible to use VDI alongside RD Sessions for our users. For example, we might decide to give RD Sessions for our call center staff and use VDI for our remote workforce. Traditionally, the RD Session Hosts have been set up on physical servers as older versions of Hyper-V and VMware weren't capable of supporting heavy workloads like this. However, we can put up to 4 TB RAM and 64 logical processors into one VM (physical hardware permitting) and run large deployment RD Sessions virtually. Our users connect to our Virtual Desktop Collections of whatever kind with a Remote Desktop Client which connects to the RDS servers using the Remote Desktop Protocol (RDP). When we connect to any server with the Microsoft Terminal Services Client (MSTSC.exe), we are using RDP for this but without setting up RDS there are only two administrative sessions available per server. Many of the advantages and disadvantages of running any kind of remote desktop apply to both solutions. Advantages of Remote Desktops Given that the desktop computing for our users is now going to be done in the data center we only need to deploy powerful desktop and laptops to those users who are going to have difficulty connecting to our RDS infrastructure. Everyone else could either be equipped with thin client devices, or given access in from other devices they already have if working remotely, such as tablets or their home PCs and laptops. Thin client computing has evolved in line with advances in remote desktop computing and the latest devices from 10Zig, Dell, HP among others support multiple hi-res monitors, smart cards, and web cams for unified communications which are also enabled in the latest version of RDP (8.1). Using remote desktops can also reduce overall power consumption for IT as an efficiently cooled data center will consume less power than the sum of thin clients and servers in an RDS deployment; in one case, I saw this saving resulted in a green charity saving 90 percent of its IT power bill. Broadly speaking, managing remote desktops ought to be easier than managing their physical equivalents, for a start they'll be running on standard hardware so installing and updating drivers won't be so much of an issue. RDS has specific tooling for this to create a largely automatic process for keeping our collections of remote desktops in a desired state. VDI doesn't exist in a vacuum and there are other Microsoft technologies to make any desktop management easier with other types of virtualization: User profiles: They have long been a problem for desktop specialists. There are dedicated technologies built into session virtualization and VDI to allow user's settings and data to be stored away from the VHD with the Guest OS on. Techniques such as folder redirection can also help here and new for Windows Server 2012 is User Environment Virtualization (UE-V) which provides a much richer set of tools that can work across VDI, RD Sessions and physical desktops to ensure the user has the same experience no matter what they are using for a Windows Client desktop. Application Virtualization (App-V): This allows us to deploy application to the user who need them when and where they need them, so we don't need to create desktops for different type of users who need special applications; we just need a few of these and only deploy generic applications on these and those that can't make use of App-V. Even if App-V is not deployed VDI administrator have total control over remote desktops, as any we have any number of security techniques at our disposal and if the worst comes to the worst we can remove any installed applications every time a user logs off! The simple fact that the desktop and applications we are providing to our users are now running on servers under our direct control also increase our IT security. Patching is now easy to enable particularly for our remote workforce as when they are on site or working remotely their desktop is still in the data center. RDS in all its forms is then an ideal way of allowing a Bring Your Own Device (BYOD) policy. Users can bring in whatever device they wish into work or work at home on their own device (WHOYD is my own acronym for this!) by using an RD Client on that device and securely connecting with it. Then there are no concerns about partially or completely wiping users' own devices or not being able to because they aren't in a connected state when they are lost or stolen. VDI versus Session Virtualization So why are there these two ways of providing Remote Desktops in Windows and what are the advantages and disadvantages of each? First and foremost Session Virtualization is always going to be much more efficient than VDI as it's going to provide more desktops on less hardware. This makes sense if we look at what's going on in each scenario if we have a hundred remote desktop users to provide for: In Session Virtualization, we are running and sharing one operating system and this will comfortably sit on 25 GB of disk. For memory we need roughly 2 GB per CPU, we can then allocate 100 MB memory to each RD Session. So on a quad CPU Server for a hundred RD Sessions we'll need 8 GB + (100MB*100), so less than 18 GB RAM. If we want to support that same hundred users on VDI, we need to provision a hundred VMs each with its own OS. To do this we need 2.5 TB of disk and if we give each VM 1 GB of RAM then 100 GB of RAM is needed. This is a little unfair on VDI in Windows Server 2012 R2 as we can cut down on the disk space need and be much more efficient in memory than this but even with these new technologies we would need 70 GB of RAM and say 400 GB of disk. Remember that with RDS our users are going to be running the desktop that comes with Windows Server 2012 R2 not Windows 8.1. Our RDS users are sharing the one desktop so they cannot be given administrative rights to that desktop. This may not be that important for them but can also affect what applications can be put onto the server desktop and some applications just won't work on a Server OS anyway. Users' sessions can't be moved from one session host to another while the user is logged in so planned maintenance has to be carefully thought through and may mean working out of hours to patch servers and update applications if we want to avoid interrupting our users' work. VDI on the other hand means that our users are using Windows 8.1 which they will be happier with. And this may well be the deciding factor regardless of cost as our users are paying for this and their needs should take precedence. VDI can be easier to manage without interrupting our users as we can move running VMs around our physical hosts without stopping them. Remote applications in RDS Another part of the RDS story is the ability to provide remote access to individual applications rather than serving up a whole desktop. This is called RD Remote App and it simply provides a shortcut to an individual application running either on a virtual or remote session desktop. I have seen this used for legacy applications that won't run on the latest versions of Windows and to provide access to secure application as it's a simple matter to prevent cut and paste or any sharing of data between a remote application and the local device it's executing on. RD Remote Apps work by publishing only the specified applications installed on our RD Session Hosts or VDI VMs. Summary This article discusses the advantages of Remote Desktops. We also learned how they can be provided in Windows. This article also compares VDI to Session Virtualization. Resources for Article: Further resources on this subject: Designing, Sizing, Building, and Configuring Citrix VDI-in-a-Box [article] Installing Virtual Desktop Agent – server OS and desktop OS [article] VMware View 5 Desktop Virtualization [article]
Read more
  • 0
  • 0
  • 5967

Packt
07 Jul 2014
12 min read
Save for later

HTML5 Game Development – A Ball-shooting Machine with Physics Engine

Packt
07 Jul 2014
12 min read
(For more resources related to this topic, see here.) Mission briefing In this article, we focus on the physics engine. We will build a basketball court where the player needs to shoot the ball in to the hoop. A player shoots the ball by keeping the mouse button pressed and releasing it. The direction is visualized by an arrow and the power is proportional to the duration of the mouse press and hold event. There are obstacles present between the ball and the hoop. The player either avoids the obstacles or makes use of them to put the ball into the hoop. Finally, we use CreateJS to visualize the physics world into the canvas. You may visit http://makzan.net/html5-games/ball-shooting-machine/ to play a dummy game in order to have a better understanding of what we will be building throughout this article. The following screenshot shows a player shooting the ball towards the hoop, with a power indicator: Why is it awesome? When we build games without a physics engine, we create our own game loop and reposition each game object in every frame. For instance, if we move a character to the right, we manage the position and movement speed ourselves. Imagine that we are coding a ball-throwing logic now. We need to keep track of several variables. We have to calculate the x and y velocity based on the time and force applied. We also need to take the gravity into account; not to mention the different angles and materials we need to consider while calculating the bounce between the two objects. Now, let's think of a physical world. We just defined how objects interact and all the collisions that happen automatically. It is similar to a real-world game; we focus on defining the rule and the world will handle everything else. Take basketball as an example. We define the height of the hoop, size of the ball, and distance of the three-point line. Then, the players just need to throw the ball. We never worry about the flying parabola and the bouncing on the board. Our space takes care of them by using the physics laws. This is exactly what happens in the simulated physics world; it allows us to apply the physics properties to game objects. The objects are affected by the gravity and we can apply forces to them, making them collide with each other. With the help of the physics engine, we can focus on defining the game-play rules and the relationship between the objects. Without the need to worry about collision and movement, we can save time to explore different game plays. We then elaborate and develop the setup further, as we like, among the prototypes. We define the position of the hoop and the ball. Then, we apply an impulse force to the ball in the x and y dimensions. The engine will handle all the things in between. Finally, we get an event trigger if the ball passes through the hoop. It is worth noting that some blockbuster games are also made with a physics engine. This includes games such as Angry Birds, Cut the Rope, and Where's My Water. Your Hotshot objectives We will divide the article into the following eight tasks: Creating the simulated physics world Shooting a ball Handling collision detection Defining levels Launching a bar with power Adding a cross obstacle Visualizing graphics Choosing a level Mission checklist We create a project folder that contains the index.html file and the scripts and styles folders. Inside the scripts folder, we create three files: physics.js, view.js, and game.js. The physics.js file is the most important file in this article. It contains all the logic related to the physics world including creating level objects, spawning dynamic balls, applying force to the objects, and handling collision. The view.js file is a helper for the view logic including the scoreboard and the ball-shooting indicator. The game.js file, as usual, is the entry point of the game. It also manages the levels and coordinates between the physics world and view. Preparing the vendor files We also need a vendors folder that holds the third-party libraries. This includes the CreateJS suite—EaselJS, MovieClip, TweenJS, PreloadJS—and Box2D. Box2D is the physics engine that we are going to use in this article. We need to download the engine code from https://code.google.com/p/box2dweb/. It is a port version from ActionScript to JavaScript. We need the Box2dWeb-2.1.a.3.min.js file or its nonminified version for debugging. We put this file in the vendors folder. Box2D is an open source physics-simulation engine that was created by Erin Catto. It was originally written in C++. Later, it was ported to ActionScript because of the popularity of Flash games, and then it was ported to JavaScript. There are different versions of ports. The one we are using is called Box2DWeb, which was ported from ActionScript's version Box2D 2.1. Using an old version may cause issues. Also, it will be difficult to find help online because most developers have switched to 2.1. Creating a simulated physics world Our first task is to create a simulated physics world and put two objects inside it. Prepare for lift off In the index.html file, the core part is the game section. We have two canvas elements in this game. The debug-canvas element is for the Box2D engine and canvas is for the CreateJS library: <section id="game" class="row"> <canvas id="debug-canvas" width="480" height="360"></canvas> <canvas id="canvas" width="480" height="360"></canvas> </section> We prepare a dedicated file for all the physics-related logic. We prepare the physics.js file with the following code: ;(function(game, cjs, b2d){ // code here later }).call(this, game, createjs, Box2D); Engage thrusters The following steps create the physics world as the foundation of the game: The Box2D classes are put in different modules. We will need to reference some common classes as we go along. We use the following code to create an alias for these Box2D classes: // alias var b2Vec2 = Box2D.Common.Math.b2Vec2 , b2AABB = Box2D.Collision.b2AABB , b2BodyDef = Box2D.Dynamics.b2BodyDef , b2Body = Box2D.Dynamics.b2Body , b2FixtureDef = Box2D.Dynamics.b2FixtureDef , b2Fixture = Box2D.Dynamics.b2Fixture , b2World = Box2D.Dynamics.b2World , b2MassData = Box2D.Collision.Shapes.b2MassData , b2PolygonShape = Box2D.Collision.Shapes.b2PolygonShape , b2CircleShape = Box2D.Collision.Shapes.b2CircleShape , b2DebugDraw = Box2D.Dynamics.b2DebugDraw , b2MouseJointDef = Box2D.Dynamics.Joints.b2MouseJointDef , b2RevoluteJointDef = Box2D.Dynamics.Joints.b2RevoluteJointDef ; We prepare a variable that states how many pixels define 1 meter in the physics world. We also define a Boolean to determine if we need to draw the debug draw: var pxPerMeter = 30; // 30 pixels = 1 meter. Box3D uses meters and we use pixels. var shouldDrawDebug = false; All the physics methods will be put into the game.physics object. We create this literal object before we code our logics: var physics = game.physics = {}; The first method in the physics object creates the world: physics.createWorld = function() { var gravity = new b2Vec2(0, 9.8); this.world = new b2World(gravity, /*allow sleep= */ true); // create two temoporary bodies var bodyDef = new b2BodyDef; var fixDef = new b2FixtureDef; bodyDef.type = b2Body.b2_staticBody; bodyDef.position.x = 100/pxPerMeter; bodyDef.position.y = 100/pxPerMeter; fixDef.shape = new b2PolygonShape(); fixDef.shape.SetAsBox(20/pxPerMeter, 20/pxPerMeter); this.world.CreateBody(bodyDef).CreateFixture(fixDef); bodyDef.type = b2Body.b2_dynamicBody; bodyDef.position.x = 200/pxPerMeter; bodyDef.position.y = 100/pxPerMeter; this.world.CreateBody(bodyDef).CreateFixture(fixDef); // end of temporary code } The update method is the game loop's tick event for the physics engine. It calculates the world step and refreshes debug draw. The world step upgrades the physics world. We'll discuss it later: physics.update = function() { this.world.Step(1/60, 10, 10); if (shouldDrawDebug) { this.world.DrawDebugData(); } this.world.ClearForces(); }; Before we can refresh the debug draw, we need to set it up. We pass a canvas reference to the Box2D debug draw instance and configure the drawing settings: physics.showDebugDraw = function() { shouldDrawDebug = true; //set up debug draw var debugDraw = new b2DebugDraw(); debugDraw.SetSprite(document.getElementById("debug-canvas").getContext("2d")); debugDraw.SetDrawScale(pxPerMeter); debugDraw.SetFillAlpha(0.3); debugDraw.SetLineThickness(1.0); debugDraw.SetFlags(b2DebugDraw.e_shapeBit | b2DebugDraw.e_jointBit); this.world.SetDebugDraw(debugDraw); }; Let's move to the game.js file. We define the game-starting logic that sets up the EaselJS stage and Ticker. It creates the world and sets up the debug draw. The tick method calls the physics.update method: ;(function(game, cjs){ game.start = function() { cjs.EventDispatcher.initialize(game); // allow the game object to listen and dispatch custom events. game.canvas = document.getElementById('canvas'); game.stage = new cjs.Stage(game.canvas); cjs.Ticker.setFPS(60); cjs.Ticker.addEventListener('tick', game.stage); // add game.stage to ticker make the stage.update call automatically. cjs.Ticker.addEventListener('tick', game.tick); // gameloop game.physics.createWorld(); game.physics.showDebugDraw(); }; game.tick = function(){ if (cjs.Ticker.getPaused()) { return; } // run when not paused game.physics.update(); }; game.start(); }).call(this, game, createjs); After these steps, we should have a result as shown in the following screenshot. It is a physics world with two bodies. One body stays in position and the other one falls to the bottom. Objective complete – mini debriefing We have defined our first physical world with one static object and one dynamic object that falls to the bottom. A static object is an object that is not affected by gravity and any other forces. On the other hand, a dynamic object is affected by all the forces. Defining gravity In reality, we have gravity on every planet. It's the same in the Box2D world. We need to define gravity for the world. This is a ball-shooting game, so we will follow the rules of gravity on Earth. We use 0 for the x-axis and 9.8 for the y-axis. It is worth noting that we do not need to use the 9.8 value. For instance, we can set a smaller gravity value to simulate other planets in space—maybe even the moon; or, we can set the gravity to zero to create a top-down view of the ice hockey game, where we apply force to the puck and benefit from the collision. Debug draw The physics engine focuses purely on the mathematical calculation. It doesn't care about how the world will be presented finally, but it does provide a visual method in order to make the debugging easier. This debug draw is very useful before we use our graphics to represent the world. We won't use the debug draw in production. Actually, we can decide how we want to visualize this physics world. We have learned two ways to visualize the game. The first way is by using the DOM objects and the second one is by using the canvas drawing method. We will visualize the world with our graphics in later tasks. Understanding body definition and fixture definition In order to define objects in the physics world, we need two definitions: a body definition and fixture definition. The body is in charge of the physical properties, such as its position in the world, taking and applying force, moving speed, and the angular speed when rotating. We use fixtures to handle the shape of the object. The fixture definition also defines the properties on how the object interacts with others while colliding, such as friction and restitution. Defining shapes Shapes are defined in a fixture. The two most common shapes in Box2D are rectangle and circle. We define a rectangle with the SetAsBox function by providing half of its width and height. Also, the circle shape is defined by the radius. It is worth noting that the position of the body is at the center of the shape. It is different from EaselJS in that the default origin point is set at the top-left corner. Pixels per meter When we define the dimension and location of the body, we use meter as a unit. That's because Box2D uses metric for calculation to make the physics behavior realistic. But we usually calculate in pixels on the screen. So, we need to convert between pixels on the screen and meters in the physics world. That's why we need the pxPerMeter variable here. The value of this variable might change from project to project. The update method In the game tick, we update the physics world. The first thing we need to do is take the world to the next step. Box2D calculates objects based on steps. It is the same as we see in the physical world when a second is passed. If a ball is falling, at any fixed time, the ball is static with the property of the falling velocity. In the next millisecond, or nanosecond, the ball falls to a new position. This is exactly how steps work in the Box2D world. In every single step, the objects are static with their physics properties. When we go a step further, Box2D takes the properties into consideration and applies them to the objects. This step takes three arguments. The first argument is the time passed since the last step. Normally, it follows the frame-per-second parameter that we set for the game. The second and the third arguments are the iteration of velocity and position. This is the maximum iterations Box2D tries when resolving a collision. Usually, we set them to a low value. The reason we clear the force is because the force will be applied indefinitely if we do not clear it. That means the object keeps receiving the force on each frame until we clear it. Normally, clearing forces on every frame will make the objects more manageable. Classified intel We often need to represent a 2D vector in the physics world. Box2D uses b2vec for this purpose. Similar to the b2vec function, we use quite a lot of Box2D functions and classes. They are modularized into namespaces. We need to alias the most common classes to make our code shorter.
Read more
  • 0
  • 0
  • 5564

article-image-creating-maze-and-animating-cube
Packt
07 Jul 2014
9 min read
Save for later

Creating the maze and animating the cube

Packt
07 Jul 2014
9 min read
(For more resources related to this topic, see here.) A maze is a rather simple shape that consists of a number of walls and a floor. So, what we need is a way to create these shapes. Three.js, not very surprisingly, doesn't have a standard geometry that will allow you to create a maze, so we need to create this maze by hand. To do this, we need to take two different steps: Find a way to generate the layout of the maze so that not all the mazes look the same. Convert that to a set of cubes (THREE.BoxGeometry) that we can use to render the maze in 3D. There are many different algorithms that we can use to generate a maze, and luckily there are also a number of open source JavaScript libraries that implement such an algorithm. So, we don't have to start from scratch. For the example in this book, I've used the following random-maze-generator project that you can find on GitHub at the following link: https://github.com/felipecsl/random-maze-generator Generating a maze layout Without going into too much detail, this library allows you to generate a maze and render it on an HTML5 canvas. The result of this library looks something like the following screenshot: You can generate this by just using the following JavaScript: var maze = new Maze(document, 'maze'); maze.generate(); maze.draw(); Even though this is a nice looking maze, we can't use this directly to create a 3D maze. What we need to do is change the code the library uses to write on the canvas, and change it to create Three.js objects. This library draws the lines on the canvas in a function called drawLine: drawLine: function(x1, y1, x2, y2) { self.ctx.beginPath(); self.ctx.moveTo(x1, y1); self.ctx.lineTo(x2, y2); self.ctx.stroke(); } If you're familiar with the HTML5 canvas, you can see that this function draws lines based on the input arguments. Now that we've got this maze, we need to convert it to a number of 3D shapes so that we can render them in Three.js. Converting the layout to a 3D set of objects To change this library to create Three.js objects, all we have to do is change the drawLine function to the following code snippet: drawLine: function(x1, y1, x2, y2) { var lengthX = Math.abs(x1 - x2); var lengthY = Math.abs(y1 - y2); // since only 90 degrees angles, so one of these is always 0 // to add a certain thickness to the wall, set to 0.5 if (lengthX === 0) lengthX = 0.5; if (lengthY === 0) lengthY = 0.5; // create a cube to represent the wall segment var wallGeom = new THREE.BoxGeometry(lengthX, 3, lengthY); var wallMaterial = new THREE.MeshPhongMaterial({ color: 0xff0000, opacity: 0.8, transparent: true }); // and create the complete wall segment var wallMesh = new THREE.Mesh(wallGeom, wallMaterial); // finally position it correctly wallMesh.position = new THREE.Vector3( x1 - ((x1 - x2) / 2) - (self.height / 2), wallGeom.height / 2, y1 - ((y1 - y2)) / 2 - (self.width / 2)); self.elements.push(wallMesh); scene.add(wallMesh); } In this new drawLine function, instead of drawing on the canvas, we create a THREE.BoxGeometry object whose length and depth are based on the supplied arguments. Using this geometry, we create a THREE.Mesh object and use the position attribute to position the mesh on a specific points with the x, y, and z coordinates. Before we add the mesh to the scene, we add it to the self.elements array. Now we can just use the following code snippet to create a 3D maze: var maze = new Maze(scene,17, 100, 100); maze.generate(); maze.draw(); As you can see, we've also changed the input arguments. These properties now define the scene to which the maze should be added and the size of the maze. The result from these changes can be seen in the following screenshot: Every time you refresh, you'll see a newly generated random maze. Now that we've got our generated maze, the next step is to add the object that we'll move through the maze. Animating the cube Before we dive into the code, let's first look at the result as shown in the following screenshot: Using the controls at the top-right corner, you can move the cube around. What you'll see is that the cube rotates around its edges, not around its center. In this section, we'll show you how to create that effect. Let's first look at the default rotation, which is along an object's central axis, and the translation behavior of Three.js. The standard Three.js rotation behavior Let's first look at all the properties you can set on THREE.Mesh. They are as follows: Function/property Description position This property refers to the position of an object, which is relative to the position of its parent. In all our examples, so far the parent is THREE.Scene. rotation This property defines the rotation of THREE.Mesh around its own x, y, or z axis. scale With this property, you can scale the object along its own x, y, and z axes. translateX(amount) This property moves the object by a specified amount over the x axis. translateY(amount) This property moves the object by a specified amount over the y axis. translateZ(amount) This property moves the object by a specified amount over the z axis. If we want to rotate a mesh around one of its own axes, we can just call the following line of code: plane.rotation.x = -0.5 * Math.PI; We've used this to rotate the ground area from a horizontal position to a vertical one. It is important to know that this rotation is done around its own internal axis, not the x, y, or z axis of the scene. So, if you first do a number of rotations one after another, you have to keep track at the orientation of your mesh to make sure you get the required effect. Another point to note is that rotation is done around the center of the object—in this case the center of the cube. If we look at the effect we want to accomplish, we run into the following two problems: First, we don't want to rotate around the center of the object; we want to rotate around one of its edges to create a walking-like animation Second, if we use the default rotation behavior, we have to continuously keep track of our orientation since we're rotating around our own internal axis In the next section, we'll explain how you can solve these problems by using matrix-based transformations. Creating an edge rotation using matrix-based transformation If we want to perform edge rotations, we have to take the following few steps: If we want to rotate around the edge, we have to change the center point of the object to the edge we want to rotate around. Since we don't want to keep track of all the rotations we've done, we'll need to make sure that after each rotation, the vertices of the cube represent the correct position. Finally, after we've rotated around the edge, we have to do the inverse of the first step. This is to make sure the center point of the object is back in the center of the cube so that it is ready for the next step. So, the first thing we need to do is change the center point of the cube. The approach we use is to offset the position of all individual vertices and then change the position of the cube in the opposite way. The following example will allow us to make a step to the right-hand side: cubeGeometry.applyMatrix(new THREE.Matrix4().makeTranslation (0, width / 2, width / 2)); cube.position.y += -width / 2; cube.position.z += -width / 2; With the cubeGeometry.applyMatrix function, we can change the position of the individual vertices of our geometry. In this example, we will create a translation (using makeTranslation), which offsets all the y and z coordinates by half the width of the cube. The result is that it will look like the cube moved a bit to the right-hand side and then up, but the actual center of the cube now is positioned at one of its lower edges. Next, we use the cube.position property to position the cube back at the ground plane since the individual vertices were offset by the makeTranslation function. Now that the edge of the object is positioned correctly, we can rotate the object. For rotation, we could use the standard rotation property, but then, we will have to constantly keep track of the orientation of our cube. So, for rotations, we once again use a matrix transformation on the vertices of our cube: cube.geometry.applyMatrix(new THREE.Matrix4().makeRotationX(amount); As you can see, we use the makeRotationX function, which changes the position of our vertices. Now we can easily rotate our cube, without having to worry about its orientation. The final step we need to take is reset the cube to its original position; taking into account that we've moved a step to the right, we can take the next step: cube.position.y += width/2; // is the inverse + width cube.position.z += -width/2; cubeGeometry.applyMatrix(new THREE.Matrix4().makeTranslation(0, - width / 2, width / 2)); As you can see, this is the inverse of the first step; we've added the width of the cube to position.y and subtracted the width from the second argument of the translation to compensate for the step to the right-hand side we've taken. If we use the preceding code snippet, we will only see the result of the step to the right. Summary In this article, we have seen how to create a maze and animate a cube. Resources for Article: Further resources on this subject: Working with the Basic Components That Make Up a Three.js Scene [article] 3D Websites [article] Rich Internet Application (RIA) – Canvas [article]
Read more
  • 0
  • 0
  • 15469
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime
article-image-validating-user-input-workflow-transitions
Packt
07 Jul 2014
6 min read
Save for later

Validating user input in workflow transitions

Packt
07 Jul 2014
6 min read
(For more resources related to this topic, see here.) For workflow transitions that have transition screens, you can add validation logic to make sure what the users put in is what you are expecting. This is a great way to ensure data integrity, and we can do this with workflow validators. In this recipe, we will add a validator to perform a date comparison between a custom field and the issue's create date, so the date value we select for the custom field must be after the issue's create date. Getting ready For this recipe, we need to have the JIRA Suite Utilities add-on installed. You can download it from the following link or install it directly using the Universal Plugin Manager: https://marketplace.atlassian.com/plugins/com.googlecode.jira-suiteutilities Since we are also doing a date comparison, we need to create a new date custom field called Start Date and add it to the Workflow Screen. How to do it… Perform the following steps to add validation rules during a workflow transition: Select and edit the workflow to configure. Select the Diagram mode. Select the Start Progress transition and click on the Validators link towards the right-hand side. Click on the Add validator link and select Date Compare from the list. Select the Start Date custom field for This date, > for Condition, and Created for Compare with, and click on Update to add the validator: Click on Publish Draft to apply the change. After we add the validator, if we now try to select a date that is before the issue's create date, JIRA will prompt you with an error message and stop the transition from going through, as shown in the following screenshot: How it works… Validators are run after the user has executed the transition but before the transition starts. This way, validators can intercept and prevent a transition from going through if one or more of the validation logics fail. If you have more than one validator, all of them must pass for the transition to go through. See also Validators can be used to make a field required only during workflow transitions. Creating custom workflow transition logic In the previous recipe, we have looked at using workflow conditions, validators, and post functions that come out of the box with JIRA and from other third-party add-ons. In this recipe, we will take a look at how to use scripts to define our own validation rules for a workflow validator. We will address a common use case, which is to make a field required during a workflow transition only when another field is set to a certain value. So, our validation logic will be as follows: If the Resolution field is set to Fixed, the Solution Details field will be required If the Resolution field is set to a value other than Fixed, the Solution Details field will not be required Getting ready For this recipe, we need to have the Script Runner add-on installed. You can download it from the following link or install it directly using the Universal Plugin Manager: https://marketplace.atlassian.com/plugins/com.onresolve.jira.groovy.groovyrunner You might also want to get familiar with Groovy scripting (http://groovy.codehaus.org). How to do it… Perform the following steps to set up a validator with custom-scripted logic: Select and edit the Simple Workflow. Select the Diagram mode. Click on the Move to Backlog global workflow transition. Click on the Validators link from the panel on the right-hand side. Click on Add validator, select Script Validator from the list, and click on Add. Now enter the following script code in the Condition textbox: import com.opensymphony.workflow.InvalidInputException import com.atlassian.jira.ComponentManager import org.apache.commons.lang.StringUtils def customFieldManager = ComponentManager.getInstance().getCustomFieldManager() def solutionField = customFieldManager.getCustomFieldObjectByName("Solution Details") def resolution = issue.getResolutionObject().getName() def solution = issue.getCustomFieldValue(solutionField) if(resolution == "Fixed" && StringUtils.isBlank(solution)) { false } else { true } Enter the text "You must provide the Solution Details if the Resolution is set to Fixed." for the Error field. Select Solution Details for Field. Click on the Add button to complete the validator setup. Click on Publish Draft to apply the change. Your validator configuration should look something like the following screenshot: Once we add our custom validator, every time the Move to Backlog transition is executed, the Groovy script will run. If the resolution field is set to Fixed and the Solution Details field is empty, we will get a message from the Error field as shown in the following screenshot: How it works… The Script Validator works just like any other validator, except that we can define our own validation logic using Groovy scripts. So, let's go through the script and see what it does. We first get the Solution Details custom field via its name, as shown in the following line of code. If you have more than one custom field with the same name, you need to use its ID instead of its name. def solutionField = customFieldManager.getCustomFieldObjectByName("Solution Details") We then select the resolution value and obtain the value entered for Solution Details during the transition, as follows: def resolution = issue.getResolutionObject().getName() def solution = issue.getCustomFieldValue(solutionField) In our example, we check the resolution name; we can also check the ID of the resolution by changing getName() to getId(). If you have multiple custom fields with the same name, use getId(). Lastly, we check whether the Resolution value is Fixed and the Solution Details value is blank; in this case, we return a value of false, so the validation fails. All other cases will return the value true, so the validation passes. We also use StringUtils.isBlank(solution) to check for blank values so that we can catch cases when users enter empty spaces for the Solution Details field. There's more… You are not limited to creating scripted validators. With the Script Runner add-on, you can create scripted conditions and post functions using Groovy scripts. Summary In this article, we studied how to work with workflows, including permissions and user input validation. Resources for Article: Further resources on this subject: Getting Started with JIRA 4 [Article] Gadgets in JIRA [Article] Advanced JIRA 5.2 Features [Article]
Read more
  • 0
  • 0
  • 4081

article-image-baking-bits-yocto-project
Packt
07 Jul 2014
3 min read
Save for later

Baking Bits with Yocto Project

Packt
07 Jul 2014
3 min read
(For more resources related to this topic, see here.) Understanding the BitBake tool The BitBake task scheduler started as a fork from Portage, which is the package management system used in the Gentoo distribution. However, nowadays the two projects have diverged a lot due to the different usage focusses. The Yocto Project and the OpenEmbedded Project are the most known and intensive users of BitBake, which remains a separated and independent project with its own development cycle and mailing list (bitbake-devel@lists.openembedded.org). BitBake is a task scheduler that parses Python and the Shell Script mixed code. The code parsed generates and runs tasks that may have a complex dependency chain, which is scheduled to allow a parallel execution and maximize the use of computational resources. BitBake can be understood as a tool similar to GNU Make in some aspects. Exploring metadata The metadata used by BitBake can be classified into three major areas: Configuration (the .conf files) Classes (the .bbclass files) Recipes (the .bb and and .bbappend files) The configuration files define the global content, which is used to provide information and configure how the recipes will work. One common example of the configuration file is the machine file that has a list of settings, which describes the board hardware. The classes are used by the whole system and can be inherited by recipes, according to their needs or by default, as in this case with classes used to define the system behavior and provide the base methods. For example, kernel.bbclass helps the process of build, install, and package of the Linux kernel, independent of version and vendor. The recipes and classes are written in a mix of Python and Shell Scripting code. The classes and recipes describe the tasks to be run and provide the needed information to allow BitBake to generate the needed task chain. The inheritance mechanism that permits a recipe to inherit one or more classes is useful to reduce code duplication, and eases the maintenance. A Linux kernel recipe example is linux-yocto_3.14.bb, which inherits a set of classes, including kernel.bbclass. BitBake's most commonly used aspects across all types of metadata (.conf, .bb, and .bbclass). Summary In this article, we studied about BitBake and the classification of metadata. Resources for Article: Further resources on this subject: Installation and Introduction of K2 Content Construction Kit [article] Linux Shell Script: Tips and Tricks [article] Customizing a Linux kernel [article]
Read more
  • 0
  • 0
  • 22727

article-image-part-1-deploying-multiple-applications-capistrano-single-project
Rodrigo Rosenfeld
01 Jul 2014
9 min read
Save for later

Part 1: Deploying Multiple Applications with Capistrano from a Single Project

Rodrigo Rosenfeld
01 Jul 2014
9 min read
Capistrano is a deployment tool written in Ruby that is able to deploy projects using any language or framework, through a set of recipes, which are also written in Ruby. Capistrano expects an application to have a single repository and it is able to run arbitrary commands on the server through an SSH non-interactive session. Capistrano was designed assuming that an application is completely described by a single repository with all code belonging to it. For example, your web application is written with Ruby on Rails and simply serving that application would be enough. But what if you decide to use a separate application for managing your users, in a separate language and framework? Or maybe some issue tracker application? You could setup a proxy server to properly deliver each request to the right application based upon the request path for example. But the problem remains: how do you use Capistrano to manage more complex scenarios like this if it supports a single repository? The typical approach is to integrate Capistrano on each of the component applications and then switching between those projects before deploying those components. Not only this is a lot of work to deploy all of these components, but it may also lead to a duplication of settings. For example, if your main application and the user management application both use the same database for a given environment, you’d have to duplicate this setting in each of the components. For the Market Tracker product, used byLexisNexis clients (which we develop at e-Core for Matterhorn Transactions Inc.), we were looking for a better way to manage many component applications, in lots of environments and servers. We wanted to manage all of them from a single repository, instead of adding Capistrano integration to each of our component’s repositories and having to worry about keeping the recipes in sync between each of the maintained repository branches. Motivation The Market Tracker application we maintain consists of three different applications: the main one, another to export search results to Excel files, and an administrative interface to manage users and other entities. We host the application in three servers: two for the real thing and another back-up server. The first two are identical ones and allow us to have redundancy and zero downtime deployments except for a few cases where we change our database schema in incompatible ways with previous versions. To add to the complexity of deploying our three composing applications to each of those servers, we also need to deploy them multiple times for different environments like production, certification, staging, and experimental. All of them run on the same server, in separate ports, and they are running separate databases:Solr and Redis instances. This is already complex enough to manage when you integrate Capistrano to each of your projects, but it gets worse. Sometimes you find bugs in production and have to release quick fixes, but you can't deploy the version in the master branch that has several other changes. At other times you find bugs on your Capistrano recipes themselves and fix them on the master. Or maybe you are changing your deploy settings rather than the application’s code. When you have to deploy to production, depending on how your Capistrano recipes work, you may have to change to the production branch, backport any changes for the Capistrano recipes from the master and finally deploy the latest fixes. This happens if your recipe will use any project files as a template and they moved to another place in the master branch, for example. We decided to try another approach, similar to what we do with our database migrations. Instead of integrating the database migrations into the main application (the default on Rails, Django, Grails, and similar web frameworks) we prefer to handle it as a separate project. In our case we use theactive_record_migrations gem, which brings standalone support for ActiveRecord migrations (the same that is bundled with Rails apps by default). Our database is shared between the administrative interface project and the main web application and we feel it's better to be able to manage our database schema independently from the projects using the database. We add the migrations project to the other application as submodules so that we know what database schema is expected to work for a particular commit of the application, but that's all. We wanted to apply the same principles to our Capistrano recipes. We wanted to manage all of our applications on different servers and environments from a single project containing the Capistrano recipes. We also wanted to store the common settings in a single place to avoid code duplication, which makes it hard to add new environments or update existing ones. Grouping all applications' Capistrano recipes in a single project It seems we were not the first to want all Capistrano recipes for all of our applications in a single project. We first tried a project called caphub. It worked fine initially and its inheritance model would allow us to avoid our code duplication. Well, not entirely. The problem is that we needed some kind of multiple inheritances or mixins. We have some settings, like token private key, that are unique across environments, like Certification and Production. But we also have other settings that are common in within a server. For example, the database host name will be the same for all applications and environments inside our collocation facility, but it will be different in our backup server at Amazon EC2. CapHub didn't help us to get rid of the duplication in such cases, but it certainly helped us to find a simple solution to get what we wanted. Let's explore how Capistrano 3 allows us to easily manage such complex scenarios that are more common than you might think. Capistrano stages Since Capistrano 3, multistage support is built-in (there was a multistage extension for Capistrano 2). That means you can writecap stage_nametask_name, for examplecap production deploy. By default,cap install will generate two stages: production and staging. You can generate as many as you want, for example: cap install STAGES=production,cert,staging,experimental,integrator But how do we deploy each of those stages to our multiple servers, since the settings for each stage may be different across the servers? Also, how can we manage separate applications? Even though those settings are called "stages" by Capistrano, you can use it as you want. For example, suppose our servers are named m1,m2, and ec2 and the applications are named web, exporter and admin. We can create settings likem1_staging_web, ec2_production_admin, and so on. This will result in lots of files (specifically 45 = 5 x 3 x 3 to support five environments, three applications, and three servers) but it's not a big deal if you consider the settings files can be really small, as the examples will demonstrate later on in this article by using mixins. Usually people will start with staging and production only, and then gradually add other environments. Also, they usually start with one or two servers and keep growing as they feel the need. So supporting 45 combinations is not such a pain since you don’t write all of them at once. On the other hand, if you have enough resources to have a separate server for each of your environments, Capistrano will allow you to add multiple "server" declarations and assign roles to them, which can be quite useful if you're running a cluster of servers. In our case, to avoid downtime we don't upgrade all servers in our cluster at once. We also don't have the budget to host 45 virtual machines or even 15. So the little effort to generate 45 small settings files compensates the savings with hosting expenses. Using mixins My next post will create an example deployment project from scratch providing detail for everything that has been discussed in this post. But first, let me introduce the concept of what we call a mixin in our project. Capistrano 3 is simply a wrapper on top of Rake. Rake is a build tool written in Ruby, similar to “make.” It has targets and targets have prerequisites. This fits nicely in the way Capistrano works, where some deployment tasks will depend on other tasks. Instead of a Rakefile (Rake’s Makefile) Capistrano will use a Capfile, but other than that it works almost the same way. The Domain Specific Language (DSL) in a Capfile is enhanced as you include Capistrano extensions to the Rake DSL. Here’s a sample Capfile, generated by cap install, when you install Capistrano: # Load DSL and Setup Up Stages require'capistrano/setup' # Includes default deployment tasks require'capistrano/deploy' # Includes tasks from other gems included in your Gemfile # # For documentation on these, see for example: # # https://github.com/capistrano/rvm # https://github.com/capistrano/rbenv # https://github.com/capistrano/chruby # https://github.com/capistrano/bundler # https://github.com/capistrano/rails # # require 'capistrano/rvm' # require 'capistrano/rbenv' # require 'capistrano/chruby' # require 'capistrano/bundler' # require 'capistrano/rails/assets' # require 'capistrano/rails/migrations' # Loads custom tasks from `lib/capistrano/tasks' if you have any defined. Dir.glob('lib/capistrano/tasks/*.rake').each { |r| import r } Just like a Rakefile, a Capfile is valid Ruby code, which you can easily extend using regular Ruby code. So, to support a mixin DSL, we simply need to extend the DSL, like this:   defmixin (path) loadFile.join('config', 'mixins', path +'.rb') end Pretty simple, right? We prefer to add this to a separate file, like lib/mixin.rb and add this to the Capfile: $:.unshiftFile.dirname(__FILE__) require 'lib/mixin' After that, calling mixin 'environments/staging' should load settings that are common for the staging environment from a file called config/mixins/environments/staging.rb in the root of the Capistrano-enabled project. This is the base to set up our deployment project that we will create in the next post. About the author Rodrigo Rosenfeld Rosas lives in Vitória-ES, Brazil, with his lovely wife and daughter. He graduated in Electrical Engineering with a Master’s degree in Robotics and Real-time Systems.For the past five years Rodrigo has focused on building and maintaining single page web applications. He is the author of some gems includingactive_record_migrations, rails-web-console, the JS specs runner oojspec, sequel-devise and the Linux X11 utility ktrayshortcut.Rodrigo was hired by e-Core (Porto Alegre - RS, Brazil) to work from home, building and maintaining software forMatterhorn Transactions Inc. with a team of great developers. Matterhorn'smain product, the Market Tracker, is used by LexisNexis clients.
Read more
  • 0
  • 0
  • 4496

article-image-json-pojo-using-gson-android-studio
Troy Miles
01 Jul 2014
6 min read
Save for later

How to Convert POJO to JSON Using Gson in Android Studio

Troy Miles
01 Jul 2014
6 min read
JSON has become the defacto standard of data exchange on the web. Compared to its cousin XML, it is smaller in size and faster to both create and parse. In fact, it seems so simple that many developers roll their own code to convert plain old Java objects or POJO to and from JSON. For simple objects, it is fairly easy to write the conversion code, but as your objects grow more complex, your code's complexity grows as well. Do you really want to maintain a bunch of code whose functionality is not truly intrinsic to your app? Luckily there is no reason for you to do so. There are quite a few alternatives to writing your own Java JSON serializer/deserializer; in fact, json.org lists 25 of them. One of them, Gson, was created by Google for use on internal projects and later was open sourced. Gson is hosted on Google Code and the source code is available in an SVN repo. Create an Android app The process of converting POJO to JSON is called serialization. The reversed process is deserialization. A big reason that GSON is such a popular library is how simple it makes both processes. For both, the only thing you need is the Gson class. Let's create a simple Android app and see how simple Gson is to use. Start Android Studio and select new project Change the Application name to GsonTest. Click Next Click Next again. Click Finish At this point we have a complete Android hello world app. In past Android IDEs, we would add the Gson library at this point, but we don't do that anymore. Instead we add a Gson dependency to our build.gradle script and that will take care of everything else for us. It is super important to edit the correct Gradle file. There is one at the root directory but the one we want is at the app directory. Double-click it to open. Locate the dependencies section near the bottom of the script. After the last entry add the following line: compile 'com.google.code.gson:gson:2.2.4' After you add it, save the script and then click the Sync Project with Gradle Files icon. It is the fifth icon from the right-hand side in the toolbar. At this point, the Gson library is visible to your app. So let's build some test code. Create test code with JSON For our test we are going to use the JSON Test web service at https://www.jsontest.com/. It is a testing platform for JSON. Basically it gives us a place to send data to in order to test if we are properly serializing and deserializing data. JSON Test has a lot of services but we will use the validate service. You pass it a JSON string URL encoded as a query string and it will reply with a JSON object that indicates whether or not the JSON was encoded correctly, as well as some statistical information. The first thing we need to do is create two classes. The first class, TestPojo, is the Java class that we are going to serialize and send to JSON Test. TestPojo doesn't do anything important. It is just for our test; however, it contains several different types of objects: ints, strings, and arrays of ints. Classes that you create can easily be much more complicated, but don't worry, Gson can handle it, for example: 1 package com.tekadept.gsontest.app; 2 3 public class TestPojo { 4 private intvalue1 = 1; 5 private String value2 = "abc"; 6 private intvalues[] = {1, 2, 3, 4}; 7 private transient intvalue3 = 3; 8 9 // no argsctor 10 TestPojo() { 11 } 12 } 13 Gson will also respect the Java transient modifier, which specifies that a field should not be serialized. Any field with it will not appear in the JSON. The second class, JsonValidate, will hold the results of our call to JSON Test. In order to make it easy to parse, I've kept the field names exactly the same as those returned by the service, except for one. Gson has an annotation, @SerializedName, if you place it before a field name, you can have name the class version of a field be different than the JSON name. For example, if we wanted to name the validate field isValid all we would have to do is: 1 package com.tekadept.gsontest.app; 2 3 import com.google.gson.annotations.SerializedName; 4 5 public class JsonValidate { 6 7 public String object_or_array; 8 public booleanempty; 9 public long parse_time_nanoseconds; 10 @SerializedName("validate") 11 public booleanisValid; 12 public intsize; 13 } By using the @SerializedName annotation, our name for the JSON validate becomes isValid. Just remember that you only need to use the annotation when you change the field's name. In order to call JSON Test's validate service, we follow the best practice of not doing it on the UI thread by using an async task. An async task has four steps: onPreExecute, doInBackground, onProgressUpdate, and onPostExecute. The doInBackground method happens on another thread. It allows us to wait for the JSON Test service to respond to us without triggering the dreaded application not responding error. You can see this in action in the following code: 60 @Override 61 protected String doInBackground(String... notUsed) { 62 TestPojotp = new TestPojo(); 63 Gsongson = new Gson(); 64 String result = null; 65 66 try { 67 String json = URLEncoder.encode(gson.toJson(tp), "UTF-8"); 68 String url = String.format("%s%s", Constants.JsonTestUrl, json); 69 result = getStream(url); 70 } catch (Exception ex){ 71 Log.v(Constants.LOG_TAG, "Error: " + ex.getMessage()); 72 } 73 return result; 74 } To encode our Java object, all we need to do is create an instance of the Gson class, then call its toJson method, passing an instance of the class we wish to serialize. Deserialization is nearly as simple. In the onPostExecute method, we get the string of JSON from the web service. We then call the convertFromJson method that does the conversion. First it makes sure that it got a valid string, then it does the conversion by calling Gson'sfromJson method, passing the string and the name of its the class, as follows: 81 @Override 82 protected void onPostExecute(String result) { 83 84 // convert JSON string to a POJO 85 JsonValidatejv = convertFromJson(result); 86 if (jv != null) { 87 Log.v(Constants.LOG_TAG, "Conversion Succeed: " + result); 88 } else { 89 Log.v(Constants.LOG_TAG, "Conversion Failed"); 90 } 91 } 92 93 private JsonValidateconvertFromJson(String result) { 94 JsonValidatejv = null; 95 if (result != null &&result.length() >0) { 96 try { 97 Gsongson = new Gson(); 98 jv = gson.fromJson(result, JsonValidate.class); 99 } catch (Exception ex) { 100     Log.v(Constants.LOG_TAG, "Error: " + ex.getMessage()); 101                 } 102             } 103             return jv; 104         } Conclusion For most developers this is all you need to know. There is a complete guide to Gson at https://sites.google.com/site/gson/gson-user-guide. The complete source code for the test app is at https://github.com/Rockncoder/GsonTest. Discover more Android tutorials and extra content on our Android page - find it here.
Read more
  • 0
  • 0
  • 18721
article-image-part-2-deploying-multiple-applications-capistrano-single-project
Rodrigo Rosenfeld
01 Jul 2014
8 min read
Save for later

Part 2: Deploying Multiple Applications with Capistrano from a Single Project

Rodrigo Rosenfeld
01 Jul 2014
8 min read
In part 1, we covered Capistrano and why you would use it. We also covered mixins, which provide the base for what we will do in this post, which is to deploy a sample project using Capistrano. For this project, suppose our user interface is a combination of two applications,app1 and app2. They should be deployed to servers do and ec2. And we'll provide two environments,production and cert. Make sure Ruby and Bundler are installed before you start. First, we create a new directory for our project, and add a Gemfile to it with capistrano as a dependency. Then we will create the Capistrano directory structure: mkdircapsample cd capsample bundle init echo "gem 'capistrano'" >>Gemfile bundle bundle exec cap install STAGES="do_prod_app1,do_prod_app2,do_cert_app1,do_cert_app2,ec2_prod_app1,ec2_prod_app2,ec2_cert_app1,ec2_cert_app2" This will create nine files under config/deploy, one for each server/environment/application group. This is just to demonstrate the idea. We'll completely override their entire content later on. It will also create a Capfile file that works in a similar way to a regular Rakefile. With Rake, you can get a list of the available tasks with rake -T. With Capistrano you can get the same using: bundle exec cap -T Behind the scenes, cap is a binary distributed with the capistrano gem that will run Rake with Capfile set as the Rakefile and supporting a few other options like --roles.Now create a new file,lib/mixin.rb, with the content mentioned in the Using mixins section in part 1. Then add this to the top of the Capfile: $: . unshiftFile.dirname(__FILE__) require'lib/mixin' Each of the files under config/deploy will look very similar to each other. For instance, ec2_prod_app1 would look like this: mixin 'servers/ec2' mixin'environments/production' mixin'applications/app1' Then config/mixins/servers/ec2.rb would look like this: server 'ec2.mydomain.com', roles: [:main] set :database_host, 'ec2-db.mydomain.com' This file contains definitions that are valid (or default) for the whole server, no matter what environment or application we're deploying. In this example the database host is shared for all applications and environments hosted on our ec2 server. Something to note here is that we're adding a single role named main to our server. If we specified all roles, like [:web, :db, :assets, :puma], then they would be shared with all recipes relying on this server mixin. So, a better approach would be to add them on the application's recipe, if required. For instance, you might want to add something like set :server_name, 'ec2.mydomain.com' to your server definitions. Then you can dynamically set the role in the application's recipe by calling role :db, [fetch(:server_name)] and so on for all required roles. However, this is usually not necessary for third-party recipes as they let you decide which role the recipe should act on. For example, if you want to deploy your application with Puma you can write set :puma_role, :main. Before we discuss a full example for the application recipe, let's look at what config/mixins/environments/production.rb might look like: set :branch, 'production' set :encoding_key, '098f6bcd4621d373cade4e832627b4f6' set :database_name, 'app_production' set :app1_port, 3000 set :app2_port, 3001 set :redis_port, 6379 set :solr_port, 8080 In this example, we're assuming that the ports for app1 and app2 , Redis and Solr will be the same for production in all servers, as well as the database name. Finally, the recipes themselves, which tell Capistrano how to set up an application, will be defined byconfig/mixins/applications/app1.rb. Here's an example for a simple Rails application: Rake :: Task['load:defaults'].invoke Rake::Task['load:defaults'].clear require'capistrano/rails' require'capistrano/puma' Rake::Task['load:defaults'].reenable Rake::Task['load:defaults'].invoke set :application, 'app1' set :repo_url, 'git@example.com:me/app1.git' set :rails_env, 'production' set :assets_roles, :main set :migration_role, :main set :puma_role, :main set :puma_bind, "tcp://0.0.0.0:#{fetch :app1_port}" namespace :railsdo desc'Generate settings file' task :generate_settingsdo on roles(:all) do template ="config/templates/database.yml.erb" dbconfig=StringIO.new(ERB.new(File.read template).result binding) upload! dbconfig, release_path.join('config', 'database.yml') end end end before 'deploy:migrate', 'rails:generate_settings' # Create directories expected by Puma default settings: before 'puma:restart', 'create_log_and_tmp'do on roles(:all) do within shared_pathdo execute :mkdir, '-p', 'log', 'tmp/pids' end end end Make sure you remove the lines that set application and repo_url on the config/deploy.rb file generated bycap install. Also, if you're deploying a Rails application using this recipe you should also add the capistrano-rails andcapistrano3-puma gems to your Gemfile and run bundle again. In case you're running rbenv or rvmto install Ruby in the server, make sure you include either capistrano-rbenv or capistrano-rvm gems and require them on the recipe. You may also need to provide more information in this case. For rbenv you'd need to tell it which version to use with set :rbenv_ruby, '2.1.2' for example. Sometimes you'll find out that some settings are valid for all applications under all environments in all servers. The most important one to notice is the location for our applications as they must not conflict with each other. Another setting that could be shared across all combinations could be the private key used to connect to all servers. For such cases, you should add those settings directly to config/deploy.rb: set :deploy_to, -> { "/home/vagrant/apps/#{fetch :environment}/#{fetch :application}" } set :ssh_options, { keys: %w(~/.vagrant.d/insecure_private_key) } I strongly recommend connecting to your servers with a regular account rather than root. For our applications we use userbenv to manage our Ruby versions, so we're able to deploy them as regular users as long as our applications listen to high port numbers. We'd then setup our proxy server (nginx in our case) to forward the requests on port 80 and 443 to each application's port accordingly to the requested domains and paths. This is set up by some Chef recipes. Those recipes run as root in our servers. To connect using another user, just pass it in the server declaration. To connect to vagrant@192.168.33.10, this is how you'd set it up: server '192.168.33.10', user: 'vagrant', roles: [:main] set :ssh_options, { keys: %w(~/.vagrant.d/insecure_private_key) } Finally, we create a config/database.yml that's suited for our environment on demand, before running the migrations task. Here's what the template config/templates/database.ymlcould look like: production: adapter: postgresql encoding: unicode pool: 30 database: <%= fetch :database_name %> host: <%= fetch :database_host %> I've omitted the settings for app2 , but in case it was another Rails application, we could extract the common logic between them to another common_rails mixin. Also notice that because we're not requiring capistrano/rails and capistrano/puma in the Capfile, their default values won't be set as Capistrano has already invoked the load:defaults task before our mixins are loaded. That's why we clear that task, require the recipes, and then re-enable and re-run the task so that the default for those recipes have the opportunity to load. Another approach is to require those recipes directly in the Capfile. But unless the recipes are carefully crafted to only run their commands for very specific roles, it's likely that you can get unexpected behavior if you deploy an application with Rails, another one with Grails, and yet another with NodeJS. If any of them has commands that run for all roles, or if the role names between them conflict somehow you'd be in trouble. So, unless you have total control and understanding about all your third-party recipes, I'd recommend that you use the approach outlined in the examples above. Conclusion All the techniques presented here are used to manage our real complex scenario at e-Core, where we support multiple applications in lots of environments that are replicated in three servers. We found that this allowed us to quickly add new environments or servers as needed to recreate our application in no time. Also, I'd like to thank Juan Ibiapina, who worked with me on all these recipes to ensure our deployment procedures are fully automated—almost. We still manage our databases and documents manually because we prefer to. About the author Rodrigo Rosenfeld Rosas lives in Vitória-ES, Brazil, with his lovely wife and daughter. He graduated in Electrical Engineering with a Master’s degree in Robotics and Real-time Systems. For the past five years Rodrigo has focused on building and maintaining single page web applications. He is the author of some gems includingactive_record_migrations,rails-web-console, the JS specs runner oojspec, sequel-devise, and the Linux X11 utility ktrayshortcut. Rodrigo was hired by e-Core (Porto Alegre-RS, Brazil) to work from home, building and maintaining software for Matterhorn Transactions Inc. with a team of great developers. Matterhorn's main product, the Market Tracker, is used by LexisNexis clients .
Read more
  • 0
  • 0
  • 4276

article-image-mapreduce-openstack-swift-and-zerovm
Lars Butler
30 Jun 2014
6 min read
Save for later

MapReduce with OpenStack Swift and ZeroVM

Lars Butler
30 Jun 2014
6 min read
Originally coined in a 2004 research paper produced by Google, the term “MapReduce” was defined as a “programming model and an associated implementation for processing and generating large datasets”. While Google’s proprietary implementation of this model is known simply as “MapReduce”, the term has since become overloaded to refer to any software that follows the same general computation model. The philosophy behind the MapReduce computation model is based on a divide and conquer approach: the input dataset is divided into small pieces and distributed among a pool of computation nodes for processing. The advantage here is that many nodes can run in parallel to solve a problem. This can be much quicker than a single machine, provided that the cost of communication (loading input data, relaying intermediate results between compute nodes, and writing final results to persistent storage) does not outweigh the computation time. Indeed, the cost of moving data between storage and computation nodes can diminish the advantage of distributed parallel computation if task payloads are too granular. On the other hand, they must be granular enough in order to be distributed evenly (although achieving perfect distribution is nearly impossible if exact computational complexity of each task cannot be known in advance). A distributed MapReduce solution can be effective for processing large data sets, provided that each task is sufficiently long-running to offset communication costs. If the need for data transfer (between compute and storage nodes) could somehow be reduced or eliminated, the task size limitations would all but disappear, enabling a wider range of use cases. One way to achieve this is to perform the computation closer to the data, that is, on the same physical machine. Stored procedures in relational database systems, for example, are one way to achieve data-local computation. Simpler yet, computation could be run directly on the system where the data is stored in files. In both cases, the same problems are present: changes to the code require direct access to the system, and thus access needs to be carefully controlled (through group management, read/write permissions, and so on). Scaling in these scenarios is also problematic: vertical scaling (beefing up a single server) is the only feasible way to gain performance while maintaining the efficiency of data-local computation. Achieving horizontal scalability by adding more machines is not feasible unless the storage system inherently supports this. Enter Swift and ZeroVM OpenStack Swift is one such example of a horizontally scalable data store. Swift can store petabytes of data in millions of objects across thousands of machines. Swift’s storage operations can also be extended by writing custom middleware applications, which makes it a good foundation for building a “smart” storage system capable of running computations directly on the data. ZeroCloud is a middleware application for Swift which does just that. It embeds ZeroVM inside Swift and exposes a new API method for executing jobs. ZeroVM provides sufficient program isolation and a guarantee of security such that any arbitrary code can be run safely without compromising the storage system. Access to data is governed by Swift’s inherent access control, so there is no need to further lock down the storage system, even in a multi-tenant environment. The combination of Swift and ZeroVM results in something like stored procedures, but much more powerful. First and foremost, user application code can be updated at any time without the need to worry about malicious or otherwise destructive side effects. (The worst thing that can happen is that a user can crash their own machines and delete their own data—but not another user’s.) Second, ZeroVM instances can start very quickly: in about five milliseconds. This means that to process one thousand objects, ZeroCloud can instantaneously spawn one thousand ZeroVM instances (one for each file), run the job, and destroy the instances very quickly. This also means that any job, big or small, can feasibly run on the system. Finally, the networking component of ZeroVM allows intercommunication between instances, enabling the creation of arbitrary job pipelines and multi-stage computations. This converged compute and storage solution makes for a good alternative to popular MapReduce frameworks like Hadoop because little effort is required to set up and tear down computation nodes for a job. Also, once a job is complete, there is no need to extract result data, pipe it across the network, and then save it in a separate persistent data store (which can be an expensive operation if there is a large quantity of data to save). With ZeroCloud, the results can simply be saved back into Swift. The same principle applies to the setup of a job. There is no need to move or copy data to the computation cluster; the data is already where it needs to be. Limitations Currently, ZeroCloud uses a fairly naïve scheduling algorithm. To perform a data-local computation on an object, ZeroCloud determines which machines in the cluster contain a replicated copy of an object and then randomly chooses one to execute a ZeroVM process. While this functions well as a proof-of-concept for data-local computing, it is quite possible for jobs to be spread unevenly across a cluster, resulting in an inefficient use of resources. A research group at the University of Texas at San Antonio (UTSA) sponsored by Rackspace is currently working on developing better algorithms for scheduling workloads running on ZeroVM-based infrastructure. A short video of the research proposal can be found here:http://link.brightcove.com/services/player/bcpid3530100726001?bckey=AQ~~,AAACa24Pu2k~,Q186VLPcl3-oLBDP8npyqxjCNB5jgYcT. Further reading Rackspace has launched a ZeroCloud playground service called Zebra for developers to try out the platform. At the time this was written, Zebra is still in a private beta and invitations are limited. But it also possible to install and run your own copy of ZeroCloud for testing and development; basic installation instructions are here: https://github.com/zerovm/zerocloud/blob/icehouse/doc/Hacking.md. There are also some tutorials (including sample applications) for creating, packaging, deploying, and executing applications on ZeroCloud: http://zerovm.readthedocs.org/en/latest/zebra/tutorial.html. The tutorials are intended for use with the Zebra service, but can be run on any deployment of ZeroCloud. Big Data in the cloud? It's not the future, it's already here! Find more Hadoop tutorials and extra content here, and dive deeper into OpenStack by visiting this page, dedicated to one of the most exciting cloud platforms around today.  About the author Lars Butler is a software developer at Rackspace, the open cloud company. He has worked as a software developer in avionics, seismic hazard research, and most recently, on ZeroVM. He can be reached @larsbutler.
Read more
  • 0
  • 0
  • 2966

article-image-making-most-your-hadoop-data-lake-part-1-data-compression
Kristen Hardwick
30 Jun 2014
6 min read
Save for later

Making the Most of Your Hadoop Data Lake, Part 1: Data Compression

Kristen Hardwick
30 Jun 2014
6 min read
In the world of big data, the Data Lake concept reigns supreme. Hadoop users are encouraged to keep all data in order to prepare for future use cases and as-yet-unknown data integration points. This concept is part of what makes Hadoop and HDFS so appealing, so it is important to make sure that the data is being stored in a way that prolongs that behavior. In the first part of this two-part series, “Making the Most of Your Hadoop Data Lake”, we will address one important factor in improving manageability—data compression. Data compression is an area that is often overlooked in the context of Hadoop. In many cluster environments, compression is disabled by default, putting the burden on the user. In this post, we will discuss the tradeoffs involved in deciding how to take advantage of compression techniques and the advantages and disadvantages of specific compression codec options with respect to Hadoop. To compress or not to compress Whenever data is converted to something other than its raw data format, that naturally implies some overhead involved in completing the conversion process. When data compression is being discussed, it is important to take that overhead into account with respect to the benefits of reducing the data footprint. One obvious benefit is that compressed data will reduce the amount of disk space that is required for storage of a particular dataset. In the big data environment, this benefit is especially significant—either the Hadoop cluster will be able to keep data for a larger time range, or storing data for the same time range will require fewer nodes, or the disk usage ratios will remain lower for longer. In addition, the smaller file sizes will mean lower data transfer times—either internally for MapReduce jobs or when performing exports of data results. The cost of these benefits, however, is that the data must be decompressed at every point where the data needs to be read, and compressed before being inserted into HDFS. With respect to MapReduce jobs, this processing overhead at both the map phase and the reduce phase will increase the CPU processing time. Fortunately, by making informed choices about the specific compression codecs used at any given phase in the data transformation process, the cluster administrator or user can ensure that the advantages of compression outweigh the disadvantages. Choosing the right codec for each phase Hadoop provides the user with some flexibility on which compression codec is used at each step of the data transformation process. It is important to realize that certain codecs are optimal for some stages, and non-optimal for others. In the next sections, we will cover some important notes for each choice. zlib The major benefit of using this codec is that it is the easiest way to get the benefits of data compression from a cluster and job configuration standpoint—the zlib codec is the default compression option. From the data transformation perspective, this codec will decrease the data footprint on disk, but will not provide much of a benefit in terms of job performance. gzip The gzip codec available in Hadoop is the same one that is used outside of the Hadoop ecosystem. It is common practice to use this as the codec for compressing the final output from a job, simply for the benefit of being able to share the compressed result with others (possibly outside of Hadoop) using a standard file format. bzip2 There are two important benefits for the bzip2 codec. First, if reducing the data footprint is a high priority, this algorithm will compress the data more than the default zlib option. Second, this is the only supported codec that produces “splittable” compressed data. A major characteristic of Hadoop is the idea of splitting the data so that they can be handled on each node independently. With the other compression codecs, there is an initial requirement to gather all parts of the compressed file in order to have all information necessary to decompress the data. With this format, the data can be decompressed in parallel. This splittable quality makes this format ideal for compressing data that will be used as input to a map function, either in a single step or as part of a series of chained jobs. LZO, LZ4, Snappy These three codecs are ideal for compressing intermediate data—the data output from the mappers that will be immediately read in by the reducers. All three codecs heavily favor compression speed over file size ratio, but the detailed specifications for each algorithm should be examined based on the specific licensing, cluster, and job requirements. Enabling compression Once the appropriate compression codec for any given transformation phase has been selected, there are a few configuration properties that need to be adjusted in order to have the changes take effect in the cluster. Intermediate data to reducer mapreduce.map.output.compress = true (Optional) mapreduce.map.output.compress.codec = org.apache.hadoop.io.compress.SnappyCodec Final output from a job mapreduce.output.fileoutputformat.compress = true (Optional) mapreduce.output.fileoutputformat.compress.codec = org.apache.hadoop.io.compress.BZip2Codec These compression codecs are also available within some of the ecosystem tools like Hive and Pig. In most cases, the tools will default to the Hadoop-configured values for particular codecs, but the tools also provide the option to compress the data generated between steps. Pig pig.tmpfilecompression = true (Optional) pig.tmpfilecompression.codec = snappy Hive hive.exec.compress.intermediate = true hive.exec.compress.output = true Conclusion This post detailed the benefits and disadvantages of data compression, along with some helpful guidelines on how to choose a codec and enable it at various stages in the data transformation workflow. In the next post, we will go through some additional techniques that can be used to ensure that users can make the most of the Hadoop Data Lake. For more Big Data and Hadoop tutorials and insight, visit our dedicated Hadoop page.  About the author Kristen Hardwick has been gaining professional experience with software development in parallel computing environments in the private, public, and government sectors since 2007. She has interfaced with several different parallel paradigms including Grid, Cluster, and Cloud. She started her software development career with Dynetics in Huntsville, AL, and then moved to Baltimore, MD, to work for Dynamics Research Corporation. She now works at Spry where her focus is on designing and developing big data analytics for the Hadoop ecosystem.
Read more
  • 0
  • 0
  • 6671
article-image-making-most-your-hadoop-data-lake-part-2-optimized-file-formats
Kristen Hardwick
30 Jun 2014
5 min read
Save for later

Making the Most of Your Hadoop Data Lake, Part 2: Optimized File Formats

Kristen Hardwick
30 Jun 2014
5 min read
One major factor of making the conversion to Hadoop is the concept of the Data Lake. That idea suggests that users keep as much data as possible in HDFS in order to prepare for future use cases and as-yet-unknown data integration points. As your data grows, it is important to make sure that the data is being stored in a way that prolongs that behavior. Data compression is not the only technique that can be used to speed up job performance and improve cluster organization. In addition to the Text and Sequence File options that are typically used by default, Hadoop offers a few more optimized file formats that are specifically designed to improve the process of interacting with the data. In the second part of this two-part series, “Making the Most of Your Hadoop Data Lake”, we will address another important factor in improving manageability—optimized file formats. Using a smarter file for your data: RCFile RCFile stands for Record Columnar File, and it serves as an ideal format for storing relational data that will be accessed through Hive. This format offers performance improvements by storing the data in an optimized way. First, the data is partitioned horizontally, into groups of rows. Then each row group is partitioned vertically, into collections of columns. Finally, the data in each column collection is compressed and stored in column-row format, as if it were a column-oriented database. The first benefit of this altered storage mechanism is apparent at the row level. All HDFS blocks used to form RCFiles will be made up of the horizontally partitioned collections of rows. This is significant because it ensures that no row of data will be split across multiple blocks, and will therefore always be on the same machine. This is not the case for traditional HDFS file formats, which typically use data size to split the file. This optimized data storage will reduce the amount of network bandwidth that is required to serve queries. The second benefit comes from optimizations at the column level, in the form of disk I/O reduction. Since the columns are stored vertically within each row group, the system will be able to seek directly to the required column position in the file, rather than being required to scan across all columns and filter out data that is not necessary. This is extremely useful in queries that only require access to a small subset of the existing columns. RCFiles can be used natively in both Hive and Pig with very little configuration. In Hive CREATE TABLE … STORED AS RCFILE; ALTER TABLE … SET FILEFORMAT RCFILE; SET hive.default.fileformat=RCFile; In Pig: register …/piggybank.jar; a = LOAD '/user/hive/warehouse/table' USING org.apache.pig.piggybank.storage.hiverc.HiveRCInputFormat(…); The Pig jar file referenced here is just one option for enabling the RCFile. At the time of writing, there was also an RCFilePigStorageclass available through Twitter’s Elephant Bird open source library. Hortonworks’ ORCFile and Cloudera’s Parquet formats RCFiles provide optimization for relational files primarily by implementing modifications at the storage level. New innovations have provided improvements on the RCFile format, namely the ORCFile format from Hortonworks and the Parquet format from Cloudera. When storing data using the Optimized Row Columnar file or Parquet formats, several pieces of metadata are automatically written at the column level within each row group; for example, minimum and maximum values for numeric data types and dictionary-style metadata for text data types. The specific metadata is also configurable. One such use case would be for a user to configure the dataset to be sorted on a given set of columns for efficient access. This excess metadata allows for queries to take advantage of an improvement on the original RCFiles–predicate pushdown. That technique allows Hive to evaluate the where clause during the record gathering process, instead of filtering data after all records have been collected. The predicate pushdown technique will evaluate the conditions of the query against the metadata associated with a particular row group, allowing it to skip over entire file blocks if possible, or to seek directly to the correct row. One major benefit of this process is that the more complex a particular where clause is, the more potential there is for row groups and columns to be filtered as irrelevant to the final result. Cloudera’s Parquet format is typically used in conjunction with Impala, but just like with RCFiles, ORCFiles can be incorporated into both Hive and Pig. HCatalog can be used as the primary method to read and write ORCFiles using Pig. The commands for Hive are provided below: In Hive: CREATE TABLE … STORED AS ORC; ALTER TABLE … SET FILEFORMAT ORC SET hive.default.fileformat=Orc Conclusion This post has detailed the alternatives to the default file formats that can be used in Hadoop in order to optimize data access and storage. This information combined with the compression techniques described in the previous post (part 1) will provide some guidelines that can be used to ensure that users can make the most of the Hadoop Data Lake. About the author Kristen Hardwick has been gaining professional experience with software development in parallel computing environments in the private, public, and government sectors since 2007. She has interfaced with several different parallel paradigms including Grid, Cluster, and Cloud. She started her software development career with Dynetics in Huntsville, AL, and then moved to Baltimore, MD, to work for Dynamics Research Corporation. She now works at Spry where her focus is on designing and developing big data analytics for the Hadoop ecosystem.
Read more
  • 0
  • 0
  • 2089

article-image-component-communication-reactjs
Richard Feldman
30 Jun 2014
5 min read
Save for later

Component Communication in React.js

Richard Feldman
30 Jun 2014
5 min read
You can get a long way in React.js solely by having parent components create child components with varying props, and having each component deal only with its own state. But what happens when a child wants to affect its parent’s state or props? Or when a child wants to inspect that parent’s state or props? Or when a parent wants to inspect its child’s state? With the right techniques, you can handle communication between React components without introducing unnecessary coupling. Child Elements Altering Parents Suppose you have a list of buttons, and when you click one, a label elsewhere on the page updates to reflect which button was most recently clicked. Although any button’s click handler can alter that button’s state, the handler has no intrinsic knowledge of the label that we need to update. So how can we give it access to do what we need? The idiomatic approach is to pass a function through props. Like so: var ExampleParent = React.createClass({ getInitialState: function() { return {lastLabelClicked: "none"} }, render: function() { var me = this; var setLastLabel = function(label) { me.setState({lastLabelClicked: label}); }; return <div> <p>Last clicked: {this.state.lastLabelClicked}</p> <LabeledButton label="Alpha Button" setLastLabel={setLastLabel}/> <LabeledButton label="Beta Button" setLastLabel={setLastLabel}/> <LabeledButton label="Delta Button" setLastLabel={setLastLabel}/> </div>; } }); var LabeledButton = React.createClass({ handleClick: function() { this.props.setLastLabel(this.props.label); }, render: function() { return <button onClick={this.handleClick}>{this.props.label}</button>; } }); Note that this does not actually affect the label’s state directly; rather, it affects the parent component’s state, and doing so will cause the parent to re-render the label as appropriate. What if we wanted to avoid using state here, and instead modify the parent’s props? Since props are externally specified, this would be a lot of extra work. Rather than telling the parent to change, the child would necessarily have to tell its parent’s parent—its grandparent, in other words—to change that grandparent’s child. This is not a route worth pursuing; besides being less idiomatic, there is no real benefit to changing the parent’s props when you could change its state instead. Inspecting Props Once created, the only way for a child’s props to “change” is for the child to be recreated when the parent’s render method is called again. This helpfully guarantees that the parent’s render method has all the information needed to determine the child’s props—not only in the present, but for the indefinite future as well. Thus if another of the parent’s methods needs to know the child’s props, like for example a click handler, it’s simply a matter of making sure that data is available outside the parent’s render method. An easy way to do this is to record it in the parent’s state: var ExampleComponent = React.createClass({ handleClick: function() { var buttonStatus = this.state.buttonStatus; // ...do something based on buttonStatus }, render: function() { // Pretend it took some effort to determine this value var buttonStatus = "btn-disabled"; this.setState({buttonStatus: buttonStatus}); return <button className={buttonStatus} onClick={this.handleClick}> Click this button! </button>; } }); It’s even easier to let a child know about its parent’s props: simply have the parent pass along whatever information is necessary when it creates the child. It’s cleaner to pass along only what the child needs to know, but if all else fails you can go as far as to pass in the parent’s entire set of props: var ParentComponent = React.createClass({ render: function() { return <ChildComponent parentProps={this.props} />; } }); Inspecting State State is trickier to inspect, because it can change on the fly. But is it ever strictly necessary for components to inspect each other’s states, or might there be a universal workaround? Suppose you have a child whose click handler cares about its parent’s state. Is there any way we could refactor things such that the child could always know that value, without having to ask the parent directly? Absolutely! Simply have the parent pass the current value of its state to the child as a prop. Whenever the parent’s state changes, it will re-run its render method, so the child (including its click handler) will automatically be recreated with the new prop. Now the child’s click handler will always have an up-to-date knowledge of the parent’s state, just as we wanted. Suppose instead that we have a parent that cares about its child’s state. As we saw earlier with the buttons-and-labels example, children can affect their parent’s states, so we can use that technique again here to refactor our way into a solution. Simply include in the child’s props a function that updates the parent’s state, and have the child incorporate that function into its relevant state changes. With the child thus keeping the parent’s state up to speed on relevant changes to the child’s state, the parent can obtain whatever information it needed simply by inspecting its own state. Takeaways Idiomatic communication between parent and child components can be easily accomplished by passing state-altering functions through props. When it comes to inspecting props and state, a combination of passing props on a need-to-know basis and refactoring state changes can ensure the relevant parties have all the information they need, whenever they need it. About the Author Richard Feldman is a functional programmer who specializes in pushing the limits of browser-based UIs. He’s built a framework that performantly renders hundreds of thousands of shapes in HTML5 canvas, a writing web app that functions like a desktop app in the absence of an Internet connection, and much more in between.
Read more
  • 0
  • 0
  • 6309
Modal Close icon
Modal Close icon