Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials

7010 Articles
Packt
18 Jul 2014
17 min read
Save for later

C10K – A Non-blocking Web Server in Go

Packt
18 Jul 2014
17 min read
This article by Nathan Kozyra, author of Mastering Concurrency in Go, tackles one of the Internet's most famous and esteemed challenges and attempt to solve it with core Go packages. (For more resources related to this topic, see here.) We've built a few usable applications; things we can start with and leapfrog into real systems for everyday use. By doing so, we've been able to demonstrate the basic and intermediate-level patterns involved in Go's concurrent syntax and methodology. However, it's about time we take on a real-world problem—one that has vexed developers (and their managers and VPs) for a great deal of the early history of the Web. In addressing and, hopefully, solving this problem, we'll be able to develop a high-performance web server that can handle a very large volume of live, active traffic. For many years, the solution to this problem was solely to throw hardware or intrusive caching systems at the problem; so, alternately, solving it with programming methodology should excite any programmer. We'll be using every technique and language construct we've learned so far, but we'll do so in a more structured and deliberate way than we have up to now. Everything we've explored so far will come into play, including the following points: Creating a visual representation of our concurrent application Utilizing goroutines to handle requests in a way that will scale Building robust channels to manage communication between goroutines and the loop that will manage them Profiling and benchmarking tools (JMeter, ab) to examine the way our event loop actually works Timeouts and concurrency controls—when necessary—to ensure data and request consistency Attacking the C10K problem The genesis of the C10K problem is rooted in serial, blocking programming, which makes it ideal to demonstrate the strength of concurrent programming, especially in Go. When he asked this in 1999, for many server admins and engineers, serving 10,000 concurrent visitors was something that would be solved with hardware. The notion that a single server on common hardware could handle this type of CPU and network bandwidth without falling over seemed foreign to most. The crux of his proposed solutions relied on producing non-blocking code. Of course, in 1999, concurrency patterns and libraries were not widespread. C++ had some polling and queuing options available via some third-party libraries and the earliest predecessor to multithreaded syntaxes, later available through Boost and then C++11. Over the coming years, solutions to the problem began pouring in across various flavors of languages, programming design, and general approaches. Any performance and scalability problem will ultimately be bound to the underlying hardware, so as always, your mileage may vary. Squeezing 10,000 concurrent connections on a 486 processor with 500 MB of RAM will certainly be more challenging than doing so on a barebones Linux server stacked with memory and multiple cores. It's also worth noting that a simple echo server would obviously be able to assume more cores than a functional web server that returns larger amounts of data and accepts greater complexity in requests, sessions, and so on, as we'll be dealing with here. Failing of servers at 10,000 concurrent connections When the Web was born and the Internet commercialized, the level of interactivity was pretty minimal. If you're a graybeard, you may recall the transition from NNTP/IRC and the like and how extraordinarily rudimentary the Web was. To address the basic proposition of [page request] → [HTTP response], the requirements on a web server in the early 1990s were pretty lenient. Ignoring all of the error responses, header reading, and settings, and other essential (but unrelated to the in → out mechanism) functions, the essence of the early servers was shockingly simple, at least compared to the modern web servers. The first web server was developed by the father of the Web, Tim Berners-Lee. Developed at CERN (such as WWW/HTTP itself), CERN httpd handled many of the things you would expect in a web server today—hunting through the code, you'll find a lot of notation that will remind you that the very core of the HTTP protocol is largely unchanged. Unlike most technologies, HTTP has had an extraordinarily long shelf life. Written in C in 1990, it was unable to utilize a lot of concurrency strategies available in languages such as Erlang. Frankly, doing so was probably unnecessary—the majority of web traffic was a matter of basic file retrieval and protocol. The meat and potatoes of a web server were not dealing with traffic, but rather dealing with the rules surrounding the protocol itself. You can still access the original CERN httpd site and download the source code for yourself from http://www.w3.org/Daemon/. I highly recommend that you do so as both a history lesson and a way to look at the way the earliest web server addressed some of the earliest problems. However, the Web in 1990 and the Web when the C10K question was first posed were two very different environments. By 1999, most sites had some level of secondary or tertiary latency provided by third-party software, CGI, databases, and so on, all of which further complicated the matter. The notion of serving 10,000 flat files concurrently is a challenge in itself, but try doing so by running them on top of a Perl script that accesses a MySQL database without any caching layer; this challenge is immediately exacerbated. By the mid 1990s, the Apache web server had taken hold and largely controlled the market (by 2009, it had become the first server software to serve more than 100 million websites). Apache's approach was rooted heavily in the earliest days of the Internet. At its launch, connections were initially handled first in, first out. Soon, each connection was assigned a thread from the thread pool. There are two problems with the Apache server. They are as follows: Blocking connections can lead to a domino effect, wherein one or more slowly resolved connections could avalanche into inaccessibility Apache had hard limits on the number of threads/workers you could utilize, irrespective of hardware constraints It's easy to see the opportunity here, at least in retrospect. A concurrent server that utilizes actors (Erlang), agents (Clojure), or goroutines (Go) seems to fit the bill perfectly. Concurrency does not solve the C10k problem in itself, but it absolutely provides a methodology to facilitate it. The most notable and visible example of an approach to the C10K problem today is Nginx, which was developed using concurrency patterns, widely available in C by 2002 to address—and ultimately solve—the C10k problem. Nginx, today, represents either the #2 or #3 web server in the world, depending on the source. Using concurrency to attack C10K There are two primary approaches to handle a large volume of concurrent requests. The first involves allocating threads per connection. This is what Apache (and a few others) do. On the one hand, allocating a thread to a connection makes a lot of sense—it's isolated, controllable via the application's and kernel's context switching, and can scale with increased hardware. One problem for Linux servers—on which the majority of the Web lives—is that each allocated thread reserves 8 MB of memory for its stack by default. This can (and should) be redefined, but this imposes a largely unattainable amount of memory required for a single server. Even if you set the default stack size to 1 MB, we're dealing with a minimum of 10 GB of memory just to handle the overhead. This is an extreme example that's unlikely to be a real issue for a couple of reasons: first, because you can dictate the maximum amount of resources available to each thread, and second, because you can just as easily load balance across a few servers and instances rather than add 10 GB to 80 GB of RAM. Even in a threaded server environment, we're fundamentally bound to the issue that can lead to performance decreases (to the point of a crash). First, let's look at a server with connections bound to threads (as shown in the following diagram), and visualize how this can lead to logjams and, eventually, crashes: This is obviously what we want to avoid. Any I/O, network, or external process that can impose some slowdown can bring about that avalanche effect we talked about, such that our available threads are taken (or backlogged) and incoming requests begin to stack up. We can spawn more threads in this model, but as mentioned earlier, there are potential risks there too, and even this will fail to mitigate the underlying problem. Taking another approach In an attempt to create our web server that can handle 10,000 concurrent connections, we'll obviously leverage our goroutine/channel mechanism to put an event loop in front of our content delivery to keep new channels recycled or created constantly. For this example, we'll assume we're building a corporate website and infrastructure for a rapidly expanding company. To do this, we'll need to be able to serve both static and dynamic content. The reason we want to introduce dynamic content is not just for the purposes of demonstration—we want to challenge ourselves to show 10,000 true concurrent connections even when a secondary process gets in the way. As always, we'll attempt to map our concurrency strategy directly to goroutines and channels. In a lot of other languages and applications, this is directly analogous to an event loop, and we'll approach it as such. Within our loop, we'll manage the available goroutines, expire or reuse completed ones, and spawn new ones where necessary. In this example visualization, we show how an event loop (and corresponding goroutines) can allow us to scale our connections without employing too many hard resources such as CPU threads or RAM: The most important step for us here is to manage that event loop. We'll want to create an open, infinite loop to manage the creation and expiration of our goroutines and respective channels. As part of this, we will also want to do some internal logging of what's happening, both for benchmarking and debugging our application. Building our C10K web server Our web server will be responsible for taking requests, routing them, and serving either flat files or dynamic files with templates parsed against a few different data sources. As mentioned earlier, if we exclusively serve flat files and remove much of the processing and network latency, we'd have a much easier time with handling 10,000 concurrent connections. Our goal is to approach as much of a real-world scenario as we can—very little of the Web operates on a single server in a static fashion. Most websites and applications utilize databases, CDNs (Content Delivery Networks), dynamic and uncached template parsing, and so on. We need to replicate them whenever possible. For the sake of simplicity, we'll separate our content by type and filter them through URL routing, as follows: /static/[request]: This will serve request.html directly /template/[request]: This will serve request.tpl after its been parsed through Go /dynamic/[request][number]: This will also serve request.tpl and parse it against a database source's record By doing this, we should get a better mixture of possible HTTP request types that could impede the ability to serve large numbers of users simultaneously, especially in a blocking web server environment. You can find Go's exceptional library to generate safe data-driven templating at http://golang.org/pkg/html/template/. By safe, we're largely referring to the ability to accept data and move it directly into templates without worrying about the sort of injection issues that are behind a large amount of malware and cross-site scripting. For the database source, we'll use MySQL here, but feel free to experiment with other databases if you're more comfortable with them. Like the html/template package, we're not going to put a lot of time into outlining MySQL and/or its variants. Benchmarking against a blocking web server It's only fair to add some starting benchmarks against a blocking web server first so that we can measure the effect of concurrent versus nonconcurrent architecture. For our starting benchmarks, we'll eschew any framework, and we'll go with our old stalwart, Apache. For the sake of completeness here, we'll be using an Intel i5 3GHz machine with 8 GB of RAM. While we'll benchmark our final product on Ubuntu, Windows, and OS X here, we'll focus on Ubuntu for our example. Our localhost domain will have three plain HTML files in /static, each trimmed to 80 KB. As we're not using a framework, we don't need to worry about raw dynamic requests, but only about static and dynamic requests in addition to data source requests. For all examples, we'll use a MySQL database (named master) with a table called articles that will contain 10,000 duplicate entries. Our structure is as follows: CREATE TABLE articles ( article_id INT NOT NULL AUTO_INCREMENT, article_title VARCHAR(128) NOT NULL, article_text VARCHAR(128) NOT NULL, PRIMARY KEY (article_id) ) With ID indexes ranging sequentially from 0-10,000, we'll be able to generate random number requests, but for now, we just want to see what kind of basic response we can get out of Apache serving static pages with this machine. For this test, we'll use Apache's ab tool and then gnuplot to sequentially map the request time as the number of concurrent requests and pages; we'll do this for our final product as well, but we'll also go through a few other benchmarking tools for it to get some better details.   Apache's AB comes with the Apache web server itself. You can read more about it at http://httpd.apache.org/docs/2.2/programs/ab.html. You can download it for Linux, Windows, OS X, and more from http://httpd.apache.org/download.cgi. The gnuplot utility is available for the same operating systems at http://www.gnuplot.info/. So, let's see how we did it. Have a look at the following graph: Ouch! Not even close. There are things we can do to tune the connections available (and respective threads/workers) within Apache, but this is not really our goal. Mostly, we want to know what happens with an out-of-the-box Apache server. In these benchmarks, we start to drop or refuse connections at around 800 concurrent connections. More troubling is that as these requests start stacking up, we see some that exceed 20 seconds or more. When this happens in a blocking server, each request behind it is queued; requests behind that are similarly queued and the entire thing starts to fall apart. Even if we cannot hit 10,000 concurrent connections, there's a lot of room for improvement. While a single server of any capacity is no longer the way we expect a web server environment to be designed, being able to squeeze as much performance as possible out of that server, ostensibly with our concurrent, event-driven approach, should be our goal. Handling requests The Gorilla toolkit certainly makes this easier, but we should also know how to intercept the functionality to impose our own custom handler. Here is a simple web router wherein we handle and direct requests using a custom http.Server struct, as shown in the following code: var routes []string type customRouter struct { } func (customRouter) ServeHTTP(rw http.ResponseWriter, r *http.Request) { fmt.Println(r.URL.Path); } func main() { var cr customRouter; server := &http.Server { Addr: ":9000", Handler:cr, ReadTimeout: 10 * time.Second, WriteTimeout: 10 * time.Second, MaxHeaderBytes: 1 << 20, } server.ListenAndServe() } Here, instead of using a built-in URL routing muxer and dispatcher, we're creating a custom server and custom handler type to accept URLs and route requests. This allows us to be a little more robust with our URL handling. In this case, we created a basic, empty struct called customRouter and passed it to our custom server creation call. We can add more elements to our customRouter type, but we really don't need to for this simple example. All we need to do is to be able to access the URLs and pass them along to a handler function. We'll have three: one for static content, one for dynamic content, and one for dynamic content from a database. Before we go so far though, we should probably see what our absolute barebones, HTTP server written in Go, does when presented with the same traffic that we sent Apache's way. By old school, we mean that the server will simply accept a request and pass along a static, flat file. You could do this using a custom router as we did earlier, taking requests, opening files, and then serving them, but Go provides a much simpler mode to handle this basic task in the http.FileServer method. So, to get some benchmarks for the most basic of Go servers against Apache, we'll utilize a simple FileServer and test it against a test.html page (which contains the same 80 KB file that we had with Apache). As our goal with this test is to improve our performance in serving flat and dynamic pages, the actual specs for the test suite are somewhat immaterial. We'd expect that while the metrics will not match from environment to environment, we should see a similar trajectory. That said, it's only fair we supply the environment used for these tests; in this case, we used a MacBook Air with a 1.4 GHz i5 processor and 4 GB of memory. First, we'll do this with our absolute best performance out of the box with Apache, which had 850 concurrent connections and 900 total requests. The results are certainly encouraging as compared to Apache. Neither of our test systems were tweaked much (Apache as installed and basic FileServer in Go), but Go's FileServer handles 1,000 concurrent connections without so much as a blip, with the slowest clocking in at 411 ms. Apache has made a great number of strides pertaining to concurrency and performance options in the last five years, but to get there does require a bit of tuning and testing. The intent of this experiment is not intended to denigrate Apache, which is well tested and established. Instead, it's to compare the out-of-the-box performance of the world's number 1 web server against what we can do with Go. To really get a baseline of what we can achieve in Go, let's see if Go's FileServer can hit 10,000 connections on a single, modest machine out of the box: ab -n 10500 -c 10000 -g test.csv http://localhost:8080/a.html We will get the following output: Success! Go's FileServer by itself will easily handle 10,000 concurrent connections, serving flat, static content. Of course, this is not the goal of this particular project—we'll be implementing real-world obstacles such as template parsing and database access, but this alone should show you the kind of starting point that Go provides for anyone who needs a responsive server that can handle a large quantity of basic web traffic. Routing requests So, let's take a step back and look again at routing our traffic through a traditional web server to include not only our static content, but also the dynamic content. We'll want to create three functions that will route traffic from our customRouter:serveStatic():: read function and serve a flat file serveRendered():, parse a template to display serveDynamic():, connect to MySQL, apply data to a struct, and parse a template. To take our requests and reroute, we'll change the ServeHTTP method for our customRouter struct to handle three regular expressions. For the sake of brevity and clarity, we'll only be returning data on our three possible requests. Anything else will be ignored. In a real-world scenario, we can take this approach to aggressively and proactively reject connections for requests we think are invalid. This would include spiders and nefarious bots and processes, which offer no real value as nonusers.
Read more
  • 0
  • 0
  • 9801

article-image-hdfs-and-mapreduce
Packt
17 Jul 2014
11 min read
Save for later

HDFS and MapReduce

Packt
17 Jul 2014
11 min read
(For more resources related to this topic, see here.) Essentials of HDFS HDFS is a distributed filesystem that has been designed to run on top of a cluster of industry standard hardware. The architecture of HDFS is such that there is no specific need for high-end hardware. HDFS is a highly fault-tolerant system and can handle failures of nodes in a cluster without loss of data. The primary goal behind the design of HDFS is to serve large data files efficiently. HDFS achieves this efficiency and high throughput in data transfer by enabling streaming access to the data in the filesystem. The following are the important features of HDFS: Fault tolerance: Many computers working together as a cluster are bound to have hardware failures. Hardware failures such as disk failures, network connectivity issues, and RAM failures could disrupt processing and cause major downtime. This could lead to data loss as well slippage of critical SLAs. HDFS is designed to withstand such hardware failures by detecting faults and taking recovery actions as required. The data in HDFS is split across the machines in the cluster as chunks of data called blocks. These blocks are replicated across multiple machines of the cluster for redundancy. So, even if a node/machine becomes completely unusable and shuts down, the processing can go on with the copy of the data present on the nodes where the data was replicated. Streaming data: Streaming access enables data to be transferred in the form of a steady and continuous stream. This means if data from a file in HDFS needs to be processed, HDFS starts sending the data as it reads the file and does not wait for the entire file to be read. The client who is consuming this data starts processing the data immediately, as it receives the stream from HDFS. This makes data processing really fast. Large data store: HDFS is used to store large volumes of data. HDFS functions best when the individual data files stored are large files, rather than having large number of small files. File sizes in most Hadoop clusters range from gigabytes to terabytes. The storage scales linearly as more nodes are added to the cluster. Portable: HDFS is a highly portable system. Since it is built on Java, any machine or operating system that can run Java should be able to run HDFS. Even at the hardware layer, HDFS is flexible and runs on most of the commonly available hardware platforms. Most production level clusters have been set up on commodity hardware. Easy interface: The HDFS command-line interface is very similar to any Linux/Unix system. The commands are similar in most cases. So, if one is comfortable with the performing basic file actions in Linux/Unix, using commands with HDFS should be very easy. The following two daemons are responsible for operations on HDFS: Namenode Datanode The namenode and datanodes daemons talk to each other via TCP/IP. Configuring HDFS All HDFS-related configuration is done by adding/updating the properties in the hdfs-site.xml file that is found in the conf folder under the Hadoop installation folder. The following are the different properties that are part of the hdfs-site.xml file: dfs.namenode.servicerpc-address: This specifies the unique namenode RPC address in the cluster. Services/daemons such as the secondary namenode and datanode daemons use this address to connect to the namenode daemon whenever it needs to communicate. This property is shown in the following code snippet: <property> <name>dfs.namenode.servicerpc-address</name> <value>node1.hcluster:8022</value> </property> dfs.namenode.http-address: This specifies the URL that can be used to monitor the namenode daemon from a browser. This property is shown in the following code snippet: <property> <name>dfs.namenode.http-address</name> <value>node1.hcluster:50070</value> </property> dfs.replication: This specifies the replication factor for data block replication across the datanode daemons. The default is 3 as shown in the following code snippet: <property> <name>dfs.replication</name> <value>3</value> </property> dfs.blocksize: This specifies the block size. In the following example, the size is specified in bytes (134,217,728 bytes is 128 MB): <property> <name>dfs.blocksize</name> <value>134217728</value> </property> fs.permissions.umask-mode: This specifies the umask value that will be used when creating files and directories in HDFS. This property is shown in the following code snippet: <property> <name>fs.permissions.umask-mode</name> <value>022</value> </property> The read/write operational flow in HDFS To get a better understanding of HDFS, we need to understand the flow of operations for the following two scenarios: A file is written to HDFS A file is read from HDFS HDFS uses a single-write, multiple-read model, where the files are written once and read several times. The data cannot be altered once written. However, data can be appended to the file by reopening it. All files in the HDFS are saved as data blocks. Writing files in HDFS The following sequence of steps occur when a client tries to write a file to HDFS: The client informs the namenode daemon that it wants to write a file. The namenode daemon checks to see whether the file already exists. If it exists, an appropriate message is sent back to the client. If it does not exist, the namenode daemon makes a metadata entry for the new file. The file to be written is split into data packets at the client end and a data queue is built. The packets in the queue are then streamed to the datanodes in the cluster. The list of datanodes is given by the namenode daemon, which is prepared based on the data replication factor configured. A pipeline is built to perform the writes to all datanodes provided by the namenode daemon. The first packet from the data queue is then transferred to the first datanode daemon. The block is stored on the first datanode daemon and is then copied to the next datanode daemon in the pipeline. This process goes on till the packet is written to the last datanode daemon in the pipeline. The sequence is repeated for all the packets in the data queue. For every packet written on the datanode daemon, a corresponding acknowledgement is sent back to the client. If a packet fails to write onto one of the datanodes, the datanode daemon is removed from the pipeline and the remainder of the packets is written to the good datanodes. The namenode daemon notices the under-replication of the block and arranges for another datanode daemon where the block could be replicated. After all the packets are written, the client performs a close action, indicating that the packets in the data queue have been completely transferred. The client informs the namenode daemon that the write operation is now complete. The following diagram shows the data block replication process across the datanodes during a write operation in HDFS: Reading files in HDFS The following steps occur when a client tries to read a file in HDFS: The client contacts the namenode daemon to get the location of the data blocks of the file it wants to read. The namenode daemon returns the list of addresses of the datanodes for the data blocks. For any read operation, HDFS tries to return the node with the data block that is closest to the client. Here, closest refers to network proximity between the datanode daemon and the client. Once the client has the list, it connects the closest datanode daemon and starts reading the data block using a stream. After the block is read completely, the connection to datanode is terminated and the datanode daemon that hosts the next block in the sequence is identified and the data block is streamed. This goes on until the last data block for that file is read. The following diagram shows the read operation of a file in HDFS: Understanding the namenode UI Hadoop provides web interfaces for each of its services. The namenode UI or the namenode web interface is used to monitor the status of the namenode and can be accessed using the following URL: http://<namenode-server>:50070/ The namenode UI has the following sections: Overview: The general information section provides basic information of the namenode with options to browse the filesystem and the namenode logs. The following is the screenshot of the Overview section of the namenode UI: The Cluster ID parameter displays the identification number of the cluster. This number is same across all the nodes within the cluster. A block pool is a set of blocks that belong to a single namespace. The Block Pool Id parameter is used to segregate the block pools in case there are multiple namespaces configured when using HDFS federation. In HDFS federation, multiple namenodes are configured to scale the name service horizontally. These namenodes are configured to share datanodes amongst themselves. We will be exploring HDFS federation in detail a bit later. Summary: The following is the screenshot of the cluster's summary section from the namenode web interface: Under the Summary section, the first parameter is related to the security configuration of the cluster. If Kerberos (the authorization and authentication system used in Hadoop) is configured, the parameter will show as Security is on. If Kerberos is not configured, the parameter will show as Security is off. The next parameter displays information related to files and blocks in the cluster. Along with this information, the heap and non-heap memory utilization is also displayed. The other parameters displayed in the Summary section are as follows: Configured Capacity: This displays the total capacity (storage space) of HDFS. DFS Used: This displays the total space used in HDFS. Non DFS Used: This displays the amount of space used by other files that are not part of HDFS. This is the space used by the operating system and other files. DFS Remaining: This displays the total space remaining in HDFS. DFS Used%: This displays the total HDFS space utilization shown as percentage. DFS Remaining%: This displays the total HDFS space remaining shown as percentage. Block Pool Used: This displays the total space utilized by the current namespace. Block Pool Used%: This displays the total space utilized by the current namespace shown as percentage. As you can see in the preceding screenshot, in this case, the value matches that of the DFS Used% parameter. This is because there is only one namespace (one namenode) and HDFS is not federated. DataNodes usages% (Min, Median, Max, stdDev): This displays the usages across all datanodes in the cluster. This helps administrators identify unbalanced nodes, which may occur when data is not uniformly placed across the datanodes. Administrators have the option to rebalance the datanodes using a balancer. Live Nodes: This link displays all the datanodes in the cluster as shown in the following screenshot: Dead Nodes: This link displays all the datanodes that are currently in a dead state in the cluster. A dead state for a datanode daemon is when the datanode daemon is not running or has not sent a heartbeat message to the namenode daemon. Datanodes are unable to send heartbeats if there exists a network connection issue between the machines that host the datanode and namenode daemons. Excessive swapping on the datanode machine causes the machine to become unresponsive, which also prevents the datanode daemon from sending heartbeats. Decommissioning Nodes: This link lists all the datanodes that are being decommissioned. Number of Under-Replicated Blocks: This represents the number of blocks that have not replicated as per the replication factor configured in the hdfs-site.xml file. Namenode Journal Status: The journal status provides location information of the fsimage file and the state of the edits logfile. The following screenshot shows the NameNode Journal Status section: NameNode Storage: The namenode storage table provides the location of the fsimage file along with the type of the location. In this case, it is IMAGE_AND_EDITS, which means the same location is used to store the fsimage file as well as the edits logfile. The other types of locations are IMAGE, which stores only the fsimage file and EDITS, which stores only the edits logfile. The following screenshot shows the NameNode Storage information:
Read more
  • 0
  • 0
  • 9726

article-image-keeping-site-secure
Packt
17 Jul 2014
9 min read
Save for later

Keeping the Site Secure

Packt
17 Jul 2014
9 min read
(For more resources related to this topic, see here.) Choosing a web host that meets your security requirements In this article, you'll learn what you, as the site administrator, can do to keep your site safe. However, there are also some basic but critical security measures that your web hosting company should take. You'll probably have a shared hosting account, where multiple sites are hosted on one server computer and each site has access to their part of the available disk space and resources. Although this is much cheaper than hiring a dedicated server to host your site, it does involve some security risks. Good web hosting companies take precautions to minimize these risks. When selecting your web host, it's worth checking if they have a good reputation of keeping their security up to standards. The official Joomla resources site (http://resources.joomla.org/directory/support-services/hosting.html) features hosting companies that fully meet the security requirements of a typical Joomla-powered site. Tip 1 – Download from reliable sources To avoid installing corrupted versions, it's a good idea to download the Joomla software itself only from the official website (www.joomla.org) or from reliable local Joomla community sites. This is also true when downloading third-party extensions. Use only extensions that have a good reputation; you can check the reviews at www.extensions.joomla.org. Preferably, download extensions only from the original developer's website or from Joomla community websites that have a good reputation. Tip 2 – Update regularly The Joomla development team regularly releases updates to fix bugs and security issues. Fortunately, Joomla makes keeping your site up to date effortless. In the backend control panel, you'll find a MAINTENANCE section in the quick icons column displaying the current status of both the Joomla software itself and of installed extensions. This is shown in the following screenshot: If updates are found, the quick icon text will prompt you to update by adding an Update now! link, as shown in the following screenshot: Clicking on this link takes you to the Joomla! Update component (which can also be found by navigating to Components | Joomla! Update). In this window, you'll see the details of the update. Just click on Install the update; the process is fully automated. Before you upgrade Joomla to an updated version, it's a good idea to create a backup of your current site. If anything goes wrong, you can quickly have it up and running again. See the Tip 6 – Protect files and directories section in this article for more information on creating backups. If updates for installed extensions are available, you'll also see a notice stating this in the MAINTENANCE section of the control panel. However, you can also check for updates manually; in the backend, navigate to Extensions | Extension Manager and click on Update in the menu on the left-hand side. Click on Find Updates to search for updates. After you've clicked on the Find Updates button, you'll see a notice informing you whether updates are available or not. Select the update you want to install and click on the Update button. Be patient; you may not see much happening for a while. After completion, a message is displayed that the available updates have been installed successfully. The update functionality only works for extensions that support it. It's to be expected that this feature will be widely supported by extension developers; but for other extensions, you'll still have to manually check for updates by visiting the extension developer's website. The Joomla update packages are stored in your website's tmp directory before the update is installed on your site. After installation, you can remove these files from the tmp directory to avoid running into disc space problems. Tip 3 – Choose a safe administrator username When you install Joomla, you also choose and enter a username for your login account (the critical account of the almighty Super User). Although you can enter any administrator username when installing Joomla, many people enter a generic administrator username, for example, admin. However, this username is far too common, and therefore poses a security risk; hackers only have to guess your password to gain access to your site. If you haven't come up with something different during the installation process, you can change the administrator username later on. You can do this using the following steps: In the backend of the site, navigate to Users | User Manager. Select the Super User user record. In the Edit Profile screen, enter a new Login Name. Be creative! Click on Save & Close to apply changes. Log out and log in to the backend with the new username. Tip 4 – Pick a strong password Pick an administrator password that isn't easy to guess. It's best to have a password that's not too short; 8 or more characters is fine. Use a combination of uppercase letters, lowercase letters, numbers, and special characters. This should guarantee a strong password. Don't use the same username and password you use for other online accounts, and regularly change your password. You can create a new password anytime in the backend User Manager section in the same way as you enter or change a username (see Tip 2 – Update regularly). Tip 5 – Use Two-Factor authentication By default, logging in to the administrative interface of your site just requires the correct combination of username and password. A much more secure way to log in is using the Two-Factor authentication system, a recent addition to Joomla. It requires you to log in with not just your username and password, but also with a six-digit security code. To get this code, you need to use the Google Authenticator app, which is available for most Android, iOS, Blackberry, Windows 8, and Windows Mobile devices. The app doesn't store the six-digit code; it changes the code every 30 seconds. Two-Factor authentication is a great solution, but it does require a little extra effort every time you log in to your site. You need to have the app ready every time you log in to generate a new access code. However, you can selectively choose which users will require this system. You can decide, for example, that only the site administrator has to log in using Two-Factor authentication. Enabling the Two-Factor authentication system of Joomla! To enable Joomla's Two-Factor authentication, where a user has to enter an additional secret key when logging in to the site, follow these steps: To use this system on your site, first enable the Two-Factor authentication plugin that comes with Joomla. In the Joomla backend, navigate to Extensions | Plugin Manager, select Two Factor Authentication – Google Authenticator, and enable it. Next, download and install the Google Authenticator app for your device. Once this is set up, you can set the authentication procedure for any user in the Joomla backend. In the user manager section, click on the account name. Then, click on the Two Factor Authentication tab and select Google Authenticator from the Authentication method dropdown. This is shown in the following screenshot: Joomla will display the account name and the key for this account. Enter these account details in the Google Authenticator app on your device. The app will generate a security code. In the Joomla backend, enter this in the Security Code field, as shown in the following screenshot: Save your changes. From now on, the Joomla login screen will ask you for your username, password, and a secret key. This is shown in the following screenshot: There are other secure login systems available besides the Google Authenticator app. Joomla also supports Yubico's YubiKey, a USB stick that generates an additional password every time it is used. After entering their usual password, the user inserts the YubiKey into the USB port of the computer and presses the YubiKey button. Automatically, the extra one-time password will be entered in Joomla's Secret Key field. For more information on YubiKey, visit http://www.yubico.com. Tip 6 – Protect files and directories Obviously, you don't want everybody to be able to access the Joomla files and folders on the web server. You can protect files and folders by setting access permissions using the Change Mode (CHMOD) commands. Basically, the CHMOD settings tell the web server who has access to a file or folder and who is allowed to read it, write to it, or to execute a file (run it as a program). Once your Joomla site is set up and everything works OK, you can use CHMOD to change permissions. You don't use Joomla to change the CHMOD settings; these are set with FTP software. You can make this work using the following steps: In your FTP program, right-click on the name of the file or directory you want to protect. In this example, we'll use the open source FTP program FileZilla. In the right-click menu, select File Permissions. You'll be presented with a pop-up screen. Here, you can check permissions and change them by selecting the appropriate options, as shown in the following screenshot: As you can see, it's possible to set permissions for the file owner (that's you), for group members (that's likely to be only you too), and for the public (everyone else). The public permissions are the tricky part; you should restrict public permissions as much as possible. When changing the permission settings, the file permissions number (the value in the Numeric value: field in the previous screenshot) will change accordingly. Every combination of settings has its particular number. In the previous example, this number is 644. Click on OK to execute the CHMOD command and set file permissions. Setting file permissions What files should you protect and what CHMOD settings should you choose? Here are a few pointers: By default, permissions for files are set to 644. Don't change this; it's a safe value. For directories, a safe setting is 750 (which doesn't allow any public access). However, some extensions may need access to certain directories. In such cases, the 750 setting may result in error messages. In this case, set permissions to 755. Never leave permissions for a file or directory set to 777. This allows everybody to write data to it. You can also block direct access to critical directories using a .htaccess file. This is a special file containing instructions for the Apache web server. Among other things, it tells the web server who's allowed access to the directory contents. You can add a .htaccess file to any folder on the server by using specific instructions. This is another way to instruct the web server to restrict access. Check the Joomla security documentation at www.joomla.org for instructions.
Read more
  • 0
  • 0
  • 11079

article-image-article-test123
Packt
16 Jul 2014
13 min read
Save for later

test-article

Packt
16 Jul 2014
13 min read
Service Invocation Time for action – creating the book warehousing process Let's create the BookWarehousingBPEL BPEL process: We will open the SOA composite by double-clicking on the Bookstore in the process tree. In the composite, we can see both Bookstore BPEL processes and their WSDL interfaces. We will add a new BPEL process by dragging-and-dropping the BPEL Process service component from the right-hand side toolbar. An already-familiar dialog window will open, where we will specify the BPEL 2.0 version, enter the process name as BookWarehousingBPEL, modify the namespace to http://packtpub.com/Bookstore/BookWarehousingBPEL, and select Synchronous BPEL Process. We will leave all other fields to their default values: Insert image 8963EN_02_03.png Next, we will wire the BookWarehousingBPEL component with the BookstoreABPEL and BookstoreBBPEL components. This way, we will establish a partner link between them. First, we will create the wire to the BookstoreBBPEL component (although the order doesn't matter). To do this, you have to click on the bottom-right side of the component. Once you place the mouse pointer above the component, circles on the edges will appear. You need to start with the circle labelled as Drag to add a new Reference and connect it with the service interface circle, as shown in the following figure: Insert image 8963EN_02_04.png You do the same to wire the BookWarehousingBPEL component with the BookstoreABPEL component: Insert image 8963EN_02_05.png We should see the following screenshot. Please notice that the BookWarehousingBPEL component is wired to the BookstoreABPEL and BookstoreBBPEL components: Insert image 8963EN_02_06.png What just happened? We have added the BookWarehousingBPEL component to our SOA composite and wired it to the BookstoreABPEL and BookstoreBBPEL components. Creating the wires between components means that references have been created and relations between components have been established. In other words, we have expressed that the BookWarehousingBPEL component will be able to invoke the BookstoreABPEL and BookstoreBBPEL components. This is exactly what we want to achieve with our BookWarehousingBPEL process, which will orchestrate both bookstore BPELs. Once we have created the components and wired them accordingly, we are ready to implement the BookWarehousingBPEL process. What just happened? We have implemented the process flow logic for the BookWarehousingBPEL process. It actually orchestrated two other services. It invoked the BookstoreA and BookstoreB BPEL processes that we've developed in the previous chapter. Here, these two BPEL processes acted like services. First, we have created the XML schema elements for the request and response messages of the BPEL process. We have created four variables: BookstoreARequest, BookstoreAResponse, BookstoreBRequest, and BookstoreBResponse. The BPEL code for variable declaration looks like the code shown in the following screenshot: Insert image 8963EN_02_18b.png Then, we have added the <assign> activity to prepare a request for both the bookstore BPEL processes. Then, we have copied the BookData element from inputVariable to the BookstoreARequest and BookstoreBRequest variables. The following BPEL code has been generated: Insert image 8963EN_02_18c.png Next, we have invoked both the bookstore BPEL services using the <invoke> activity. The BPEL source code reads like the following screenshot: Insert image 8963EN_02_18d.png Finally, we have added the <if> activity to select the bookstore with the lowest stock quantity: Insert image 8963EN_02_18e.png With this, we have concluded the development of the book warehousing BPEL process and are ready to deploy and test it. Deploying and testing the book warehousing BPEL We will deploy the project to the SOA Suite process server the same way as we did in the previous chapter, by right-clicking on the project and selecting Deploy. Then, we will navigate through the options. As we redeploy the whole SOA composite, make sure the Overwrite option is selected. You should check if the compilation and the deployment have been successful. Then, we will log in to the Enterprise Manager console, select the project bookstore, and click on the Test button. Be sure to select bookwarehousingbpel_client for the test. After entering the parameters, as shown in the following figure, we can click on the Test Web Service button to initiate the BPEL process, and we should receive an answer (Bookstore A or Bookstore B, depending on the data that we have entered). Remember that our BookWarehousingBPEL process actually orchestrated two other services. It invoked the BookstoreA and BookstoreB BPEL processes. To verify this, we can launch the flow trace (click on the Launch Flow Trace button) in the Enterprise Manager console, and we will see that the two bookstore BPEL processes have been invoked, as shown: Insert image 8963EN_02_19.png An even more interesting view is the flow view, which also shows that both bookstore services have been invoked. Click on BookWarehousingBPEL to open up the audit trail: Insert image 8963EN_02_20.png Understanding partner links When invoking services, we have often mentioned partner links. Partner links denote all the interactions that a BPEL process has with the external services. There are two possibilities: The BPEL process invokes operations on other services. The BPEL process receives invocations from clients. One of the clients is the user of the BPEL process, who makes the initial invocation. Other clients are services, for example, those that have been invoked by the BPEL process, but may return replies. Links to all the parties that BPEL interacts with are called partner links. Partner links can be links to services that are invoked by the BPEL process. These are sometimes called invoked partner links. Partner links can also be links to clients, and can invoke the BPEL process. Such partner links are sometimes called client partner links. Note that each BPEL process has at least one client partner link, because there has to be a client that first invokes the BPEL process. Usually, a BPEL process will also have several invoked partner links, because it will most likely invoke several services. In our case, the BookWarehousingBPEL process has one client partner link and two invoked partner links, BookstoreABPEL and BookstoreBBPEL. You can observe this on the SOA composite design view and on the BPEL process itself, where the client partner links are located on the left-hand side, while the invoked partner links are located on the right-hand side of the design view. Partner link types Describing situations where the service is invoked by the process and vice versa requires selecting a certain perspective. We can select the process perspective and describe the process as requiring portTypeA on the service and providing portTypeB to the service. Alternatively, we select the service perspective and describe the service as offering portTypeA to the BPEL process and requiring portTypeB from the process. Partner link types allow us to model such relationships as a third party. We are not required to take a certain perspective; rather, we just define roles. A partner link type must have at least one role and can have at most two roles. The latter is the usual case. For each role, we must specify a portType that is used for interaction. Partner link types are defined in the WSDLs. If we take a closer look at the BookWarehousingBPEL.wsdl file, we can see the following partner link type definition: Insert image 8963EN_02_22.png If we specify only one role, we express the willingness to interact with the service, but do not place any additional requirements on the service. Sometimes, existing services will not define a partner link type. Then, we can wrap the WSDL of the service and define partner link types ourselves. Now that we have become familiar with the partner link types and know that they belong to WSDL, it is time to go back to the BPEL process definition, more specifically to the partner links. Defining partner links Partner links are concrete references to services that a BPEL business process interacts with. They are specified near the beginning of the BPEL process definition document, just after the <process> tag. Several <partnerLink> definitions are nested within the <partnerLinks> element: <process ...>   <partnerLinks>   <partnerLink ... /> <partnerLink ... />        ...      </partnerLinks>   <sequence>      ... </sequence> </process> For each partner link, we have to specify the following: name: This serves as a reference for interactions via that partner link. partnerLinkType: This defines the type of the partner link. myRole: This indicates the role of the BPEL process itself. partnerRole: This indicates the role of the partner. initializePartnerRole: This indicates whether the BPEL engine should initialize the partner link's partner role value. This is an optional attribute and should only be used with partner links that specify the partner role. We define both roles (myRole and partnerRole) only if the partnerLinkType specifies two roles. If the partnerLinkType specifies only one role, the partnerLink also has to specify only one role—we omit the one that is not needed. Let's go back to our example, where we have defined the BookstoreABPEL partner link type. To define a partner link, we need to specify the partner roles, because it is a synchronous relation. The definition is shown in the following code excerpt: Insert image 8963EN_02_23.png With this, we have concluded the discussion on partner links and partner link types. We will continue with the parallel service invocation. Parallel service invocation BPEL also supports parallel service invocation. In our example, we have invoked Bookstore A and Bookstore B sequentially. This way, we need to wait for the response from the first service and then for the response from the second service. If these services take longer to execute, the response times will be added together. If we invoke both services in parallel, we only need to wait for the duration of the longer-lasting call, as shown in the following screenshot: BPEL has several possibilities for parallel flows, which will be described in detail in Chapter 8, Dynamic Parallel Invocations. Here, we present the basic parallel service invocation using the <flow> activity. Insert image 8963EN_02_24.png To invoke services in parallel, or to perform any other activities in parallel, we can use the <flow> activity. Within the <flow> activity, we can nest an arbitrary number of flows, and all will execute in parallel. Let's try and modify our example so that Bookstore A and B will be invoked in parallel. Time for action – developing parallel flows Let's now modify the BookWarehousingBPEL process so that the BookstoreA and BookstoreB services will be invoked in parallel. We should do the following: To invoke BookstoreA and BookstoreB services in parallel, we need to add the Flow structured activity to the process flow just before the first invocation, as shown in the following screenshot: Insert image 8963EN_02_25.png We see that two parallel branches have been created. We simply drag-and-drop both the invoke activities into the parallel branches: Insert image 8963EN_02_26.png That's all! We can create more parallel branches if we need, by clicking on the Add Sequence icon. What just happened? We have modified the BookWarehousingBPEL process so that the BookstoreA and BookstoreB <invoke> activities are executed in parallel. A corresponding <flow>activity has been created in the BPEL source code. Within the <flow> activity, both <invoke> activities are nested. Please notice that each <invoke> activity is placed within its own <sequence> activity. T his would make sense if we would require more than one activity in each parallel branch. The BPEL source code looks like the one shown in the following screenshot: Insert image 8963EN_02_27.png Deploying and testing the parallel invocation We will deploy the project to the SOA Suite process server the same way we did in the previous sample. Then, we will log in to the Enterprise Manager console, select the project Bookstore, and click on the Test Web Service button. To observe that the services have been invoked in parallel, we can launch the flow trace (click on the Launch Flow Trace button) in the Enterprise Manager console, click on the book warehousing BPEL processes, and activate the flow view, which shows that both bookstore services have been invoked in parallel. Understanding parallel flow To invoke services concurrently, we can use the <flow> construct. In the following example, the three <invoke> operations will perform concurrently: <process ...>    ... <sequence> <!-- Wait for the incoming request to start the process --> <receive ... />   <!-- Invoke a set of related services, concurrently --> <flow> <invoke ... /> <invoke ... /> <invoke ... /> </flow>      ... <!-- Return the response --> <reply ... /> </sequence> </process> We can also combine and nest the <sequence> and <flow> constructs that allow us to define several sequences that execute concurrently. In the following example, we have defined two sequences, one that consists of three invocations, and one with two invocations. Both sequences will execute concurrently: <process ...>    ... <sequence>   <!-- Wait for the incoming request to start the process --> <receive ... />   <!-- Invoke two sequences concurrently --> <flow> <!-- The three invokes below execute sequentially --> <sequence> <invoke ... /> <invoke ... /> <invoke ... /> </sequence> <!-- The two invokes below execute sequentially --> <sequence> <invoke ... /> <invoke ... /> </sequence> </flow>        ... <!-- Return the response --> <reply ... /> </sequence> </process> We can use other activities as well within the <flow> activity to achieve parallel execution. With this, we have concluded our discussion on the parallel invocation. Pop quiz – service invocation Q1. Which activity is used to invoke services from BPEL processes? <receive> <invoke> <sequence> <flow> <process> <reply> Q2. Which parameters do we need to specify for the <invoke> activity? endpointURL partnerLink operationName operation portType portTypeLink Q3. In which file do we declare partner link types? Q4. Which activity is used to execute service invocation and other BPEL activities in parallel? <receive> <invoke> <sequence> <flow> <process> <reply> Summary In this chapter, we learned how to invoke services and orchestrate services. We explained the primary mission of BPEL—service orchestration. It follows the concept of programming-in-the-large. We have developed a BPEL process that has invoked two services and orchestrated them. We have become familiar with the <invoke> activity and understood the service invocation's background, particularly partner links and partner link types. We also learned that from BPEL, it is very easy to invoke services in parallel. To achieve this, we use the <flow> activity. Within <flow>, we can not only nest several <invoke> activities but also other BPEL activities. Now, we are ready to learn about variables and data manipulation, which we will do in the next chapter.  
Read more
  • 0
  • 0
  • 6699

article-image-securing-wal-stream
Packt
15 Jul 2014
6 min read
Save for later

Securing the WAL Stream

Packt
15 Jul 2014
6 min read
(For more resources related to this topic, see here.) The primary mechanism that PostgreSQL uses to provide a data durability guarantee is through its Write Ahead Log (WAL). All transactional data is written to this location before ever being committed to database files. Once WAL files are no longer necessary for crash recovery, PostgreSQL will either delete or archive them. For the purposes of a highly available server, we recommend that you keep these important files as long as possible. There are several reasons for this; they are as follows: Archived WAL files can be used for Point In Time Recovery (PITR) If you are using streaming replication, interrupted streams can be re-established by applying WAL files until the replica has caught up WAL files can be reused to service multiple server copies In order to gain these benefits, we need to enable PostgreSQL WAL archiving and save these files until we no longer need them. This section will address our recommendations for long term storage of WAL files. Getting ready In order to properly archive WAL files, we recommend that you provision a server dedicated to backups or file storage. Depending on the transaction volume, an active PostgreSQL database might produce thousands of these on a daily basis. At 16 MB apiece, this is not an idle concern. For instance, for a 1 TB database, we recommend at least 3 TB of storage space. In addition, we will be using rsync as a daemon on this archive server. To install this on a Debian-based server, execute the following command as a root-level user: sudo apt-get install rsync Red-Hat-based systems will need this command instead: sudo yum install rsync xinetd How to do it... Our archive server has a 3 TB mount at the /db directory and is named arc_server on our network. The PostgreSQL source server resides at 192.168.56.10. Follow these steps for long-term storage of important WAL files on an archive server Enable rsync to run as a daemon on the archive server. On Debian based systems, edit the /etc/default/rsync file and change the RSYNC_ENABLE variable to true. On Red-Hat-based systems, edit the /etc/xinet.d/rsync file and change the disable parameter to no. Create a directory to store archived WAL files as the postgres user with these commands: sudo mkdir /db/pg_archived sudo chown postgres:postgres /db/pg_archived Create a file named /etc/rsyncd.conf and fill it with the following contents: [wal_store] path = /db/pg_archived comment = DB WAL Archives uid = postgres gid = postgres read only = false hosts allow = 192.168.56.10 hosts deny = * Start the rsync daemon. Debian-based systems should execute the following command: sudo service rsync start Red-Hat-based systems can start rsync with this command instead: sudo service xinetd start Change the archive_mode and archive_command parameters in postgresql.conf to read the following: archive_mode = on archive_command = 'rsync -aq %p arc_server::wal_store/%f' Restart the PostgreSQL server with a command similar to this: pg_ctl -D $PGDATA restart How it works The rsync utility is normally used to transfer files between two servers. However, we can take advantage of using it as a daemon to avoid connection overhead imposed by using SSH as an rsync protocol. Our first step is to ensure that the service is not disabled in some manner, which would make the rest of this guide moot. Next, we need a place to store archived WAL files on the archive server. Assuming that we have 3 TB of space in the /db directory, we simply claim /db/pg_archived as the desired storage location. There should be enough space to use /db for backups as well, but we won't discuss that here. Next, we create a file named /etc/rsyncd.conf, which will configure how rsync operates as a daemon. Here, we name the /db/pg_archived directory wal_store so that we can address the path by its name when sending files. We give it a human-readable name and ensure that files are owned by the postgres user, as this user also controls most of the PostgreSQL-related services. The next, and possibly the most important step, is to block all hosts but the primary PostgreSQL server from writing to this location. We set hosts deny to *, which blocks every server. Then, we set hosts allow to the primary database server's IP address so that only it has access. If everything goes well, we can start the rsync (or xinetd on Red Hat systems) service and we can see that in the following screenshot: Next, we enable archive_mode by setting it to on. With archive mode enabled, we can specify a command that will execute when PostgreSQL no longer needs a WAL file for crash recovery. In this case, we invoke the rsync command with the -a parameter to preserve elements such as file ownership, timestamps, and so on. In addition, we specify the -q setting to suppress output, as PostgreSQL only checks the command exit status to determine its success. In the archive_command setting, the %p value represents the full path to the WAL file, and %f resolves to the filename. In this context, we're sending the WAL file to the archive server at the wal_store module we defined in rsyncd.conf. Once we restart PostgreSQL, it will start storing all the old WAL files by sending them to the archive server. In case any rsync command fails because the archive server is unreachable, PostgreSQL will keep trying to send it until it is successful. If the archive server is unreachable for too long, we suggest that you change the archive_command setting to store files elsewhere. This prevents accidentally overfilling the PostgreSQL server storage. There's more... As we will likely want to use the WAL files on other servers, we suggest that you make a list of all the servers that could need WAL files. Then, modify the rsyncd.conf file on the archive server and add this section: [wal_fetch] path = /db/pg_archived comment = DB WAL Archive Retrieval uid = postgres gid = postgres read only = true hosts allow = host1, host2, host3 hosts deny = * Now, we can fetch WAL files from any of the hosts in hosts allow. As these are dedicated PostgreSQL replicas, recovery servers, or other defined roles, this makes the archive server a central location for all our WAL needs. See also We suggest that you read more about the archive_mode and archive_command settings on the PostgreSQL site. We've included a link here: http://www.postgresql.org/docs/9.3/static/runtime-config-wal.html The rsyncd.conf file should also have its own manual page. Read it with this command to learn more about the available settings: man rsyncd.conf Summary In this article, we've successfully learned how to secure the WAL stream by following the given steps. Resources for Article: Further resources on this subject: PostgreSQL 9: Reliable Controller and Disk Setup [article] Backup in PostgreSQL 9 [article] Recovery in PostgreSQL 9 [article]
Read more
  • 0
  • 0
  • 2292

article-image-progressive-mockito
Packt
14 Jul 2014
16 min read
Save for later

Progressive Mockito

Packt
14 Jul 2014
16 min read
(For more resources related to this topic, see here.) Drinking Mockito Download the latest Mockito binary from the following link and add it to the project dependency: http://code.google.com/p/mockito/downloads/list As of February 2014, the latest Mockito version is 1.9.5. Configuring Mockito To add Mockito JAR files to the project dependency, perform the following steps: Extract the JAR files into a folder. Launch Eclipse. Create an Eclipse project named Chapter04. Go to the Libraries tab in the project build path. Click on the Add External JARs... button and browse to the Mockito JAR folder. Select all JAR files and click on OK. We worked with Gradle and Maven and built a project with the JUnit dependency. In this section, we will add Mockito dependencies to our existing projects. The following code snippet will add a Mockito dependency to a Maven project and download the JAR file from the central Maven repository (http://mvnrepository.com/artifact/org.mockito/mockito-core): <dependency> <groupId>org.mockito</groupId> <artifactId>mockito-core</artifactId> <version>1.9.5</version> <scope>test</scope> </dependency> The following Gradle script snippet will add a Mockito dependency to a Gradle project: testCompile 'org.mockito:mockito-core:1.9.5' Mocking in action This section demonstrates the mock objects with a stock quote example. In the real world, people invest money on the stock market—they buy and sell stocks. A stock symbol is an abbreviation used to uniquely identify shares of a particular stock on a particular market, such as stocks of Facebook are registered on NASDAQ as FB and stocks of Apple as AAPL. We will build a stock broker simulation program. The program will watch the market statistics, and depending on the current market data, you can perform any of the following actions: Buy stocks Sell stocks Hold stocks The domain classes that will be used in the program are Stock, MarketWatcher, Portfolio, and StockBroker. Stock represents a real-world stock. It has a symbol, company name, and price. MarketWatcher looks up the stock market and returns the quote for the stock. A real implementation of a market watcher can be implemented from http://www.wikijava.org/wiki/Downloading_stock_market_quotes_from_Yahoo!_finance. Note that the real implementation will connect to the Internet and download the stock quote from a provider. Portfolio represents a user's stock data such as the number of stocks and price details. Portfolio exposes APIs for getting the average stock price and buying and selling stocks. Suppose on day one someone buys a share at a price of $10.00, and on day two, the customer buys the same share at a price of $8.00. So, on day two the person has two shares and the average price of the share is $9.00. The following screenshot represents the Eclipse project structure. You can download the project from the Packt Publishing website and work with the files: The following code snippet represents the StockBroker class. StockBroker collaborates with the MarketWatcher and Portfolio classes. The perform() method of StockBroker accepts a portfolio and a Stock object: public class StockBroker { private final static BigDecimal LIMIT = new BigDecimal("0.10"); private final MarketWatcher market; public StockBroker(MarketWatcher market) { this.market = market; } public void perform(Portfolio portfolio,Stock stock) { Stock liveStock = market.getQuote(stock.getSymbol()); BigDecimal avgPrice = portfolio.getAvgPrice(stock); BigDecimal priceGained = liveStock.getPrice().subtract(avgPrice); BigDecimal percentGain = priceGained.divide(avgPrice); if(percentGain.compareTo(LIMIT) > 0) { portfolio.sell(stock, 10); }else if(percentGain.compareTo(LIMIT) < 0){ portfolio.buy(stock); } } } Look at the perform method. It takes a portfolio object and a stock object, calls the getQuote method of MarketWatcher, and passes a stock symbol. Then, it gets the average stock price from portfolio and compares the current market price with the average stock price. If the current stock price is 10 percent greater than the average price, then the StockBroker program sells 10 stocks from Portfolio; however, if the current stock price goes down by 10 percent, then the program buys shares from the market to average out the loss. Why do we sell 10 stocks? This is just an example and 10 is just a number; this could be anything you want. StockBroker depends on Portfolio and MarketWatcher; a real implementation of Portfolio should interact with a database, and MarketWatcher needs to connect to the Internet. So, if we write a unit test for the broker, we need to execute the test with a database and an Internet connection. A database connection will take time and Internet connectivity depends on the Internet provider. So, the test execution will depend on external entities and will take a while to finish. This will violate the quick test execution principle. Also, the database state might not be the same across all test runs. This is also applicable for the Internet connection service. Each time the database might return different values, and therefore asserting a specific value in your unit test is very difficult. We'll use Mockito to mock the external dependencies and execute the test in isolation. So, the test will no longer be dependent on real external service, and therefore it will be executed quickly. Mocking objects A mock can be created with the help of a static mock() method as follows: import org.mockito.Mockito; public class StockBrokerTest { MarketWatcher marketWatcher = Mockito.mock(MarketWatcher.class); Portfolio portfolio = Mockito.mock(Portfolio.class); } Otherwise, you can use Java's static import feature and static import the mock method of the org.mockito.Mockito class as follows: import static org.mockito.Mockito.mock; public class StockBrokerTest { MarketWatcher marketWatcher = mock(MarketWatcher.class); Portfolio portfolio = mock(Portfolio.class); } There's another alternative; you can use the @Mock annotation as follows: import org.mockito.Mock; public class StockBrokerTest { @Mock MarketWatcher marketWatcher; @Mock Portfolio portfolio; } However, to work with the @Mock annotation, you are required to call MockitoAnnotations.initMocks( this ) before using the mocks, or use MockitoJUnitRunner as a JUnit runner. The following code snippet uses MockitoAnnotations to create mocks: import static org.junit.Assert.assertEquals; import org.mockito.MockitoAnnotations; public class StockBrokerTest { @Mock MarketWatcher marketWatcher; @Mock Portfolio portfolio; @Before public void setUp() { MockitoAnnotations.initMocks(this); } @Test public void sanity() throws Exception { assertNotNull(marketWatcher); assertNotNull(portfolio); } } The following code snippet uses the MockitoJUnitRunner JUnit runner: import org.mockito.runners.MockitoJUnitRunner; @RunWith(MockitoJUnitRunner.class) public class StockBrokerTest { @Mock MarketWatcher marketWatcher; @Mock Portfolio portfolio; @Test public void sanity() throws Exception { assertNotNull(marketWatcher); assertNotNull(portfolio); } } Before we deep dive into the Mockito world, there are a few things to remember. Mockito cannot mock or spy the following functions: final classes, final methods, enums, static methods, private methods, the hashCode() and equals() methods, anonymous classes, and primitive types. PowerMock (an extension of EasyMock) and PowerMockito (an extension of the Mockito framework) allows you to mock static and private methods; even PowerMockito allows you to set expectations on new invocations for private member classes, inner classes, and local or anonymous classes. However, as per the design, you should not opt for mocking private/static properties—it violates the encapsulation. Instead, you should refactor the offending code to make it testable. Change the Portfolio class, create the final class, and rerun the test; the test will fail as the Portfolio class is final, and Mockito cannot mock a final class. The following screenshot shows the JUnit output: Stubbing methods We read about stubs in ,Test Doubles. The stubbing process defines the behavior of a mock method such as the value to be returned or the exception to be thrown when the method is invoked. The Mockito framework supports stubbing and allows us to return a given value when a specific method is called. This can be done using Mockito.when() along with thenReturn (). The following is the syntax of importing when: import static org.mockito.Mockito.when; The following code snippet stubs the getQuote(String symbol) method of MarcketWatcher and returns a specific Stock object: import static org.mockito.Matchers.anyString; import static org.mockito.Mockito.when; @RunWith(MockitoJUnitRunner.class) public class StockBrokerTest { @Mock MarketWatcher marketWatcher; @Mock Portfolio portfolio; @Test public void marketWatcher_Returns_current_stock_status() { Stock uvsityCorp = new Stock("UV", "Uvsity Corporation", new BigDecimal("100.00")); when (marketWatcher.getQuote( anyString ())). thenReturn (uvsityCorp); assertNotNull(marketWatcher.getQuote("UV")); } } A uvsityCorp stock object is created with a stock price of $100.00 and the getQuote method is stubbed to return uvsityCorp whenever the getQuote method is called. Note that anyString() is passed to the getQuote method, which means whenever the getQuote method will be called with any String value, the uvsityCorp object will be returned. The when() method represents the trigger, that is, when to stub. The following methods are used to represent what to do when the trigger is triggered: thenReturn(x): This returns the x value. thenThrow(x): This throws an x exception. thenAnswer(Answer answer): Unlike returning a hardcoded value, a dynamic user-defined logic is executed. It's more like for fake test doubles, Answer is an interface. thenCallRealMethod(): This method calls the real method on the mock object. The following code snippet stubs the external dependencies and creates a test for the StockBroker class: import com.packt.trading.dto.Stock; import static org.junit.Assert.assertNotNull; import static org.mockito.Matchers.anyString; import static org.mockito.Matchers.isA; import static org.mockito.Mockito.verify; import static org.mockito.Mockito.when; @RunWith(MockitoJUnitRunner.class) public class StockBrokerTest { @Mock MarketWatcher marketWatcher; @Mock Portfolio portfolio; StockBroker broker; @Before public void setUp() { broker = new StockBroker(marketWatcher); } @Test public void when_ten_percent_gain_then_the_stock_is_sold() { //Portfolio's getAvgPrice is stubbed to return $10.00 when(portfolio.getAvgPrice(isA(Stock.class))). thenReturn(new BigDecimal("10.00")); //A stock object is created with current price $11.20 Stock aCorp = new Stock("A", "A Corp", new BigDecimal("11.20")); //getQuote method is stubbed to return the stock when(marketWatcher.getQuote(anyString())).thenReturn(aCorp); //perform method is called, as the stock price increases // by 12% the broker should sell the stocks broker.perform(portfolio, aCorp); //verifying that the broker sold the stocks verify(portfolio).sell(aCorp,10); } } The test method name is when_ten_percent_gain_then_the_stock_is_sold; a test name should explain the intention of the test. We use underscores to make the test name readable. We will use the when_<<something happens>>_then_<<the action is taken>> convention for the tests. In the preceding test example, the getAvgPrice() method of portfolio is stubbed to return $10.00, then the getQuote method is stubbed to return a hardcoded stock object with a current stock price of $11.20. The broker logic should sell the stock as the stock price goes up by 12 percent. The portfolio object is a mock object. So, unless we stub a method, by default, all the methods of portfolio are autostubbed to return a default value, and for the void methods, no action is performed. The sell method is a void method; so, instead of connecting to a database to update the stock count, the autostub will do nothing. However, how will we test whether the sell method was invoked? We use Mockito.verify. The verify() method is a static method, which is used to verify the method invocation. If the method is not invoked, or the argument doesn't match, then the verify method will raise an error to indicate that the code logic has issues. Verifying the method invocation To verify a redundant method invocation, or to verify whether a stubbed method was not called but was important from the test perspective, we should manually verify the invocation; for this, we need to use the static verify method. Why do we use verify? Mock objects are used to stub external dependencies. We set an expectation, and a mock object returns an expected value. In some conditions, a behavior or method of a mock object should not be invoked, or sometimes, we may need to call the method N (a number) times. The verify method verifies the invocation of mock objects. Mockito does not automatically verify all stubbed calls. If a stubbed behavior should not be called but the method is called due to a bug in the code, verify flags the error though we have to verify that manually. The void methods don't return values, so you cannot assert the returned values. Hence, verify is very handy to test the void methods. Verifying in depth The verify() method has an overloaded version that takes Times as an argument. Times is a Mockito framework class of the org.mockito.internal.verification package, and it takes wantedNumberOfInvocations as an integer argument. If 0 is passed to Times, it infers that the method will not be invoked in the testing path. We can pass 0 to Times(0) to make sure that the sell or buy methods are not invoked. If a negative number is passed to the Times constructor, Mockito throws MockitoException - org.mockito.exceptions.base.MockitoException, and this shows the Negative value is not allowed here error. The following methods are used in conjunction with verify: times(int wantedNumberOfInvocations): This method is invoked exactly n times; if the method is not invoked wantedNumberOfInvocations times, then the test fails. never(): This method signifies that the stubbed method is never called or you can use times(0) to represent the same scenario. If the stubbed method is invoked at least once, then the test fails. atLeastOnce(): This method is invoked at least once, and it works fine if it is invoked multiple times. However, the operation fails if the method is not invoked. atLeast(int minNumberOfInvocations): This method is called at least n times, and it works fine if the method is invoked more than the minNumberOfInvocations times. However, the operation fails if the method is not called minNumberOfInvocations times. atMost(int maxNumberOfInvocations): This method is called at the most n times. However, the operation fails if the method is called more than minNumberOfInvocations times. only(): The only method called on a mock fails if any other method is called on the mock object. In our example, if we use verify(portfolio, only()).sell(aCorp,10);, the test will fail with the following output: The test fails in line 15 as portfolio.getAvgPrice(stock) is called. timeout(int millis): This method is interacted in a specified time range. Verifying zero and no more interactions The verifyZeroInteractions(Object... mocks) method verifies whether no interactions happened on the given mocks. The following test code directly calls verifyZeroInteractions and passes the two mock objects. Since no methods are invoked on the mock objects, the test passes: @Test public void verify_zero_interaction() { verifyZeroInteractions(marketWatcher,portfolio); } The verifyNoMoreInteractions(Object... mocks) method checks whether any of the given mocks has any unverified interaction. We can use this method after verifying a mock method to make sure that nothing else was invoked on the mock. The following test code demonstrates verifyNoMoreInteractions: @Test public void verify_no_more_interaction() { Stock noStock = null; portfolio.getAvgPrice(noStock); portfolio.sell(null, 0); verify(portfolio).getAvgPrice(eq(noStock)); //this will fail as the sell method was invoked verifyNoMoreInteractions(portfolio); } The following is the JUnit output: The following are the rationales and examples of argument matchers. Using argument matcher ArgumentMatcher is a Hamcrest matcher with a predefined describeTo() method. ArgumentMatcher extends the org.hamcrest.BaseMatcher package. It verifies the indirect inputs into a mocked dependency. The Matchers.argThat(Matcher) method is used in conjunction with the verify method to verify whether a method is invoked with a specific argument value. ArgumentMatcher plays a key role in mocking. The following section describes the context of ArgumentMatcher. Mock objects return expected values, but when they need to return different values for different arguments, argument matcher comes into play. Suppose we have a method that takes a player name as input and returns the total number of runs (a run is a point scored in a cricket match) scored as output. We want to stub it and return 100 for Sachin and 10 for xyz. We have to use argument matcher to stub this. Mockito returns expected values when a method is stubbed. If the method takes arguments, the argument must match during the execution; for example, the getValue(int someValue) method is stubbed in the following way: when(mockObject.getValue(1)).thenReturn(expected value); Here, the getValue method is called with mockObject.getValue(100). Then, the parameter doesn't match (it is expected that the method will be called with 1, but at runtime, it encounters 100), so the mock object fails to return the expected value. It will return the default value of the return type—if the return type is Boolean, it'll return false; if the return type is object, then null, and so on. Mockito verifies argument values in natural Java style by using an equals() method. Sometimes, we use argument matchers when extra flexibility is required. Mockito provides built-in matchers such as anyInt(), anyDouble(), anyString(), anyList(), and anyCollection(). More built-in matchers and examples of custom argument matchers or Hamcrest matchers can be found at the following link: http://docs.mockito.googlecode.com/hg/latest/org/mockito/Matchers.html Examples of other matchers are isA(java.lang.Class<T> clazz), any(java.lang.Class<T> clazz), and eq(T) or eq(primitive value). The isA argument checks whether the passed object is an instance of the class type passed in the isA argument. The any(T) argument also works in the same way. Why do we need wildcard matchers? Wildcard matchers are used to verify the indirect inputs to the mocked dependencies. The following example describes the context. In the following code snippet, an object is passed to a method and then a request object is created and passed to service. Now, from a test, if we call the someMethod method and service is a mocked object, then from test, we cannot stub callMethod with a specific request as the request object is local to the someMethod: public void someMethod(Object obj){ Request req = new Request(); req.setValue(obj); Response resp = service.callMethod(req); } If we are using argument matchers, all arguments have to be provided by matchers. We're passing three arguments and all of them are passed using matchers: verify(mock).someMethod(anyInt(), anyString(), eq("third argument")); The following example will fail because the first and the third arguments are not passed using matcher: verify(mock).someMethod(1, anyString(), "third argument");
Read more
  • 0
  • 0
  • 11001
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at €18.99/month. Cancel anytime
article-image-module-facts-types-and-reporting-tools-puppet
Packt
11 Jul 2014
11 min read
Save for later

Module, Facts, Types and Reporting tools in Puppet

Packt
11 Jul 2014
11 min read
(For more resources related to this topic, see here.) Module Files and Templates Transferring files with puppet is something best done within modules, when you define a file resource, you can use content => "something" or you can push a file from the puppet master using source. As an example, using our judy database, we could have judy::config with the following file definition: class judy::config { file {'/etc/judy/judy.conf': source => 'puppet:///modules/judy/judy.conf' } } Now puppet will search for this file in the directory judy/files. It is also possible to add full paths and have your module mimic the filesystem, the previous source line would be changed to source => 'puppet:///modules/judy/etc/judy/judy.conf' and the file would be found in judy/files/etc/judy/judy.conf. The puppet:/// url given previously has three backslashes, optionally the name of a puppet server may appear between the second and third backslash. If left blank, the puppet server performing catalog compilation is used to retrieve the file. You can alternatively specify the server using source => 'puppet://puppetfile.example.com/modules/judy/judy.conf'. Having files come from specific puppet servers can make maintenance difficult. If you change the name of your puppet server, you have to change all references to that name as well. Templates are searched in a similar fashion, in this example in judy/templates, when specifying the template, you use content => template('judy/template.erb') to have puppet look for the template in your modules templates directory. As an example another config file for judy could be: file {'/etc/judy/judyadm.conf': content => template('judy/judyadm.conf.erb') } Puppet will look for the file 'judy/judyadm.conf.erb' in modulepath/judy/templates/judyadm.conf.erb. We haven't covered ruby templates up to this point, templates are files that are parsed according to erb syntax rules. If you need to distribute a file where you need to change some settings based on variables, then a template is the thing to use. ERB syntax is covered in detail at http://docs.puppetlabs.com/guides/templating.html. In the next section, we will discuss module implementations in a large organization before writing custom modules. Creating a Custom Fact for use in Hiera The most useful custom facts are those that return a calculated value that you can use to organize your nodes. Such facts allow you to group your nodes into smaller groups or create groups with functionality or locality. These facts allow you to separate the data component of your modules from the logic or code components. Such a fact can be used in your hiera.yaml file to add a level to the hierarchy. One aspect of the system that can be used to determine information about the node is the ipaddress. Assuming you do not reuse ipaddresses within your organization, the ipaddress can be used to determine on which part of the network a node resides, the zone. In this example, we will define three zone's in which machines reside, production, development, and sandbox. The ipaddresses in each zone are on different subnets. We'll start by building up a script to calculate the zone and then turn it into a fact like our last example. Our script will need to calculate IP ranges using netmasks, so we'll import the ipaddr library and use IPAddr objects to calculate ranges. require('ipaddr') require('facter') require('puppet') Next we'll define a function that takes an ipaddress as the argument and returns the zone to which that ipaddress belongs: def zone(ip) zones = { 'production' => [IPAddr.new('192.168.122.0/24'),IPAddr.new('192.168.124.0/23')], 'development' => [IPAddr.new('192.168.123.0/24'),IPAddr.new('192.168.126.0/23')], 'sandbox' => [IPAddr.new('192.168.128.0/22')] } for zone in zones.keys do for subnet in zones[zone] do if subnet.include?(ip) return zone end end end return 'undef' end This function will loop through the zones hash looking for a match on IP address. If no match is found, the value of 'undef' is returned. We then obtain the ipaddress for the machine using the ipaddress fact from facter. ip = IPAddr.new(Facter.value('ipaddress')) Then we call the zone function with this ipaddress to obtain the zone. print zone(ip),"n" Now we can make this script executable and test. node1# facteripaddress 192.168.122.132 node1# ./example_zone.rb production Now all we have to do is replace print zone(ip),"n" with the following to define the fact. Facter.add('example_zone') do setcode do zone(ip) end end Now when we insert this code into our example_facts module and run puppet on our nodes, the custom fact is available. # facter -p example_zone production Now that we can define a zone based on a custom fact, we can go back to our hiera.yaml file and add %{::example_zone} to the hierarchy. The hiera.yaml hierarchy will now contain the following --- :hierarchy: - "zones/%{::example_zone}" - "hosts/%{::hostname}" - "roles/%{::role}" - "%{::kernel}/%{::osfamily}/%{::lsbmajdistrelease}" - "is_virtual/%{::is_virtual}" - common After restarting httpd to have the hiera.yaml file reread, we create a zones directory in hieradata and add production.yaml with the following contents. --- welcome: "example_zone - production" Now when we run puppet on our node1, we see the motd updated with the new welcome message node1# cat /etc/motd PRODUCTION example_zone - production Managed Node: node1 Managed by Puppet version 3.4.2 Creating a few key facts that can be used to build up your hierarchy can greatly reduce the complexity of your modules. There are several workflows available, in addition to the custom fact we just described above, you can use the /etc/facter/facts.d directory with static files or scripts, or you can have tasks run from other tools dump files into that directory to create custom facts. When writing ruby scripts you can use any other fact by calling Facter.value('factname'). If you write your script in ruby you can access any ruby library using require. Your custom fact could query the system using lspci or lsusb to determine what hardware is specifically installed on that node. As an example, you could use lspci to determine the make and model of graphics card on the machine and return that as a fact, such as videocard. In the next section we'll write our own custom modules that will take such a fact and install the appropriate driver for the videocard based on the custom fact. Parameterized Classes Parameterized classes are classes where you have defined several parameters that can be overridden when you instantiate the class for your node. The use case for parameterized classes is when you have something that won't be repeated within a single node. You cannot define the same parameterized class more than once per node. As a simple example, we'll create a class which installs a database program and starts that databases service. We'll call this class example::db, the definition will live in modules/example/manifests/db.pp class example::db ($db) { case $db { 'mysql': { $dbpackage = 'mysql-server' $dbservice = 'mysqld' } 'postgresql': { $dbpackage = 'postgresql-server' $dbservice = 'postgresql' } } package { "$dbpackage": } service { "$dbservice": ensure => true, enable => true, require => Package["$dbpackage"] } } This class takes a single parameter ($db) that specifies the type of the database, in this case either postgresql or mysql. To use this class, we have to instantiate it. class { 'example::db': db => 'mysql' } Now when we apply this to a node we see that mysql-server is installed and mysqld is started and enabled at boot. This works great for something like a database, since we don't think we will have more than one type of database server on a single node. If we try to instantiate the example::db class with postgresql on our node, we'll get an error as follows: Types and Providers Puppet separates the implementation of a type into the type definition and any one of many providers for that type. For instance, the package type in puppet has multiple providers depending on the platform in use (apt, yum, rpm and others). Early on in puppet development there were only a few core types defined. Since then, the core types have expanded to the point where anything that I feel should be a type is already defined by core puppet. The lvm module created a type for defining logical volumes, the concat module created types for defining file fragments. The firewall module created a type for defining firewall rules. Each of these types represents something on the system with the following properties: unique searchable creatable destroyable atomic When creating a new type, you have to make sure your new type has these properties. The resource defined by the type has to be unique, this is why the file type uses the path to a file as the naming variable (namevar), a system may have files with the same name (not unique) but it cannot have more than one file with an identical path. As an example, the ldap configuration file for openldap is /etc/openldap/ldap.conf, the ldap configuration file for the name services library is /etc/ldap.conf, if you used filename, then they would both be the same resource. Resources must be unique. By atomic I mean that it is indivisible, it cannot be made of smaller components. For instance, the firewall module creates a type for single iptables rules. Creating a type for the tables (INPUT, OUTPUT, FORWARD) within iptables wouldn't be atomic, each table is made up of multiple smaller parts, the rules. Your type has to be searchable so that puppet can determine the state of the thing you are modifying. A mechanism has to exist to know what the current state is of the thing in question. The last two properties are equally important, puppet must be able to remove the thing, destroy it and likewise, puppet must be able to create the thing anew. Given these criteria, there are several modules that define new types, some examples include types which manage: git repositories apache virtual hosts ldap entries network routes gem modules perlcpan modules databases drupalmultisites Foreman Foreman is more than just a puppet reporting tool, it bills itself as a complete lifecycle management platform. Foreman can act as the ENC (external node classifier) for your entire installation and configure DHCP, DNS and PXE booting. It's a one stop shop. We'll configure foreman to be our report backend in this example. mcollective Mcollective is an orchestration tool created by puppetlabs that is not specific to puppet, plugins exist to work with other configuration management systems. Mcollective uses a message queue (MQ) with active connections from all active nodes to enable parallel job execution on large numbers of nodes. To understand how mcollective works, we'll consider the following high level diagram and work through the various components. The configuration of mcollective is still somewhat involved and prone to errors. Still, once mcollective is working properly, the power it provides can become addicting, it will be worth the effort. The default MQ install for marionette is using activemq, the activemq provided by the puppetlabs repo is known to work. Mcollective uses message queue and can use your existing message queue infrastructure. If using activemq, a single server can handle 800 nodes. After that you'll need to spread out. We'll cover the standard mcollective install using puppet's certificate authority to provide ssl security to mcollective. The theory here being that we trust puppet to configure the machines already, we can trust it a little more to run arbitrary commands. We'll also require that users of mcollective have proper ssl authentication as well. Summary In this article we learned how to deal with puppet. We also created a custom fact for use in Hiera and covered different topics like Foreman, mcollective, and much more. Resources for Article: Further resources on this subject: Puppet: Integrating External Tools [article] Quick start – Using the core Puppet resource types [article] External Tools and the Puppet Ecosystem [article]
Read more
  • 0
  • 0
  • 5645

article-image-virtual-machine-virtual-world
Packt
11 Jul 2014
15 min read
Save for later

A Virtual Machine for a Virtual World

Packt
11 Jul 2014
15 min read
(For more resources related to this topic, see here.) Creating a VM from a template Let us start by creating our second virtual machine from the Ubuntu template. Right-click on the template and select Clone, as shown in the following screenshot: Use the settings shown in the following screenshot for the new virtual machine. You can also use any virtual machine name you like. A VM name can only be alphanumeric without any special characters. You can also use any other VM you have already created in your own virtual environment. Access the virtual machine through the Proxmox console after cloning and setting up network connectivity such as IP address, hostname, and so on. For our Ubuntu virtual machine, we are going to edit interfaces in /etc/network/, hostname in /etc/, and hosts in /etc/. Advanced configuration options for a VM We will now look at some of the advanced configuration options we can use to extend the capability of a KVM virtual machine. The hotplugging option for a VM Although it is not a very common occurrence, a virtual machine can run out of storage unexpectedly whether due to over provisioning or improper storage requirement planning. For a physical server with hot swap bays, we can simply add a new hard drive and then partition it, and you are up and running. Imagine another situation when you have to add some virtual network interface to the VM right away, but you cannot afford shutting down the VM to add the vNICs. The hotplug option also allows hotplugging virtual network interfaces without shutting down a VM. Proxmox virtual machines by default do not support hotplugging. There are some extra steps needed to be followed in order to enable hotplugging for devices such as virtual disks and virtual network interfaces. Without the hotplugging option, the virtual machine needs to be completely powered off and then powered on after adding a new virtual disk or virtual interface. Simply rebooting the virtual machine will not activate the newly added virtual device. In Proxmox 3.2 and later, the hotplug option is not shown on the Proxmox GUI. It has to be done through CLI by adding options to the <vmid>.conf file. Enabling the hotplug option for a virtual machine is a three-step process: Shut down VM and add the hotplug option into the <vmid>.conf file. Power up VM and then load modules that will initiate the actual hotplugging. Add a virtual disk or virtual interface to be hotplugged into the virtual machine. The hotplugging option for <vmid>.conf Shut down the cloned virtual machine we created earlier and then open the configuration file from the following location. Securely log in to the Proxmox node or use the console in the Proxmox GUI using the following command: # nano /etc/pve/nodes/<node_name>/qemu-server/102.conf With default options added during the virtual machine creation process, the following code is what the VM configuration file looks like: ballon: 512 bootdisk: virtio0 cores: 1 ide2: none, media=cdrom kvm: 0 memory: 1024 name: pmxUB01 net0: e1000=56:63:C0:AC:5F:9D,bridge=vmbr0 ostype: l26 sockets: 1 virtio0: vm-nfs-01:102/vm-102-disk-1.qcow2,format=qcow2,size=32G Now, at the bottom of the 102.conf configuration file located under /etc/pve/nodes/<node_name>/qemu-server/, we will add the following option to enable hotplugging in the virtual machine: hotplug: Save the configuration file and power up the virtual machine. Loading modules After the hotplug option is added and the virtual machine is powered up, it is now time to load two modules into the virtual machine, which will allow hotplugging a virtual disk anytime without rebooting the VM. Securely log in to VM or use the Proxmox GUI console to get into the command prompt of the VM. Then, run the following commands to load the acpiphp and pci_hotplug modules. Do not load these modules to the Proxmox node itself: # sudo modprobe acpiphp # sudo modprobe pci_hotplug   The acpiphp and pci_hotplug modules are two hot plug drivers for the Linux operating system. These drivers allow addition of a virtual disk image or virtual network interface card without shutting down the Linux-based virtual machine. The modules can also be loaded automatically during the virtual machine boot by inserting them in /etc/modules. Simply add acpiphp and pci_hotplug on two separate lines in /etc/modules. Adding virtual disk/vNIC After loading both the acpiphp and pci_hotplug modules, all that remains is adding a new virtual disk or virtual network interface in the virtual machine through a web GUI. On adding a new disk image, check that the virtual machine operating system recognizes the new disk through the following command: #sudo fdisk -l For a virtual network interface, simply add a new virtual interface from a web GUI and the operating system will automatically recognize a new vNIC. After adding the interface, check that the vNIC is recognized through the following command: #sudo ifconfig –a Please note that while the hotplugging option works great with Linux-based virtual machines, it is somewhat problematic on Windows XP/7-based VMs. Hotplug seems to work great with both 32- and 64-bit versions of the Windows Server 2003/2008/2012 VMs. The best practice for a Windows XP/7-based virtual machine is to just power cycle the virtual machine to activate newly added virtual disk images. Forcing the Windows VM to go through hotplugging will cause an unstable operating environment. This is a limitation of the KVM itself. Nested virtual environment In simple terms, a virtual environment inside another virtual environment is known as a nested virtual environment. If the hardware resource permits, a nested virtual environment can open up whole new possibilities for a company. The most common scenario of a nested virtual environment is to set up a fully isolated test environment to test software such as hypervisor, or operating system updates/patches before applying them in a live environment. A nested environment can also be used as a training platform to teach computer and network virtualization, where students can set up their own virtual environment from the ground without breaking the main system. This eliminates the high cost of hardware for each student or for the test environment. When an isolated test platform is needed, it is just a matter of cloning some real virtual machines and giving access to authorized users. A nested virtual environment has the potential to give the network administrator an edge in the real world by allowing cost cutting and just getting things done with limited resources. One very important thing to keep in mind is that a nested virtual environment will have a significantly lower performance than a real virtual environment. If the nested virtual environment also has virtualized storage, performance will degrade significantly. The loss of performance can be offset by creating a nested environment with an SSD storage backend. When a nested virtual environment is created, it usually also contains virtualized storage to provide virtual storage for nested virtual machines. This allows for a fully isolated nested environment with its own subnet and virtual firewall. There are many debates about the viability of a nested virtual environment. Both pros and cons can be argued equally. But it will come down to the administrator's grasp on his or her existing virtual environment and good understanding of the nature of requirement. This allowed us to build a fully functional Proxmox cluster from the ground up without using additional hardware. The following screenshot is a side-by-side representation of a nested virtual environment scenario: In the previous comparison, on the right-hand side we have our basic cluster we have been building so far. On the left-hand side we have the actual physical nodes and virtual machines used to create the nested virtual environment. Our nested cluster is completely isolated from the rest of the physical cluster with a separate subnet. Internet connectivity is provided to the nested environment by using a virtualized firewall 1001-scce-fw-01. Like the hotplugging option, nesting is also not enabled in the Proxmox cluster by default. Enabling nesting will allow nested virtual machines to have KVM hardware virtualization, which increases the performance of nested virtual machines. To enable KVM hardware virtualization, we have to edit the modules in /etc/ of the physical Proxmox node and <vmid>.conf of the virtual machine. We can see that the option is disabled for our cloned nested virtual machine in the following screenshot: Enabling KVM hardware virtualization KVM hardware virtualization can be added just by performing the following few additional steps: In each Proxmox node, add the following line in the /etc/modules file: kvm-amd nested=1 Migrate or shut down all virtual machines of Proxmox nodes and then reboot. After the Proxmox nodes reboot, add the following argument in the <vmid>.conf file of the virtual machines used to create a nested virtual environment: args: -enable-nesting Enable KVM hardware virtualization from the virtual machine option menu through GUI. Restart the nested virtual machine. Network virtualization Network virtualization is a software approach to set up and maintain network without physical hardware. Proxmox has great features to virtualize the network for both real and nested virtual environments. By using virtualized networking, management becomes simpler and centralized. Since there is no physical hardware to deal with, the network ability can be extended within a minute's notice. Especially in a nested virtual environment, the use of virtualized network is very prominent. In order to set up a successful nested virtual environment, a better grasp of the Proxmox network feature is required. With the introduction of Open vSwitch (www.openvswitch.org) in Proxmox 3.2 and later, network virtualization is now much more efficient. Backing up a virtual machine A good backup strategy is the last line of defense against disasters, such as hardware failure, environmental damages, accidental deletions, and misconfigurations. In a virtual environment, a backup strategy turns into a daunting task because of the number of machines needed to be backed up. In a busy production environment, virtual machines may be created and discarded whenever needed or not needed. Without a proper backup plan, the entire backup task can go out of control. Gone are those days when we only had few server hardware to deal with and backing them up was an easy task. Today's backup solutions have to deal with several dozens or possibly several hundred virtual machines. Depending on the requirement, an administrator may have to backup all the virtual machines regularly instead of just the files inside them. Backing up an entire virtual machine takes up a very large amount of space after a while depending on how many previous backups we have. A granular file backup helps to quickly restore just the file needed but sure is a bad choice if the virtual server is badly damaged to a point that it becomes inaccessible. Here, we will see different backup options available in Proxmox, their advantages, and disadvantages. Proxmox backup and snapshot options Proxmox has the following two backup options: Full backup: This backs up the entire virtual machine. Snapshot: This only creates a snapshot image of the virtual machine. Proxmox 3.2 and above can only do a full backup and cannot do any granular file backup from inside a virtual machine. Proxmox also does not use any backup agent. Backing up a VM with a full backup All full backups are in the .tar format containing both the configuration file and virtual disk image file. The TAR file is all you need to restore the virtual machine on any nodes and on any storage. Full backups can also be scheduled on a daily and weekly basis. Full virtual backup files are named based on the following format: vzdump-qemu-<vm_id>-YYYY_MM_DD-HH_MM_SS.vma.lzo The following screenshot shows what a typical list of virtual machine backups looks like: Proxmox 3.2 and above cannot do full backups on LVM and Ceph RBD storage. Full backups can only occur on local, Ceph FS, and NFS-based storages, which are defined as backup during storage creation. Please note that Ceph FS and RBD are not the same type of storage even though they both coexist on the same Ceph cluster. The following screenshot shows the storage feature through the Proxmox GUI with backup-enabled attached storages: The backup menu in Proxmox is a true example of simplicity. With only three choices to select, it is as easy as it can get. The following screenshot is an example of a Proxmox backup menu. Just select the backup storage, backup mode, and compression type and that's it: Creating a schedule for Backup Schedules can be created from the virtual machine backup option. We will see each option box in detail in the following sections. The options are shown in the following screenshot: Node By default, a backup job applies to all nodes. If you want to apply the backup job to a particular node, then select it here. With a node selected, backup job will be restricted to that node only. If a virtual machine on node 1 was selected for backup and later on the virtual machine was moved to node 2, it will not be backed up since only node 1 was selected for this backup task. Storage Select a backup storage destination where all full backups will be stored. Typically an NFS server is used for backup storage. They are easy to set up and do not require a lot of upfront investment due to their low performance requirements. Backup servers are much leaner than computing nodes since they do not have to run any virtual machines. Backups are supported on local, NFS, and Ceph FS storage systems. Ceph FS storages are mounted locally on Proxmox nodes and selected as a local directory. Both Ceph FS and RBD coexist on the same Ceph cluster. Day of Week Select which day or days the backup task applies to. Days' selection is clickable in a drop-down menu. If the backup task should run daily, then select all the days from the list. Start Time Unlike Day of Week, only one time slot can be selected. Multiple selections of time to backup different times of the day are not possible. If the backup must run multiple times a day, create a separate task for each time slot. Selection mode The All selection mode will select all the virtual machines within the whole Proxmox cluster. The Exclude selected VMs mode will back up all VMs except the ones selected. Include selected VMs will back up only the ones selected. Send email to Enter a valid e-mail address here so that the Proxmox backup task can send an e-mail upon backup task completion or if there was any issue during backup. The e-mail includes the entire log of the backup tasks. It is highly recommended to enter the e-mail address here so that an administrator or backup operator can receive backup task feedback e-mails. This will allow us to find out if there was an issue during backup or how much time it actually takes to see if any performance issue occurred during backup. The following screenshot is a sample of a typical e-mail received after a backup task: Compression By default, the LZO compression method is selected. LZO (http://en.wikipedia.org/wiki/Lempel–Ziv–Oberhumer) is based on a lossless data compression algorithm, designed with the decompression ratio in mind. LZO is capable to do fast compression and even faster decompressions. GZIP will create smaller backup files at the cost of high CPU usage to achieve a higher compression ratio. Since higher compression ratio is the main focal point, it is a slow backup process. Do not select the None compression option, since it will create large backups without compression. With the None method, a 200 GB RAW disk image with 50 GB used will have a 200 GB backup image. With compression turned on, the backup image size will be around 70-80 GB. Mode Typically, all running virtual machine backups occur with the Snapshot option. Do not confuse this Snapshot option with Live Snapshots of VM. The Snapshot mode allows live backup while the virtual machine is turned on, while Live Snapshots captures the state of the virtual machine for a certain point in time. With the Suspend or Stop mode, the backup task will try to suspend the running virtual machine or forcefully stop it prior to commencing full backup. After backup is done, Proxmox will resume or power up the VM. Since Suspend only freezes the VM during backup, it has less downtime than the Stop mode because VM does not need to go through the entire reboot cycle. Both the Suspend and Stop modes backup can be used for VM, which can have partial or full downtime without disrupting regular infrastructure operation, while the Snapshot mode is used for VMs that can have a significant impact due to their downtime.
Read more
  • 0
  • 0
  • 10779

article-image-introduction-mobile-forensics
Packt
10 Jul 2014
18 min read
Save for later

Introduction to Mobile Forensics

Packt
10 Jul 2014
18 min read
(For more resources related to this topic, see here.) In 2013, there were almost as many mobile cellular subscriptions as there were people on earth, says International Telecommunication Union (ITU). The following figure shows the global mobile cellular subscriptions from 2005 to 2013. Mobile cellular subscriptions are moving at lightning speed and passed a whopping 7 billion early in 2014. Portio Research Ltd. predicts that mobile subscribers will reach 7.5 billion by the end of 2014 and 8.5 billion by the end of 2016. Mobile cellular subscription growth from 2005 to 2013 Smartphones of today, such as the Apple iPhone, Samsung Galaxy series, and BlackBerry phones, are compact forms of computers with high performance, huge storage, and enhanced functionalities. Mobile phones are the most personal electronic device a user accesses. They are used to perform simple communication tasks, such as calling and texting, while still providing support for Internet browsing, e-mail, taking photos and videos, creating and storing documents, identifying locations with GPS services, and managing business tasks. As new features and applications are incorporated into mobile phones, the amount of information stored on the devices is continuously growing. Mobiles phones become portable data carriers, and they keep track of all your moves. With the increasing prevalence of mobile phones in peoples' daily lives and in crime, data acquired from phones become an invaluable source of evidence for investigations relating to criminal, civil, and even high-profile cases. It is rare to conduct a digital forensic investigation that does not include a phone. Mobile device call logs and GPS data were used to help solve the attempted bombing in Times Square, New York, in 2010. The details of the case can be found at http://www.forensicon.com/forensics-blotter/cell-phone-email-forensics-investigation-cracks-nyc-times-square-car-bombing-case/. The science behind recovering digital evidence from mobile phones is called mobile forensics. Digital evidence is defined as information and data that is stored on, received, or transmitted by an electronic device that is used for investigations. Digital evidence encompasses any and all digital data that can be used as evidence in a case. Mobile forensics Digital forensics is a branch of forensic science, focusing on the recovery and investigation of raw data residing in electronic or digital devices. Mobile forensics is a branch of digital forensics related to the recovery of digital evidence from mobile devices. Forensically sound is a term used extensively in the digital forensics community to qualify and justify the use of particular forensic technology or methodology. The main principle for a sound forensic examination of digital evidence is that the original evidence must not be modified. This is extremely difficult with mobile devices. Some forensic tools require a communication vector with the mobile device, thus standard write protection will not work during forensic acquisition. Other forensic acquisition methods may involve removing a chip or installing a bootloader on the mobile device prior to extracting data for forensic examination. In cases where the examination or data acquisition is not possible without changing the configuration of the device, the procedure and the changes must be tested, validated, and documented. Following proper methodology and guidelines is crucial in examining mobile devices as it yields the most valuable data. As with any evidence gathering, not following the proper procedure during the examination can result in loss or damage of evidence or render it inadmissible in court. The mobile forensics process is broken into three main categories: seizure, acquisition, and examination/analysis. Forensic examiners face some challenges while seizing the mobile device as a source of evidence. At the crime scene, if the mobile device is found switched off, the examiner should place the device in a faraday bag to prevent changes should the device automatically power on. Faraday bags are specifically designed to isolate the phone from the network. If the phone is found switched on, switching it off has a lot of concerns attached to it. If the phone is locked by a PIN or password or encrypted, the examiner will be required to bypass the lock or determine the PIN to access the device. Mobile phones are networked devices and can send and receive data through different sources, such as telecommunication systems, Wi-Fi access points, and Bluetooth. So if the phone is in a running state, a criminal can securely erase the data stored on the phone by executing a remote wipe command. When a phone is switched on, it should be placed in a faraday bag. If possible, prior to placing the mobile device in the faraday bag, disconnect it from the network to protect the evidence by enabling the flight mode and disabling all network connections (Wi-Fi, GPS, Hotspots, and so on). This will also preserve the battery, which will drain while in a faraday bag and protect against leaks in the faraday bag. Once the mobile device is seized properly, the examiner may need several forensic tools to acquire and analyze the data stored on the phone. Mobile device forensic acquisition can be performed using multiple methods, which are defined later. Each of these methods affects the amount of analysis required. Should one method fail, another must be attempted. Multiple attempts and tools may be necessary in order to acquire the most data from the mobile device. Mobile phones are dynamic systems that present a lot of challenges to the examiner in extracting and analyzing digital evidence. The rapid increase in the number of different kinds of mobile phones from different manufacturers makes it difficult to develop a single process or tool to examine all types of devices. Mobile phones are continuously evolving as existing technologies progress and new technologies are introduced. Furthermore, each mobile is designed with a variety of embedded operating systems. Hence, special knowledge and skills are required from forensic experts to acquire and analyze the devices. Mobile forensic challenges One of the biggest forensic challenges when it comes to the mobile platform is the fact that data can be accessed, stored, and synchronized across multiple devices. As the data is volatile and can be quickly transformed or deleted remotely, more effort is required for the preservation of this data. Mobile forensics is different from computer forensics and presents unique challenges to forensic examiners. Law enforcement and forensic examiners often struggle to obtain digital evidence from mobile devices. The following are some of the reasons: Hardware differences: The market is flooded with different models of mobile phones from different manufacturers. Forensic examiners may come across different types of mobile models, which differ in size, hardware, features, and operating system. Also, with a short product development cycle, new models emerge very frequently. As the mobile landscape is changing each passing day, it is critical for the examiner to adapt to all the challenges and remain updated on mobile device forensic techniques. Mobile operating systems: Unlike personal computers where Windows has dominated the market for years, mobile devices widely use more operating systems, including Apple's iOS, Google's Android, RIM's BlackBerry OS, Microsoft's Windows Mobile, HP's webOS, Nokia's Symbian OS, and many others. Mobile platform security features: Modern mobile platforms contain built-in security features to protect user data and privacy. These features act as a hurdle during the forensic acquisition and examination. For example, modern mobile devices come with default encryption mechanisms from the hardware layer to the software layer. The examiner might need to break through these encryption mechanisms to extract data from the devices. Lack of resources: As mentioned earlier, with the growing number of mobile phones, the tools required by a forensic examiner would also increase. Forensic acquisition accessories, such as USB cables, batteries, and chargers for different mobile phones, have to be maintained in order to acquire those devices. Generic state of the device: Even if a device appears to be in an off state, background processes may still run. For example, in most mobiles, the alarm clock still works even when the phone is switched off. A sudden transition from one state to another may result in the loss or modification of data. Anti-forensic techniques: Anti-forensic techniques, such as data hiding, data obfuscation, data forgery, and secure wiping, make investigations on digital media more difficult. Dynamic nature of evidence: Digital evidence may be easily altered either intentionally or unintentionally. For example, browsing an application on the phone might alter the data stored by that application on the device. Accidental reset: Mobile phones provide features to reset everything. Resetting the device accidentally while examining may result in the loss of data. Device alteration: The possible ways to alter devices may range from moving application data, renaming files, and modifying the manufacturer's operating system. In this case, the expertise of the suspect should be taken into account. Passcode recovery: If the device is protected with a passcode, the forensic examiner needs to gain access to the device without damaging the data on the device. Communication shielding: Mobile devices communicate over cellular networks, Wi-Fi networks, Bluetooth, and Infrared. As device communication might alter the device data, the possibility of further communication should be eliminated after seizing the device. Lack of availability of tools: There is a wide range of mobile devices. A single tool may not support all the devices or perform all the necessary functions, so a combination of tools needs to be used. Choosing the right tool for a particular phone might be difficult. Malicious programs: The device might contain malicious software or malware, such as a virus or a Trojan. Such malicious programs may attempt to spread over other devices over either a wired interface or a wireless one. Legal issues: Mobile devices might be involved in crimes, which can cross geographical boundaries. In order to tackle these multijurisdictional issues, the forensic examiner should be aware of the nature of the crime and the regional laws. Mobile phone evidence extraction process Evidence extraction and forensic examination of each mobile device may differ. However, following a consistent examination process will assist the forensic examiner to ensure that the evidence extracted from each phone is well documented and that the results are repeatable and defendable. There is no well-established standard process for mobile forensics. However, the following figure provides an overview of process considerations for extraction of evidence from mobile devices. All methods used when extracting data from mobile devices should be tested, validated, and well documented. A great resource for handling and processing mobile devices can be found at http://digital-forensics.sans.org/media/mobile-device-forensic-process-v3.pdf. Mobile phone evidence extraction process The evidence intake phase The evidence intake phase is the starting phase and entails request forms and paperwork to document ownership information and the type of incident the mobile device was involved in, and outlines the type of data or information the requester is seeking. Developing specific objectives for each examination is the critical part of this phase. It serves to clarify the examiner's goals. The identification phase The forensic examiner should identify the following details for every examination of a mobile device: The legal authority The goals of the examination The make, model, and identifying information for the device Removable and external data storage Other sources of potential evidence We will discuss each of them in the following sections. The legal authority It is important for the forensic examiner to determine and document what legal authority exists for the acquisition and examination of the device as well as any limitations placed on the media prior to the examination of the device. The goals of the examination The examiner will identify how in-depth the examination needs to be based upon the data requested. The goal of the examination makes a significant difference in selecting the tools and techniques to examine the phone and increases the efficiency of the examination process. The make, model, and identifying information for the device As part of the examination, identifying the make and model of the phone assists in determining what tools would work with the phone. Removable and external data storage Many mobile phones provide an option to extend the memory with removable storage devices, such as the Trans Flash Micro SD memory expansion card. In cases when such a card is found in a mobile phone that is submitted for examination, the card should be removed and processed using traditional digital forensic techniques. It is wise to also acquire the card while in the mobile device to ensure data stored on both the handset memory and card are linked for easier analysis. Other sources of potential evidence Mobile phones act as good sources of fingerprint and other biological evidence. Such evidence should be collected prior to the examination of the mobile phone to avoid contamination issues unless the collection method will damage the device. Examiners should wear gloves when handling the evidence. The preparation phase Once the mobile phone model is identified, the preparation phase involves research regarding the particular mobile phone to be examined and the appropriate methods and tools to be used for acquisition and examination. The isolation phase Mobile phones are by design intended to communicate via cellular phone networks, Bluetooth, Infrared, and wireless (Wi-Fi) network capabilities. When the phone is connected to a network, new data is added to the phone through incoming calls, messages, and application data, which modifies the evidence on the phone. Complete destruction of data is also possible through remote access or remote wiping commands. For this reason, isolation of the device from communication sources is important prior to the acquisition and examination of the device. Isolation of the phone can be accomplished through the use of faraday bags, which block the radio signals to or from the phone. Past research has found inconsistencies in total communication protection with faraday bags. Therefore, network isolation is advisable. This can be done by placing the phone in radio frequency shielding cloth and then placing the phone into airplane or flight mode. The processing phase Once the phone has been isolated from the communication networks, the actual processing of the mobile phone begins. The phone should be acquired using a tested method that is repeatable and is as forensically sound as possible. Physical acquisition is the preferred method as it extracts the raw memory data and the device is commonly powered off during the acquisition process. On most devices, the least amount of changes occur to the device during physical acquisition. If physical acquisition is not possible or fails, an attempt should be made to acquire the file system of the mobile device. A logical acquisition should always be obtained as it may contain only the parsed data and provide pointers to examine the raw memory image. The verification phase After processing the phone, the examiner needs to verify the accuracy of the data extracted from the phone to ensure that data is not modified. The verification of the extracted data can be accomplished in several ways. Comparing extracted data to the handset data Check if the data extracted from the device matches the data displayed by the device. The data extracted can be compared to the device itself or a logical report, whichever is preferred. Remember, handling the original device may make changes to the only evidence—the device itself. Using multiple tools and comparing the results To ensure accuracy, use multiple tools to extract the data and compare results. Using hash values All image files should be hashed after acquisition to ensure data remains unchanged. If file system extraction is supported, the examiner extracts the file system and then computes hashes for the extracted files. Later, any individually extracted file hash is calculated and checked against the original value to verify the integrity of it. Any discrepancy in a hash value must be explainable (for example, if the device was powered on and then acquired again, thus the hash values are different). The document and reporting phase The forensic examiner is required to document throughout the examination process in the form of contemporaneous notes relating to what was done during the acquisition and examination. Once the examiner completes the investigation, the results must go through some form of peer-review to ensure the data is checked and the investigation is complete. The examiner's notes and documentation may include information such as the following: Examination start date and time The physical condition of the phone Photos of the phone and individual components Phone status when received—turned on or off Phone make and model Tools used for the acquisition Tools used for the examination Data found during the examination Notes from peer-review The presentation phase Throughout the investigation, it is important to make sure that the information extracted and documented from a mobile device can be clearly presented to any other examiner or to a court. Creating a forensic report of data extracted from the mobile device during acquisition and analysis is important. This may include data in both paper and electronic formats. Your findings must be documented and presented in a manner that the evidence speaks for itself when in court. The findings should be clear, concise, and repeatable. Timeline and link analysis, features offered by many commercial mobile forensics tools, will aid in reporting and explaining findings across multiple mobile devices. These tools allow the examiner to tie together the methods behind the communication of multiple devices. The archiving phase Preserving the data extracted from the mobile phone is an important part of the overall process. It is also important that the data is retained in a useable format for the ongoing court process, for future reference, should the current evidence file become corrupt, and for record keeping requirements. Court cases may continue for many years before the final judgment is arrived at, and most jurisdictions require that data be retained for long periods of time for the purposes of appeals. As the field and methods advance, new methods for pulling data out of a raw, physical image may surface, and then the examiner can revisit the data by pulling a copy from the archives. Practical mobile forensic approaches Similar to any forensic investigation, there are several approaches that can be used for the acquisition and examination/analysis of data from mobile phones. The type of mobile device, the operating system, and the security setting generally dictate the procedure to be followed in a forensic process. Every investigation is distinct with its own circumstances, so it is not possible to design a single definitive procedural approach for all the cases. The following details outline the general approaches followed in extracting data from mobile devices. Mobile operating systems overview One of the major factors in the data acquisition and examination/analysis of a mobile phone is the operating system. Starting from low-end mobile phones to smartphones, mobile operating systems have come a long way with a lot of features. Mobile operating systems directly affect how the examiner can access the mobile device. For example, Android OS gives terminal-level access whereas iOS does not give such an option. A comprehensive understanding of the mobile platform helps the forensic examiner make sound forensic decisions and conduct a conclusive investigation. While there is a large range of smart mobile devices, four main operating systems dominate the market, namely, Google Android, Apple iOS, RIM BlackBerry OS, and Windows Phone. More information can be found at http://www.idc.com/getdoc.jsp?containerId=prUS23946013. Android Android is a Linux-based operating system, and it's a Google open source platform for mobile phones. Android is the world's most widely used smartphone operating system. Sources show that Apple's iOS is a close second (http://www.forbes.com/sites/tonybradley/2013/11/15/android-dominates-market-share-but-apple-makes-all-the-money/). Android has been developed by Google as an open and free option for hardware manufacturers and phone carriers. This makes Android the software of choice for companies who require a low-cost, customizable, lightweight operating system for their smart devices without developing a new OS from scratch. Android's open nature has further encouraged the developers to build a large number of applications and upload them onto Android Market. Later, end users can download the application from Android Market, which makes Android a powerful operating system. iOS iOS, formerly known as the iPhone operating system, is a mobile operating system developed and distributed solely by Apple Inc. iOS is evolving into a universal operating system for all Apple mobile devices, such as iPad, iPod touch, and iPhone. iOS is derived from OS X, with which it shares the Darwin foundation, and is therefore a Unix-like operating system. iOS manages the device hardware and provides the technologies required to implement native applications. iOS also ships with various system applications, such as Mail and Safari, which provide standard system services to the user. iOS native applications are distributed through AppStore, which is closely monitored by Apple. Windows phone Windows phone is a proprietary mobile operating system developed by Microsoft for smartphones and pocket PCs. It is the successor to Windows mobile and primarily aimed at the consumer market rather than the enterprise market. The Windows Phone OS is similar to the Windows desktop OS, but it is optimized for devices with a small amount of storage. BlackBerry OS BlackBerry OS is a proprietary mobile operating system developed by BlackBerry Ltd., known as Research in Motion (RIM), exclusively for its BlackBerry line of smartphones and mobile devices. BlackBerry mobiles are widely used in corporate companies and offer native support for corporate mail via MIDP, which enables wireless sync with Microsoft Exchange, e-mail, contacts, calendar, and so on, while used along with the BlackBerry Enterprise server. These devices are known for their security.
Read more
  • 0
  • 0
  • 20309

article-image-introduction-microsoft-remote-desktop-services-and-vdi
Packt
09 Jul 2014
10 min read
Save for later

An Introduction to Microsoft Remote Desktop Services and VDI

Packt
09 Jul 2014
10 min read
(For more resources related to this topic, see here.) Remote Desktop Services and VDI What Terminal Services did was to provide simultaneous access to the desktop running on a server/group of servers. The name was changed to Remote Desktop Services (RDS) with the release of Windows Server 2008 R2 but actually this encompasses both VDI and what was Terminal Services and now referred to as Session Virtualization. This is still available alongside VDI in Windows Server 2012 R2; each user who connects to the server is granted a Remote Desktop session (RD Session) on an RD Session Host and shares this same server-based desktop. Microsoft refers to any kind of remote desktop as a Virtual Desktop and these are grouped into Collections made available to specific group users and managed as one, and a Session Collection is a group of virtual desktop based on session virtualization. It's important to note here that what the users see with Session Virtualization is the desktop interface delivered with Windows Server which is similar to but not the same as the Windows client interface, for example, it does have a modern UI by default. We can add/remove the user interface features in Windows Server to change the way it looked, one of these the Desktop Experience option is there specifically to make Windows Server look more like Windows client and in Windows Server 2012 R2 if you add this feature option in you'll get the Windows store just as you do in Windows 8.1. VDI also provides remote desktops to our users over RDP but does this in a completely different way. In VDI each user accesses a VM running Windows Client and so the user experience looks exactly the same as it would on a laptop or physical desktop. In Windows VDI, these Collections of VMs run on Hyper-V and our users connect to them with RDP just as they can connect to RD Sessions described above. Other parts of RDS are common to both as we'll see shortly but what is important for now is that RDS manages the VDI VMs for us and organises which users are connected to which desktops and so on. So we don't directly create VMs in a Collection or directly set up security on them. Instead a collection is created from a template which is a special VM that is never turned on as the Guest OS is sysprepped and turning it on would instantly negate that. The VMs in a Collection inherit this master copy of the Guest OS and any installed applications as well. The settings for the template VM are also inherited by the virtual desktops in a collection; CPU memory, graphics, networking, and so on, and as we'll see there are a lot of VM settings that specifically apply to VDI rather than for VMs running a server OS as part of our infrastructure. To reiterate VDI in Windows Server 2012 R2, VDI is one option in an RDS deployment and it's even possible to use VDI alongside RD Sessions for our users. For example, we might decide to give RD Sessions for our call center staff and use VDI for our remote workforce. Traditionally, the RD Session Hosts have been set up on physical servers as older versions of Hyper-V and VMware weren't capable of supporting heavy workloads like this. However, we can put up to 4 TB RAM and 64 logical processors into one VM (physical hardware permitting) and run large deployment RD Sessions virtually. Our users connect to our Virtual Desktop Collections of whatever kind with a Remote Desktop Client which connects to the RDS servers using the Remote Desktop Protocol (RDP). When we connect to any server with the Microsoft Terminal Services Client (MSTSC.exe), we are using RDP for this but without setting up RDS there are only two administrative sessions available per server. Many of the advantages and disadvantages of running any kind of remote desktop apply to both solutions. Advantages of Remote Desktops Given that the desktop computing for our users is now going to be done in the data center we only need to deploy powerful desktop and laptops to those users who are going to have difficulty connecting to our RDS infrastructure. Everyone else could either be equipped with thin client devices, or given access in from other devices they already have if working remotely, such as tablets or their home PCs and laptops. Thin client computing has evolved in line with advances in remote desktop computing and the latest devices from 10Zig, Dell, HP among others support multiple hi-res monitors, smart cards, and web cams for unified communications which are also enabled in the latest version of RDP (8.1). Using remote desktops can also reduce overall power consumption for IT as an efficiently cooled data center will consume less power than the sum of thin clients and servers in an RDS deployment; in one case, I saw this saving resulted in a green charity saving 90 percent of its IT power bill. Broadly speaking, managing remote desktops ought to be easier than managing their physical equivalents, for a start they'll be running on standard hardware so installing and updating drivers won't be so much of an issue. RDS has specific tooling for this to create a largely automatic process for keeping our collections of remote desktops in a desired state. VDI doesn't exist in a vacuum and there are other Microsoft technologies to make any desktop management easier with other types of virtualization: User profiles: They have long been a problem for desktop specialists. There are dedicated technologies built into session virtualization and VDI to allow user's settings and data to be stored away from the VHD with the Guest OS on. Techniques such as folder redirection can also help here and new for Windows Server 2012 is User Environment Virtualization (UE-V) which provides a much richer set of tools that can work across VDI, RD Sessions and physical desktops to ensure the user has the same experience no matter what they are using for a Windows Client desktop. Application Virtualization (App-V): This allows us to deploy application to the user who need them when and where they need them, so we don't need to create desktops for different type of users who need special applications; we just need a few of these and only deploy generic applications on these and those that can't make use of App-V. Even if App-V is not deployed VDI administrator have total control over remote desktops, as any we have any number of security techniques at our disposal and if the worst comes to the worst we can remove any installed applications every time a user logs off! The simple fact that the desktop and applications we are providing to our users are now running on servers under our direct control also increase our IT security. Patching is now easy to enable particularly for our remote workforce as when they are on site or working remotely their desktop is still in the data center. RDS in all its forms is then an ideal way of allowing a Bring Your Own Device (BYOD) policy. Users can bring in whatever device they wish into work or work at home on their own device (WHOYD is my own acronym for this!) by using an RD Client on that device and securely connecting with it. Then there are no concerns about partially or completely wiping users' own devices or not being able to because they aren't in a connected state when they are lost or stolen. VDI versus Session Virtualization So why are there these two ways of providing Remote Desktops in Windows and what are the advantages and disadvantages of each? First and foremost Session Virtualization is always going to be much more efficient than VDI as it's going to provide more desktops on less hardware. This makes sense if we look at what's going on in each scenario if we have a hundred remote desktop users to provide for: In Session Virtualization, we are running and sharing one operating system and this will comfortably sit on 25 GB of disk. For memory we need roughly 2 GB per CPU, we can then allocate 100 MB memory to each RD Session. So on a quad CPU Server for a hundred RD Sessions we'll need 8 GB + (100MB*100), so less than 18 GB RAM. If we want to support that same hundred users on VDI, we need to provision a hundred VMs each with its own OS. To do this we need 2.5 TB of disk and if we give each VM 1 GB of RAM then 100 GB of RAM is needed. This is a little unfair on VDI in Windows Server 2012 R2 as we can cut down on the disk space need and be much more efficient in memory than this but even with these new technologies we would need 70 GB of RAM and say 400 GB of disk. Remember that with RDS our users are going to be running the desktop that comes with Windows Server 2012 R2 not Windows 8.1. Our RDS users are sharing the one desktop so they cannot be given administrative rights to that desktop. This may not be that important for them but can also affect what applications can be put onto the server desktop and some applications just won't work on a Server OS anyway. Users' sessions can't be moved from one session host to another while the user is logged in so planned maintenance has to be carefully thought through and may mean working out of hours to patch servers and update applications if we want to avoid interrupting our users' work. VDI on the other hand means that our users are using Windows 8.1 which they will be happier with. And this may well be the deciding factor regardless of cost as our users are paying for this and their needs should take precedence. VDI can be easier to manage without interrupting our users as we can move running VMs around our physical hosts without stopping them. Remote applications in RDS Another part of the RDS story is the ability to provide remote access to individual applications rather than serving up a whole desktop. This is called RD Remote App and it simply provides a shortcut to an individual application running either on a virtual or remote session desktop. I have seen this used for legacy applications that won't run on the latest versions of Windows and to provide access to secure application as it's a simple matter to prevent cut and paste or any sharing of data between a remote application and the local device it's executing on. RD Remote Apps work by publishing only the specified applications installed on our RD Session Hosts or VDI VMs. Summary This article discusses the advantages of Remote Desktops. We also learned how they can be provided in Windows. This article also compares VDI to Session Virtualization. Resources for Article: Further resources on this subject: Designing, Sizing, Building, and Configuring Citrix VDI-in-a-Box [article] Installing Virtual Desktop Agent – server OS and desktop OS [article] VMware View 5 Desktop Virtualization [article]
Read more
  • 0
  • 0
  • 5967
Packt
07 Jul 2014
12 min read
Save for later

HTML5 Game Development – A Ball-shooting Machine with Physics Engine

Packt
07 Jul 2014
12 min read
(For more resources related to this topic, see here.) Mission briefing In this article, we focus on the physics engine. We will build a basketball court where the player needs to shoot the ball in to the hoop. A player shoots the ball by keeping the mouse button pressed and releasing it. The direction is visualized by an arrow and the power is proportional to the duration of the mouse press and hold event. There are obstacles present between the ball and the hoop. The player either avoids the obstacles or makes use of them to put the ball into the hoop. Finally, we use CreateJS to visualize the physics world into the canvas. You may visit http://makzan.net/html5-games/ball-shooting-machine/ to play a dummy game in order to have a better understanding of what we will be building throughout this article. The following screenshot shows a player shooting the ball towards the hoop, with a power indicator: Why is it awesome? When we build games without a physics engine, we create our own game loop and reposition each game object in every frame. For instance, if we move a character to the right, we manage the position and movement speed ourselves. Imagine that we are coding a ball-throwing logic now. We need to keep track of several variables. We have to calculate the x and y velocity based on the time and force applied. We also need to take the gravity into account; not to mention the different angles and materials we need to consider while calculating the bounce between the two objects. Now, let's think of a physical world. We just defined how objects interact and all the collisions that happen automatically. It is similar to a real-world game; we focus on defining the rule and the world will handle everything else. Take basketball as an example. We define the height of the hoop, size of the ball, and distance of the three-point line. Then, the players just need to throw the ball. We never worry about the flying parabola and the bouncing on the board. Our space takes care of them by using the physics laws. This is exactly what happens in the simulated physics world; it allows us to apply the physics properties to game objects. The objects are affected by the gravity and we can apply forces to them, making them collide with each other. With the help of the physics engine, we can focus on defining the game-play rules and the relationship between the objects. Without the need to worry about collision and movement, we can save time to explore different game plays. We then elaborate and develop the setup further, as we like, among the prototypes. We define the position of the hoop and the ball. Then, we apply an impulse force to the ball in the x and y dimensions. The engine will handle all the things in between. Finally, we get an event trigger if the ball passes through the hoop. It is worth noting that some blockbuster games are also made with a physics engine. This includes games such as Angry Birds, Cut the Rope, and Where's My Water. Your Hotshot objectives We will divide the article into the following eight tasks: Creating the simulated physics world Shooting a ball Handling collision detection Defining levels Launching a bar with power Adding a cross obstacle Visualizing graphics Choosing a level Mission checklist We create a project folder that contains the index.html file and the scripts and styles folders. Inside the scripts folder, we create three files: physics.js, view.js, and game.js. The physics.js file is the most important file in this article. It contains all the logic related to the physics world including creating level objects, spawning dynamic balls, applying force to the objects, and handling collision. The view.js file is a helper for the view logic including the scoreboard and the ball-shooting indicator. The game.js file, as usual, is the entry point of the game. It also manages the levels and coordinates between the physics world and view. Preparing the vendor files We also need a vendors folder that holds the third-party libraries. This includes the CreateJS suite—EaselJS, MovieClip, TweenJS, PreloadJS—and Box2D. Box2D is the physics engine that we are going to use in this article. We need to download the engine code from https://code.google.com/p/box2dweb/. It is a port version from ActionScript to JavaScript. We need the Box2dWeb-2.1.a.3.min.js file or its nonminified version for debugging. We put this file in the vendors folder. Box2D is an open source physics-simulation engine that was created by Erin Catto. It was originally written in C++. Later, it was ported to ActionScript because of the popularity of Flash games, and then it was ported to JavaScript. There are different versions of ports. The one we are using is called Box2DWeb, which was ported from ActionScript's version Box2D 2.1. Using an old version may cause issues. Also, it will be difficult to find help online because most developers have switched to 2.1. Creating a simulated physics world Our first task is to create a simulated physics world and put two objects inside it. Prepare for lift off In the index.html file, the core part is the game section. We have two canvas elements in this game. The debug-canvas element is for the Box2D engine and canvas is for the CreateJS library: <section id="game" class="row"> <canvas id="debug-canvas" width="480" height="360"></canvas> <canvas id="canvas" width="480" height="360"></canvas> </section> We prepare a dedicated file for all the physics-related logic. We prepare the physics.js file with the following code: ;(function(game, cjs, b2d){ // code here later }).call(this, game, createjs, Box2D); Engage thrusters The following steps create the physics world as the foundation of the game: The Box2D classes are put in different modules. We will need to reference some common classes as we go along. We use the following code to create an alias for these Box2D classes: // alias var b2Vec2 = Box2D.Common.Math.b2Vec2 , b2AABB = Box2D.Collision.b2AABB , b2BodyDef = Box2D.Dynamics.b2BodyDef , b2Body = Box2D.Dynamics.b2Body , b2FixtureDef = Box2D.Dynamics.b2FixtureDef , b2Fixture = Box2D.Dynamics.b2Fixture , b2World = Box2D.Dynamics.b2World , b2MassData = Box2D.Collision.Shapes.b2MassData , b2PolygonShape = Box2D.Collision.Shapes.b2PolygonShape , b2CircleShape = Box2D.Collision.Shapes.b2CircleShape , b2DebugDraw = Box2D.Dynamics.b2DebugDraw , b2MouseJointDef = Box2D.Dynamics.Joints.b2MouseJointDef , b2RevoluteJointDef = Box2D.Dynamics.Joints.b2RevoluteJointDef ; We prepare a variable that states how many pixels define 1 meter in the physics world. We also define a Boolean to determine if we need to draw the debug draw: var pxPerMeter = 30; // 30 pixels = 1 meter. Box3D uses meters and we use pixels. var shouldDrawDebug = false; All the physics methods will be put into the game.physics object. We create this literal object before we code our logics: var physics = game.physics = {}; The first method in the physics object creates the world: physics.createWorld = function() { var gravity = new b2Vec2(0, 9.8); this.world = new b2World(gravity, /*allow sleep= */ true); // create two temoporary bodies var bodyDef = new b2BodyDef; var fixDef = new b2FixtureDef; bodyDef.type = b2Body.b2_staticBody; bodyDef.position.x = 100/pxPerMeter; bodyDef.position.y = 100/pxPerMeter; fixDef.shape = new b2PolygonShape(); fixDef.shape.SetAsBox(20/pxPerMeter, 20/pxPerMeter); this.world.CreateBody(bodyDef).CreateFixture(fixDef); bodyDef.type = b2Body.b2_dynamicBody; bodyDef.position.x = 200/pxPerMeter; bodyDef.position.y = 100/pxPerMeter; this.world.CreateBody(bodyDef).CreateFixture(fixDef); // end of temporary code } The update method is the game loop's tick event for the physics engine. It calculates the world step and refreshes debug draw. The world step upgrades the physics world. We'll discuss it later: physics.update = function() { this.world.Step(1/60, 10, 10); if (shouldDrawDebug) { this.world.DrawDebugData(); } this.world.ClearForces(); }; Before we can refresh the debug draw, we need to set it up. We pass a canvas reference to the Box2D debug draw instance and configure the drawing settings: physics.showDebugDraw = function() { shouldDrawDebug = true; //set up debug draw var debugDraw = new b2DebugDraw(); debugDraw.SetSprite(document.getElementById("debug-canvas").getContext("2d")); debugDraw.SetDrawScale(pxPerMeter); debugDraw.SetFillAlpha(0.3); debugDraw.SetLineThickness(1.0); debugDraw.SetFlags(b2DebugDraw.e_shapeBit | b2DebugDraw.e_jointBit); this.world.SetDebugDraw(debugDraw); }; Let's move to the game.js file. We define the game-starting logic that sets up the EaselJS stage and Ticker. It creates the world and sets up the debug draw. The tick method calls the physics.update method: ;(function(game, cjs){ game.start = function() { cjs.EventDispatcher.initialize(game); // allow the game object to listen and dispatch custom events. game.canvas = document.getElementById('canvas'); game.stage = new cjs.Stage(game.canvas); cjs.Ticker.setFPS(60); cjs.Ticker.addEventListener('tick', game.stage); // add game.stage to ticker make the stage.update call automatically. cjs.Ticker.addEventListener('tick', game.tick); // gameloop game.physics.createWorld(); game.physics.showDebugDraw(); }; game.tick = function(){ if (cjs.Ticker.getPaused()) { return; } // run when not paused game.physics.update(); }; game.start(); }).call(this, game, createjs); After these steps, we should have a result as shown in the following screenshot. It is a physics world with two bodies. One body stays in position and the other one falls to the bottom. Objective complete – mini debriefing We have defined our first physical world with one static object and one dynamic object that falls to the bottom. A static object is an object that is not affected by gravity and any other forces. On the other hand, a dynamic object is affected by all the forces. Defining gravity In reality, we have gravity on every planet. It's the same in the Box2D world. We need to define gravity for the world. This is a ball-shooting game, so we will follow the rules of gravity on Earth. We use 0 for the x-axis and 9.8 for the y-axis. It is worth noting that we do not need to use the 9.8 value. For instance, we can set a smaller gravity value to simulate other planets in space—maybe even the moon; or, we can set the gravity to zero to create a top-down view of the ice hockey game, where we apply force to the puck and benefit from the collision. Debug draw The physics engine focuses purely on the mathematical calculation. It doesn't care about how the world will be presented finally, but it does provide a visual method in order to make the debugging easier. This debug draw is very useful before we use our graphics to represent the world. We won't use the debug draw in production. Actually, we can decide how we want to visualize this physics world. We have learned two ways to visualize the game. The first way is by using the DOM objects and the second one is by using the canvas drawing method. We will visualize the world with our graphics in later tasks. Understanding body definition and fixture definition In order to define objects in the physics world, we need two definitions: a body definition and fixture definition. The body is in charge of the physical properties, such as its position in the world, taking and applying force, moving speed, and the angular speed when rotating. We use fixtures to handle the shape of the object. The fixture definition also defines the properties on how the object interacts with others while colliding, such as friction and restitution. Defining shapes Shapes are defined in a fixture. The two most common shapes in Box2D are rectangle and circle. We define a rectangle with the SetAsBox function by providing half of its width and height. Also, the circle shape is defined by the radius. It is worth noting that the position of the body is at the center of the shape. It is different from EaselJS in that the default origin point is set at the top-left corner. Pixels per meter When we define the dimension and location of the body, we use meter as a unit. That's because Box2D uses metric for calculation to make the physics behavior realistic. But we usually calculate in pixels on the screen. So, we need to convert between pixels on the screen and meters in the physics world. That's why we need the pxPerMeter variable here. The value of this variable might change from project to project. The update method In the game tick, we update the physics world. The first thing we need to do is take the world to the next step. Box2D calculates objects based on steps. It is the same as we see in the physical world when a second is passed. If a ball is falling, at any fixed time, the ball is static with the property of the falling velocity. In the next millisecond, or nanosecond, the ball falls to a new position. This is exactly how steps work in the Box2D world. In every single step, the objects are static with their physics properties. When we go a step further, Box2D takes the properties into consideration and applies them to the objects. This step takes three arguments. The first argument is the time passed since the last step. Normally, it follows the frame-per-second parameter that we set for the game. The second and the third arguments are the iteration of velocity and position. This is the maximum iterations Box2D tries when resolving a collision. Usually, we set them to a low value. The reason we clear the force is because the force will be applied indefinitely if we do not clear it. That means the object keeps receiving the force on each frame until we clear it. Normally, clearing forces on every frame will make the objects more manageable. Classified intel We often need to represent a 2D vector in the physics world. Box2D uses b2vec for this purpose. Similar to the b2vec function, we use quite a lot of Box2D functions and classes. They are modularized into namespaces. We need to alias the most common classes to make our code shorter.
Read more
  • 0
  • 0
  • 5564

article-image-creating-maze-and-animating-cube
Packt
07 Jul 2014
9 min read
Save for later

Creating the maze and animating the cube

Packt
07 Jul 2014
9 min read
(For more resources related to this topic, see here.) A maze is a rather simple shape that consists of a number of walls and a floor. So, what we need is a way to create these shapes. Three.js, not very surprisingly, doesn't have a standard geometry that will allow you to create a maze, so we need to create this maze by hand. To do this, we need to take two different steps: Find a way to generate the layout of the maze so that not all the mazes look the same. Convert that to a set of cubes (THREE.BoxGeometry) that we can use to render the maze in 3D. There are many different algorithms that we can use to generate a maze, and luckily there are also a number of open source JavaScript libraries that implement such an algorithm. So, we don't have to start from scratch. For the example in this book, I've used the following random-maze-generator project that you can find on GitHub at the following link: https://github.com/felipecsl/random-maze-generator Generating a maze layout Without going into too much detail, this library allows you to generate a maze and render it on an HTML5 canvas. The result of this library looks something like the following screenshot: You can generate this by just using the following JavaScript: var maze = new Maze(document, 'maze'); maze.generate(); maze.draw(); Even though this is a nice looking maze, we can't use this directly to create a 3D maze. What we need to do is change the code the library uses to write on the canvas, and change it to create Three.js objects. This library draws the lines on the canvas in a function called drawLine: drawLine: function(x1, y1, x2, y2) { self.ctx.beginPath(); self.ctx.moveTo(x1, y1); self.ctx.lineTo(x2, y2); self.ctx.stroke(); } If you're familiar with the HTML5 canvas, you can see that this function draws lines based on the input arguments. Now that we've got this maze, we need to convert it to a number of 3D shapes so that we can render them in Three.js. Converting the layout to a 3D set of objects To change this library to create Three.js objects, all we have to do is change the drawLine function to the following code snippet: drawLine: function(x1, y1, x2, y2) { var lengthX = Math.abs(x1 - x2); var lengthY = Math.abs(y1 - y2); // since only 90 degrees angles, so one of these is always 0 // to add a certain thickness to the wall, set to 0.5 if (lengthX === 0) lengthX = 0.5; if (lengthY === 0) lengthY = 0.5; // create a cube to represent the wall segment var wallGeom = new THREE.BoxGeometry(lengthX, 3, lengthY); var wallMaterial = new THREE.MeshPhongMaterial({ color: 0xff0000, opacity: 0.8, transparent: true }); // and create the complete wall segment var wallMesh = new THREE.Mesh(wallGeom, wallMaterial); // finally position it correctly wallMesh.position = new THREE.Vector3( x1 - ((x1 - x2) / 2) - (self.height / 2), wallGeom.height / 2, y1 - ((y1 - y2)) / 2 - (self.width / 2)); self.elements.push(wallMesh); scene.add(wallMesh); } In this new drawLine function, instead of drawing on the canvas, we create a THREE.BoxGeometry object whose length and depth are based on the supplied arguments. Using this geometry, we create a THREE.Mesh object and use the position attribute to position the mesh on a specific points with the x, y, and z coordinates. Before we add the mesh to the scene, we add it to the self.elements array. Now we can just use the following code snippet to create a 3D maze: var maze = new Maze(scene,17, 100, 100); maze.generate(); maze.draw(); As you can see, we've also changed the input arguments. These properties now define the scene to which the maze should be added and the size of the maze. The result from these changes can be seen in the following screenshot: Every time you refresh, you'll see a newly generated random maze. Now that we've got our generated maze, the next step is to add the object that we'll move through the maze. Animating the cube Before we dive into the code, let's first look at the result as shown in the following screenshot: Using the controls at the top-right corner, you can move the cube around. What you'll see is that the cube rotates around its edges, not around its center. In this section, we'll show you how to create that effect. Let's first look at the default rotation, which is along an object's central axis, and the translation behavior of Three.js. The standard Three.js rotation behavior Let's first look at all the properties you can set on THREE.Mesh. They are as follows: Function/property Description position This property refers to the position of an object, which is relative to the position of its parent. In all our examples, so far the parent is THREE.Scene. rotation This property defines the rotation of THREE.Mesh around its own x, y, or z axis. scale With this property, you can scale the object along its own x, y, and z axes. translateX(amount) This property moves the object by a specified amount over the x axis. translateY(amount) This property moves the object by a specified amount over the y axis. translateZ(amount) This property moves the object by a specified amount over the z axis. If we want to rotate a mesh around one of its own axes, we can just call the following line of code: plane.rotation.x = -0.5 * Math.PI; We've used this to rotate the ground area from a horizontal position to a vertical one. It is important to know that this rotation is done around its own internal axis, not the x, y, or z axis of the scene. So, if you first do a number of rotations one after another, you have to keep track at the orientation of your mesh to make sure you get the required effect. Another point to note is that rotation is done around the center of the object—in this case the center of the cube. If we look at the effect we want to accomplish, we run into the following two problems: First, we don't want to rotate around the center of the object; we want to rotate around one of its edges to create a walking-like animation Second, if we use the default rotation behavior, we have to continuously keep track of our orientation since we're rotating around our own internal axis In the next section, we'll explain how you can solve these problems by using matrix-based transformations. Creating an edge rotation using matrix-based transformation If we want to perform edge rotations, we have to take the following few steps: If we want to rotate around the edge, we have to change the center point of the object to the edge we want to rotate around. Since we don't want to keep track of all the rotations we've done, we'll need to make sure that after each rotation, the vertices of the cube represent the correct position. Finally, after we've rotated around the edge, we have to do the inverse of the first step. This is to make sure the center point of the object is back in the center of the cube so that it is ready for the next step. So, the first thing we need to do is change the center point of the cube. The approach we use is to offset the position of all individual vertices and then change the position of the cube in the opposite way. The following example will allow us to make a step to the right-hand side: cubeGeometry.applyMatrix(new THREE.Matrix4().makeTranslation (0, width / 2, width / 2)); cube.position.y += -width / 2; cube.position.z += -width / 2; With the cubeGeometry.applyMatrix function, we can change the position of the individual vertices of our geometry. In this example, we will create a translation (using makeTranslation), which offsets all the y and z coordinates by half the width of the cube. The result is that it will look like the cube moved a bit to the right-hand side and then up, but the actual center of the cube now is positioned at one of its lower edges. Next, we use the cube.position property to position the cube back at the ground plane since the individual vertices were offset by the makeTranslation function. Now that the edge of the object is positioned correctly, we can rotate the object. For rotation, we could use the standard rotation property, but then, we will have to constantly keep track of the orientation of our cube. So, for rotations, we once again use a matrix transformation on the vertices of our cube: cube.geometry.applyMatrix(new THREE.Matrix4().makeRotationX(amount); As you can see, we use the makeRotationX function, which changes the position of our vertices. Now we can easily rotate our cube, without having to worry about its orientation. The final step we need to take is reset the cube to its original position; taking into account that we've moved a step to the right, we can take the next step: cube.position.y += width/2; // is the inverse + width cube.position.z += -width/2; cubeGeometry.applyMatrix(new THREE.Matrix4().makeTranslation(0, - width / 2, width / 2)); As you can see, this is the inverse of the first step; we've added the width of the cube to position.y and subtracted the width from the second argument of the translation to compensate for the step to the right-hand side we've taken. If we use the preceding code snippet, we will only see the result of the step to the right. Summary In this article, we have seen how to create a maze and animate a cube. Resources for Article: Further resources on this subject: Working with the Basic Components That Make Up a Three.js Scene [article] 3D Websites [article] Rich Internet Application (RIA) – Canvas [article]
Read more
  • 0
  • 0
  • 15469

article-image-baking-bits-yocto-project
Packt
07 Jul 2014
3 min read
Save for later

Baking Bits with Yocto Project

Packt
07 Jul 2014
3 min read
(For more resources related to this topic, see here.) Understanding the BitBake tool The BitBake task scheduler started as a fork from Portage, which is the package management system used in the Gentoo distribution. However, nowadays the two projects have diverged a lot due to the different usage focusses. The Yocto Project and the OpenEmbedded Project are the most known and intensive users of BitBake, which remains a separated and independent project with its own development cycle and mailing list (bitbake-devel@lists.openembedded.org). BitBake is a task scheduler that parses Python and the Shell Script mixed code. The code parsed generates and runs tasks that may have a complex dependency chain, which is scheduled to allow a parallel execution and maximize the use of computational resources. BitBake can be understood as a tool similar to GNU Make in some aspects. Exploring metadata The metadata used by BitBake can be classified into three major areas: Configuration (the .conf files) Classes (the .bbclass files) Recipes (the .bb and and .bbappend files) The configuration files define the global content, which is used to provide information and configure how the recipes will work. One common example of the configuration file is the machine file that has a list of settings, which describes the board hardware. The classes are used by the whole system and can be inherited by recipes, according to their needs or by default, as in this case with classes used to define the system behavior and provide the base methods. For example, kernel.bbclass helps the process of build, install, and package of the Linux kernel, independent of version and vendor. The recipes and classes are written in a mix of Python and Shell Scripting code. The classes and recipes describe the tasks to be run and provide the needed information to allow BitBake to generate the needed task chain. The inheritance mechanism that permits a recipe to inherit one or more classes is useful to reduce code duplication, and eases the maintenance. A Linux kernel recipe example is linux-yocto_3.14.bb, which inherits a set of classes, including kernel.bbclass. BitBake's most commonly used aspects across all types of metadata (.conf, .bb, and .bbclass). Summary In this article, we studied about BitBake and the classification of metadata. Resources for Article: Further resources on this subject: Installation and Introduction of K2 Content Construction Kit [article] Linux Shell Script: Tips and Tricks [article] Customizing a Linux kernel [article]
Read more
  • 0
  • 0
  • 22727
article-image-validating-user-input-workflow-transitions
Packt
07 Jul 2014
6 min read
Save for later

Validating user input in workflow transitions

Packt
07 Jul 2014
6 min read
(For more resources related to this topic, see here.) For workflow transitions that have transition screens, you can add validation logic to make sure what the users put in is what you are expecting. This is a great way to ensure data integrity, and we can do this with workflow validators. In this recipe, we will add a validator to perform a date comparison between a custom field and the issue's create date, so the date value we select for the custom field must be after the issue's create date. Getting ready For this recipe, we need to have the JIRA Suite Utilities add-on installed. You can download it from the following link or install it directly using the Universal Plugin Manager: https://marketplace.atlassian.com/plugins/com.googlecode.jira-suiteutilities Since we are also doing a date comparison, we need to create a new date custom field called Start Date and add it to the Workflow Screen. How to do it… Perform the following steps to add validation rules during a workflow transition: Select and edit the workflow to configure. Select the Diagram mode. Select the Start Progress transition and click on the Validators link towards the right-hand side. Click on the Add validator link and select Date Compare from the list. Select the Start Date custom field for This date, > for Condition, and Created for Compare with, and click on Update to add the validator: Click on Publish Draft to apply the change. After we add the validator, if we now try to select a date that is before the issue's create date, JIRA will prompt you with an error message and stop the transition from going through, as shown in the following screenshot: How it works… Validators are run after the user has executed the transition but before the transition starts. This way, validators can intercept and prevent a transition from going through if one or more of the validation logics fail. If you have more than one validator, all of them must pass for the transition to go through. See also Validators can be used to make a field required only during workflow transitions. Creating custom workflow transition logic In the previous recipe, we have looked at using workflow conditions, validators, and post functions that come out of the box with JIRA and from other third-party add-ons. In this recipe, we will take a look at how to use scripts to define our own validation rules for a workflow validator. We will address a common use case, which is to make a field required during a workflow transition only when another field is set to a certain value. So, our validation logic will be as follows: If the Resolution field is set to Fixed, the Solution Details field will be required If the Resolution field is set to a value other than Fixed, the Solution Details field will not be required Getting ready For this recipe, we need to have the Script Runner add-on installed. You can download it from the following link or install it directly using the Universal Plugin Manager: https://marketplace.atlassian.com/plugins/com.onresolve.jira.groovy.groovyrunner You might also want to get familiar with Groovy scripting (http://groovy.codehaus.org). How to do it… Perform the following steps to set up a validator with custom-scripted logic: Select and edit the Simple Workflow. Select the Diagram mode. Click on the Move to Backlog global workflow transition. Click on the Validators link from the panel on the right-hand side. Click on Add validator, select Script Validator from the list, and click on Add. Now enter the following script code in the Condition textbox: import com.opensymphony.workflow.InvalidInputException import com.atlassian.jira.ComponentManager import org.apache.commons.lang.StringUtils def customFieldManager = ComponentManager.getInstance().getCustomFieldManager() def solutionField = customFieldManager.getCustomFieldObjectByName("Solution Details") def resolution = issue.getResolutionObject().getName() def solution = issue.getCustomFieldValue(solutionField) if(resolution == "Fixed" && StringUtils.isBlank(solution)) { false } else { true } Enter the text "You must provide the Solution Details if the Resolution is set to Fixed." for the Error field. Select Solution Details for Field. Click on the Add button to complete the validator setup. Click on Publish Draft to apply the change. Your validator configuration should look something like the following screenshot: Once we add our custom validator, every time the Move to Backlog transition is executed, the Groovy script will run. If the resolution field is set to Fixed and the Solution Details field is empty, we will get a message from the Error field as shown in the following screenshot: How it works… The Script Validator works just like any other validator, except that we can define our own validation logic using Groovy scripts. So, let's go through the script and see what it does. We first get the Solution Details custom field via its name, as shown in the following line of code. If you have more than one custom field with the same name, you need to use its ID instead of its name. def solutionField = customFieldManager.getCustomFieldObjectByName("Solution Details") We then select the resolution value and obtain the value entered for Solution Details during the transition, as follows: def resolution = issue.getResolutionObject().getName() def solution = issue.getCustomFieldValue(solutionField) In our example, we check the resolution name; we can also check the ID of the resolution by changing getName() to getId(). If you have multiple custom fields with the same name, use getId(). Lastly, we check whether the Resolution value is Fixed and the Solution Details value is blank; in this case, we return a value of false, so the validation fails. All other cases will return the value true, so the validation passes. We also use StringUtils.isBlank(solution) to check for blank values so that we can catch cases when users enter empty spaces for the Solution Details field. There's more… You are not limited to creating scripted validators. With the Script Runner add-on, you can create scripted conditions and post functions using Groovy scripts. Summary In this article, we studied how to work with workflows, including permissions and user input validation. Resources for Article: Further resources on this subject: Getting Started with JIRA 4 [Article] Gadgets in JIRA [Article] Advanced JIRA 5.2 Features [Article]
Read more
  • 0
  • 0
  • 4081

article-image-part-1-deploying-multiple-applications-capistrano-single-project
Rodrigo Rosenfeld
01 Jul 2014
9 min read
Save for later

Part 1: Deploying Multiple Applications with Capistrano from a Single Project

Rodrigo Rosenfeld
01 Jul 2014
9 min read
Capistrano is a deployment tool written in Ruby that is able to deploy projects using any language or framework, through a set of recipes, which are also written in Ruby. Capistrano expects an application to have a single repository and it is able to run arbitrary commands on the server through an SSH non-interactive session. Capistrano was designed assuming that an application is completely described by a single repository with all code belonging to it. For example, your web application is written with Ruby on Rails and simply serving that application would be enough. But what if you decide to use a separate application for managing your users, in a separate language and framework? Or maybe some issue tracker application? You could setup a proxy server to properly deliver each request to the right application based upon the request path for example. But the problem remains: how do you use Capistrano to manage more complex scenarios like this if it supports a single repository? The typical approach is to integrate Capistrano on each of the component applications and then switching between those projects before deploying those components. Not only this is a lot of work to deploy all of these components, but it may also lead to a duplication of settings. For example, if your main application and the user management application both use the same database for a given environment, you’d have to duplicate this setting in each of the components. For the Market Tracker product, used byLexisNexis clients (which we develop at e-Core for Matterhorn Transactions Inc.), we were looking for a better way to manage many component applications, in lots of environments and servers. We wanted to manage all of them from a single repository, instead of adding Capistrano integration to each of our component’s repositories and having to worry about keeping the recipes in sync between each of the maintained repository branches. Motivation The Market Tracker application we maintain consists of three different applications: the main one, another to export search results to Excel files, and an administrative interface to manage users and other entities. We host the application in three servers: two for the real thing and another back-up server. The first two are identical ones and allow us to have redundancy and zero downtime deployments except for a few cases where we change our database schema in incompatible ways with previous versions. To add to the complexity of deploying our three composing applications to each of those servers, we also need to deploy them multiple times for different environments like production, certification, staging, and experimental. All of them run on the same server, in separate ports, and they are running separate databases:Solr and Redis instances. This is already complex enough to manage when you integrate Capistrano to each of your projects, but it gets worse. Sometimes you find bugs in production and have to release quick fixes, but you can't deploy the version in the master branch that has several other changes. At other times you find bugs on your Capistrano recipes themselves and fix them on the master. Or maybe you are changing your deploy settings rather than the application’s code. When you have to deploy to production, depending on how your Capistrano recipes work, you may have to change to the production branch, backport any changes for the Capistrano recipes from the master and finally deploy the latest fixes. This happens if your recipe will use any project files as a template and they moved to another place in the master branch, for example. We decided to try another approach, similar to what we do with our database migrations. Instead of integrating the database migrations into the main application (the default on Rails, Django, Grails, and similar web frameworks) we prefer to handle it as a separate project. In our case we use theactive_record_migrations gem, which brings standalone support for ActiveRecord migrations (the same that is bundled with Rails apps by default). Our database is shared between the administrative interface project and the main web application and we feel it's better to be able to manage our database schema independently from the projects using the database. We add the migrations project to the other application as submodules so that we know what database schema is expected to work for a particular commit of the application, but that's all. We wanted to apply the same principles to our Capistrano recipes. We wanted to manage all of our applications on different servers and environments from a single project containing the Capistrano recipes. We also wanted to store the common settings in a single place to avoid code duplication, which makes it hard to add new environments or update existing ones. Grouping all applications' Capistrano recipes in a single project It seems we were not the first to want all Capistrano recipes for all of our applications in a single project. We first tried a project called caphub. It worked fine initially and its inheritance model would allow us to avoid our code duplication. Well, not entirely. The problem is that we needed some kind of multiple inheritances or mixins. We have some settings, like token private key, that are unique across environments, like Certification and Production. But we also have other settings that are common in within a server. For example, the database host name will be the same for all applications and environments inside our collocation facility, but it will be different in our backup server at Amazon EC2. CapHub didn't help us to get rid of the duplication in such cases, but it certainly helped us to find a simple solution to get what we wanted. Let's explore how Capistrano 3 allows us to easily manage such complex scenarios that are more common than you might think. Capistrano stages Since Capistrano 3, multistage support is built-in (there was a multistage extension for Capistrano 2). That means you can writecap stage_nametask_name, for examplecap production deploy. By default,cap install will generate two stages: production and staging. You can generate as many as you want, for example: cap install STAGES=production,cert,staging,experimental,integrator But how do we deploy each of those stages to our multiple servers, since the settings for each stage may be different across the servers? Also, how can we manage separate applications? Even though those settings are called "stages" by Capistrano, you can use it as you want. For example, suppose our servers are named m1,m2, and ec2 and the applications are named web, exporter and admin. We can create settings likem1_staging_web, ec2_production_admin, and so on. This will result in lots of files (specifically 45 = 5 x 3 x 3 to support five environments, three applications, and three servers) but it's not a big deal if you consider the settings files can be really small, as the examples will demonstrate later on in this article by using mixins. Usually people will start with staging and production only, and then gradually add other environments. Also, they usually start with one or two servers and keep growing as they feel the need. So supporting 45 combinations is not such a pain since you don’t write all of them at once. On the other hand, if you have enough resources to have a separate server for each of your environments, Capistrano will allow you to add multiple "server" declarations and assign roles to them, which can be quite useful if you're running a cluster of servers. In our case, to avoid downtime we don't upgrade all servers in our cluster at once. We also don't have the budget to host 45 virtual machines or even 15. So the little effort to generate 45 small settings files compensates the savings with hosting expenses. Using mixins My next post will create an example deployment project from scratch providing detail for everything that has been discussed in this post. But first, let me introduce the concept of what we call a mixin in our project. Capistrano 3 is simply a wrapper on top of Rake. Rake is a build tool written in Ruby, similar to “make.” It has targets and targets have prerequisites. This fits nicely in the way Capistrano works, where some deployment tasks will depend on other tasks. Instead of a Rakefile (Rake’s Makefile) Capistrano will use a Capfile, but other than that it works almost the same way. The Domain Specific Language (DSL) in a Capfile is enhanced as you include Capistrano extensions to the Rake DSL. Here’s a sample Capfile, generated by cap install, when you install Capistrano: # Load DSL and Setup Up Stages require'capistrano/setup' # Includes default deployment tasks require'capistrano/deploy' # Includes tasks from other gems included in your Gemfile # # For documentation on these, see for example: # # https://github.com/capistrano/rvm # https://github.com/capistrano/rbenv # https://github.com/capistrano/chruby # https://github.com/capistrano/bundler # https://github.com/capistrano/rails # # require 'capistrano/rvm' # require 'capistrano/rbenv' # require 'capistrano/chruby' # require 'capistrano/bundler' # require 'capistrano/rails/assets' # require 'capistrano/rails/migrations' # Loads custom tasks from `lib/capistrano/tasks' if you have any defined. Dir.glob('lib/capistrano/tasks/*.rake').each { |r| import r } Just like a Rakefile, a Capfile is valid Ruby code, which you can easily extend using regular Ruby code. So, to support a mixin DSL, we simply need to extend the DSL, like this:   defmixin (path) loadFile.join('config', 'mixins', path +'.rb') end Pretty simple, right? We prefer to add this to a separate file, like lib/mixin.rb and add this to the Capfile: $:.unshiftFile.dirname(__FILE__) require 'lib/mixin' After that, calling mixin 'environments/staging' should load settings that are common for the staging environment from a file called config/mixins/environments/staging.rb in the root of the Capistrano-enabled project. This is the base to set up our deployment project that we will create in the next post. About the author Rodrigo Rosenfeld Rosas lives in Vitória-ES, Brazil, with his lovely wife and daughter. He graduated in Electrical Engineering with a Master’s degree in Robotics and Real-time Systems.For the past five years Rodrigo has focused on building and maintaining single page web applications. He is the author of some gems includingactive_record_migrations, rails-web-console, the JS specs runner oojspec, sequel-devise and the Linux X11 utility ktrayshortcut.Rodrigo was hired by e-Core (Porto Alegre - RS, Brazil) to work from home, building and maintaining software forMatterhorn Transactions Inc. with a team of great developers. Matterhorn'smain product, the Market Tracker, is used by LexisNexis clients.
Read more
  • 0
  • 0
  • 4496
Modal Close icon
Modal Close icon