Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials

7019 Articles
article-image-backups-vmware-view-infrastructure-0
Packt
17 Sep 2014
18 min read
Save for later

Backups in the VMware View Infrastructure

Packt
17 Sep 2014
18 min read
In this article, by Chuck Mills and Ryan Cartwright, authors of the book VMware Horizon 6 Desktop Virtualization Solutions, we will study about the back up options available in VMware. It also provides guidance on scheduling appropriate backups of a Horizon View environment. (For more resources related to this topic, see here.) While a single point of failure should not exist in the VMware View environment, it is still important to ensure regular backups are taken for a quick recovery when failures occur. Also, if a setting becomes corrupted or is changed, a backup could be used to restore to a previous point in time. The backup of the VMware View environment should be performed on a regular basis in line with an organization's existing backup methodology. A VMware View environment contains both files and databases. The main backup points of a VMware View environment are as follows: VMware View Connection Server—ADAM database VMware View Security Server VMware View Composer Database Remote Desktop Service host servers Remote Desktop Service host templates and virtual machines Virtual desktop templates and parent VMs Virtual desktops Linked clones (stateless) Full clones (stateful) ThinApp repository Persona Management VMware vCenter Restoring the VMware View environment Business Continuity and Disaster Recovery With a backup of all of the preceding components, the VMware View Server infrastructure can be recovered during a time of failure. To maximize the chances of success in a recovery environment, it is advised to take backups of the View ADAM database, View Composer, and vCenter database at the same time to avoid discrepancies. Backups can be scheduled and automated or can be manually executed; ideally, scheduled backups will be used to ensure that they are performed and completed regularly. Proper design dictates that there should always be two or more View Connection Servers. As all View Connection Servers in the same replica pool contain the same configuration data, it is only necessary to back up one View Connection Server. This backup is typically configured for the first View Connection Server installed in standard mode in an environment. VMware View Connection Server – ADAM Database backup View Connection Server stores the View Connection Server configuration data in the View LDAP repository. View Composer stores the configuration data for linked clone desktops in the View Composer database. When you use View Administrator to perform backups, the Connection Server backs up the View LDAP configuration data and the View Composer database. Both sets of backup files will be stored in the same location. The LDAP data is exported in LDAP data interchange format (LDIF). If you have multiple View Connection Server(s) in a replicated group, you only need to export data from one of the instances. All replicated instances contain the same configuration data. It is a not good practice to rely on replicated instances of View Connection Server as your backup mechanism. When the Connection Server synchronizes data across the instances of Connection Server, any data lost on one instance might be lost in all the members of the group. If the View Connection Server uses multiple vCenter Server instances and multiple View Composer services, then the View Connection Server will back up all the View Composer databases associated with the vCenter Server instances. View Connection Server backups are configured from the VMware View Admin console. The backups dump the configuration files and the database information to a location on the View Connection Server. Then, the data must be backed up through normal mechanisms, like a backup agent and scheduled job. The procedure for a View Connection Server backup is as follows: Schedule VMware View backup runs and exports to C:View_Backup. Use your third-party backup solution on the View Connection Server and have it back up the System State, Program Files, and C:View_Backup folders that were created in step 1. From within the View Admin console, there are three primary options that must be configured to back up the View Connection Server settings: Automatic backup frequency: This is the frequency at which backups are automatically taken. The recommendation is as follows: Recommendation (every day): As most server backups are performed daily, if the automatic View Connection Server backup is taken before the full backup of the Windows server, it will be included in the nightly backup. This is adjusted as necessary. Backup time: This displays the time based on the automatic backup frequency. (Every day produces the 12 midnight time.) Maximum number of backups: This is the maximum number of backups that can be stored on the View Connection Server; once the maximum number has been reached, backups will be rotated out based on age, with the oldest backup being replaced by the newest backup. The recommendation is as follows: Recommendation—30 days: This will ensure that approximately one month of backups are retained on the server. This is adjusted as necessary. Folder location: This is the location on the View Connection Server, where the backups will be stored. Ensure that the third-party backup solution is backing up this location. The following screenshot shows the Backup tab: Performing a manual backup of the View database Use the following steps to perform a manual backup of your View database: Log in to the View Administrator console. Expand the Catalog option under Inventory (on the left-hand side of the console). Select the first pool and right-click on it. Select Disable Provisioning, as shown in the following screenshot: Continue to disable provisioning for each of the pools. This will assure that no new information will be added to the ADAM database. After you disable provisioning for all the pools, there are two ways to perform the backup: The View Administrator console Running a command using the command prompt The View Administrator console Follow these steps to perform a backup: Log in to the View Administrator console. Expand View Configuration found under Inventory. Select Servers, which displays all the servers found in your environment. Select the Connection Servers tab. Right-click on one of the Connection Servers and choose Backup Now, as shown in the following screenshot After the backup process is complete, enable provisioning to the pools. Using the command prompt You can export the ADAM database by executing a built-in export tool in the command prompt. Perform the following steps: Connect directly to the View Connection Server with a remote desktop utility such as RDP. Open a command prompt and use the cd command to navigate to C:Program FilesVMwareVMware ViewServertoolsbin. Execute the vdmexport.exe command and use the –f option to specify a location and filename, as shown in the following screenshot (for this example, C:View_Backup is the location and vdmBackup.ldf is the filename): Once a backup has been either automatically run or manually executed, there will be two types of files saved in the backup location: LDF files: These are the LDIF exports from the VMware View Connection Server ADAM database and store the configuration settings of the VMware View environment SVI files: These are the backups of the VMware View Composer database The backup process of the View Connection Server is fairly straightforward. While the process is easy, it should not be overlooked. Security Server considerations Surprisingly, there is no option to back up the VMware View Security Server via the VMware View Admin console. For View Connection Servers, backup is configured by selecting the server, selecting Edit, and then clicking on Backup. Highlighting the View Security Server provides no such functionality. Instead, the security server should be backed up via normal third-party mechanisms. The installation directory is of primary concern, which is C:Program FilesVMwareVMware ViewServer by default. The .config file is in the …sslgatewayconf directory, and it includes the following settings: pcoipClientIPAddress: This is the public address used by the Security Server pcoipClientUDPPort: This is the port used for UDP traffic (the default is 4172) In addition, the settings file is located in this directory, which includes settings such as the following: maxConnections: This is the maximum number of concurrent connections the View Security Server can have at one time (the default is 2000) serverID: This is the hostname used by the security server In addition, custom certificates and logfiles are stored within the installation directory of the VMware View Security Server. Therefore, it is important to back up the data regularly if the logfile data is to be maintained (and is not being ingested into a larger enterprise logfile solution). The View Composer database The View Composer database used for linked clones is backed up using the following steps: Log in to the View Administrator console. Expand the Catalog option under Inventory (left-hand side of the console). Select the first pool and right-click on it. Select Disable Provisioning. Connect directly to the server where the View Composer was installed, using a remote desktop utility such as RDP. Stop the View Composer service, as shown in the following screenshot. This will prevent provisioning request that would change the composer database. After the service is stopped, use the standard practice for backed up databases in the current environment. Restart the Composer service after the backup completes. Remote Desktop Service host servers VMware View 6 uses virtual machines to deliver hosted applications and desktops. In some cases, tuning and optimization, or other customer specific configurations to the environment or applications may be built on the Remote Desktop Service (RDS) host. Use the Windows Server Backup tool or the current backup software deployed in your environment. RDS Server host templates and virtual machines The virtual machine templates and virtual machines are an important part of the Horizon View infrastructure and need protection in the event that the system needs to be recovered. Back up the RDS host templates when changes are made and the testing/validation is completed. The production RDS host machines should be backed up if they contains user data or any other elements that require protection at frequent intervals. Third-party backup solutions are used in this case. Virtual desktop templates and parent VMs Horizon View uses virtual machine templates to create the desktops in pools for full virtual machines and uses parent VMs to create the desktops in a linked clone desktop pool. These virtual machine templates and the parent VMs are another important part of the View infrastructure that needs protection. These backups are a crucial part of being able to quickly restore the desktop pools and the RDS hosts in the event of data loss. While frequent changes occur for standard virtual machines, the virtual machine templates and parent VMs only need backing up after new changes have been made to the template and parent VM images. These backups should be readily available for rapid redeployment when required. For environments that use full cloning as the provisioning technique for the vDesktops, the gold template should be backed up regularly. The gold template is the master vDesktop that all other vDesktops are cloned from. The VMware KB article, Backing up and restoring virtual machine templates using VMware APIs, covers the steps to both back up and restore a template. In short, most backup solutions will require that the gold template is converted from a template to a regular virtual machine and it can then be backed up. You can find more information at http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2009395. Backing up the parent VM can be tricky as it is a virtual machine, often with many different point-in-time snapshots. The most common technique is to collapse the virtual machine snapshot tree at a given point-in-time snapshot, and then back up or copy the newly created virtual machine to a second datastore. By storing the parent VM on a redundant storage solution, it is quite unlikely that the parent VM will be lost. What's more likely is that a point-in-time snapshot of the parent VM may be created while it's in a nonfunctional or less-than-ideal state. Virtual desktops There are three types of virtual desktops in a Horizon View environment, which are as follows: Linked clone desktops Stateful desktops Stateless desktops Linked clone desktops Virtual desktops that are created by View Composer using the linked clone technology present special challenges with backup and restoration. In many cases, a linked clone desktop will also be considered as a stateless desktop. The dynamic nature of a linked clone desktop and the underlying structure of the virtual machine itself means the linked clone desktops are not a good candidate for backup and restoration. However, the same qualities that impede the use of a standard backup solution provide an advantage for rapid reprovisioning of virtual desktops. When the underlying infrastructure for things such as the delivery of applications and user data, along with the parent VMs, are restored, then linked clone desktop pools can be recreated and made available within a short amount of time, and therefore lessening the impact of an outage or data loss. Stateful desktops In the stateful desktop pool scenario, all of the virtual desktops retain user data when the user logs back in to the virtual desktop. So, in this case, backing up the virtual machines with third-party tools like any other virtual machine in vSphere is considered the optimal method for protection and recovery. Stateless desktops With the stateless desktop architecture, the virtual desktops do not retain the desktop state when the user logs back in to the virtual desktop. The nature of the stateless desktops does not require and nor do they directly contain any data that requires a backup. All the user data in a stateless desktop is stored on a file share. The user data includes any files the user creates, changes, or copies within the virtual infrastructure, along with the user persona data. Therefore, because no user data is stored within the virtual desktop, there will be no need to back up the desktop. File shares should be included in the standard backup strategy and all user data and persona information will be included in the existing daily backups. The ThinApp repository The ThinApp repository is similar in nature to the user data on the stateless desktops in that it should reside on a redundant file share that is backed up regularly. If the ThinApp packages are configured to preserve each user's sandbox, the ThinApp repository should likely be backed up nightly. Persona Management With the View Persona Management feature, the user's remote profile is dynamically downloaded after the user logs in to a virtual desktop. The secure, centralized repository can be configured in which Horizon View will store user profiles. The standard practice is to back up network shares on which View Persona Management stores the profile repository. View Persona Management will ensure that user profiles are backed up to the remote profile share, eliminating the need for additional tools to back up user data on the desktops. Therefore, backup software to protect the user profile on the View desktop is unnecessary. VMware vCenter Most established IT departments are using backup tools from the storage or backup vendor to protect the datastores where the VM's are stored. This will make the recovery of the base vSphere environment faster and easier. The central piece of vCenter is the vCenter database. If there is a total loss of database you will lose all your configuration information of vSphere, including the configuration specific to View (for examples, users, folders, and many more). Another important item to understand is that even if you rebuild your vCenter using the same folder and resource pool names, your View environment will not reconnect and use the new vCenter. The reason is that each object in vSphere has what is called a Managed object Reference (MoRef) and they are stored in the vSphere database. View uses the MoRef information to talk to vCenter. As View and vSphere rely on each other, making a backup of your View environment without backing up your vSphere environment doesn't make sense. Restoring the VMware View environment If your environment has multiple Connection Servers, the best thing to do would be delete all the servers but one, and then use the following steps to restore the ADAM database: Connect directly to the server where the View Connection Server is located using a remote desktop utility such as RDP. Stop the View Connection service, as shown in the following screenshot: Locate the backup (or exported) ADAM database file that has the .ldf extension. The first step of the import is to decrypt the file by opening a command prompt and use the cd command to navigate to C:Program FilesVMwareVMware ViewServertoolsbin. Use the following command: vdmimport –f View_BackupvdmBackup.ldf –d >View_BackupvmdDecrypt.ldf You will be prompted to enter the password from the account you used to create the backup file. Now use the vdmimport –f [decrypted file name] command (from the preceding example, the filename will be vmdDecrypt.ldf). After the ADAM database is updated, you can restart the View Connection Server service. Replace the delete Connection Servers by running the Connection Server installation and using the Replica option. To reinstall the View Composer database, you can connect to the server where Composer is installed. Stop the View Composer service and use your standard procedure for restoring a database. After the restore, start the View Composer service. While this provides the steps to restore the main components of the Connection server, the steps to perform a complete View Connection Server restore can be found in the VMware KB article, Performing an end-to-end backup and restore for VMware View Manager, at http://kb.vmware.com/selfservice/microsites/search.do?cmd=displayKC&docType=kc&externalId=1008046. Reconciliation after recovery One of the main factors to consider when performing a restore in a Horizon View infrastructure is the possibility that the Connection Server environment could be out of sync with the current View state and a reconciliation is required. After restoring the Connection Server ADAM database, there may be missing desktops that are shown in the Connection Server Admin user interface if the following actions are executed after the backup but before a restore: The administrator deleted pools or desktops The desktop pool was recomposed, which resulted in the removal of the unassigned desktops Missing desktops or pools can be manually removed from the Connection Server Admin UI. Some of the automated desktops may become disassociated with their pools due to the creation of a pool between the time of the backup and the restore time. View administrators may be able to make them usable by cloning the linked clone desktop to a full clone desktop using vCenter Server. They would be created as an individual desktop in the Connection Server and then assign those desktops to a specific user. Business Continuity and Disaster Recovery It's important to ensure that the virtual desktops along with the application delivery infrastructure is included and prioritized as a Business Continuity and Disaster Recovery plan. Also, it's important to ensure that the recovery procedures are tested and validated on a regular cycle, as well as having the procedures and mechanisms in place that ensure critical data (images, software media, data backup, and so on) is always stored and ready in an alternate location. This will ensure an efficient and timely recovery. It would be ideal to have a disaster recovery plan and business continuity plan that recovers the essential services to an alternate "standby" data center. This will allow the data to be backed up and available offsite to the alternate facility for an additional measure of protection. The alternate data center could have "hot" standby capacity for the virtual desktops and application delivery infrastructure. This site would then address 50 percent capacity in the event of a disaster and also 50 percent additional capacity in the event of a business continuity event that prevents users from accessing the main facility. The additional capacity will also provide a rollback option if there were failed updates to the main data center. Operational procedures should ensure the desktop and server images are available to the alternate facility when changes are made to the main VMware View system. Desktop and application pools should also be updated in the alternate data center whenever maintenance procedures are executed and validated in the main data center. Summary As expected, it is important to back up the fundamental components of a VMware View solution. While a resilient design should mitigate most types of failure, there are still occasions when a backup may be needed to bring an environment back up to an operational level. This article covered the major components of View and provided some of the basic options for creating backups of those components. The Connection Server and Composer database along with vCenter were explained. There was a good overview of the options used to protect the different types of virtual desktops. The ThinApp repository and Persona Management was also explained. The article also covered the basic recovery options and where to find information on the complete View recovery procedures. Resources for Article: Further resources on this subject: Introduction to Veeam® Backup & Replication for VMware [article] Design, Install, and Configure [article] VMware vCenter Operations Manager Essentials - Introduction to vCenter Operations Manager [article]
Read more
  • 0
  • 0
  • 10866

Packt
17 Sep 2014
12 min read
Save for later

What is REST?

Packt
17 Sep 2014
12 min read
This article by Bhakti Mehta, the author of Restful Java Patterns and Best Practices, starts with the basic concepts of REST, how to design RESTful services, and best practices around designing REST resources. It also covers the architectural aspects of REST. (For more resources related to this topic, see here.) Where REST has come from The confluence of social networking, cloud computing, and era of mobile applications creates a generation of emerging technologies that allow different networked devices to communicate with each other over the Internet. In the past, there were traditional and proprietary approaches for building solutions encompassing different devices and components communicating with each other over a non-reliable network or through the Internet. Some of these approaches such as RPC, CORBA, and SOAP-based web services, which evolved as different implementations for Service Oriented Architecture (SOA) required a tighter coupling between components along with greater complexities in integration. As the technology landscape evolves, today’s applications are built on the notion of producing and consuming APIs instead of using web frameworks that invoke services and produce web pages. This requirement enforces the need for easier exchange of information between distributed services along with predictable, robust, well-defined interfaces. API based architecture enables agile development, easier adoption and prevalence, scale and integration with applications within and outside the enterprise HTTP 1.1 is defined in RFC 2616, and is ubiquitously used as the standard protocol for distributed, collaborative and hypermedia information systems. Representational State Transfer (REST) is inspired by HTTP and can be used wherever HTTP is used. The widespread adoption of REST and JSON opens up the possibilities of applications incorporating and leveraging functionality from other applications as needed. Popularity of REST is mainly because it enables building lightweight, simple, cost-effective modular interfaces, which can be consumed by a variety of clients. This article covers the following topics Introduction to REST Safety and Idempotence HTTP verbs and REST Best practices when designing RESTful services REST architectural components Introduction to REST REST is an architectural style that conforms to the Web Standards like using HTTP verbs and URIs. It is bound by the following principles. All resources are identified by the URIs. All resources can have multiple representations All resources can be accessed/modified/created/deleted by standard HTTP methods. There is no state on the server. REST is extensible due to the use of URIs for identifying resources. For example, a URI to represent a collection of book resources could look like this: http://foo.api.com/v1/library/books A URI to represent a single book identified by its ISBN could be as follows: http://foo.api.com/v1/library/books/isbn/12345678 A URI to represent a coffee order resource could be as follows: http://bar.api.com/v1/coffees/orders/1234 A user in a system can be represented like this: http://some.api.com/v1/user A URI to represent all the book orders for a user could be: http://bar.api.com/v1/user/5034/book/orders All the preceding samples show a clear readable pattern, which can be interpreted by the client. All these resources could have multiple representations. These resource examples shown here can be represented by JSON or XML and can be manipulated by HTTP methods: GET, PUT, POST, and DELETE. The following table summarizes HTTP Methods and descriptions for the actions taken on the resource with a simple example of a collection of books in a library. HTTP method Resource URI Description GET /library/books Gets a list of books GET /library/books/isbn/12345678 Gets a book identified by ISBN “12345678” POST /library/books Creates a new book order DELETE /library/books/isbn/12345678 Deletes a book identified by ISBN “12345678” PUT /library/books/isbn/12345678 Updates a specific book identified by ISBN “12345678’ PATCH /library/books/isbn/12345678 Can be used to do partial update for a book identified by ISBN “12345678” REST and statelessness REST is bound by the principle of statelessness. Each request from the client to the server must have all the details to understand the request. This helps to improve visibility, reliability and scalability for requests. Visibility is improved, as the system monitoring the requests does not have to look beyond one request to get details. Reliability is improved, as there is no check-pointing/resuming to be done in case of partial failures. Scalability is improved, as the number of requests that can be processed is increases as the server is not responsible for storing any state. Roy Fielding’s dissertation on the REST architectural style provides details on the statelessness of REST, check http://www.ics.uci.edu/~fielding/pubs/dissertation/rest_arch_style.htm With this initial introduction to basics of REST, we shall cover the different maturity levels and how REST falls in it in the following section. Richardson Maturity Model Richardson maturity model is a model, which is developed by Leonard Richardson. It talks about the basics of REST in terms of resources, verbs and hypermedia controls. The starting point for the maturity model is to use HTTP layer as the transport. Level 0 – Remote Procedure Invocation This level contains SOAP or XML-RPC sending data as POX (Plain Old XML). Only POST methods are used. This is the most primitive way of building SOA applications with a single method POST and using XML to communicate between services. Level 1 – REST resources This uses POST methods and instead of using a function and passing arguments uses the REST URIs. So it still uses only one HTTP method. It is better than Level 0 that it breaks a complex functionality into multiple resources with one method. Level 2 – more HTTP verbs This level uses other HTTP verbs like GET, HEAD, DELETE, PUT along with POST methods. Level 2 is the real use case of REST, which advocates using different verbs based on the HTTP request methods and the system can have multiple resources. Level 3 – HATEOAS Hypermedia as the Engine of Application State (HATEOAS) is the most mature level of Richardson’s model. The responses to the client requests, contains hypermedia controls, which can help the client decide what the next action they can take. Level 3 encourages easy discoverability and makes it easy for the responses to be self- explanatory. Safety and Idempotence This section discusses in details about what are safe and idempotent methods. Safe methods Safe methods are methods that do not change the state on the server. GET and HEAD are safe methods. For example GET /v1/coffees/orders/1234 is a safe method. Safe methods can be cached. PUT method is not safe as it will create or modify a resource on the server. POST method is not safe for the same reasons. DELETE method is not safe as it deletes a resource on the server. Idempotent methods An idempotent method is a method that will produce the same results irrespective of how many times it is called. For example GET method is idempotent, as multiple calls to the GET resource will always return the same response. PUT method is idempotent as calling PUT method multiple times will update the same resource and not change the outcome. POST is not idempotent and calling POST method multiple times can have different results and will result in creating new resources. DELETE is idempotent because once the resource is deleted it is gone and calling the method multiple times will not change the outcome. HTTP verbs and REST HTTP verbs inform the server what to do with the data sent as part of the URL GET GET is the simplest verb of HTTP, which enables to get access to a resource. Whenever the client clicks a URL in the browser it sends a GET request to the address specified by the URL. GET is safe and idempotent. GET requests are cached. Query parameters can be used in GET requests. For example a simple GET request is as follows: curl http://api.foo.com/v1/user/12345 POST POST is used to create a resource. POST requests are neither idempotent nor safe. Multiple invocations of the POST requests can create multiple resources. POST requests should invalidate a cache entry if exists. Query parameters with POST requests are not encouraged For example a POST request to create a user can be curl –X POST -d’{“name”:”John Doe”,“username”:”jdoe”, “phone”:”412-344-5644”} http://api.foo.com/v1/user PUT PUT is used to update a resource. PUT is idempotent but not safe. Multiple invocations of PUT requests should produce the same results by updating the resource. PUT requests should invalidate the cache entry if exists. For example a PUT request to update a user can be curl –X PUT -d’{ “phone”:”413-344-5644”} http://api.foo.com/v1/user DELETE DELETE is used to delete a resource. DELETE is idempotent but not safe. DELETE is idempotent because based on the RFC 2616 "the side effects of N > 0 requests is the same as for a single request". This means once the resource is deleted calling DELETE multiple times will get the same response. For example, a request to delete a user is as follows: curl –X DELETE http://foo.api.com/v1/user/1234 HEAD HEAD is similar like GET request. The difference is that only HTTP headers are returned and no content is returned. HEAD is idempotent and safe. For example, a request to send HEAD request with curl is as follows: curl –X HEAD http://foo.api.com/v1/user It can be useful to send a HEAD request to see if the resource has changed before trying to get a large representation using a GET request. PUT vs POST According to RFC the difference between PUT and POST is in the Request URI. The URI identified by POST defines the entity that will handle the POST request. The URI in the PUT request includes the entity in the request. So POST /v1/coffees/orders means to create a new resource and return an identifier to describe the resource In contrast PUT /v1/coffees/orders/1234 means to update a resource identified by “1234” if it does not exist else create a new order and use the URI orders/1234 to identify it. Best practices when designing resources This section highlights some of the best practices when designing RESTful resources: The API developer should use nouns to understand and navigate through resources and verbs with the HTTP method. For example the URI /user/1234/books is better than /user/1234/getBook. Use associations in the URIs to identify sub resources. For example to get the authors for book 5678 for user 1234 use the following URI /user/1234/books/5678/authors. For specific variations use query parameters. For example to get all the books with 10 reviews /user/1234/books?reviews_counts=10. Allow partial responses as part of query parameters if possible. An example of this case is to get only the name and age of a user, the client can specify, ?fields as a query parameter and specify the list of fields which should be sent by the server in the response using the URI /users/1234?fields=name,age. Have defaults for the output format for the response incase the client does not specify which format it is interested in. Most API developers choose to send json as the default response mime type. Have camelCase or use _ for attribute names. Support a standard API for count for example users/1234/books/count in case of collections so the client can get the idea of how many objects can be expected in the response. This will also help the client, with pagination queries. Support a pretty printing option users/1234?pretty_print. Also it is a good practice to not cache queries with pretty print query parameter. Avoid chattiness by being as verbose as possible in the response. This is because if the server does not provide enough details in the response the client needs to make more calls to get additional details. That is a waste of network resources as well as counts against the client’s rate limits. REST architecture components This section will cover the various components that must be considered when building RESTful APIs As seen in the preceding screenshot, REST services can be consumed from a variety of clients and applications running on different platforms and devices like mobile devices, web browsers etc. These requests are sent through a proxy server. The HTTP requests will be sent to the resources and based on the various CRUD operations the right HTTP method will be selected. On the response side there can be Pagination, to ensure the server sends a subset of results. Also the server can do Asynchronous processing thus improving responsiveness and scale. There can be links in the response, which deals with HATEOAS. Here is a summary of the various REST architectural components: HTTP requests use REST API with HTTP verbs for the uniform interface constraint Content negotiation allows selecting a representation for a response when there are multiple representations available. Logging helps provide traceability to analyze and debug issues Exception handling allows sending application specific exceptions with HTTP codes Authentication and authorization with OAuth2.0 gives access control to other applications, to take actions without the user having to send their credentials Validation provides support to send back detailed messages with error codes to the client as well as validations for the inputs received in the request. Rate limiting ensures the server is not burdened with too many requests from single client Caching helps to improve application responsiveness. Asynchronous processing enables the server to asynchronously send back the responses to the client. Micro services which comprises breaking up a monolithic service into fine grained services HATEOAS to improve usability, understandability and navigability by returning a list of links in the response Pagination to allow clients to specify items in a dataset that they are interested in. The REST Architectural components in the image can be chained one after the other as shown priorly. For example, there can be a filter chain, consisting of filters related with Authentication, Rate limiting, Caching, and Logging. This will take care of authenticating the user, checking if the requests from the client are within rate limits, then a caching filter which can check if the request can be served from the cache respectively. This can be followed by a logging filter, which can log the details of the request. For more details, check RESTful Patterns and best practices.
Read more
  • 0
  • 0
  • 2615

article-image-standard-functionality
Packt
16 Sep 2014
16 min read
Save for later

Standard Functionality

Packt
16 Sep 2014
16 min read
In this article by Mark Brummel, author of Microsoft Dynamics NAV 2013 Application Design, we will learn how to search in the standard functionality and reuse parts in our own software. For this part, we will look at resources in Microsoft Dynamics NAV. Resources are similar to using as products as items but far less complex making it easier to look and learn. (For more resources related to this topic, see here.) Squash court master data Our company has 12 courts that we want to register in Microsoft Dynamics NAV. This master data is comparable to resources so we'll go ahead and copy this functionality. Resources are not attached to the contact table like the vendor/squash player tables. We need the number series again so we'll add a new number series to our Squash Setup table. The Squash Court table should look like this after creation: Chapter objects The Object Designer window shows the Page tab, as shown in the following screenshot: After the import process is completed make sure that your current database is the default database for the role tailored client and run page 123456701, Squash Setup. From this page select the action Initialize Squash Application. This will execute the C/AL code in the InitSquashApp function of this page, which will prepare the demo data for us to play with. The objects are prepared and tested in a Microsoft Dynamics NAV 2013 R2 W1 database. Reservations When running a squash court, we want to be able to keep track of reservations. Looking at standard Dynamics NAV functionality, it might be a good idea to create a squash player journal. The journal can create entries for reservations that can be invoiced. A journal needs the object structure. Creating a new journal from scratch is a lot of work and can easily lead to making mistakes. It is easier and safer to copy an existing journal structure from the standard application that is similar to the journal we need for our design. In our example, we have copied the Resource Journals: You can export these objects in text format and then rename and renumber the objects to be reused easily. The Squash Journal objects are renumbered and renamed from the Resource Journal. All journals have the same structure. The template, batch, and register tables are almost always the same whereas the journal line and ledger entry table contain function-specific fields. Let's have a look at the first. The Journal Template has several fields, as shown in the following screenshot: Let's discuss these fields in more detail: Name: This is the unique name. It is possible to define as many templates as required but usually one template per form ID and one for recurring will do. If you want journals with different source codes, you need to have more templates. Description: A readable and understandable description for its purpose. Test Report ID: All templates have a test report that allows the user to check for posting errors. Form ID: For some journals, more UI objects are required. For example, the General Journals have a special form for bank and cash. Posting Report ID: This report is printed when a user selects Post and Print. Force Posting Report: Use this option when a posting report is mandatory. Source Code: Here you can enter a trail code for all the postings done via this journal. Reason Code: This functionality is similar to Source Code. Recurring: Whenever you post lines from a recurring journal, new lines are automatically created with a posting date defined in the recurring date formula. No. Series: When you use this feature the Document No. in the journal line is automatically populated with a new number from this Number Series. Posting No. Series: Use this feature for recurring journals. The Journal Batch has various fields, as shown in the following screenshot: Let's discuss these fields in more detail: Journal Template Name: The name of the journal template this batch refers to Name: Each batch should have a unique code Description: A readable and explaining description for this batch Reason Code: When populated this Reason Code will overrule the Reason Code from the Journal Template No. Series: When populated this No. Series will overrule the No. Series from the Journal Template Posting No. Series: When populated this Posting No. Series will overrule the Posting No. Series from the Journal Template The Register table has various fields, as shown in the following screenshot: Terms from the Journal Register tab that you need to know would be: No.: This field is automatically and incrementally populated for each transaction with this journal and there are no gaps between the numbers From Entry No.: A reference to the first ledger entry created is with this transaction To Entry No.: A reference to the last ledger entry is created with this transaction Creation Date: Always populated with the real date when the transaction was posted User ID: The ID of the end user who has posted the transaction The journal The journal line has a number of mandatory fields that are required for all journals and some fields that are required for its designed functionality. In our case, the journal should create a reservation which then can be invoiced. This requires some information to be populated in the lines. Reservation The reservation process is a logistical process that requires us to know the number of the squash court, the date, and the time of the reservation. We also need to know how long the players want to play. To check the reservation, it might also be useful to store the number of the squash player. Invoicing For the invoicing part, we need to know the price we need to invoice. It might also be useful to store the cost to see our profit. For the system to figure out the proper G/L Account for the turnover, we also need to define a General Product Posting Group. Let's discuss these fields in more detail: Journal Template Name: This is a reference to the current Journal Template. Line No.: Each journal has a virtually unlimited number of lines; this number is automatically incremented by 10000 allowing lines to be created in between. Entry Type: This is the reservation or invoice. Document No.: This number can be used to give to the squash player as a reservation number. When the Entry Type is Invoice, it is the invoice number. Posting Date: This is usually the reservation date but when the Entry Type is Invoice, it might be the date of the invoice, which might differ from the posting date in the general ledger. Squash Player No.: This is a reference to the squash player who has made the reservation. Squash Court No.: This is a reference to the squash court. Description: This is automatically updated with the number of the squash court, reservation date and times, but can be changed by the user. Reservation Date: This is the actual date of the reservation. From Time: This is the starting time of the reservation. We only allow whole and half hours. To Time: This is the ending time of the reservation. We only allow whole and half hours. This is automatically populated when people enter a quantity. Quantity: This is the number of hours playing time. We only allow units of 0,5 to be entered here. This is automatically calculated when the times are populated. Unit Cost: This is the cost to run a squash court for one hour. Total Cost: This is the cost for this reservation. Unit Price: This is the invoice price for this reservation per hour. This depends on whether or not the squash player is a member or not. Total Price: This is the total invoice price for this reservation. Shortcut Dimension Code 1 & 2: This is a reference to the dimensions used for this transaction. Applies-to Entry No.: When a reservation is invoiced, this is the reference to the Squash Entry No. of the reservation. Source Code: This is inherited from the journal batch or template and used when posting the transaction. Chargeable: When this option is used, there will not be an invoice for the reservation. Journal Batch Name: This is a reference to the journal batch that is used for this transaction. Reason Code: This is inherited from the journal batch or template and used when posting the transaction. Recurring Method: When the journal is a recurring journal, you can use this field to determine if the Amount field is blanked after posting the lines. Recurring Frequency: This field determines the new posting date after the recurring lines are posted. Gen. Bus. Posting Group: The combination of general business and product posting group determines the G/L Account for turnover when we invoice the reservation. The Gen. Bus. Posting Group is inherited from the bill-to customer. Gen. Prod. Posting Group: This will be inherited from the squash player. External Document No.: When a squash player wants us to note a reference number, we can store it here. Posting No. Series: When the Journal Template has a Posting No. Series, it is populated here to be used when posting. Bill-to Customer No.: This determines who is paying for the reservation. We will inherit this from the squash player. So now we have a place to enter reservations but we have something to do before we can start doing this. Some fields were determined to be inherited and calculated. The time field needs calculation to avoid people entering wrong values The Unit Price should be calculated The Unit Cost, Posting groups, and Bill-to Customer No. need to be inherited As final cherry on top, we will look at implementing dimensions Time calculation As it comes to the time, we want only to allow specific start and end time. Our squash court can be used in blocks of half an hour. The Quantity field should be calculated based on the entered times and vice versa. To have the most flexible solution possible, we will create a new table with allowed starting and ending times. This table will have two fields: Reservation Time and Duration. The Duration field will be a decimal field that we will promote to a SumIndexField. This will enable us to use SIFT to calculate the quantity. When populated the table will look like this: The time fields in the squash journal table will now get a table relation with this table. This prevents a user to enter values that are not in the table thus, only valid starting and ending times. This is all done without any C/AL code and flexible when times change later. Now, we need some code that calculates the quantity based on the input: From Time - OnValidate() CalcQty;   To Time - OnValidate() CalcQty;   CalcQty() IF ("From Time" <> 0T) AND ("To Time" <> 0T) THEN BEGIN IF "To Time" <= "From Time" THEN    FIELDERROR("To Time"); ResTime.SETRANGE("Reservation Time", "From Time",    "To Time"); ResTime.FIND('+'); ResTime.NEXT(-1); ResTime.SETRANGE("Reservation Time", "From Time",    ResTime."Reservation Time"); ResTime.CALCSUMS(Duration); VALIDATE(Quantity, ResTime.Duration); END; When a user enters a value in the From Time or To Time fields, the CalcQty function is executed. This checks if both fields have a value and then checks whether To Time is larger than From Time. Then we place a filter on the Reservation Time table. Now, when a user makes a reservation from 8:00 to 9:00, there are three records in the filter making the result of the Calcsums (total of all records) of duration 1,5. Therefore, we find the previous reservation time and use that. This example shows how easy it is to use the built-in Microsoft Dynamics NAV functionality such as table relations and Calcsums instead of complex time calculations, which we could have also used. Price calculation There is a special technique to determine prices. Prices are stored in a table with all possible parameters as fields and by filtering down on these fields, the best price is determined. If required, with extra logic to find the lowest (or highest) price, if more prices are found. To look, learn, and love this part of the standard application, we have used table Sales Price (7002) and codeunit Sales Price Calc. Mgt. (7000), even though we only need a small part of this functionality. This mechanism of price calculation is used throughout the application and offers a normalized way of calculating sales prices. A similar construction is used for purchase prices with table Purchase Price (7012) and codeunit Purch. Price Calc. Mgt. (7010). Squash prices In our case, we have already determined that we have a special rate for members, but let's say we have also a special rate for daytime and evening in winter and summer. This could make our table look like: We can make special prices for members on dates for winter and summer and make a price only valid until a certain time. We can also make a special price for a court. This table could be creatively expanded with all kinds of codes until we end up with table Sales Price (7002) in the standard product which was the template for our example. Price Calc Mgt. codeunit To calculate the price, we need a codeunit similar to the standard product. This codeunit is called with a squash journal line record and stores all valid prices in a buffer table and then finds the lowest price if there is overlap. FindSquashPrice() WITH FromSquashPrice DO BEGIN SETFILTER("Ending Date",'%1|>=%2',0D,StartingDate); SETRANGE("Starting Date",0D,StartingDate);   ToSquashPrice.RESET; ToSquashPrice.DELETEALL;   SETRANGE(Member, IsMember);   SETRANGE("Ending Time", 0T); SETRANGE("Squash Court No.", ''); CopySquashPriceToSquashPrice(FromSquashPrice,ToSquashPrice);   SETRANGE("Ending Time", 0T); SETRANGE("Squash Court No.", CourtNo); CopySquashPriceToSquashPrice(FromSquashPrice,ToSquashPrice);   SETRANGE("Squash Court No.", ''); IF StartingTime <> 0T THEN BEGIN    SETFILTER("Ending Time",'%1|>=%2',000001T,StartingTime);    CopySquashPriceToSquashPrice(FromSquashPrice,      ToSquashPrice); END;   SETRANGE("Squash Court No.", CourtNo); IF StartingTime <> 0T THEN BEGIN    SETFILTER("Ending Time",'%1|>=%2',000001T,StartingTime);    CopySquashPriceToSquashPrice(FromSquashPrice,      ToSquashPrice); END; END; If there is no price in the filter, it uses the unit price from the squash court, as shown: CalcBestUnitPrice() WITH SquashPrice DO BEGIN FoundSquashPrice := FINDSET; IF FoundSquashPrice THEN BEGIN    BestSquashPrice := SquashPrice;    REPEAT      IF SquashPrice."Unit Price" <        BestSquashPrice."Unit Price"      THEN        BestSquashPrice := SquashPrice;    UNTIL NEXT = 0; END; END;   // No price found in agreement IF BestSquashPrice."Unit Price" = 0 THEN BestSquashPrice."Unit Price" := SquashCourt."Unit Price";   SquashPrice := BestSquashPrice; Inherited data To use the journal for the product part of the application, we want to inherit some of the fields from the master data tables. In order to make that possible, we need to copy and paste these fields from other tables to our master data table and populate it. In our example, we can copy and paste the fields from the Resource table (156). We also need to add code to the OnValidate triggers in the journal line table. The squash court table, for example, is expanded with the fields Unit Code, Unit Price, Gen. Prod. Posting Group, and VAT Prod. Posting Group, as shown in the preceding screenshot. We can now add code to the OnValidate of the Squash Court No. field in the Journal Line table. Squash Court No. - OnValidate() IF SquashCourt.GET("Squash Court No.") THEN BEGIN Description := SquashCourt.Description; "Unit Cost" := SquashCourt."Unit Cost"; "Gen. Prod. Posting Group" := SquashCourt."Gen. Prod. Posting Group"; FindSquashPlayerPrice; END; Please note that unit price is used in the Squash Price Calc. Mgt. codeunit that is executed from the FindSquashPlayerPrice function. Dimensions In Microsoft Dynamics NAV, dimensions are defined in master data and posted to the ledger entries to be used in analysis view entries. We will now discuss how to analyze the data generated by dimensions. In between that journey they move around a lot in different tables as follows: Table 348 | Dimension: This is where the main dimension codes are defined. Table 349 | Dimension Value: This is where each dimension can have an unlimited number of values. Table 350 | Dimension Combination: In this table, we can block certain combinations of dimension codes. Table 351 | Dimension Value Combination: In this table, we can block certain combinations of dimension values. If this table is populated, the value Limited is populated in the dimension combination table for these dimensions. Table 352 | Default Dimension: This table is populated for all master data that has dimensions defined. Table 354 | Default Dimension Priority: When more than one master data record in one transaction have the same dimensions, it is possible here to set priorities. Table 480 | Dimension Set Entry: This table contains a matrix of all used dimension combinations. Codeunit 408 | Dimension Management: This codeunit is the single point in the application where all dimension movement is done. In our application, dimensions are moved from the squash player, squash court, and customer table via the squash journal line to the squash ledger entries. When we create an invoice, we move the dimensions from the ledger entries to the sales line table. Master data To connect dimensions to master data, we first need to allow this changing codeunit 408 dimension management. SetupObjectNoList() TableIDArray[1] := DATABASE::"Salesperson/Purchaser"; TableIDArray[2] := DATABASE::"G/L Account"; TableIDArray[3] := DATABASE::Customer; ... TableIDArray[22] := DATABASE::"Service Item Group"; TableIDArray[23] := DATABASE::"Service Item";   //* Squash Application TableIDArray[49] := DATABASE::"Squash Player"; TableIDArray[50] := DATABASE::"Squash Court"; //* Squash Application   Object.SETRANGE(Type,Object.Type::Table);   FOR Index := 1 TO ARRAYLEN(TableIDArray) DO BEGIN ... The TableIDArray variable has a default number of 23 dimensions. This we have changed to 50. By leaving gaps we allow Microsoft to add master data tables in future without us having to change our code. Without this change, the system would return the following error message when we try to use dimensions: Next change is to add the Global Dimension fields to the master data tables. They can be copied and pasted from other master data tables. When these fields are validated, the ValidateShortcutDimCode function is executed as follows: ValidateShortcutDimCode() DimMgt.ValidateDimValueCode(FieldNumber,ShortcutDimCode); DimMgt.SaveDefaultDim(DATABASE::"Squash Player","No.", FieldNumber,ShortcutDimCode); MODIFY; Summary In this article, we learned to better understand how Journals and Ledger entries work throughout the system, and how to create your own Journal application. You also learned how to reverse engineer the standard application to learn from it and apply this to your own customizations. Resources for Article: Further resources on this subject: Achieving site resilience for the Mailbox server [Article] Setting Up and Managing E-mails and Batch Processing [Article] Where Is My Data and How Do I Get to It? [Article]
Read more
  • 0
  • 0
  • 2125

article-image-using-r-statistics-research-and-graphics
Packt
16 Sep 2014
12 min read
Save for later

Using R for Statistics, Research, and Graphics

Packt
16 Sep 2014
12 min read
In this article by David Alexander Lillis, author of the R Graph Essentials, we will talk about R. Developed by Professor Ross Ihaka and Dr. Robert Gentleman at Auckland University (New Zealand) during the early 1990s, the R statistics environment is a real success story. R is open source software, which you can download in a couple of minutes from the Comprehensive R Network (CRAN) website (http://cran.r-project.org/), and combines a powerful programming language, outstanding graphics, and a comprehensive range of useful statistical functions. If you need a statistics environment that includes a programming language, R is ideal. It's true that the learning curve is longer than for spreadsheet-based packages, but once you master the R programming syntax, you can develop your own very powerful analytic tools. Many contributed packages are available on the web for use with R, and very often the analytic tools you need can be downloaded at no cost. (For more resources related to this topic, see here.) The main problem for those new to R is the time required to master the programming language, but several nice graphical user interfaces, such as John Fox's R Commander package, are available, which make it much easier for the newcomer to develop proficiency in R than it used to be. For many statisticians and researchers, R is the package of choice because of its powerful programming language, the easy availability of code, and because it can import Excel spreadsheets, comma separated variable (.csv) spreadsheets, and text files, as well as SPSS files, STATA files, and files produced within other statistical packages. You may be looking for a tool for your own data analysis. If so, let's take a brief look at what R can do for you. Some basic R syntax Data can be created in R or else read in from .csv or other files as objects. For example, you can read in the data contained within a .csv file called mydata.csv as follows: A <- read.csv(mydata.csv, h=T) A The object A now contains all the data of the original file. The syntax A[3,7] picks out the element in row 3 and column 7. The syntax A[14, ] selects the fourteenth row and A[,6] selects the sixth column. The functions mean(A) and sd(A) find the mean and standard deviation of each column. The syntax 3*A + 7 would triple each element of A and add 7 to each element and store the new array as the object B Now you could save this array as a .csv file called Outputfile.csv as follows: write.csv(B, file="Outputfile.csv") Statistical modeling R provides a comprehensive range of basic statistical functions relating to the commonly-used distributions (normal distribution, t-distribution, Poisson, gamma, and so on), and many less-well known distributions. It also provides a range of non-parametric tests that are appropriate when your data are not distributed normally. Linear and non-linear regressions are easy to perform, and finding the optimum model (that is, by eliminating non-significant independent variables and non-significant factor interactions) is particularly easy. Implementing Generalized Linear Models and other commonly-used models such as Analysis of Variance, Multivariate Analysis of Variance, and Analysis of Covariance is also straightforward and, once you know the syntax, you may find that such tasks can be done more quickly in R than in other packages. The usual post-hoc tests for identifying factor levels that are significantly different from the other levels (for example, Tukey and Sheffe tests) are available, and testing for interactions between factors is easy. Factor Analysis, and the related Principal Components Analysis, are well known data reduction techniques that enable you to explain your data in terms of smaller sets of independent variables (or factors). Both methods are available in R, and code for complex designs, including One and Two Way Repeated Measures, and Four Way ANOVA (for example, two repeated measures and two between-subjects), can be written relatively easily or downloaded from various websites (for example, http://www.personality-project.org/r/). Other analytic tools include Cluster Analysis, Discriminant Analysis, Multidimensional Scaling, and Correspondence Analysis. R also provides various methods for fitting analytic models to data and smoothing (for example, lowess and spline-based methods). Miscellaneous packages for specialist methods You can find some very useful packages of R code for fields as diverse as biometry, epidemiology, astrophysics, econometrics, financial and actuarial modeling, the social sciences, and psychology. For example, if you are interested in Astrophysics, Penn State Astrophysics School offers a nice website that includes both tutorials and code (http://www.iiap.res.in/astrostat/RTutorials.html). Here I'll mention just a few of the popular techniques: Monte Carlo methods A number of sources give excellent accounts of how to perform Monte Carlo simulations in R (that is, drawing samples from multidimensional distributions and estimating expected values). A valuable text is Christian Robert's book Introducing Monte Carlo Methods with R. Murali Haran gives another interesting Astrophysical example in the CAStR website (http://www.stat.psu.edu/~mharan/MCMCtut/MCMC.html). Structural Equation Modeling Structural Equation Modelling (SEM) is becoming increasingly popular in the social sciences and economics as an alternative to other modeling techniques such as multiple regression, factor analysis and analysis of covariance. Essentially, SEM is a kind of multiple regression that takes account of factor interactions, nonlinearities, measurement error, multiple latent independent variables, and latent dependent variables. Useful references for conducting SEM in R include those of Revelle, Farnsworth (2008), and Fox (2002 and 2006). Data mining A number of very useful resources are available for anyone contemplating data mining using R. For example, Luis Torgo has just published a book on data mining using R, and presents case studies, along with the datasets and code, which the interested student can work through. Torgo's book provides the usual analytic and graphical techniques used every day by data miners, including visualization techniques, dealing with missing values, developing prediction models, and methods for evaluating the performance of your models. Also of interest to the data miner is the Rattle GUI (R Analytical Tool to Learn Easily). Rattle is a data mining facility for analyzing very large data sets. It provides many useful statistical and graphical data summaries, presents mechanisms for developing a variety of models, and summarizes the performance of your models. Graphics in R Quite simply, the quality and range of graphics available through R is superb and, in my view, vastly superior to those of any other package I have encountered. Of course, you have to write the necessary code, but once you have mastered this skill, you have access to wonderful graphics. You can write your own code from scratch, but many websites provide helpful examples, complete with code, which you can download and modify to suit your own needs. R's base graphics (graphics created without the use of any additional contributed packages) are superb, but various graphics packages such as ggplot2 (and the associated qplot function) help you to create wonderful graphs. R's graphics capabilities include, but are not limited to, the following: Base graphics in R Basic graphics techniques and syntax Creating scatterplots and line plots Customizing axes, colors, and symbols Adding text – legends, titles, and axis labels Adding lines – interpolation lines, regression lines, and curves Increasing complexity – graphing three variables, multiple plots, or multiple axes Saving your plots to multiple formats – PDF, postscript, and JPG Including mathematical expressions on your plots Making graphs clear and pretty – including a grid, point labels, and shading Shading and coloring your plot Creating bar charts, histograms, boxplots, pie charts, and dotcharts Adding loess smoothers Scatterplot matrices R's color palettes Adding error bars Creating graphs using qplot Using basic qplot graphics techniques and syntax to customize in easy steps Creating scatterplots and line plots in qplot Mapping symbol size, symbol type and symbol color to categorical data Including regressions and confidence intervals on your graphs Shading and coloring your graph Creating bar charts, histograms, boxplots, pie charts, and dotcharts Labelling points on your graph Creating graphs using ggplot Ploting options – backgrounds, sizes, transparencies, and colors Superimposing points Controlling symbol shapes and using pretty color schemes Stacked, clustered, and paneled bar charts Methods for detailed customization of lines, point labels, smoothers, confidence bands, and error bars The following graph records information on the heights in centimeters and weights in kilograms of patients in a medical study. The curve in red gives a smoothed version of the data, created using locally weighted scatterplot smoothing. Both the graph and the modelling required to produce the smoothed curve, were performed in R. Here is another graph. It gives the heights and body masses of female patients receiving treatment in a hospital. Each patient is identified by name. This graph was created very easily using ggplot, and shows the default background produced by ggplot (a grey plotting background and white grid lines). Next, we see a histogram of patients' heights and body masses, partitioned by gender. The bars are given in an orange and an ivory color. The ggplot package provides a wide range of colors and hues, as well as a wide range of color palettes. Finally, we see a line graph of height against age for a group of four children. The graph includes both points and lines and we have a unique color for each child. The ggplot package makes it possible to create attractive and effective graphs for research and data analysis. Summary For many scientists and data analysts, mastery of R could be an investment for the future, particularly for those who are beginning their careers. The technology for handling scientific computation is advancing very quickly, and is a major impetus for scientific advance. Some level of mastery of R has become, for many applications, essential for taking advantage of these developments. Spatial analysis, where R provides an integrated framework access to abilities that are spread across many different computer programs, is a good example. A few years ago, I would not have recommended R as a statistics environment for generalist data analysts or postgraduate students, except those working directly in areas involving statistical modeling. However, many tutorials are downloadable from the Internet and a number of organizations provide online tutorials and/or face-to-face workshops (for example, The Analysis Factor http://www.theanalysisfactor.com/). In addition, the appearance of GUIs, such as R Commander and the new iNZight GUI33 (designed for use in schools), makes it easier for non-specialists to learn and use R effectively. I am most happy to provide advice to anyone contemplating learning to use this outstanding statistical and research tool. References Some useful material on R are as follows: L'analyse des donn´ees. Tome 1: La taxinomie, Tome 2: L'analyse des correspondances, Dunod, Paris, Benz´ecri, J. P (1973). Computation of Correspondence Analysis, Blasius J, Greenacre, M. J (1994). In M J Greenacre, J Blasius (eds.), Correspondence Analysis in the Social Sciences, pp. 53–75, Academic Press, London. Statistics: An Introduction using R, Crawley, M. J. (m.crawley@imperial.ac.uk), Imperial College, Silwood Park, Ascot, Berks, Published in 2005 by John Wiley & Sons, Ltd. http://eu.wiley.com/WileyCDA/WileyTitle/productCd-0470022973,subjectCd-ST05.html (ISBN 0-470-02297-3). http://www3.imperial.ac.uk/naturalsciences/research/statisticsusingr. Structural Equation Models Appendix to An R and S-PLUS Companion to Applied Regression, Fox, John, http://cran.r-project.org/doc/contrib/Fox-Companion/appendix-sems.pdf. Getting Started with the R Commander, Fox, John, 26 August 2006. The R Commander: A Basic-Statistics Graphical User Interface to R, Fox, John, Journal of Statistical Software, September 2005, Volume 14, Issue 9. http://www.jstatsoft.org/. Structural Equation Modeling With the sem Package in R, Fox, John, Structural Equation Modeling, 13(3), 465–486. Lawrence Erlbaum Associates, Inc. 2006. Biplots in Biomedical Research, Gabriel, K, R and Odoroff, C, 9, 469–485, Statistics in Medicine, 1990. Theory and Applications of Correspondence Analysis, Greenacre M. J., Academic Press, London, 1984. Using R for Data Analysis and Graphics Introduction, Code and Commentary, Maindonald, J. H, Centre for Mathematics and its Applications, Australian National University. Introducing Monte Carlo Methods with R, Series Use R, Robert, Christian P., Casella, George, 2010, XX, 284 p., Softcover, ISBN 978-1-4419-1575-7. <p>Useful tutorials available on the web are as follows:</p> An Introduction to R: examples for Actuaries, De Silva, N, 2006, http://toolkit.pbworks.com/f/R%20Examples%20for%20Actuaries%20v0.1-1.pdf. Econometrics in R, Farnsworth, Grant, V, October 26, 2008, http://cran.r-project.org/doc/contrib/Farnsworth-EconometricsInR.pdf. An Introduction to the R Language, Harte, David, Statistics Research Associates Limited, www.statsresearch.co.nz. Quick R, Kabakoff, Rob, http://www.statmethods.net/index.html. R for SAS and SPSS Users, Muenchen, Bob, http://RforSASandSPSSusers.com. Statistical Analysis with R - a quick start, Nenadi´,C and Zucchini, Walter. R for Beginners, Paradis, Emannuel (paradis@isem.univ-montp2.fr), Institut des Sciences de l' Evolution, Universite Montpellier II, F-34095 Montpellier c_edex 05, France. Data Mining with R learning by case studies, Torgo, Luis, http://www.liaad.up.pt/~ltorgo/DataMiningWithR/. SimpleR - Using R for Introductory Statistics, Verzani, John, http://cran.r-project.org/doc/contrib/Verzani-SimpleR.pdf. Time Series Analysis and Its Applications: With R Examples, http://www.stat.pitt.edu/stoffer/tsa2/textRcode.htm#ch2. The irises of the Gaspé peninsula, E. Anderson, Bulletin of the American Iris Society, 59, 2-5. 1935. Introducing Monte Carlo Methods with R, Series Use R, Robert, Christian P., Casella, George. 2010, XX, 284 p., Softcover, ISBN: 978-1-4419-1575-7. Resources for Article: Further resources on this subject: Aspects of Data Manipulation in R [Article] Learning Data Analytics with R and Hadoop [Article] First steps with R [Article]
Read more
  • 0
  • 0
  • 4073

article-image-security-settings-salesforce
Packt
11 Sep 2014
10 min read
Save for later

Security Settings in Salesforce

Packt
11 Sep 2014
10 min read
In the article by Rakesh Gupta and Sagar Pareek, authors of Salesforce.com Customization Handbook, we will discuss Organization-Wide Default (OWD) and various ways to share records. We will also discuss the various security settings in Salesforce. The following topics will be covered in this article: (For more resources related to this topic, see here.) Concepts of OWD The sharing rule Field-Level Security and its effect on data visibility Setting up password polices Concepts of OWD Organization-Wide Default is also known as OWD. This is the base-level sharing and setting of objects in your organization. By using this, you can secure your data so that other users can't access data that they don't have access to. The following diagram shows the basic database security in Salesforce. In this, OWD plays a key role. It's a base-level object setting in the organization, and you can't go below this. So here, we will discuss OWD in Salesforce. Let's start with an example. Sagar Pareek is the system administrator in Appiuss. His manager Sara Barellies told him that the user who has created or owns the account records as well as the users that are higher in the role hierarchy can access the records. Here, you have to think first about OWD because it is the basic thing to restrict object-level access in Salesforce. To achieve this, Sagar Pareek has to set Organization-Wide Default for the account object to private. Setting up OWD To change or update OWD for your organization, follow these steps: Navigate to Setup | Administer | Security Controls | Sharing Settings. From the Manage sharing settings for drop-down menu, select the object for which you want to change OWD. Click on Edit. From the Default Access drop-down menu, select an access as per your business needs. For the preceding scenario, select Private to grant access to users who are at a high position in the role hierarchy, by selecting Grant access using hierarchy. For standard objects, it is automatically selected, and for custom objects, you have the option to select it. Click on Save. The following table describes the various types of OWD access and their respective description: OWD access Description Private Only the owner of the records and the higher users in the role hierarchy are able to access and report on the records. Public read only All users can view the records, but only the owners and the users higher in the role hierarchy can edit them. Public read/write All users can view, edit, and report on all records. Public read/write/ transfer All users can view, edit, transfer, and report on all records. This is only available for case and lead objects. Controlled by parent This says that access on the child object's records is controlled by the parent. Public full access This is available for campaigns. In this, all users can view, edit, transfer, and report on all records.   You can assign this access to campaigns, accounts, cases, contacts, contracts, leads, opportunities, users, and custom objects. This feature is only available for Professional, Enterprise, Unlimited, Performance, Developer, and Database Editions. Basic OWD settings for objects Whenever you buy your Salesforce Instance, it comes with the predefined OWD settings for standard objects. You can change them anytime by following the path Setup | Administer | Security Controls | Sharing Settings. The following table describes the default access to objects: Object Default access Account Public read/write Activity Private Asset Public read/write Campaign Public full access Case Public read/write transfer Contact Controlled by parent (that is, account) Contract Public read/write Custom Object Public read/write Lead Public read/write transfer Opportunity Public read only Users Public read only and private for external users Let's continue with another example. Sagar Pareek is the system administrator in Appiuss. His manager Sara Barellies told him that only the users who created the record for the demo object can access the records, and no one else can have the power to view/edit/delete it. To do this, you have to change OWD for a demo object to private, and don't select Grant Access Using Hierarchy. When you select the Grant Access Using Hierarchy field, it provides access to people who are above in the role hierarchy. Sharing Rule To open the record-level access for a group of users, roles, or roles and subordinates beyond OWD, you can use Sharing Rule. Sharing Rule is used for open access; you can't use Sharing Rule to restrict access. Let's start with an example where Sagar Pareek is the system administrator in Appiuss. His manager Sara Barellies wants every user in the organization to be able to view the account records but only a group of users (all the users do not belong to the same role or have the same profile) can edit it. To solve the preceding business requirement, you have to follow these steps: First, change the OWD account to Public Read Only by following the path Setup | Administer | Security Controls | Sharing Settings, so all users from the organization can view the account records. Now, create a public group Account access and add users as per the business requirement. To create a public group, follow the path Name | Setup | Administration Setup | Manage Users | Public Groups. Finally, you have to create a sharing rule. To create sharing rules, follow the path Setup | Administer | Security Controls | Sharing Settings, and navigate to the list related to Account Sharing Rules: Click on New, and it will redirect you to a new window where you have to enter Label, Rule Name, and Description (always write a description so that other administrators or developers get to know why this rule was created). Then, for Rule Type, select Based on criteria. Select the criteria by which records are to be shared and create a criterion so that all records fall under it (such as Account Name not equal to null). Select Public Groups in the Share with option and your group name. Select the level of access for the users. Here, select Read/Write from the drop-down menu of Default Account, Contract and Asset Access. Finally, it will look like the following screenshot: Types of Sharing Rules What we did to solve the preceding business requirement is called Sharing Rule. There is a limitation on Sharing Rules; you can write only 50 Sharing Rules (criteria-based) and 300 Sharing Rules (both owner- and criteria-based) per object. The following are the types of Sharing Rules in Salesforce: Manual Sharing: Only when OWD is set to Private or Public Read for any object will a sharing button be enabled in the record detail page. Record owners or users, who are at a higher position in role and hierarchy, can share records with other users. For the last business use case, we changed the account OWD to Public Read Only. If you navigate to the Account records detail page, you can see the Sharing button: Click on the Sharing button and it will redirect you to a new window. Now, click on Add and you are ready to share records with the following: Public groups Users Roles Roles and subordinates Select the access type for each object and click on Save. It will look like what is shown in the following screenshot: The Lead and Case Sharing buttons will be enabled when OWD is Private, Public Read Only, and Public Read/Write. Apex Sharing: When all other Sharing Rules can't fulfill your requirements, then you can use the Apex Sharing method to share records. It gives you the flexibility to handle complex sharing. Apex-managed sharing is a type of programmatic sharing that allows you to define a custom sharing reason to associate with your programmatic share. Standard Salesforce objects support programmatic sharing while custom objects support Apex-managed sharing. Field-Level Security and its effect on data visibility Data on fields is very important for any organization. They want to show some data to the field-specific users. In Salesforce, you can use Field-Level Security to make fields hidden or read-only for a specific profile. There are three ways in Salesforce to set Field-Level Security: From an object-field From a profile Field accessibility From an object-field Let's start with an example where Sagar Pareek is the system administrator in Appiuss. His manager Sara Barellies wants to create a field (phone) on an account object and make this field read-only for all users and also allowing system administrators to edit the field. To solve this business requirement, follow these steps: Navigate to Setup | Customize | Account | Fields and then click on the Phone (it's a hyperlink) field. It will redirect you to the detail page of the Phone field; you will see a page like the following screenshot: Click on the Set Field-Level Security button, and it will redirect you to a new page where you can set the Field-Level Security. Select Visible and Read-Only for all the profiles other than that of the system administrator. For the system administrator, select only Visible. Click on Save. If you select Read-Only, the visible checkbox will automatically get selected. From a profile Similarly, in Field-Level settings, you can also achieve the same results from a profile. Let's follow the preceding business use case to be achieved through the profile. To do this, follow these steps: Navigate to Setup | Administer | Manage Users | Profile, go to the System Administrator profile, and click on it. Now, you are on the profile detail page. Navigate to the Field-Level Security section. It will look like the following screenshot: Click on the View link beside the Account option. It will open Account Field-Level Security for the profile page. Click on the Edit button and edit Field-Level Security as we did in the previous section. Field accessibility We can achieve the same outcome by using field accessibility. To do this, follow these steps: Navigate to Setup | Administer | Security Controls | Field Accessibility. Click on the object name; in our case, it's Account. It will redirect you to a new page where you can select View by Fields or View by Profiles: In our case, select View by Fields and then select the field Phone. Click on the editable link as shown in the following screenshot: It will open the Access Settings for Account Field page, where you can edit the Field-Level Security. Once done, click on Save. Setting up password policies For security purposes, Salesforce provides an option to set password policies for the organization. Let's start with an example. Sagar Pareek, the system administrator of an organization, has decided to create a policy regarding the password for the organization, where the password of each user must be of 10 characters and must be a combination of alphanumeric and special characters. To do this, he will have to follow these steps: Navigate to Setup | Security Controls | Password Policies. It will open the Password Policies setup page: In the Minimum password length field, select 10 characters. In the Password complexity requirement field, select Must mix Alpha, numeric and special characters. Here, you can also decide when the password should expire under the User password expire in option. Enforce the password history under the option enforce password history, and set a password question requirement as well as the number of invalid attempts allowed and the lock-out period. Click on Save. Summary In this article, we have gone through various security setting features available on Salesforce. Starting from OWD, followed by Sharing Rules and Field-Level Security, we also covered password policy concepts. Resources for Article: Further resources on this subject: Introducing Salesforce Chatter [Article] Salesforce CRM Functions [Article] Adding a Geolocation Trigger to the Salesforce Account Object [Article]
Read more
  • 0
  • 0
  • 3467

article-image-livecode-loops-and-timers
Packt
10 Sep 2014
9 min read
Save for later

LiveCode: Loops and Timers

Packt
10 Sep 2014
9 min read
In this article by Dr Edward Lavieri, author of LiveCode Mobile Development Cookbook, you will learn how to use timers and loops in your mobile apps. Timers can be used for many different functions, including a basketball shot clock, car racing time, the length of time logged into a system, and so much more. Loops are useful for counting and iterating through lists. All of this will be covered in this article. (For more resources related to this topic, see here.) Implementing a countdown timer To implement a countdown timer, we will create two objects: a field to display the current timer and a button to start the countdown. We will code two handlers: one for the button and one for the timer. How to do it... Perform the following steps to create a countdown timer: Create a new main stack. Place a field on the stack's card and name it timerDisplay. Place a button on the stack's card and name it Count Down. Add the following code to the Count Down button: on mouseUp local pTime put 19 into pTime put pTime into fld "timerDisplay" countDownTimer pTime end mouseUp Add the following code to the Count Down button: on countDownTimer currentTimerValue subtract 1 from currentTimerValue put currentTimerValue into fld "timerDisplay" if currentTimerValue > 0 then send "countDownTimer" && currentTimerValue to me in 1 sec end if end countDownTimer Test the code using a mobile simulator or an actual device. How it works... To implement our timer, we created a simple callback situation where the countDownTimer method will be called each second until the timer is zero. We avoided the temptation to use a repeat loop because that would have blocked all other messages and introduced unwanted app behavior. There's more... LiveCode provides us with the send command, which allows us to transfer messages to handlers and objects immediately or at a specific time, such as this recipe's example. Implementing a count-up timer To implement a count-up timer, we will create two objects: a field to display the current timer and a button to start the upwards counting. We will code two handlers: one for the button and one for the timer. How to do it... Perform the following steps to implement a count-up timer: Create a new main stack. Place a field on the stack's card and name it timerDisplay. Place a button on the stack's card and name it Count Up. Add the following code to the Count Up button: on mouseUp local pTime put 0 into pTime put pTime into fld "timerDisplay" countUpTimer pTime end mouseUp Add the following code to the Count Up button: on countUpTimer currentTimerValue add 1 to currentTimerValue put currentTimerValue into fld "timerDisplay" if currentTimerValue < 10 then send "countUpTimer" && currentTimerValue to me in 1 sec end if end countUpTimer Test the code using a mobile simulator or an actual device. How it works... To implement our timer, we created a simple callback situation where the countUpTimer method will be called each second until the timer is at 10. We avoided the temptation to use a repeat loop because that would have blocked all other messages and introduced unwanted app behavior. There's more... Timers can be tricky, especially on mobile devices. For example, using the repeat loop control when working with timers is not recommended because repeat blocks other messages. Pausing a timer It can be important to have the ability to stop or pause a timer once it is started. The difference between stopping and pausing a timer is in keeping track of where the timer was when it was interrupted. In this recipe, you'll learn how to pause a timer. Of course, if you never resume the timer, then the act of pausing it has the same effect as stopping it. How to do it... Use the following steps to create a count-up timer and pause function: Create a new main stack. Place a field on the stack's card and name it timerDisplay. Place a button on the stack's card and name it Count Up. Add the following code to the Count Up button: on mouseUp local pTime put 0 into pTime put pTime into fld "timerDisplay" countUpTimer pTime end mouseUp Add the following code to the Count Up button: on countUpTimer currentTimerValue add 1 to currentTimerValue put currentTimerValue into fld "timerDisplay" if currentTimerValue < 60 then send "countUpTimer" && currentTimerValue to me in 1 sec end if end countUpTimer Add a button to the card and name it Pause. Add the following code to the Pause button: on mouseUp repeat for each line i in the pendingMessages cancel (item 1 of i) end repeat end mouseUp In LiveCode, the pendingMessages option returns a list of currently scheduled messages. These are messages that have been scheduled for delivery but are yet to be delivered. To test this, first click on the Count Up button, and then click on the Pause button before the timer reaches 60. How it works... We first created a timer that counts up from 0 to 60. Next, we created a Pause button that, when clicked, cancels all pending system messages, including the call to the countUpTimer handler. Resuming a timer If you have a timer as part of your mobile app, you will most likely want the user to be able to pause and resume a timer, either directly or through in-app actions. See previous recipes in this article to create and pause a timer. This recipe covers how to resume a timer once it is paused. How to do it... Perform the following steps to resume a timer once it is paused: Create a new main stack. Place a field on the stack's card and name it timerDisplay. Place a button on the stack's card and name it Count Up. Add the following code to the Count Up button: on mouseUp local pTime put 0 into pTime put pTime into fld "timerDisplay" countUpTimer pTime end mouseUp on countUpTimer currentTimerValue add 1 to currentTimerValue put currentTimerValue into fld "timerDisplay" if currentTimerValue < 60 then send "countUpTimer" && currentTimerValue to me in 1 sec end if end countUpTimer Add a button to the card and name it Pause. Add the following code to the Pause button: on mouseUp repeat for each line i in the pendingMessages cancel (item 1 of i) end repeat end mouseUp Place a button on the card and name it Resume. Add the following code to the Resume button: on mouseUp local pTime put the text of fld "timerDisplay" into pTime countUpTimer pTime end mouseUp on countUpTimer currentTimerValue add 1 to currentTimerValue put currentTimerValue into fld "timerDisplay" if currentTimerValue <60 then send "countUpTimer" && currentTimerValue to me in 1 sec end if end countUpTimer To test this, first, click on the Count Up button, then click on the Pause button before the timer reaches 60. Finally, click on the Resume button. How it works... We first created a timer that counts up from 0 to 60. Next, we created a Pause button that, when clicked, cancels all pending system messages, including the call to the countUpTimer handler. When the Resume button is clicked on, the current value of the timer, based on the timerDisplay button, is used to continue incrementing the timer. In LiveCode, pendingMessages returns a list of currently scheduled messages. These are messages that have been scheduled for delivery but are yet to be delivered. Using a loop to count There are numerous reasons why you might want to implement a counter in a mobile app. You might want to count the number of items on a screen (that is, cold pieces in a game), the number of players using your app simultaneously, and so on. One of the easiest methods of counting is to use a loop. This recipe shows you how to easily implement a loop. How to do it... Use the following steps to instantiate a loop that counts: Create a new main stack. Rename the stack's default card to MainScreen. Drag a label field to the card and name it counterDisplay. Drag five checkboxes to the card and place them anywhere. Change the names to 1, 2, 3, 4, and 5. Drag a button to the card and name it Loop to Count. Add the following code to the Loop to Count button: on mouseUp local tButtonNumber put the number of buttons on this card into tButtonNumber if tButtonNumber > 0 then repeat with tLoop = 1 to tButtonNumber set the label of btn value(tLoop) to "Changed " & tLoop end repeat put "Number of button's changed: " & tButtonNumber into fld "counterDisplay" end if end mouseUp Test the code by running it in a mobile simulator or on an actual device. How it works... In this recipe, we created several buttons on a card. Next, we created code to count the number of buttons and a repeat control structure to sequence through the buttons and change their labels. Using a loop to iterate through a list In this recipe, we will create a loop to iterate through a list of text items. Our list will be a to-do or action list. Our loop will process each line and number them on screen. This type of loop can be useful when you need to process lists of unknown lengths. How to do it... Perform the following steps to create an iterative loop: Create a new main stack. Drag a scrolling list field to the stack's card and name it myList. Change the contents of the myList field to the following, paying special attention to the upper- and lowercase values of each line: Wash Truck Write Paper Clean Garage Eat Dinner Study for Exam Drag a button to the card and name it iterate. Add the following code to the iterate button: on mouseUp local tLines put the number of lines of fld "myList" into tLines repeat with tLoop = 1 to tLines put tLoop & " - " & line tLoop of fld "myList"into line tLoop of fld "myList" end repeat end mouseUp Test the code by clicking on the iterate button. How it works... We used the repeat control structure to iterate through a list field one line at a time. This was accomplished by first determining the number of lines in that list field, and then setting the repeat control structure to sequence through the lines. Summary In this article we examined the LiveCode scripting required to implement and control count-up and countdown timers. We also learnt how to use loops to count and iterate through a list. Resources for Article:  Further resources on this subject: Introduction to Mobile Forensics [article] Working with Pentaho Mobile BI [article] Building Mobile Apps [article]
Read more
  • 0
  • 0
  • 13704
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime
article-image-working-data-access-and-file-formats-using-nodejs
Packt
04 Sep 2014
27 min read
Save for later

Working with Data Access and File Formats Using Node.js

Packt
04 Sep 2014
27 min read
In this article by Surendra Mohan, the author of Node.js Essentials, we will cover the following concepts: Reading and writing files using Node.js MySQL database handling using Node.js Working with data formats using Node.js Let's get started! (For more resources related to this topic, see here.) Reading and writing files The easiest and most convenient way of reading a file in a PHP application is by using the PHP file_get_contents() API function. Let's look into the following example PHP code snippet, wherein we intend to read a sample text file named sampleaf.txt that resides in the same directory as that of our PHP file (sampleaf.php): <?php $text = file_get_contents('sampleaf.txt'); print $text; ?> In the preceding code snippet, if the source file, sampleaf.txt, exists or can be read, the content of this file is assigned to the $text variable (long string type); otherwise, it will result in a Boolean value as false. All PHP API functions are blocking in nature, so is the file_get_contents() API function. Thus, the PHP code that is supposed to be executed after the file_get_contents() API function call gets blocked until the former code either executes successfully or completely fails. There is no callback mechanism available for this PHP API. Let's convert the preceding PHP code snippet into its corresponding Node.js code. Because the readFileSync() API function in the fs module is the closest Node.js equivalent to that of the PHP file_get_contents() API function, let's use it. Our Node.js code equivalent looks something like the following code snippet (sampleaf.njs): var fs = require('fs'); var text = false; try { text = fs.readFileSync(__dirname+'/'+'sampleaf.txt', 'utf8'); } catch (err) { // No action } console.log(text); Node.js functions come in both asynchronous as well as synchronous forms, asynchronous being the default. In our preceding Node.js code, we have appended the Sync term in our Node.js fs.readFile() API function, which is asynchronous in nature, and gets converted to its synchronous version once Sync gets appended to the end of it. The asynchronous version is nonblocking in nature, and depends upon the callbacks to take care of the Node.js API function call results. On the other hand, the synchronous version is blocking in nature (same as of our PHP code), which results in blocking of the Node.js code that is supposed to be executed after the API function till it completely succeeds or fails. If we look into the arguments passed with the Node.js fs.readFileSync() API function, we find the source file sampleaf.txt (prepended with the _dirname variable) that needs to be read, and utf8 stating the encoding we intend to use. The _dirname variable holds the directory name where our Node.js code file, sampleaf.njs, resides. The use of the _dirname variable instructs the Node.js fs.readFileSync() API function to locate the source file in the directory returned by this variable. If this variable is missing, our API function will try to find the source file in the directory where the Node.js server was started. By default, the second argument doesn't encode, which results in the function to return a raw buffer of bytes instead of a string. In order to enable the function to return a string, we pass the utf8 string (for UTF-8 encoding) as the second parameter to the function. Because we are dealing with the synchronous version of the API function, we handled Node.js exceptions by using the Node.js try and catch keywords. In this case, if the try block code gets executed successfully, the catch block code will be ignored and never get executed; otherwise, the try block code will immediately stop executing, thereby helping the catch block code to get executed. While replacing the PHP file_get_contents() API function with its corresponding Node.js API function, it is recommended to use the Node.js fs.readFile() API function instead of fs.readFileSync() due to the fact that synchronous API functions (in our case, fs.readFileSync()) are blocking in nature, whereas asynchronous API functions (in our case, fs.readFile()) are not. So, let's try converting the preceding PHP code snippet to its corresponding asynchronous Node.js code by using the fs.readFile() API function. We write the following Node.js code snippet and save it in a new file with the filename sampleafa.njs: var fs = require('fs'); var text = false; fs.readFile(__dirname+'/'+'sampleaf.txt', 'utf8', function(err, fo) { if (!err) { text = fo; } console.log(text); }); Our preceding asynchronous Node.js fs.readFile() API function accepts a callback function as its third argument that can return both the data as well as error, whichever is applicable. There is another way to read a file in Node.js. However, it doesn't match with any features available in PHP so far. We do so by creating a Node.js stream that will help us read the file. While the stream is read, events such as data, error, and close, are sent along with the stream. In such scenarios, we need to set up event handlers that would take care of such events. The PHP file() API function The file() API function helps us read content of a file and returns it as an indexed array. The content in the array are stored in such a way that each value of the array holds a single line of the file. If we want to print the first line of our source file (sampleaf.txt) in PHP, we write the following code snippet that includes the End Of Line (EOL) character sequence at the end of the line: $x = file('sampleaf.txt'); print $x[0]; If we are using PHP5 version, we get an opportunity to include the second and optional parameter (the flags parameter) to our file() API function. The flags parameter provides us with three options that can be either used individually or can be combined together using the OR operator (|), and they are as follows: FILE_IGNORE_NEW_LINES FILE_SKIP_EMPTY_LINES FILE_USE_INCLUDE_PATH The FILE_IGNORE_NEW_LINES flag option is normally used, and it instructs the PHP file() API function to eradicate EOL characters from the end of each line. Let's rework on our preceding PHP code snippet such that it prints the first line of the sampleaf.txt file, but eradicates the EOL character sequence at the end of each value in the array. So, our modified PHP code snippet will look like the following: $x = file('sampleaf.txt', FILE_IGNORE_NEW_LINES); print $x[0]; Now it's time to convert the preceding PHP code snippet into its corresponding Node.js code snippet. Converting the PHP file() API function is a bit complicated as compared to that of converting the PHP file_get_contents() API function. The following code demonstrates the converted Node.js code snippet corresponding to our PHP code snippet: var fs = require('fs'); var FILE_IGNORE_NEW_LINES = 0x2; var x = false; var flag = FILE_IGNORE_NEW_LINES; fs.readFile(__dirname+'/'+'sampleaf.txt', 'utf8', function(err, data) { if (!err) { x = data.replace(/rn?/g,'n'); x = x.split('n'); x.neol = x.length - 1; if ((x.length > 0) && (x[x.length-1] === '')) { x.splice(x.length-1, 1); } if ((flag & FILE_IGNORE_NEW_LINES) === 0) { for (var i=0; i < x.neol; ++i) { x[i] += 'n'; } } delete x.neol; } console.log(x[0]); }); In the preceding Node.js code, the !err condition within the if statement is the real culprit that makes this code actually complicated. Let's now walk through the preceding Node.js code snippet, especially the ones we have embedded in the if statement: First of all, we converted EOL characters for the Linux, Windows, and Mac text files into the end-of-line character (n) for the operating system our Node.js server is current running, using the following code chunk: x = data.replace(/rn?/g,'n'); Then, we converted the string to an array of lines that complies with the PHP file() API function standards, using the following line of code: x = x.split('n'); Then, we handled the last line of the file by implementing the following code snippet: x.neol = x.length - 1; if ((x.length > 0) && (x[x.length-1] === '')) { x.splice(x.length-1, 1); } Finally, in the following code snippet, we check whether FILE_IGNORE_NEW_LINES has been specified or not. If it hasn't been specified, the EOL character will be added to the end of the lines: if ((flag & FILE_IGNORE_NEW_LINES) === 0) { for (var i=0; i < x.neol; ++i) { x[i] += 'n'; } } File handling APIs The core set of file handling APIs in PHP and Node.js are shaped based on the C language file handling API functions. For instance, the PHP fopen() API function opens a file in different modes, such as read, write, and append. The Node.js open() API function is the equivalent to this PHP fopen() API function, and both of these API functions are shaped with the fopen() from the C language. Let's consider the following PHP code snippet that opens a file for reading purposes and reads the first 500 bytes of content from the file: $fo = fopen('sampleaf.txt', 'r'); $text = fread($fo, 500); fclose($fo); In case the file size is less than 500 bytes, our preceding PHP code snippet will read the entire file. In Node.js, the Node.js fs.read() API function is used to read from the file and uses the buffer built-in module to hold an ordered collection of bytes. Likewise, the Node.js fs.close() API function is used to close the file once it is read as intended. In order to convert the preceding PHP code snippet, we write the following Node.js code snippet: var fs = require('fs'); var Buffer = require('buffer').Buffer; fs.open(__dirname+'/'+'sampleaf.txt', 'r', function(err, fo) { var text = ''; var b = new Buffer(500); fs.read(fo, b, 0, b.length, null, function(err, bytesRead, buf){ var bufs = buf.slice(0, bytesRead); text += bufs.toString(); fs.close(fo, function() { console.log(text); }); }); }); In our Node.js code, besides the usual callback functions, we have used a couple of buffer variables, such as the b and bufs variables that adds some complexity to our code. The b variable holds the data that is read using the Node.js fs.read() API function. The bufs variable holds the actual bytes that are read, wherein the unused bytes of the b variable are sliced off. The buf argument is an alias of the b variable. Both PHP and the Node.js maintain a file pointer that indicates the next bytes that should be read from the file. We can cross-check the end of the file using the PHP feof() API function. In order to implement the PHP feof() API function, we write the following PHP code snippet: $fo = fopen('sampleaf.txt', 'r'); $text = ''; while (!feof($fo)) { $text .= fread($fo, 500); } fclose($fo); print $text; Node.js doesn't have anything that is equivalent to the PHP feof() API function. Instead, we use the bytesRead argument that is passed to the callback function and is compared with the number of bytes requested in order to read the file. We land to the following Node.js code snippet when we convert our preceding and modified PHP code snippet: var fs = require('fs'); var Buffer = require('buffer').Buffer; fs.open(__dirname+'/'+'sampleaf.txt', 'r', function(err, fo) { var text = ''; var b = new Buffer(500); var fread = function() { fs.read(fo, b, 0, b.length, null, function(err, bytesRead, buf) { var eof = (bytesRead != b.length); if (!eof) { text += buf.toString(); fread(); } else { if (bytesRead > 0) { var bufs = buf.slice(0, bytesRead); text += bufs.toString(); } fs.close(fo, function() { console.log(text); }); } }); }; fread(); }) Due to callbacks in Node.js, the fread function variable must be defined in such a way that it can be called if the file size is greater than the b buffer variable. The fread function variable is triggered continuously till the end of the file. By the end of the file, the partially occupied buffer is proceeded, and then the Node.js fs.close() API function is triggered. We also use the linearity concept, where the Node.js console.log() API function call is embedded in the callback of the Node.js fs.close() API function. MySQL access In the previous section, we learned how to access the data from files using PHP code and exercised how to convert such PHP code to its corresponding Node.js code. As an alternative to what we discussed earlier, we can even access the necessary data from our database, and write to it as a record or set of records. Because the database server can be accessed remotely, PHP and Node.js are capable enough to connect to the intended database, regardless of whether it is running on the same server or remote server. You must be aware that data in a database is arranged in rows and columns. This makes it easy to organize and store data such as usernames. On the other hand, it is quite complex to organize and store certain types of data, such as image files or any other media files. In this section, we will use the MySQL database with PHP, and learn how to convert our PHP code that uses the MySQL database into its equivalent Node.js code based on different scenarios. The reason behind choosing the MySQL database for our exercise is that it is quite popular in the database and hosting market. Moreover, PHP applications have a special bond with MySQL database. We assume that the MySQL server has already been installed, so that we can create and use the MySQL database with the PHP and Node.js code during our exercise. In order to access our database through a PHP or Node.js application, our web application server (where PHP or Node.js is running) needs some tweaking, so that necessary accesses are granted to the database. If you are running the Apache2 web server on a Linux environment, the phpx-myql extension needs to be installed, where x denotes the PHP version you are using. For instance, when using PHP 5.x, the required and related extension that needs to be installed would be php5-mysql. Likewise, the php4-mysql and php6-mysql extensions are necessary for PHP versions 4.x and 6.x, respectively. On the other hand, if you are using the Apache2 web server on a Windows environment, you need to install the PHP-to-MySQL extension during the Apache2 web server installation. Database approaches Node.js doesn't have a built-in module that can help a Node.js application access the MySQL database. However, we have a number of modules that are provided by the Node.js npm package and can be installed in order to achieve database access in variety of approaches. Using the MySQL socket protocol MySQL socket protocol is one of the easiest approaches that can be implemented in Node.js using the Node.js npm package. This npm package uses the built-in net module to open a network socket for the MySQL server to connect with the application and exchange packets in a format that is supported and expected by the MySQL server. The Node.js npm package bluffs and surpasses other MySQL drivers (shipped with the MySQL server installer), unaware of the fact that it is communicating to the Node.js driver instead of the default driver that has been built in C language. There are a number of ways MySQL socket protocol can be implemented in Node.js. The most popular implementation is using the Node.js node-mysql npm package. In order to install this npm package, you can either retrieve it from its GitHub repository at http://github.com/felixge/node-mysql or run npm install mysql on the command line. An alternative to the Node.js implementation of this protocol is the Node.js mysql-native npm package. In order to install this package, you can either retrieve it from its GitHub repository at http://github.com/sidorares/nodejs-mysql-native, or run npm install mysql-native on the command line. In order to play around with database records, the SQL language that needs to be applied constitutes of commands such as SELECT, INSERT, UPDATE, and DELETE along with other commands. However, Node.js stores data in the database as properties on a Node.js object. Object-relational mapping (ORM or O/R mapping) is a set of planned actions to read and write objects (in our case, Node.js objects) to a SQL-based database (in our case, the MySQL database). This ORM is implemented on the top of other database approaches. Thus, the Node.js ORM npm package can use any other Node.js npm packages (for instance, node-mysql and mysql-native) to access and play around with the database. Normally, ORM npm packages use SQL statements during implementation; however, it provides a better and logical set of API functions to do the database access and data exchange job. The following are a couple of Node.js npm modules that provide object-relational mapping support to Node.js: The Node.js persistencejs npm module: This is an asynchronous JavaScript-based ORM library. We recommend you to refer its GitHub repository documentation at https://github.com/coresmart/persistencejs, in case you wish to learn about it. The Node.js sequelize npm module : This is a JavaScript-based ORM library that provides access to databases such as MySQL, SQLite, PostgreSQL, and so on, by mapping database records to objects and vice versa. If you want to learn more about the sequelize library, we recommend that you refer to its documentation at http://sequelizejs.com/. Normally, an object-relational mapping layer makes the Node.js world quite simple, convenient, and developer friendly. Using the node-mysql Node.js npm package In this section, we will learn how to implement the Node.js node-mysql npm package, which is the most popular way of accessing a MySQL database using Node.js. In order to use the node-mysql package, we need to install it. This Node.js npm module can be installed in the same way as we install other Node.js npm packages. So, to install it, we run the following command: npm install mysql As soon as the node-mysql package gets installed, we need to make this package available for use by using the Node.js require() API function. To do so, we create a mysql variable to access the Node.js node-mysql module, as demonstrated in the following line of Node.js code: var mysql = require('mysql'); Before you can use database records to read or write, it is mandatory to connect your PHP or Node.js application to this database. In order to connect our PHP application to our MySQL database, we use three sets of the PHP API function (mysql, mysqli, and PDO) that use the PDO_MySQL driver. Let's write the following PHP code snippet: $sql_host = '192.168.0.100'; $sql_user = 'adminuser'; $sql_pass = 'password'; $conn = mysql_connect($sql_host, $sql_user, $sql_pass); In the preceding code snippet, the $conn variable holds the database collection. In order to establish the database connection, we used the PHP mysql_connect() API function that accepts three arguments: database server as the IP address or DNS (in our case, the IP address is 192.168.0.100), database username (in our case, adminuser), and the password associated to the database user (in our case, password). When working with the node-mysql Node.js npm package, the Node.js createClient() API function is used as the equivalent to the PHP mysql_connect() API function. Unlike the PHP API function, the Node.js API function accesses a Node.js object with the three properties as its parameters. Moreover, we want our Node.js code to load the mysql Node.js npm package. Thus, we use Node.js require() to achieve this. Let's write the following Node.js code snippet that is equivalent to our preceding PHP code snippet: Var mysql = require('mysql'); var sql_host = '192.168.0.100'; var sql_user = 'adminuser'; var sql_pass = 'password'; var sql_conn = {host: sql_host, user: sql_user, password: sql_pass}; var conn = mysql.createClient(sql_conn); We can even merge the last two statements (highlighted) in a single statement. Thus, we replace the highlighted statements with the following one: var conn = mysql.createClient({host: sql_host, user: sql_user, password: sql_pass}); In the case of both the PHP $conn and Node.js conn variables, a meaningful value is assigned to these variables if the MySQL server is accessible; otherwise, they are assigned a false value. Once the database is no longer needed, it needs to be disconnected from our PHP and Node.js code. Using PHP, the MySQL connection variable (in our case, $conn) needs to be closed using the PHP mysql_close() API function by implementing the following PHP code statement: $disconn = mysql_close($conn); The PHP mysql_close() API function returns a Boolean value that indicates whether the connection has been closed successfully or failed. In the case of Node.js, we use the destroy() method on the conn object in order to close the database connection using the following Node.js code statement: conn.destroy(); Once our applications get connected to the desired MySQL database on the MySQL server, the database needs to be selected. In case of PHP, the PHP mysql_select_db() API function is used to do this job. The following PHP code snippet demonstrates how we select the desired database: $sql_db = 'desiredDB'; $selectedDB = mysql_select_db($sql_db, $conn); While converting the PHP code into its equivalent Node.js code, it should be refactored to explicitly pass the PHP $conn variable to all the mysql API functions. As soon as the database is selected, we use the PHP mysql_query() API function to play around with the data of the selected database. In the case of Node.js, we use Node.js query() methods, which is equivalent to its corresponding PHP code statement. In order to select a database, the SQL USE command is used as demonstrated in the following code line: USE desiredDB; In the preceding statement, if we are using a single database, the semicolon (;) is optional. It is used as a separator in case you plan to use more than one database. When working with Node.js, the USE desiredDB SQL command needs to be sent using the Node.js query() method. Let's write the following Node.js code snippet that selects the desiredDB database from our MySQL server via the Node.js conn variable: var sql_db = 'desiredDB'; var sql_db_select = 'USE '+sql_db; conn.query(sql_db_select, function(err) { if (!err) { // Selects the desired MySQL database, that is, desiredDB } else { // MySQL database selection error } }); We can even merge the highlighted statements in the preceding code snippet into one statement. Our Node.js code snippet will look something like the following once these statements get merged: var sql_db = 'desiredDB'; conn.query('USE '+sql_db, function(err) { if (!err) { // Selects the desired MySQL database, that is, desiredDB } else { // MySQL database selection error } }); By now, we are able to connect our PHP and Node.js applications to the MySQL server and select the desired MySQL database to play around with. Our data in the selected database can be accessed (in terms of reading and writing) using popular SQL commands, such as CREATE TABLE, DROP TABLE, SELECT, INSERT, UPDATE, and DELETE. Creating a table Let's consider the CREATE TABLE SQL command. In PHP, the CREATE TABLE SQL command is triggered using the PHP myql_query() API function, as demonstrated in the following PHP code snippet: $sql_prefix = 'desiredDB_'; $sql_cmd = 'CREATE TABLE `'.$sql_prefix.'users` (`id` int AUTO_INCREMENT KEY, `user` text)'; $tabCreated = mysql_query($sql_cmd, $conn); In the preceding PHP code snippet, we created a table, desiredDB_users, which consists of two columns: id and user. The id column holds an integer value and possesses the SQL AUTO_INCREMENT and KEY options. These SQL options indicate that the MySQL server should set a unique value for each row that is associated to the id column, and that the user has no control over this value. The user column holds string values and is set with the value indicated by the requester when a new row is inserted into our MySQL database. Let's write the following Node.js code snippet, which is equivalent to our preceding PHP code: var sql_prefix = 'desiredDB_'; var sql_cmd = 'CREATE TABLE `'+sql_prefix+'users` (`id` int AUTO_INCREMENT KEY, `user` text)'; var tabCreated = false; conn.query(sql_cmd, function(err, rows, fields) { if (!e) { tabCreated = true; } }); Here, the err parameter that is placed in our query() method's callback function indicates whether any error has been triggered or not. Deleting a table Let's now learn how to delete a table and attempt to delete the same table, users, we just created. This activity is quite similar to creating a table, which we recently discussed. In PHP, we use the DROP TABLE SQL command to achieve our purpose as demonstrated in the following PHP code snippet: $sql_prefix = 'desiredDB_'; $sql_cmd = 'DROP TABLE `'.$sql_prefix.'users`'; $tabDropped = mysql_query($sql_cmd, $conn); Converting the preceding PHP code snippet into its corresponding Node.js code snippet, we follow the same basis as we discussed while creating the table. Our converted Node.js code snippet will look like the following code snippet: var sql_prefix = 'desiredDB_'; var sql_cmd = 'DROP TABLE `'+sql_prefix+'users`'; var tabDropped = false; conn.query(sql_cmd, function(err, rows, fields) { if (!err) { tabDropped = true; } }); Using a SELECT statement Coming to the SQL SELECT statement, it is used to read data from the database tables and functions in a bit different way. In PHP, the PHP mysql_query() API function triggers the statement and returns a result object that is stored in the PHP $sql_result variable. In order to access the actual data, the PHP mysql_fetch_assoc() API function is implemented that runs in a loop in order to fetch the data from more than one row. We assume that we have retained our users table and all the records that we had deleted recently. Our PHP code snippet will look like the following one: $sql_prefix = 'desiredDB_'; $sql_cmd = 'SELECT user FROM `'.$sql_prefix.'users`'; $sql_result = mysql_query($sql_cmd, $conn); while ($row = mysql_fetch_assoc($sql_result)) { $user = $row['user']; print $user; } It is always good to extract the PHP $row variable into an array-free variable (in our case, $user), due to the fact that doing so eliminates complexity when converting PHP code to its corresponding Node.js code. When converting the preceding PHP code snippet into a Node.js code snippet, we use the Node.js query() method to trigger the statement that returns the data as arguments to the callback function. The rows parameter to the callback function holds a two-dimensional array of data, that is, an indexed array of rows along with the array of values associated to each row. Our Node.js code snippet for the preceding PHP code snippet will look like the following one: var sql_prefix = 'desiredDB_'; var sql_cmd = 'SELECT user FROM `'.$sql_prefix.'users`'; conn.query(sql_cmd, function(err, rows, fields) { if (!err) { for (var i=0; i < rows.length; ++i) { var row = rows[i]; var user = row['user']; console.log(user); } } }); In the preceding Node.js code snippet, we could have defined row[i]['user'] so as to show that the Node.js rows variable is a two-dimensional array. Using the UPDATE statement Now, let's try out implementing the SQL UPDATE statement that is used to modify any data of a table. In PHP, we trigger the SQL UPDATE statement by using the PHP mysql_query() API function, as demonstrated in the following code snippet: $sql_prefix = 'desiredDB_'; $sql_cmd = 'UPDATE `'.$sql_prefix.'users` SET `user`="mohan" WHERE `user`="surendra"'; $tabUpdated = mysql_query($sql_cmd, $conn); if ($tabUpdated) { $rows_updated = mysql_affected_rows($conn); print 'Updated '.$rows_updated.' rows.'; } Here, the PHP mysql_affected_rows() API function returns the number of rows that have been modified due to the SQL UPDATE statement. In Node.js, we use the same SQL UPDATE statement. Additionally, we use the affectedRows property of the row's object that holds the same value which is returned out of the PHP mysql_affected_rows() API function. Our Node.js converted code snippet will look like the following one: var sql_prefix = 'desiredDB_'; var sql_cmd = 'UPDATE `'+sql_prefix+'users` SET `user`="mohan" WHERE `user`="surendra"'; conn.query(sql_cmd, function(err, rows, fields) { if (!err) { var rows_updated = rows.affectedRows; console.log('Updated '+rows_updated+' rows.'); } }); Using the INSERT statement Now it is time to write the PHP code to insert data into a table, and then convert the code to its equivalent Node.js code. In order to insert data into a table, we use the SQL INSERT statement. In PHP, the SQL INSERT statement is triggered using the PHP mysql_query() API function, as demonstrated in the following PHP code snippet: $sql_prefix = 'desiredDB_'; $sql_cmd = 'INSERT INTO `'.$sql_prefix.'users` (`id`, `user`) VALUES (0, "surmohan")'; $tabInserted = mysql_query($sql_cmd, $conn); if ($tabInserted) { $inserted_id = mysql_insert_id($conn); print 'Successfully inserted row with id='.$inserted_id.'.'; } Here, the PHP mysql_insert_id() API function returns the value of id that is associated to the newly inserted data. In Node.js, we use the same SQL INSERT statement. Additionally, we use the insertId property of the row's object that holds the same value that is returned from the PHP mysql_insert_id() API function. The Node.js code snippet that is equivalent to the preceding PHP code snippet looks like the following one: var sql_prefix = 'desiredDB_'; var sql_cmd = 'INSERT INTO `'+sql_prefix+'users` (`id`, `user`) VALUES (0, "surmohan")'; conn.query(sql_cmd, function(err, rows, fields) { if (!err) { var inserted_id = rows.insertId; console.log(''Successfully inserted row with id='+inserted_id+'.'); } }); Using the DELETE statement Finally, we have reached our last activity of this section, that is, use of the SQL DELETE statement. Like the SQL statements we discussed earlier in this section, the SQL DELETE statement is also triggered using the PHP mysql_query() API function as demonstrated in the following PHP code snippet: $sql_prefix = 'desiredDB_'; $sql_cmd = 'DELETE FROM `'.$sql_prefix.'users` WHERE `user`="surmohan"'; $tabDeleted = mysql_query($sql_cmd, $conn); if ($tabDeleted) { $rows_deleted = mysql_affected_rows($conn); print 'Successfully deleted '.$rows_deleted.' rows.'; } In Node.js, we use the same SQL DELETE statement. We also use the affectedRows property that serves us in the same way as discussed while dealing with the SQL UPDATE statement. The equivalent Node.js code snippet will look like the following one: var sql_prefix = 'desiredDB_'; var sql_cmd = 'DELETE FROM `'+sql_prefix+'users` WHERE `user`="surmohan"'; conn.query(sql_cmd, function(err, rows, fields) { if (!err) { var rows_deleted = rows.affectedRows; console.log('Successfully deleted '+rows_deleted +' rows.'); } });
Read more
  • 0
  • 0
  • 9104

article-image-physics-engine
Packt
04 Sep 2014
9 min read
Save for later

The physics engine

Packt
04 Sep 2014
9 min read
In this article by Martin Varga, the author of Learning AndEngine, we will look at the physics in AndEngine. (For more resources related to this topic, see here.) AndEngine uses the Android port of the Box2D physics engine. Box2D is very popular in games, including the most popular ones such as Angry Birds, and many game engines and frameworks use Box2D to simulate physics. It is free, open source, and written in C++, and it is available on multiple platforms. AndEngine offers a Java wrapper API for the C++ Box2D backend, and therefore, no prior C++ knowledge is required to use it. Box2D can simulate 2D rigid bodies. A rigid body is a simplification of a solid body with no deformations. Such objects do not exist in reality, but if we limit the bodies to those moving much slower than the speed of light, we can say that solid bodies are also rigid. Box2D uses real-world units and works with physics terms. A position in a scene in AndEngine is defined in pixel coordinates, whereas in Box2D, it is defined in meters. AndEngine uses a pixel to meter conversion ratio. The default value is 32 pixels per meter. Basic terms Box2D works with something we call a physics world. There are bodies and forces in the physics world. Every body in the simulation has the following few basic properties: Position Orientation Mass (in kilograms) Velocity (in meters per second) Torque (or angular velocity in radians per second) Forces are applied to bodies and the following Newton's laws of motion apply: The first law, An object that is not moving or moving with constant velocity will stay that way until a force is applied to it, can be tweaked a bit The second law, Force is equal to mass multiplied by acceleration, is especially important to understand what will happen when we apply force to different objects The third law, For every action, there is an equal and opposite reaction, is a bit flexible when using different types of bodies Body types There are three different body types in Box2D, and each one is used for a different purpose. The body types are as follows: Static body: This doesn't have velocity and forces do not apply to a static body. If another body collides with a static body, it will not move. Static bodies do not collide with other static and kinematic bodies. Static bodies usually represent walls, floors, and other immobile things. In our case, they will represent platforms which don't move. Kinematic body: This has velocity, but forces don't apply to it. If a kinematic body is moving and a dynamic body collides with it, the kinematic body will continue in its original direction. Kinematic bodies also do not collide with other static and kinematic bodies. Kinematic bodies are useful to create moving platforms, which is exactly how we are going to use them. Dynamic body: A dynamic body has velocity and forces apply to it. Dynamic bodies are the closest to real-world bodies and they collide with all types of bodies. We are going to use a dynamic body for our main character. It is important to understand the consequences of choosing each body type. When we define gravity in Box2D, it will pull all dynamic bodies to the direction of the gravitational acceleration, but static bodies will remain still and kinematic bodies will either remain still or keep moving in their set direction as if there was no gravity. Fixtures Every body is composed of one or more fixtures. Each fixture has the following four basic properties: Shape: In Box2D, fixtures can be circles, rectangles, and polygons Density: This determines the mass of the fixture Friction: This plays a major role in body interactions Elasticity: This is sometimes called restitution and determines how bouncy the object is There are also special properties of fixtures such as filters and filter categories and a single Boolean property called sensor. Shapes The position of fixtures and their shapes in the body determine the overall shape, mass, and the center of gravity of the body. The upcoming figure is an example of a body that consists of three fixtures. The fixtures do not need to connect. They are part of one body, and that means their positions relative to each other will not change. The red dot represents the body's center of gravity. The green rectangle is a static body and the other three shapes are part of a dynamic body. Gravity pulls the whole body down, but the square will not fall. Density Density determines how heavy the fixtures are. Because Box2D is a two-dimensional engine, we can imagine all objects to be one meter deep. In fact, it doesn't matter as long as we are consistent. There are two bodies, each with a single circle fixture, in the following figure. The left circle is exactly twice as big as the right one, but the right one has double the density of the first one. The triangle is a static body and the rectangle and the circles are dynamic, creating a simple scale. When the simulation is run, the scales are balanced. Friction Friction defines how slippery a surface is. A body can consist of multiple fixtures with different friction values. When two bodies collide, the final friction is calculated from the point of collision based on the colliding fixtures. Friction can be given a value between 0 and 1, where 0 means completely frictionless and 1 means super strong friction. Let's say we have a slope which is made of a body with a single fixture that has a friction value of 0.5, as shown in the following figure: The other body consists of a single square fixture. If its friction is 0, the body slides very fast all the way down. If the friction is more than 0, then it would still slide, but slow down gradually. If the value is more than 0.25, it would still slide but not reach the end. Finally, with friction close to 1, the body will not move at all. Elasticity The coefficient of restitution is a ratio between the speeds before and after a collision, and for simplicity, we can call the material property elasticity. In the following figure, there are three circles and a rectangle representing a floor with restitution 0, which means not bouncy at all. The circles have restitutions (from left to right) of 1, 0.5, and 0. When this simulation is started, the three balls will fall with the same speed and touch the floor at the same time. However, after the first bounce, the first one will move upwards and climb all the way to the initial position. The middle one will bounce a little and keep bouncing less and less until it stops. The right one will not bounce at all. The following figure shows the situation after the first bounce: Sensor When we need a fixture that detects collisions but is otherwise not affected by them and doesn't affect other fixtures and bodies, we use a sensor. A goal line in a 2D air hockey top-down game is a good example of a sensor. We want it to detect the disc passing through, but we don't want it to prevent the disc from entering the goal. The physics world The physics world is the whole simulation including all bodies with their fixtures, gravity, and other settings that influence the performance and quality of the simulation. Tweaking the physics world settings is important for large simulations with many objects. These settings include the number of steps performed per second and the number of velocity and position interactions per step. The most important setting is gravity, which is determined by a vector of gravitational acceleration. Gravity in Box2D is simplified, but for the purpose of games, it is usually enough. Box2D works best when simulating a relatively small scene where objects are a few tens of meters big at most. To simulate, for example, a planet's (radial) gravity, we would have to implement our own gravitational force and turn the Box2D built-in gravity off. Forces and impulses Both forces and impulses are used to make a body move. Gravity is nothing else but a constant application of a force. While it is possible to set the position and velocity of a body in Box2D directly, it is not the right way to do it, because it makes the simulation unrealistic. To move a body properly, we need to apply a force or an impulse to it. These two things are almost the same. While forces are added to all the other forces and change the body velocity over time, impulses change the body velocity immediately. In fact, an impulse is defined as a force applied over time. We can imagine a foam ball falling from the sky. When the wind starts blowing from the left, the ball will slowly change its trajectory. Impulse is more like a tennis racket that hits the ball in flight and changes its trajectory immediately. There are two types of forces and impulses: linear and angular. Linear makes the body move left, right, up, and down, and angular makes the body spin around its center. Angular force is called torque. Linear forces and impulses are applied at a given point, which will have different effects based on the position. The following figure shows a simple body with two fixtures and quite high friction, something like a carton box on a carpet. First, we apply force to the center of the large square fixture. When the force is applied, the body simply moves on the ground to the right a little. This is shown in the following figure: Second, we try to apply force to the upper-right corner of the large box. This is shown in the following figure: Using the same force at a different point, the body will be toppled to the right side. This is shown in the following figure:
Read more
  • 0
  • 0
  • 3736

article-image-more-line-charts-area-charts-and-scatter-plots
Packt
26 Aug 2014
13 min read
Save for later

More Line Charts, Area Charts, and Scatter Plots

Packt
26 Aug 2014
13 min read
In this article by Scott Gottreu, the author of Learning jqPlot, we'll learn how to import data from remote sources. We will discuss what area charts, stacked area charts, and scatter plots are. Then we will learn how to implement these newly learned charts. We will also learn about trend lines. (For more resources related to this topic, see here.) Working with remote data sources We return from lunch and decide to start on our line chart showing social media conversions. With this chart, we want to pull the data in from other sources. You start to look for some internal data sources, coming across one that returns the data as an object. We can see an excerpt of data returned by the data source. We will need to parse the object and create data arrays for jqPlot: { "twitter":[ ["2012-11-01",289],...["2012-11-30",225] ], "facebook":[ ["2012-11-01",27],...["2012-11-30",48] ] } We solve this issue using a data renderer to pull our data and then format it properly for jqPlot. We can pass a function as a variable to jqPlot and when it is time to render the data, it will call this new function. We start by creating the function to receive our data and then format it. We name it remoteDataSource. jqPlot will pass the following three parameters to our function: url: This is the URL of our data source. plot: The jqPlot object we create is passed by reference, which means we could modify the object from within remoteDataSource. However, it is best to treat it as a read-only object. options: We can pass any type of option in the dataRendererOptions option when we create our jqPlot object. For now, we will not be passing in any options: <script src="../js/jqplot.dateAxisRenderer.min.js"></script> <script> $(document).ready(function(){ var remoteDataSource = function(url, plot, options) { Next we create a new array to hold our formatted data. Then, we use the $.ajax method in jQuery to pull in our data. We set the async option to false. If we don't, the function will continue to run before getting the data and we'll have an empty chart: var data = new Array; $.ajax({ async: false, We set the url option to the url variable that jqPlot passed in. We also set the data type to json: url: url, dataType:"json", success: function(remoteData) { Then we will take the twitter object in our JSON and make that the first element of our data array and make facebook the second element. We then return the whole array back to jqPlot to finish rendering our chart: data.push(remoteData.twitter); data.push(remoteData.facebook); } }); return data; }; With our previous charts, after the id attribute, we would have passed in a data array. This time, instead of passing in a data array, we pass in a URL. Then, within the options, we declare the dataRenderer option and set remoteDataSource as the value. Now when our chart is created, it will call our renderer and pass in all the three parameters we discussed earlier: var socialPlot = $.jqplot ('socialMedia', "./data/social_shares.json", { title:'Social Media Shares', dataRenderer: remoteDataSource, We create labels for both our data series and enable the legend: series:[ { label: 'Twitter' }, { label: 'Facebook' } ], legend: { show: true, placement: 'outsideGrid' }, We enable DateAxisRenderer for the x axis and set min to 0 on the y axis, so jqPlot will not extend the axis below zero: axes:{ xaxis:{ renderer:$.jqplot.DateAxisRenderer, label: 'Days in November' }, yaxis: { min:0, label: 'Number of Shares' } } }); }); </script> <div id="socialMedia" style="width:600px;"></div> If you are running the code samples from your filesystem in Chrome, you will get an error message similar to this: No 'Access-Control-Allow-Origin' header is present on the requested resource. The security settings do not allow AJAX requests to be run against files on the filesystem. It is better to use a local web server such as MAMP, WAMP, or XAMPP. This way, we avoid the access control issues. Further information about cross-site HTTP requests can be found at the Mozilla Developer Network at https://developer.mozilla.org/en-US/docs/Web/HTTP/Access_control_CORS. We load this new chart in our browser and can see the result. We are likely to run into cross-domain issues when trying to access remote sources that do not allow cross-domain requests. The common practice to overcome this hurdle would be to use the JSONP data type in our AJAX call. jQuery will only run JSONP calls asynchronously. This keeps your web page from hanging if a remote source stops responding. However, because jqPlot requires all the data from the remote source before continuing, we can't use cross-domain sources with our data renderers. We start to think of ways we can use external APIs to pull in data from all kinds of sources. We make a note to contact the server guys to write some scripts to pull from the external APIs we want and pass along the data to our charts. By doing it in this way, we won't have to implement OAuth (OAuth is a standard framework used for authentication), http://oauth.net/2, in our web app or worry about which sources allow cross-domain access. Adding to the project's scope As we continue thinking up new ways to work with this data, Calvin stops by. "Hey guys, I've shown your work to a few of the regional vice-presidents and they love it." Your reply is that all of this is simply an experiment and was not designed for public consumption. Calvin holds up his hands as if to hold our concerns at bay. "Don't worry, they know it's all in beta. They did have a couple of ideas. Can you insert in the expenses with the revenue and profit reports? They also want to see those same charts but formatted differently." He continues, "One VP mentioned that maybe we could have one of those charts where everything under the line is filled in. Oh, and they would like to see these by Wednesday ahead of the meeting." With that, Calvin turns around and makes his customary abrupt exit. Adding a fill between two lines We talk through Calvin's comments. Adding in expenses won't be too much of an issue. We could simply add the expense line to one of our existing reports but that will likely not be what they want. Visually, the gap on our chart between profit and revenue should be the total amount of expenses. You mention that we could fill in the gap between the two lines. We decide to give this a try: We leave the plugins and the data arrays alone. We pass an empty array into our data array as a placeholder for our expenses. Next, we update our title. After this, we add a new series object and label it Expenses: ... var rev_profit = $.jqplot ('revPrfChart', [revenue, profit, [] ], { title:'Monthly Revenue & Profit with Highlighted Expenses', series:[ { label: 'Revenue' }, { label: 'Profit' }, { label: 'Expenses' } ], legend: { show: true, placement: 'outsideGrid' }, To fill in the gap between the two lines, we use the fillBetween option. The only two required options are series1 and series2. These require the positions of the two data series in the data array. So in our chart, series1 would be 0 and series2 would be 1. The other three optional settings are: baseSeries, color, and fill. The baseSeries option tells jqPlot to place the fill on a layer beneath the given series. It will default to 0. If you pick a series above zero, then the fill will hide any series below the fill layer: fillBetween: { series1: 0, series2: 1, We want to assign a different value to color because it will default to the color of the first data series option. The color option will accept either a hexadecimal value or the rgba option, which allows us to change the opacity of the fill. Even though the fill option defaults to true, we explicitly set it. This option also gives us the ability to turn off the fill after the chart is rendered: color: "rgba(232, 44, 12, 0.5)", fill: true }, The settings for the rest of the chart remain unchanged: axes:{ xaxis:{ renderer:$.jqplot.DateAxisRenderer, label: 'Months' }, yaxis:{ label: 'Totals Dollars', tickOptions: { formatString: "$%'d" } } } }); }); </script> <div id="revPrfChart" style="width:600px;"></div> We switch back to our web browser and load the new page. We see the result of our efforts in the following screenshot. This chart layout works but we think Calvin and the others will want something else. We decide we need to make an area chart. Understanding area and stacked area charts Area charts come in two varieties. The default type of area chart is simply a modification of a line chart. Everything from the data point on the y axis all the way to zero is shaded. In the event your numbers are negative, then the data above the line up to zero is shaded in. Each data series you have is laid upon the others. Area charts are best to use when we want to compare similar elements, for example, sales by each division in our company or revenue among product categories. The other variation of an area chart is the stacked area chart. The chart starts off being built in the same way as a normal area chart. The first line is plotted and shaded below the line to zero. The difference occurs with the remaining lines. We simply stack them. To understand what happens, consider this analogy. Each shaded line represents a wall built to the height given in the data series. Instead of building one wall behind another, we stack them on top of each other. What can be hard to understand is the y axis. It now denotes a cumulative total, not the individual data points. For example, if the first y value of a line is 4 and the first y value on the second line is 5, then the second point will be plotted at 9 on our y axis. Consider this more complicated example: if the y value in our first line is 2, 7 for our second line, and 4 for the third line, then the y value for our third line will be plotted at 13. That's why we need to compare similar elements. Creating an area chart We grab the quarterly report with the divisional profits we created this morning. We will extend the data to a year and plot the divisional profits as an area chart: We remove the data arrays for revenue and the overall profit array. We also add data to the three arrays containing the divisional profits: <script src="../js/jqplot.dateAxisRenderer.min.js"></script> <script> $(document).ready(function(){ var electronics = [["2011-11-20", 123487.87], ...]; var media = [["2011-11-20", 66449.15], ...]; var nerd_corral = [["2011-11-20", 2112.55], ...]; var div_profit = $.jqplot ('division_profit', [ media, nerd_corral, electronics ], { title:'12 Month Divisional Profits', Under seriesDefaults, we assign true to fill and fillToZero. Without setting fillToZero to true, the fill would continue to the bottom of the chart. With the option set, the fill will extend downward to zero on the y axis for positive values and stop. For negative data points, the fill will extend upward to zero: seriesDefaults: { fill: true, fillToZero: true }, series:[ { label: 'Media & Software' }, { label: 'Nerd Corral' }, { label: 'Electronics' } ], legend: { show: true, placement: 'outsideGrid' }, For our x axis, we set numberTicks to 6. The rest of our options we leave unchanged: axes:{ xaxis:{ label: 'Months', renderer:$.jqplot.DateAxisRenderer, numberTicks: 6, tickOptions: { formatString: "%B" } }, yaxis: { label: 'Total Dollars', tickOptions: { formatString: "$%'d" } } } }); }); </script> <div id="division_profit" style="width:600px;"></div> We review the results of our changes in our browser. We notice something is wrong: only the Electronics series, shown in brown, is showing. This goes back to how area charts are built. Revisiting our wall analogy, we have built a taller wall in front of our other two walls. We need to order our data series from largest to smallest: We move the Electronics series to be the first one in our data array: var div_profit = $.jqplot ('division_profit', [ electronics, media, nerd_corral ], It's also hard to see where some of the lines go when they move underneath another layer. Thankfully, jqPlot has a fillAlpha option. We pass in a percentage in the form of a decimal and jqPlot will change the opacity of our fill area: ... seriesDefaults: { fill: true, fillToZero: true, fillAlpha: .6 }, ... We reload our chart in our web browser and can see the updated changes. Creating a stacked area chart with revenue Calvin stops by while we're taking a break. "Hey guys, I had a VP call and they want to see revenue broken down by division. Can we do that?" We tell him we can. "Great" he says, before turning away and leaving. We discuss this new request and realize this would be a great chance to use a stacked area chart. We dig around and find the divisional revenue numbers Calvin wanted. We can reuse the chart we just created and just change out the data and some options. We use the same variable names for our divisional data and plug in revenue numbers instead of profit. We use a new variable name for our chart object and a new id attribute for our div. We update our title and add the stackSeries option and set it to true: var div_revenue = $.jqplot ( 'division_revenue' , [electronics, media, nerd_corral], { title: '12 Month Divisional Revenue', stackSeries: true, We leave our series' options alone and the only option we change on our x axis is set numberTicks back to 3: seriesDefaults: { fill: true, fillToZero: true }, series:[ { label: 'Electronics' }, { label: 'Media & Software' }, { label: 'Nerd Corral' } ], legend: { show: true, placement: 'outsideGrid' }, axes:{ xaxis:{ label: 'Months', renderer:$.jqplot.DateAxisRenderer, numberTicks: 3, tickOptions: { formatString: "%B" } }, We finish our changes by updating the ID of our div container: yaxis: { label: 'Total Dollars', tickOptions: { formatString: "$%'d" } } } }); }); </script> <div id=" division_revenue " style="width:600px;"></div> With our changes complete, we load this new chart in our browser. As we can see in the following screenshot, we have a chart with each of the data series stacked on top of each other. Because of the nature of a stacked chart, the individual data points are no longer decipherable; however, with the visualization, this is less of an issue. We decide that this is a good place to stop for the day. We'll start on scatterplots and trend lines tomorrow morning. As we begin gathering our things, Calvin stops by on his way out and we show him our recent work. "This is amazing. You guys are making great progress." We tell him we're going to move on to trend lines tomorrow. "Oh, good," Calvin says. "I've had requests to show trending data for our revenue and profit. Someone else mentioned they would love to see trending data of shares on Twitter for our daily deals site. But, like you said, that can wait till tomorrow. Come on, I'll walk with you two."
Read more
  • 0
  • 0
  • 2386

article-image-introducing-llvm-intermediate-representation
Packt
26 Aug 2014
18 min read
Save for later

Introducing LLVM Intermediate Representation

Packt
26 Aug 2014
18 min read
In this article by Bruno Cardoso Lopez and Rafael Auler, the authors of Getting Started with LLVM Core Libraries, we will look into some basic concepts of the LLVM intermediate representation (IR). (For more resources related to this topic, see here.) LLVM IR is the backbone that connects frontends and backends, allowing LLVM to parse multiple source languages and generate code to multiple targets. Frontends produce the IR, while backends consume it. The IR is also the point where the majority of LLVM target-independent optimizations takes place. Overview The choice of the compiler IR is a very important decision. It determines how much information the optimizations will have to make the code run faster. On one hand, a very high-level IR allows optimizers to extract the original source code intent with ease. On the other hand, a low-level IR allows the compiler to generate code tuned for a particular hardware more easily. The more information you have about the target machine, the more opportunities you have to explore machine idiosyncrasies. Moreover, the task at lower levels must be done with care. As the compiler translates the program to a representation that is closer to machine instructions, it becomes increasingly difficult to map program fragments to the original source code. Furthermore, if the compiler design is exaggerated using a representation that represents a specific target machine very closely, it becomes awkward to generate code for other machines that have different constructs. This design trade-off has led to different choices among compilers. Some compilers, for instance, do not support code generation for multiple targets and focus on only one machine architecture. This enables them to use specialized IRs throughout their entire pipeline that make the compiler efficient with respect to a single architecture, which is the case of the Intel C++ Compiler (icc). However, writing compilers that generate code for a single architecture is an expensive solution if you aim to support multiple targets. In these cases, it is unfeasible to write a different compiler for each architecture, and it is best to design a single compiler that performs well on a variety of targets, which is the goal of compilers such as GCC and LLVM. For these projects, called retargetable compilers, there are substantially more challenges to coordinate the code generation for multiple targets. The key to minimizing the effort to build a retargetable compiler lies in using a common IR, the point where different backends share the same understanding about the source program to translate it to a divergent set of machines. Using a common IR, it is possible to share a set of target-independent optimizations among multiple backends, but this puts pressure on the designer to raise the level of the common IR to not overrepresent a single machine. Since working at higher levels precludes the compiler from exploring target-specific trickery, a good retargetable compiler also employs other IRs to perform optimizations at different, lower levels. The LLVM project started with an IR that operated at a lower level than the Java bytecode, thus, the initial acronym was Low Level Virtual Machine. The idea was to explore low-level optimization opportunities and employ link-time optimizations. The link-time optimizations were made possible by writing the IR to disk, as in a bytecode. The bytecode allows the user to amalgamate multiple modules in the same file and then apply interprocedural optimizations. In this way, the optimizations will act on multiple compilation units as if they were in the same module. LLVM, nowadays, is neither a Java competitor nor a virtual machine, and it has other intermediate representations to achieve efficiency. For example, besides the LLVM IR, which is the common IR where target-independent optimizations work, each backend may apply target-dependent optimizations when the program is represented with the MachineFunction and MachineInstr classes. These classes represent the program using target-machine instructions. On the other hand, the Function and Instruction classes are, by far, the most important ones because they represent the common IR that is shared across multiple targets. This intermediate representation is mostly target-independent (but not entirely) and the official LLVM intermediate representation. To avoid confusion, while LLVM has other levels to represent a program, which technically makes them IRs as well, we do not refer to them as LLVM IRs; however, we reserve this name for the official, common intermediate representation by the Instruction class, among others. This terminology is also adopted by the LLVM documentation. The LLVM project started as a set of tools that orbit around the LLVM IR, which justifies the maturity of the optimizers and the number of optimizers that act at this level. This IR has three equivalent forms: An in-memory representation (the Instruction class, among others) An on-disk representation that is encoded in a space-efficient form (the bitcode files) An on-disk representation in a human-readable text form (the LLVM assembly files) LLVM provides tools and libraries that allow you to manipulate and handle the IR in all forms. Hence, these tools can transform the IR back and forth, from memory to disk as well as apply optimizations, as illustrated in the following diagram: Understanding the LLVM IR target dependency The LLVM IR is designed to be as target-independent as possible, but it still conveys some target-specific aspects. Most people blame the C/C++ language for its inherent, target-dependent nature. To understand this, consider that when you use standard C headers in a Linux system, for instance, your program implicitly imports some header files from the bits Linux headers folder. This folder contains target-dependent header files, including macro definitions that constrain some entities to have a particular type that matches what the syscalls of this kernel-machine expect. Afterwards, when the frontend parses your source code, it needs to also use different sizes for int, for example, depending on the intended target machine where this code will run. Therefore, both library headers and C types are already target-dependent, which makes it challenging to generate an IR that can later be translated to a different target. If you consider only the target-dependent, C standard library headers, the parsed AST for a given compilation unit is already target-dependent, even before the translation to the LLVM IR. Furthermore, the frontend generates IR code using type sizes, calling conventions, and special library calls that match the ones defined by each target ABI. Still, the LLVM IR is quite versatile and is able to cope with distinct targets in an abstract way. Exercising basic tools to manipulate the IR formats We mention that the LLVM IR can be stored on disk in two formats: bitcode and assembly text. We will now learn how to use them. Consider the sum.c source code: int sum(int a, int b) { return a+b; } To make Clang generate the bitcode, you can use the following command: $ clang sum.c -emit-llvm -c -o sum.bc To generate the assembly representation, you can use the following command: $ clang sum.c -emit-llvm -S -c -o sum.ll You can also assemble the LLVM IR assembly text, which will create a bitcode: $ llvm-as sum.ll -o sum.bc To convert from bitcode to IR assembly, which is the opposite, you can use the disassembler: $ llvm-dis sum.bc -o sum.ll The llvm-extract tool allows the extraction of IR functions, globals, and also the deletion of globals from the IR module. For instance, extract the sum function from sum.bc with the following command: $ llvm-extract -func=sum sum.bc -o sum-fn.bc Nothing changes between sum.bc and sum-fn.bc in this particular example since sum is already the sole function in this module. Introducing the LLVM IR language syntax Observe the LLVM IR assembly file, sum.ll: target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128" target triple = "x86_64-apple-macosx10.7.0" define i32 @sum(i32 %a, i32 %b) #0 { entry: %a.addr = alloca i32, align 4 %b.addr = alloca i32, align 4 store i32 %a, i32* %a.addr, align 4 store i32 %b, i32* %b.addr, align 4 %0 = load i32* %a.addr, align 4 %1 = load i32* %b.addr, align 4 %add = add nsw i32 %0, %1 ret i32 %add } attributes #0 = { nounwind ssp uwtable ... } The contents of an entire LLVM file, either assembly or bitcode, are said to define an LLVM module. The module is the LLVM IR top-level data structure. Each module contains a sequence of functions, which contains a sequence of basic blocks that contain a sequence of instructions. The module also contains peripheral entities to support this model, such as global variables, the target data layout, and external function prototypes as well as data structure declarations. LLVM local values are the analogs of the registers in the assembly language and can have any name that starts with the % symbol. Thus, %add = add nsw i32 %0, %1 will add the local value %0 to %1 and put the result in the new local value, %add. You are free to give any name to the values, but if you are short on creativity, you can just use numbers. In this short example, we can already see how LLVM expresses its fundamental properties: It uses the Static Single Assignment (SSA) form. Note that there is no value that is reassigned; each value has only a single assignment that defines it. Each use of a value can immediately be traced back to the sole instruction responsible for its definition. This has an immense value to simplify optimizations, owing to the trivial use-def chains that the SSA form creates, that is, the list of definitions that reaches a user. If LLVM had not used the SSA form, we would need to run a separate data flow analysis to compute the use-def chains, which are mandatory for classical optimizations such as constant propagation and common subexpression elimination. Code is organized as three-address instructions. Data processing instructions have two source operands and place the result in a distinct destination operand. It has an infinite number of registers. Note how LLVM local values can be any name that starts with the % symbol, including numbers that start at zero, such as %0, %1, and so on, that have no restriction on the maximum number of distinct values. The target datalayout construct contains information about endianness and type sizes for target triple that is described in target host. Some optimizations depend on knowing the specific data layout of the target to transform the code correctly. Observe how the layout declaration is done: target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128" target triple = "x86_64-apple-macosx10.7.0" We can extract the following facts from this string: The target is an x86_64 processor with macOSX 10.7.0. It is a little-endian target, which is denoted by the first letter in the layout (a lowercase e). Big-endian targets need to use an uppercase E. The information provided about types is in the format type:<size>:<abi>:<preferred>. In the preceding example, p:64:64:64 represents a pointer that is 64 bits wide in size, with the abi and preferred alignments set to the 64-bit boundary. The ABI alignment specifies the minimum required alignment for a type, while the preferred alignment specifies a potentially larger value, if this will be beneficial. The 32-bit integer types i32:32:32 are 32 bits wide in size, 32-bit abi and preferred alignment, and so on. The function declaration closely follows the C syntax: define i32 @sum(i32 %a, i32 %b) #0 { This function returns a value of the type i32 and has two i32 arguments, %a and %b. Local identifiers always need the % prefix, whereas global identifiers use @. LLVM supports a wide range of types, but the most important ones are the following: Arbitrary-sized integers in the iN form; common examples are i32, i64, and i128. Floating-point types, such as the 32-bit single precision float and 64-bit double precision double. Vectors types in the format <<# elements> x <elementtype>>. A vector with four i32 elements is written as <4 x i32>. The #0 tag in the function declaration maps to a set of function attributes, also very similar to the ones used in C/C++ functions and methods. The set of attributes is defined at the end of the file: attributes #0 = { nounwind ssp uwtable "less-precise-fpmad"="false""no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf"="true""no-infs-fp-math"="false" "no-nans-fp-math"="false" "unsafe-fp-math"="false""use-soft-float"="false" } For instance, nounwind marks a function or method as not throwing exceptions, and ssp tells the code generator to use a stack smash protector in an attempt to increase the security of this code against attacks. The function body is explicitly divided into basic blocks (BBs), and a label is used to start a new BB. A label relates to a basic block in the same way that a value identifier relates to an instruction. If a label declaration is omitted, the LLVM assembler automatically generates one using its own naming scheme. A basic block is a sequence of instructions with a single entry point at its first instruction, and a single exit point at its last instruction. In this way, when the code jumps to the label that corresponds to a basic block, we know that it will execute all of the instructions in this basic block until the last instruction, which will change the control flow by jumping to another basic block. Basic blocks and their associated labels need to adhere to the following conditions: Each BB needs to end with a terminator instruction, one that jumps to other BBs or returns from the function The first BB, called the entry BB, is special in an LLVM function and must not be the target of any branch instructions Our LLVM file, sum.ll, has only one BB because it has no jumps, loops, or calls. The function start is marked with the entry label, and it ends with the return instruction, ret: entry: %a.addr = alloca i32, align 4 %b.addr = alloca i32, align 4 store i32 %a, i32* %a.addr, align 4 store i32 %b, i32* %b.addr, align 4 %0 = load i32* %a.addr, align 4 %1 = load i32* %b.addr, align 4 %add = add nsw i32 %0, %1 ret i32 %add The alloca instruction reserves space on the stack frame of the current function. The amount of space is determined by element type size, and it respects a specified alignment. The first instruction, %a.addr = alloca i32, align 4, allocates a 4-byte stack element, which respects a 4-byte alignment. A pointer to the stack element is stored in the local identifier, %a.addr. The alloca instruction is commonly used to represent local (automatic) variables. The %a and %b arguments are stored in the stack locations %a.addr and %b.addr by means of store instructions. The values are loaded back from the same memory locations by load instructions, and they are used in the addition, %add = add nsw i32 %0, %1. Finally, the addition result, %add, is returned by the function. The nsw flag specifies that this add operation has "no signed wrap", which indicates instructions that are known to have no overflow, allowing for some optimizations. If you are interested in the history behind the nsw flag, a worthwhile read is the LLVMdev post at http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-November/045730.html by Dan Gohman. In fact, the load and store instructions are redundant, and the function arguments can be used directly in the add instruction. Clang uses -O0 (no optimizations) by default, and the unnecessary loads and stores are not removed. If we compile with -O1 instead, the outcome is a much simpler code, which is reproduced here: define i32 @sum(i32 %a, i32 %b) ... { entry: %add = add nsw i32 %b, %a ret i32 %add } ... Using the LLVM assembly directly is very handy when writing small examples to test target backends and as a means to learn basic LLVM concepts. However, a library is the recommended interface for frontend writers to build the LLVM IR, which is the subject of our next section. You can find a complete reference to the LLVM IR assembly syntax at http://llvm.org/docs/LangRef.html. Introducing the LLVM IR in-memory model The in-memory representation closely models the LLVM language syntax that we just presented. The header files for the C++ classes that represent the IR are located at include/llvm/IR. The following is a list of the most important classes: The Module class aggregates all of the data used in the entire translation unit, which is a synonym for "module" in LLVM terminology. It declares the Module::iterator typedef as an easy way to iterate across the functions inside this module. You can obtain these iterators via the begin() and end() methods. View its full interface at http://llvm.org/docs/doxygen/html/classllvm_1_1Module.html. The Function class contains all objects related to a function definition or declaration. In the case of a declaration (use the isDeclaration() method to check whether it is a declaration), it contains only the function prototype. In both cases, it contains a list of the function parameters accessible via the getArgumentList() method or the pair of arg_begin() and arg_end(). You can iterate through them using the Function::arg_iterator typedef. If your Function object represents a function definition, and you iterate through its contents via the for (Function::iterator i = function.begin(), e = function.end(); i != e; ++i) idiom, you will iterate across its basic blocks. View its full interface at http://llvm.org/docs/doxygen/html/classllvm_1_1Function.html. The BasicBlock class encapsulates a sequence of LLVM instructions, accessible via the begin()/end() idiom. You can directly access its last instruction using the getTerminator() method, and you also have a few helper methods to navigate the CFG, such as accessing predecessor basic blocks via getSinglePredecessor(), when the basic block has a single predecessor. However, if it does not have a single predecessor, you need to work out the list of predecessors yourself, which is also not difficult if you iterate through basic blocks and check the target of their terminator instructions. View its full interface at http://llvm.org/docs/doxygen/html/classllvm_1_1BasicBlock.html. The Instruction class represents an atom of computation in the LLVM IR, a single instruction. It has some methods to access high-level predicates, such as isAssociative(), isCommutative(), isIdempotent(), or isTerminator(), but its exact functionality can be retrieved with getOpcode(), which returns a member of the llvm::Instruction enumeration, which represents the LLVM IR opcodes. You can access its operands via the op_begin() and op_end() pair of methods, which are inherited from the User superclass that we will present shortly. View its full interface at http://llvm.org/docs/doxygen/html/classllvm_1_1Instruction.html. We have still not presented the most powerful aspect of the LLVM IR (enabled by the SSA form): the Value and User interfaces; these allow you to easily navigate the use-def and def-use chains. In the LLVM in-memory IR, a class that inherits from Value means that it defines a result that can be used by others, whereas a subclass of User means that this entity uses one or more Value interfaces. Function and Instruction are subclasses of both Value and User, while BasicBlock is a subclass of just Value. To understand this, let's analyze these two classes in depth: The Value class defines the use_begin() and use_end() methods to allow you to iterate through Users, offering an easy way to access its def-use chain. For every Value class, you can also access its name through the getName() method. This models the fact that any LLVM value can have a distinct identifier associated with it. For example, %add1 can identify the result of an add instruction, BB1 can identify a basic block, and myfunc can identify a function. Value also has a powerful method called replaceAllUsesWith(Value *), which navigates through all of the users of this value and replaces it with some other value. This is a good example of how the SSA form allows you to easily substitute instructions and write fast optimizations. You can view the full interface at http://llvm.org/docs/doxygen/html/classllvm_1_1Value.html. The User class has the op_begin() and op_end() methods that allows you to quickly access all of the Value interfaces that it uses. Note that this represents the use-def chain. You can also use a helper method called replaceUsesOfWith(Value *From, Value *To) to replace any of its used values. You can view the full interface at http://llvm.org/docs/doxygen/html/classllvm_1_1User.html. Summary In this article, we acquainted ourselves with the concepts and components related to the LLVM intermediate representation. Resources for Article: Further resources on this subject: Creating and Utilizing Custom Entities [Article] Getting Started with Code::Blocks [Article] Program structure, execution flow, and runtime objects [Article]
Read more
  • 0
  • 0
  • 39631
article-image-what-content-provider
Packt
26 Aug 2014
5 min read
Save for later

What is a content provider?

Packt
26 Aug 2014
5 min read
This article is written by Sunny Kumar Aditya and Vikash Kumar Karn, the authors of Android SQLite Essentials. There are four essential components in an Android Application: activity, service, broadcast receiver, and content provider. Content provider is used to manage access to structured set of data. They encapsulate the data and provide abstraction as well as the mechanism for defining data security. However, content providers are primarily intended to be used by other applications that access the provider using a provider's client object. Together, providers and provider clients offer a consistent, standard interface to data that also handles inter-process communication and secure data access. (For more resources related to this topic, see here.) A content provider allows one app to share data with other applications. By design, an Android SQLite database created by an application is private to the application; it is excellent if you consider the security point of view but troublesome when you want to share data across different applications. This is where a content provider comes to the rescue; you can easily share data by building your content provider. It is important to note that although our discussion would focus on a database, a content provider is not limited to it. It can also be used to serve file data that normally goes into files, such as photos, audio, or videos. The interaction between Application A and B happens while exchanging data can be seen in the following diagram: Here we have an Application A, whose activity needs to access the database of Application B. As we already read, the database of the Application B is stored in the internal memory and cannot be directly accessed by Application A. This is where the Content Provider comes into the picture; it allows us to share data and modify access to other applications. The content provider implements methods for querying, inserting, updating, and deleting data in databases. Application A now requests the content provider to perform some desired operations on behalf of it. We would explore the use of Content Provider to fetch contacts from a phone's contact database. Using existing content providers Android lists a lot of standard content providers that we can use. Some of them are Browser, CalendarContract, CallLog, Contacts, ContactsContract, MediaStore, userDictionary, and so on. We will fetch contacts from a phone's contact list with the help of system's existing ContentProvider and ContentResolver. We will be using the ContactsContract provider for this purpose. What is a content resolver? The ContentResolver object in the application's context is used to communicate with the provider as a client. The ContentResolver object communicates with the provider object—an instance of a class that implements ContentProvider. The provider object receives data requests from clients, performs the requested action, and returns the results. ContentResolver is the single, global instance in our application that provides access to other application's content provider; we do not need to worry about handling inter-process communication. The ContentResolver methods provide the basic CRUD (create, retrieve, update, and delete) functions of persistent storage; it has methods that call identically named methods in the provider object but does not know the implementation. In the code provided in the corresponding AddNewContactActivity class, we will initiate picking of contact by building an intent object, Intent.ACTION_PICK, that allows us to pick an item from a data source; in addition, all we need to know is the URI of the provider, which in our case is ContactsContract.Contacts.CONTENT_URI: public void pickContact() { try { Intent cIntent = new Intent(Intent.ACTION_PICK, ContactsContract.Contacts.CONTENT_URI); startActivityForResult(cIntent, PICK_CONTACT); } catch (Exception e) { e.printStackTrace(); Log.i(TAG, "Exception while picking contact"); } } The code used in this article is placed at GitHub: https://github.com/sunwicked/Ch3-PersonalContactManager/tree/master This functionality is also provided by Messaging, Gallery, and Contacts. The Contacts screen will pop up allowing us to browse or search for contacts we require to migrate to our new application. In onActivityResult, that is our next stop, we will use the method to handle our corresponding request to pick and use contacts. Let's look at the code we have to add to pick contacts from an Android's contact provider: protected void onActivityResult(int requestCode, int resultCode, Intent data) { . . else if (requestCode == PICK_CONTACT) { if (resultCode == Activity.RESULT_OK) { Uri contactData = data.getData(); Cursor c = getContentResolver().query(contactData, null, null, null, null); if (c.moveToFirst()) { String id = c .getString(c.getColumnIndexOrThrow(ContactsContract.Contacts._ID)); String hasPhone = c .getString(c.getColumnIndex(ContactsContract.Contacts.HAS_PHONE_NUMBER)); if (hasPhone.equalsIgnoreCase("1")) { Cursor phones = getContentResolver().query(ContactsContract. CommonDataKinds.Phone.CONTENT_URI, null, ContactsContract.CommonDataKinds. Phone.CONTACT_ID + " = " + id, null, null); phones.moveToFirst(); contactPhone.setText(phones.getString(phones.getColumnIndex("data1"))); contactName .setText(phones.getString(phones.getColumnIndex(ContactsContract.Contacts. DISPLAY_NAME))); } ….. We start by checking whether the request code is matching ours, and then we cross check resultcode. We get the content resolver object by making a call to getcontentresolver on the Context object; it is a method of the android.content.Context class. As we are in an activity that inherits from Context, we do not need to be explicit in making a call to it; same goes for services. We will now check whether the contact we picked has a phone number or not. After verifying the necessary details, we pull data that we require, such as contact name and phone number, and set them in relevant fields. Summary This article reflects on how to access and share data in Android via content providers and how to construct a content provider. We also talk about content resolvers and how they are used to communicate with the providers as a client. Resources for Article: Further resources on this subject: Reversing Android Applications [article] Saying Hello to Unity and Android [article] Android Fragmentation Management [article]
Read more
  • 0
  • 0
  • 10315

article-image-classifying-text
Packt
26 Aug 2014
23 min read
Save for later

Classifying Text

Packt
26 Aug 2014
23 min read
In this article by Jacob Perkins, author of Python 3 Text Processing with NLTK 3 Cookbook, we will learn how to transform text into feature dictionaries, and how to train a text classifier for sentiment analysis. (For more resources related to this topic, see here.) Bag of words feature extraction Text feature extraction is the process of transforming what is essentially a list of words into a feature set that is usable by a classifier. The NLTK classifiers expect dict style feature sets, so we must therefore transform our text into a dict. The bag of words model is the simplest method; it constructs a word presence feature set from all the words of an instance. This method doesn't care about the order of the words, or how many times a word occurs, all that matters is whether the word is present in a list of words. How to do it... The idea is to convert a list of words into a dict, where each word becomes a key with the value True. The bag_of_words() function in featx.py looks like this: def bag_of_words(words): return dict([(word, True) for word in words]) We can use it with a list of words; in this case, the tokenized sentence the quick brown fox: >>> from featx import bag_of_words >>> bag_of_words(['the', 'quick', 'brown', 'fox']) {'quick': True, 'brown': True, 'the': True, 'fox': True} The resulting dict is known as a bag of words because the words are not in order, and it doesn't matter where in the list of words they occurred, or how many times they occurred. All that matters is that the word is found at least once. You can use different values than True, but it is important to keep in mind that the NLTK classifiers learn from the unique combination of (key, value). That means that ('fox', 1) is treated as a different feature than ('fox', 2). How it works... The bag_of_words() function is a very simple list comprehension that constructs a dict from the given words, where every word gets the value True. Since we have to assign a value to each word in order to create a dict, True is a logical choice for the value to indicate word presence. If we knew the universe of all possible words, we could assign the value False to all the words that are not in the given list of words. But most of the time, we don't know all the possible words beforehand. Plus, the dict that would result from assigning False to every possible word would be very large (assuming all words in the English language are possible). So instead, to keep feature extraction simple and use less memory, we stick to assigning the value True to all words that occur at least once. We don't assign the value False to any word since we don't know what the set of possible words are; we only know about the words we are given. There's more... In the default bag of words model, all words are treated equally. But that's not always a good idea. As we already know, some words are so common that they are practically meaningless. If you have a set of words that you want to exclude, you can use the bag_of_words_not_in_set() function in featx.py: def bag_of_words_not_in_set(words, badwords): return bag_of_words(set(words) - set(badwords)) This function can be used, among other things, to filter stopwords. Here's an example where we filter the word the from the quick brown fox: >>> from featx import bag_of_words_not_in_set >>> bag_of_words_not_in_set(['the', 'quick', 'brown', 'fox'], ['the']) {'quick': True, 'brown': True, 'fox': True} As expected, the resulting dict has quick, brown, and fox, but not the. Filtering stopwords Stopwords are words that are often useless in NLP, in that they don't convey much meaning, such as the word the. Here's an example of using the bag_of_words_not_in_set() function to filter all English stopwords: from nltk.corpus import stopwords def bag_of_non_stopwords(words, stopfile='english'): badwords = stopwords.words(stopfile) return bag_of_words_not_in_set(words, badwords) You can pass a different language filename as the stopfile keyword argument if you are using a language other than English. Using this function produces the same result as the previous example: >>> from featx import bag_of_non_stopwords >>> bag_of_non_stopwords(['the', 'quick', 'brown', 'fox']) {'quick': True, 'brown': True, 'fox': True} Here, the is a stopword, so it is not present in the returned dict. Including significant bigrams In addition to single words, it often helps to include significant bigrams. As significant bigrams are less common than most individual words, including them in the bag of words model can help the classifier make better decisions. We can use the BigramCollocationFinder class to find significant bigrams. The bag_of_bigrams_words() function found in featx.py will return a dict of all words along with the 200 most significant bigrams: from nltk.collocations import BigramCollocationFinder from nltk.metrics import BigramAssocMeasures def bag_of_bigrams_words(words, score_fn=BigramAssocMeasures.chi_sq, n=200): bigram_finder = BigramCollocationFinder.from_words(words) bigrams = bigram_finder.nbest(score_fn, n) return bag_of_words(words + bigrams) The bigrams will be present in the returned dict as (word1, word2) and will have the value as True. Using the same example words as we did earlier, we get all words plus every bigram: >>> from featx import bag_of_bigrams_words >>> bag_of_bigrams_words(['the', 'quick', 'brown', 'fox']) {'brown': True, ('brown', 'fox'): True, ('the', 'quick'): True, 'fox': True, ('quick', 'brown'): True, 'quick': True, 'the': True} You can change the maximum number of bigrams found by altering the keyword argument n. See also In the next recipe, we will train a NaiveBayesClassifier class using feature sets created with the bag of words model. Training a Naive Bayes classifier Now that we can extract features from text, we can train a classifier. The easiest classifier to get started with is the NaiveBayesClassifier class. It uses the Bayes theorem to predict the probability that a given feature set belongs to a particular label. The formula is: P(label | features) = P(label) * P(features | label) / P(features) The following list describes the various parameters from the previous formula: P(label): This is the prior probability of the label occurring, which is the likelihood that a random feature set will have the label. This is based on the number of training instances with the label compared to the total number of training instances. For example, if 60/100 training instances have the label, the prior probability of the label is 60%. P(features | label): This is the prior probability of a given feature set being classified as that label. This is based on which features have occurred with each label in the training data. P(features): This is the prior probability of a given feature set occurring. This is the likelihood of a random feature set being the same as the given feature set, and is based on the observed feature sets in the training data. For example, if the given feature set occurs twice in 100 training instances, the prior probability is 2%. P(label | features): This tells us the probability that the given features should have that label. If this value is high, then we can be reasonably confident that the label is correct for the given features. Getting ready We are going to be using the movie_reviews corpus for our initial classification examples. This corpus contains two categories of text: pos and neg. These categories are exclusive, which makes a classifier trained on them a binary classifier. Binary classifiers have only two classification labels, and will always choose one or the other. Each file in the movie_reviews corpus is composed of either positive or negative movie reviews. We will be using each file as a single instance for both training and testing the classifier. Because of the nature of the text and its categories, the classification we will be doing is a form of sentiment analysis. If the classifier returns pos, then the text expresses a positive sentiment, whereas if we get neg, then the text expresses a negative sentiment. How to do it... For training, we need to first create a list of labeled feature sets. This list should be of the form [(featureset, label)], where the featureset variable is a dict and label is the known class label for the featureset. The label_feats_from_corpus() function in featx.py takes a corpus, such as movie_reviews, and a feature_detector function, which defaults to bag_of_words. It then constructs and returns a mapping of the form {label: [featureset]}. We can use this mapping to create a list of labeled training instances and testing instances. The reason to do it this way is to get a fair sample from each label. It is important to get a fair sample, because parts of the corpus may be (unintentionally) biased towards one label or the other. Getting a fair sample should eliminate this possible bias: import collections def label_feats_from_corpus(corp, feature_detector=bag_of_words): label_feats = collections.defaultdict(list) for label in corp.categories(): for fileid in corp.fileids(categories=[label]): feats = feature_detector(corp.words(fileids=[fileid])) label_feats[label].append(feats) return label_feats Once we can get a mapping of label | feature sets, we want to construct a list of labeled training instances and testing instances. The split_label_feats() function in featx.py takes a mapping returned from label_feats_from_corpus() and splits each list of feature sets into labeled training and testing instances: def split_label_feats(lfeats, split=0.75): train_feats = [] test_feats = [] for label, feats in lfeats.items(): cutoff = int(len(feats) * split) train_feats.extend([(feat, label) for feat in feats[:cutoff]]) test_feats.extend([(feat, label) for feat in feats[cutoff:]]) return train_feats, test_feats Using these functions with the movie_reviews corpus gives us the lists of labeled feature sets we need to train and test a classifier: >>> from nltk.corpus import movie_reviews >>> from featx import label_feats_from_corpus, split_label_feats >>> movie_reviews.categories() ['neg', 'pos'] >>> lfeats = label_feats_from_corpus(movie_reviews) >>> lfeats.keys() dict_keys(['neg', 'pos']) >>> train_feats, test_feats = split_label_feats(lfeats, split=0.75) >>> len(train_feats) 1500 >>> len(test_feats) 500 So there are 1000 pos files, 1000 neg files, and we end up with 1500 labeled training instances and 500 labeled testing instances, each composed of equal parts of pos and neg. If we were using a different dataset, where the classes were not balanced, our training and testing data would have the same imbalance. Now we can train a NaiveBayesClassifier class using its train() class method: >>> from nltk.classify import NaiveBayesClassifier >>> nb_classifier = NaiveBayesClassifier.train(train_feats) >>> nb_classifier.labels() ['neg', 'pos'] Let's test the classifier on a couple of made up reviews. The classify() method takes a single argument, which should be a feature set. We can use the same bag_of_words() feature detector on a list of words to get our feature set: >>> from featx import bag_of_words >>> negfeat = bag_of_words(['the', 'plot', 'was', 'ludicrous']) >>> nb_classifier.classify(negfeat) 'neg' >>> posfeat = bag_of_words(['kate', 'winslet', 'is', 'accessible']) >>> nb_classifier.classify(posfeat) 'pos' How it works... The label_feats_from_corpus() function assumes that the corpus is categorized, and that a single file represents a single instance for feature extraction. It iterates over each category label, and extracts features from each file in that category using the feature_detector() function, which defaults to bag_of_words(). It returns a dict whose keys are the category labels, and the values are lists of instances for that category. If we had label_feats_from_corpus() return a list of labeled feature sets instead of a dict, it would be much harder to get balanced training data. The list would be ordered by label, and if you took a slice of it, you would almost certainly be getting far more of one label than another. By returning a dict, you can take slices from the feature sets of each label, in the same proportion that exists in the data. Now we need to split the labeled feature sets into training and testing instances using split_label_feats(). This function allows us to take a fair sample of labeled feature sets from each label, using the split keyword argument to determine the size of the sample. The split argument defaults to 0.75, which means the first 75% of the labeled feature sets for each label will be used for training, and the remaining 25% will be used for testing. Once we have gotten our training and testing feats split up, we train a classifier using the NaiveBayesClassifier.train() method. This class method builds two probability distributions for calculating prior probabilities. These are passed into the NaiveBayesClassifier constructor. The label_probdist constructor contains the prior probability for each label, or P(label). The feature_probdist constructor contains P(feature name = feature value | label). In our case, it will store P(word=True | label). Both are calculated based on the frequency of occurrence of each label and each feature name and value in the training data. The NaiveBayesClassifier class inherits from ClassifierI, which requires subclasses to provide a labels() method, and at least one of the classify() or prob_classify() methods. The following diagram shows other methods, which will be covered shortly: There's more... We can test the accuracy of the classifier using nltk.classify.util.accuracy() and the test_feats variable created previously: >>> from nltk.classify.util import accuracy >>> accuracy(nb_classifier, test_feats) 0.728 This tells us that the classifier correctly guessed the label of nearly 73% of the test feature sets. The code in this article is run with the PYTHONHASHSEED=0 environment variable so that accuracy calculations are consistent. If you run the code with a different value for PYTHONHASHSEED, or without setting this environment variable, your accuracy values may differ. Classification probability While the classify() method returns only a single label, you can use the prob_classify() method to get the classification probability of each label. This can be useful if you want to use probability thresholds for classification: >>> probs = nb_classifier.prob_classify(test_feats[0][0]) >>> probs.samples() dict_keys(['neg', 'pos']) >>> probs.max() 'pos' >>> probs.prob('pos') 0.9999999646430913 >>> probs.prob('neg') 3.535688969240647e-08 In this case, the classifier says that the first test instance is nearly 100% likely to be pos. Other instances may have more mixed probabilities. For example, if the classifier says an instance is 60% pos and 40% neg, that means the classifier is 60% sure the instance is pos, but there is a 40% chance that it is neg. It can be useful to know this for situations where you only want to use strongly classified instances, with a threshold of 80% or greater. Most informative features The NaiveBayesClassifier class has two methods that are quite useful for learning about your data. Both methods take a keyword argument n to control how many results to show. The most_informative_features() method returns a list of the form [(feature name, feature value)] ordered by most informative to least informative. In our case, the feature value will always be True: >>> nb_classifier.most_informative_features(n=5)[('magnificent', True), ('outstanding', True), ('insulting', True),('vulnerable', True), ('ludicrous', True)] The show_most_informative_features() method will print out the results from most_informative_features() and will also include the probability of a feature pair belonging to each label: >>> nb_classifier.show_most_informative_features(n=5) Most Informative Features magnificent = True pos : neg = 15.0 : 1.0 outstanding = True pos : neg = 13.6 : 1.0 insulting = True neg : pos = 13.0 : 1.0 vulnerable = True pos : neg = 12.3 : 1.0 ludicrous = True neg : pos = 11.8 : 1.0 The informativeness, or information gain, of each feature pair is based on the prior probability of the feature pair occurring for each label. More informative features are those that occur primarily in one label and not on the other. The less informative features are those that occur frequently with both labels. Another way to state this is that the entropy of the classifier decreases more when using a more informative feature. See https://en.wikipedia.org/wiki/Information_gain_in_decision_trees for more on information gain and entropy (while it specifically mentions decision trees, the same concepts are applicable to all classifiers). Training estimator During training, the NaiveBayesClassifier class constructs probability distributions for each feature using an estimator parameter, which defaults to nltk.probability.ELEProbDist. The estimator is used to calculate the probability of a label parameter given a specific feature. In ELEProbDist, ELE stands for Expected Likelihood Estimate, and the formula for calculating the label probabilities for a given feature is (c+0.5)/(N+B/2). Here, c is the count of times a single feature occurs, N is the total number of feature outcomes observed, and B is the number of bins or unique features in the feature set. In cases where the feature values are all True, N == B. In other cases, where the number of times a feature occurs is recorded, then N >= B. You can use any estimator parameter you want, and there are quite a few to choose from. The only constraints are that it must inherit from nltk.probability.ProbDistI and its constructor must take a bins keyword argument. Here's an example using the LaplaceProdDist class, which uses the formula (c+1)/(N+B): >>> from nltk.probability import LaplaceProbDist >>> nb_classifier = NaiveBayesClassifier.train(train_feats, estimator=LaplaceProbDist) >>> accuracy(nb_classifier, test_feats) 0.716 As you can see, accuracy is slightly lower, so choose your estimator parameter carefully. You cannot use nltk.probability.MLEProbDist as the estimator, or any ProbDistI subclass that does not take the bins keyword argument. Training will fail with TypeError: __init__() got an unexpected keyword argument 'bins'. Manual training You don't have to use the train() class method to construct a NaiveBayesClassifier. You can instead create the label_probdist and feature_probdist variables manually. The label_probdist variable should be an instance of ProbDistI, and should contain the prior probabilities for each label. The feature_probdist variable should be a dict whose keys are tuples of the form (label, feature name) and whose values are instances of ProbDistI that have the probabilities for each feature value. In our case, each ProbDistI should have only one value, True=1. Here's a very simple example using a manually constructed DictionaryProbDist class: >>> from nltk.probability import DictionaryProbDist >>> label_probdist = DictionaryProbDist({'pos': 0.5, 'neg': 0.5}) >>> true_probdist = DictionaryProbDist({True: 1}) >>> feature_probdist = {('pos', 'yes'): true_probdist, ('neg', 'no'): true_probdist} >>> classifier = NaiveBayesClassifier(label_probdist, feature_probdist) >>> classifier.classify({'yes': True}) 'pos' >>> classifier.classify({'no': True}) 'neg' See also In the next recipe, we will train the DecisionTreeClassifier classifier. Training a decision tree classifier The DecisionTreeClassifier class works by creating a tree structure, where each node corresponds to a feature name and the branches correspond to the feature values. Tracing down the branches, you get to the leaves of the tree, which are the classification labels. How to do it... Using the same train_feats and test_feats variables we created from the movie_reviews corpus in the previous recipe, we can call the DecisionTreeClassifier.train() class method to get a trained classifier. We pass binary=True because all of our features are binary: either the word is present or it's not. For other classification use cases where you have multivalued features, you will want to stick to the default binary=False. In this context, binary refers to feature values, and is not to be confused with a binary classifier. Our word features are binary because the value is either True or the word is not present. If our features could take more than two values, we would have to use binary=False. A binary classifier, on the other hand, is a classifier that only chooses between two labels. In our case, we are training a binary DecisionTreeClassifier on binary features. But it's also possible to have a binary classifier with non-binary features, or a non-binary classifier with binary features. The following is the code for training and evaluating the accuracy of a DecisionTreeClassifier class: >>> dt_classifier = DecisionTreeClassifier.train(train_feats,binary=True, entropy_cutoff=0.8, depth_cutoff=5, support_cutoff=30)>>> accuracy(dt_classifier, test_feats)0.688 The DecisionTreeClassifier class can take much longer to train than the NaiveBayesClassifier class. For that reason, I have overridden the default parameters so it trains faster. These parameters will be explained later. How it works... The DecisionTreeClassifier class, like the NaiveBayesClassifier class, is also an instance of ClassifierI, as shown in the following diagram: During training, the DecisionTreeClassifier class creates a tree where the child nodes are also instances of DecisionTreeClassifier. The leaf nodes contain only a single label, while the intermediate child nodes contain decision mappings for each feature. These decisions map each feature value to another DecisionTreeClassifier, which itself may contain decisions for another feature, or it may be a final leaf node with a classification label. The train() class method builds this tree from the ground up, starting with the leaf nodes. It then refines itself to minimize the number of decisions needed to get to a label by putting the most informative features at the top. To classify, the DecisionTreeClassifier class looks at the given feature set and traces down the tree, using known feature names and values to make decisions. Because we are creating a binary tree, each DecisionTreeClassifier instance also has a default decision tree, which it uses when a known feature is not present in the feature set being classified. This is a common occurrence in text-based feature sets, and indicates that a known word was not in the text being classified. This also contributes information towards a classification decision. There's more... The parameters passed into DecisionTreeClassifier.train() can be tweaked to improve accuracy or decrease training time. Generally, if you want to improve accuracy, you must accept a longer training time and if you want to decrease the training time, the accuracy will most likely decrease as well. But be careful not to optimize for accuracy too much. A really high accuracy may indicate overfitting, which means the classifier will be excellent at classifying the training data, but not so good on data it has never seen. See https://en.wikipedia.org/wiki/Over_fitting for more on this concept. Controlling uncertainty with entropy_cutoff Entropy is the uncertainty of the outcome. As entropy approaches 1.0, uncertainty increases. Conversely, as entropy approaches 0.0, uncertainty decreases. In other words, when you have similar probabilities, the entropy will be high as each probability has a similar likelihood (or uncertainty of occurrence). But the more the probabilities differ, the lower the entropy will be. The entropy_cutoff value is used during the tree refinement process. The tree refinement process is how the decision tree decides to create new branches. If the entropy of the probability distribution of label choices in the tree is greater than the entropy_cutoff value, then the tree is refined further by creating more branches. But if the entropy is lower than the entropy_cutoff value, then tree refinement is halted. Entropy is calculated by giving nltk.probability.entropy() a MLEProbDist value created from a FreqDist of label counts. Here's an example showing the entropy of various FreqDist values. The value of 'pos' is kept at 30, while the value of 'neg' is manipulated to show that when 'neg' is close to 'pos', entropy increases, but when it is closer to 1, entropy decreases: >>> from nltk.probability import FreqDist, MLEProbDist, entropy >>> fd = FreqDist({'pos': 30, 'neg': 10}) >>> entropy(MLEProbDist(fd)) 0.8112781244591328 >>> fd['neg'] = 25 >>> entropy(MLEProbDist(fd)) 0.9940302114769565 >>> fd['neg'] = 30 >>> entropy(MLEProbDist(fd)) 1.0 >>> fd['neg'] = 1 >>> entropy(MLEProbDist(fd)) 0.20559250818508304 What this all means is that if the label occurrence is very skewed one way or the other, the tree doesn't need to be refined because entropy/uncertainty is low. But when the entropy is greater than entropy_cutoff, then the tree must be refined with further decisions to reduce the uncertainty. Higher values of entropy_cutoff will decrease both accuracy and training time. Controlling tree depth with depth_cutoff The depth_cutoff value is also used during refinement to control the depth of the tree. The final decision tree will never be deeper than the depth_cutoff value. The default value is 100, which means that classification may require up to 100 decisions before reaching a leaf node. Decreasing the depth_cutoff value will decrease the training time and most likely decrease the accuracy as well. Controlling decisions with support_cutoff The support_cutoff value controls how many labeled feature sets are required to refine the tree. As the DecisionTreeClassifier class refines itself, labeled feature sets are eliminated once they no longer provide value to the training process. When the number of labeled feature sets is less than or equal to support_cutoff, refinement stops, at least for that section of the tree. Another way to look at it is that support_cutoff specifies the minimum number of instances that are required to make a decision about a feature. If support_cutoff is 20, and you have less than 20 labeled feature sets with a given feature, then you don't have enough instances to make a good decision, and refinement around that feature must come to a stop. See also The previous recipe covered the creation of training and test feature sets from the movie_reviews corpus. Summary In this article, we learned how to transform text into feature dictionaries, and how to train a text classifier for sentiment analysis. Resources for Article: Further resources on this subject: Python Libraries for Geospatial Development [article] Python Testing: Installing the Robot Framework [article] Ten IPython essentials [article]
Read more
  • 0
  • 0
  • 2860

article-image-using-osgi-services
Packt
26 Aug 2014
14 min read
Save for later

Using OSGi Services

Packt
26 Aug 2014
14 min read
This article created by Dr Alex Blewitt the author of Mastering Eclipse Plug-in Development will present OSGi services as a means to communicate with and connect applications. Unlike the Eclipse extension point mechanism, OSGi services can have multiple versions available at runtime and can work in other OSGi environments, such as Felix or other commercial OSGi runtimes. (For more resources related to this topic, see here.) Overview of services In an Eclipse or OSGi runtime, each individual bundle is its own separate module, which has explicit dependencies on library code via Import-Package, Require-Bundle, or Require-Capability. These express static relationships and provide a way of configuring the bundle's classpath. However, this presents a problem. If services are independent, how can they use contributions provided by other bundles? In Eclipse's case, the extension registry provides a means for code to look up providers. In a standalone OSGi environment, OSGi services provide a similar mechanism. A service is an instance of a class that implements a service interface. When a service is created, it is registered with the services framework under one (or more) interfaces, along with a set of properties. Consumers can then get the service by asking the framework for implementers of that specific interface. Services can also be registered under an abstract class, but this is not recommended. Providing a service interface exposed as an abstract class can lead to unnecessary coupling of client to implementation. The following diagram gives an overview of services: This separation allows the consumer and producer to depend on a common API bundle, but otherwise be completely decoupled from one another. This allows both the consumer and producer to be mocked out or exchange with different implementations in the future. Registering a service programmatically To register a service, an instance of the implementation class needs to be created and registered with the framework. Interactions with the framework are performed with an instance of BundleContext—typically provided in the BundleActivator.start method and stored for later use. The *FeedParser classes will be extended to support registration as a service instead of the Equinox extension registry. Creating an activator A bundle's activator is a class that is instantiated and coupled to the lifetime of the bundle. When a bundle is started, if a manifest entry Bundle-Activator exists, then the corresponding class is instantiated. As long as it implements the BundleActivator interface, the start method will be called. This method is passed as an instance of BundleContext, which is the bundle's connection to the hosting OSGi framework. Create a class in the com.packtpub.e4.advanced.feeds project called com.packtpub.e4.advanced.feeds.internal.FeedsActivator, which implements the org.osgi.framework.BundleActivator interface. The quick fix may suggest adding org.osgi.framework as an imported package. Accept this, and modify the META-INF/MANIFEST.MF file as follows: Import-Package: org.osgi.framework Bundle-Activator: com.packtpub.e4.advanced.feeds.internal.FeedsActivator The framework will automatically invoke the start method of the FeedsActivator when the bundle is started, and correspondingly, the stop method when the bundle is stopped. Test this by inserting a pair of println calls: public class FeedsActivator implements BundleActivator { public void start(BundleContext context) throws Exception { System.out.println("Bundle started"); } public void stop(BundleContext context) throws Exception { System.out.println("Bundle stopped"); } } Now run the project as an OSGi framework with the feeds bundle, the Equinox console, and the Gogo shell. The required dependencies can be added by clicking on Add Required Bundles, although the Include optional dependencies checkbox does not need to be selected. Ensure that the other workspace and target bundles are deselected with the Deselect all button, as shown in the following screenshot: The required bundles are as follows: com.packtpub.e4.advanced.feeds org.apache.felix.gogo.command org.apache.felix.gogo.runtime org.apache.felix.gogo.shell org.eclipse.equinox.console org.eclipse.osgi On the console, when the bundle is started (which happens automatically if the Default Auto-Start is set to true), the Bundle started message should be seen. If the bundle does not start, ss in the console will print a list of bundles and start 2 will start the bundle with the ID 2. Afterwards, stop 2 can be used to stop bundle 2. Bundles can be stopped/started dynamically in an OSGi framework. Registering the service Once the FeedsActivator instance is created, a BundleContext instance will be available for interaction with the framework. This can be persisted for subsequent use in an instance field and can also be used directly to register a service. The BundleContext class provides a registerService method, which takes an interface, an instance, and an optional Dictionary instance of key/value pairs. This can be used to register instances of the feed parser at runtime. Modify the start method as follows: public void start(BundleContext context) throws Exception { context.registerService(IFeedParser.class, new RSSFeedParser(), null); context.registerService(IFeedParser.class, new AtomFeedParser(), null); context.registerService(IFeedParser.class, new MockFeedParser(), null); } Now start the framework again. In the console that is launched, look for the bundle corresponding to the feeds bundle: osgi> bundles | grep feeds com.packtpub.e4.advanced.feeds_1.0.0.qualifier [4] {com.packtpub.e4.advanced.feeds.IFeedParser}={service.id=56} {com.packtpub.e4.advanced.feeds.IFeedParser}={service.id=57} {com.packtpub.e4.advanced.feeds.IFeedParser}={service.id=58} This shows that bundle 4 has started three services, using the interface com.packtpub.e4.advanced.feeds.IFeedParser, and with service IDs 56, 57, and 58. It is also possible to query the runtime framework for services of a known interface type directly using the services command and an LDAP style filter: osgi> services (objectClass=com.packtpub.e4.advanced.feeds.IFeedParser) {com.packtpub.e4.advanced.feeds.IFeedParser}={service.id=56} "Registered by bundle:" com.packtpub.e4.advanced.feeds_1.0.0.qualifier [4] "No bundles using service." {com.packtpub.e4.advanced.feeds.IFeedParser}={service.id=57} "Registered by bundle:" com.packtpub.e4.advanced.feeds_1.0.0.qualifier [4] "No bundles using service." {com.packtpub.e4.advanced.feeds.IFeedParser}={service.id=58} "Registered by bundle:" com.packtpub.e4.advanced.feeds_1.0.0.qualifier [4] "No bundles using service." The results displayed represent the three services instantiated. They can be introspected using the service command passing the service.id: osgi> service 56 com.packtpub.e4.advanced.feeds.internal.RSSFeedParser@52ba638e osgi> service 57 com.packtpub.e4.advanced.feeds.internal.AtomFeedParser@3e64c3a osgi> service 58 com.packtpub.e4.advanced.feeds.internal.MockFeedParser@49d5e6da Priority of services Services have an implicit order, based on the order in which they were instantiated. Each time a service is registered, a global service.id is incremented. It is possible to define an explicit service ranking with an integer property. This is used to ensure relative priority between services, regardless of the order in which they are registered. For services with equal service.ranking values, the service.id values are compared. OSGi R6 adds an additional property, service.bundleid, which is used to denote the ID of the bundle that provides the service. This is not used to order services, and is for informational purposes only. Eclipse Luna uses OSGi R6. To pass a priority into the service registration, create a helper method called priority, which takes an int value and stores it in a Hashtable with the key service.ranking. This can be used to pass a priority to the service registration methods. The following code illustrates this: private Dictionary<String,Object> priority(int priority) { Hashtable<String, Object> dict = new Hashtable<String,Object>(); dict.put("service.ranking", new Integer(priority)); return dict; } public void start(BundleContext context) throws Exception { context.registerService(IFeedParser.class, new RSSFeedParser(), priority(1)); context.registerService(IFeedParser.class, new MockFeedParser(), priority(-1)); context.registerService(IFeedParser.class, new AtomFeedParser(), priority(2)); } Now when the framework starts, the services are displayed in order of priority: osgi> services | (objectClass=com.packtpub.e4.advanced.feeds.IFeedParser) {com.packtpub.e4.advanced.feeds.IFeedParser}={service.ranking=2, service.id=58} "Registered by bundle:" com.packtpub.e4.advanced.feeds_1.0.0.qualifier [4] "No bundles using service." {com.packtpub.e4.advanced.feeds.IFeedParser}={service.ranking=1, service.id=56} "Registered by bundle:" com.packtpub.e4.advanced.feeds_1.0.0.qualifier [4] "No bundles using service." {com.packtpub.e4.advanced.feeds.IFeedParser}={service.ranking=-1, service.id=57} "Registered by bundle:" com.packtpub.e4.advanced.feeds_1.0.0.qualifier [4] "No bundles using service." Dictionary was the original Java Map interface, and Hashtable the original HashMap implementation. They fell out of favor in Java 1.2 when Map and HashMap were introduced (mainly because they weren't synchronized by default) but OSGi was developed to run on early releases of Java (JSR 8 proposed adding OSGi as a standard for the Java platform). Not only that, early low-powered Java mobile devices didn't support the full Java platform, instead exposing the original Java 1.1 data structures. Because of this history, many APIs in OSGi refer to only Java 1.1 data structures so that low-powered devices can still run OSGi systems. Using the services The BundleContext instance can be used to acquire services as well as register them. FeedParserFactory, which originally used the extension registry, can be upgraded to refer to services instead. To obtain an instance of BundleContext, store it in the FeedsActivator.start method as a static variable. That way, classes elsewhere in the bundle will be able to acquire the context. An accessor method provides an easy way to do this: public class FeedsActivator implements BundleActivator { private static BundleContext bundleContext; public static BundleContext getContext() { return bundleContext; } public void start(BundleContext context) throws Exception { // register methods as before bundleContext = context; } public void stop(BundleContext context) throws Exception { bundleContext = null; } } Now the FeedParserFactory class can be updated to acquire the services. OSGi services are represented via a ServiceReference instance (which is a sharable object representing a handle to the service) and can be used to acquire a service instance: public class FeedParserFactory { public List<IFeedParser> getFeedParsers() { List<IFeedParser> parsers = new ArrayList<IFeedParser>(); BundleContext context = FeedsActivator.getContext(); try { Collection<ServiceReference<IFeedParser>> references = context.getServiceReferences(IFeedParser.class, null); for (ServiceReference<IFeedParser> reference : references) { parsers.add(context.getService(reference)); context.ungetService(reference); } } catch (InvalidSyntaxException e) { // ignore } return parsers; } } In this case, the service references are obtained from the bundle context with a call to context.getServiceReferences(IFeedParser.class,null). The service references can be used to access the service's properties, and to acquire the service. The service instance is acquired with the context.getService(ServiceReference) call. The contract is that the caller "borrows" the service, and when finished, should return it with an ungetService(ServiceReference) call. Technically, the service is only supposed to be used between the getService and ungetService calls as its lifetime may be invalid afterwards; instead of returning an array of service references, the common pattern is to pass in a unit of work that accepts the service and then call ungetService afterwards. However, to fit in with the existing API, the service is acquired, added to the list, and then released immediately afterwards. Lazy activation of bundles Now run the project as an Eclipse application, with the feeds and feeds.ui bundles installed. When a new feed is created by navigating to File | New | Other | Feeds | Feed, and a feed such as http://alblue.bandlem.com/atom.xml is entered, the feeds will be shown in the navigator view. When drilling down, a NullPointerException may be seen in the logs, as shown in the following: !MESSAGE An exception occurred invoking extension: com.packtpub.e4.advanced.feeds.ui.feedNavigatorContent for object com.packtpub.e4.advanced.feeds.Feed@770def59 !STACK 0 java.lang.NullPointerException at com.packtpub.e4.advanced.feeds.FeedParserFactory. getFeedParsers(FeedParserFactory.java:31) at com.packtpub.e4.advanced.feeds.ui.FeedContentProvider. getChildren(FeedContentProvider.java:80) at org.eclipse.ui.internal.navigator.extensions. SafeDelegateTreeContentProvider. getChildren(SafeDelegateTreeContentProvider.java:96) Tracing through the code indicates that the bundleContext is null, which implies that the feeds bundle has not yet been started. This can be seen in the console of the running Eclipse application by executing the following code: osgi> ss | grep feeds 866 ACTIVE com.packtpub.e4.advanced.feeds.ui_1.0.0.qualifier 992 RESOLVED com.packtpub.e4.advanced.feeds_1.0.0.qualifier While the feeds.ui bundle is active, the feeds bundle is not. Therefore, the services haven't been instantiated, and bundleContext has not been cached. By default, bundles are not started when they are accessed for the first time. If the bundle needs its activator to be called prior to using any of the classes in the package, it needs to be marked as having an activation policy of lazy. This is done by adding the following entry to the MANIFEST.MF file: Bundle-ActivationPolicy: lazy The manifest editor can be used to add this configuration line by selecting Activate this plug-in when one of its classes is loaded, as shown in the following screenshot: Now, when the application is run, the feeds will resolve appropriately. Comparison of services and extension points Both mechanisms (using the extension registry and using the services) allow for a list of feed parsers to be contributed and used by the application. What are the differences between them, and are there any advantages to one or the other? Both the registry and services approaches can be used outside of an Eclipse runtime. They work the same way when used in other OSGi implementations (such as Felix) and can be used interchangeably. The registry approach can also be used outside of OSGi, although that is far less common. The registry encodes its information in the plugin.xml file by default, which means that it is typically edited as part of a bundle's install (it is possible to create registry entries from alternative implementations if desired, but this rarely happens). The registry has a notification system, which can listen to contributions being added and removed. The services approach uses the OSGi framework to store and maintain a list of services. These services don't have an explicit configuration file and, in fact, can be contributed by code (such as the registerService calls) or by declarative representations. The separation of how the service is created versus how the service is registered is a key difference between the service and the registry approach. Like the registry, the OSGi services system can generate notifications when services come and go. One key difference in an OSGi runtime is that bundles depending on the Eclipse registry must be declared as singletons; that is, they have to use the ;singleton:=true directive on Bundle-SymbolicName. This means that there can only be one version of a bundle that exposes registry entries in a runtime, as opposed to multiple versions in the case of general services. While the registry does provide mechanisms to be able to instantiate extensions from factories, these typically involve simple configurations and/or properties that are hard-coded in the plugin.xml files themselves. They would not be appropriate to store sensitive details such as passwords. On the other hand, a service can be instantiated from whatever external configuration information is necessary and then registered, such as a JDBC connection for a database. Finally, extensions in the registry are declarative by default and are activated on demand. This allows Eclipse to start quickly because it does not need to build the full set of class loader objects or run code, and then bring up services on demand. Although the approach previously didn't use declarative services, it is possible to do this. Summary This article introduced OSGi services as a means to extend an application's functionality. It also shed light on how to register a service programmatically. Resources for Article: Further resources on this subject: Apache Maven and m2eclipse [article] Introducing an Android platform [article] Installing and Setting up JavaFX for NetBeans and Eclipse IDE [article]
Read more
  • 0
  • 0
  • 3488
article-image-components-unity
Packt
26 Aug 2014
13 min read
Save for later

Components in Unity

Packt
26 Aug 2014
13 min read
In this article by Simon Jackson, author of Mastering Unity 2D Game Development, we will have a walkthrough of the new 2D system and other new features. We will then understand some of the Unity components deeply. We will then dig into animation and its components. (For more resources related to this topic, see here.) Unity 4.3 improvements Unity 4.3 was not just about the new 2D system; there are also a host of other improvements and features with this release. The major highlights of Unity 4.3 are covered in the following sections. Improved Mecanim performance Mecanim is a powerful tool for both 2D and 3D animations. In Unity 4.3, there have been many improvements and enhancements, including a new game object optimizer that ensures objects are more tightly bound to their skeletal systems and removes unnecessary transform holders. Thus making Mecanim animations lighter and smoother. Refer to the following screenshot: In Unity 4.3, Mecanim also adds greater control to blend animations together, allowing the addition of curves to have smooth transitions, and now it also includes events that can be hooked into at every step. The Windows Phone API improvements and Windows 8.1 support Unity 4.2 introduced Windows Phone and Windows 8 support, since then things have been going wild, especially since Microsoft has thrown its support behind the movement and offered free licensing for the existing Pro owners. Refer to the following screenshot: Unity 4.3 builds solidly on the v4 foundations by bringing additional platform support, and it closes some more gaps between the existing platforms. Some of the advantages are as follows: The emulator is now fully supported with Windows Phone (new x86 phone build) It has more orientation support, which allows even the splash screens to rotate properly and enabling pixel perfect display It has trial application APIs for both Phone and Windows 8 It has improved sensors and location support On top of this, with the recent release of Windows 8.1, Unity 4.3 now also supports Windows 8.1 fully; additionally, Unity 4.5.3 will introduce support Windows Phone 8.1 and universal projects. Dynamic Nav Mesh (Pro version only) If you have only been using the free version of Unity till now, you will not be aware of what a Nav Mesh agent is. Nav Meshes are invisible meshes that are created for your 3D environment at the build time to simplify path finding and navigation for movable entities. Refer to the following screenshot: You can, of course, create the simplified models for your environment and use them in your scenes; however, every time you change your scene, you need to update your navigation model. Nav Meshes simply remove this overhead. Nav Meshes are crucial, especially in larger environments where collision and navigation calculations can make the difference between your game running well or not. Unity 4.3 has improved this by allowing more runtime changes to the dynamic Nav Mesh, allowing you to destroy parts of your scene that alter the walkable parts of your terrain. Nav Mesh calculations are also now multithreaded to give even an even better speed boost to your game. Also, there have been many other under-the-hood fixes and tweaks. Editor updates The Unity editor received a host of updates in Unity 4.3 to improve the performance and usability of the editor, as you can see in the following demo screenshot. Granted most of the improvements are behind the scenes. The improved Unity Editor GUI with huge improvements The editor refactored a lot of the scripting features on the platform, primarily to reduce the code complexity required for a lot of scripting components, such as unifying parts of the API into single components. For example, the LookLikeControls and LookLikeInspector options have been unified into a single LookLike function, which allows easier creation of the editor GUI components. Further simplification of the programmable editor interface is an ongoing task and a lot of headway is being made in each release. Additionally, the keyboard controls have been tweaked to ensure that the navigation works in a uniform way and the sliders/fields work more consistently. MonoDevelop 4.01 Besides the editor features, one of the biggest enhancements has to be the upgrade of the MonoDevelop editor (http://monodevelop.com/), which Unity supports and is shipped with. This has been a long running complaint for most developers simply due to the brand new features in the later editions. Refer to the following screenshot: MonoDevelop isn't made by Unity; it's an open source initiative run by Xamarin hosted on GitHub (https://github.com/mono/monodevelop) for all the willing developers to contribute and submit fixes to. Although the current stable release is 4.2.1, Unity is not fully up to date. Hopefully, this recent upgrade will mean that Unity can keep more in line with the future versions of this free tool. Sadly, this doesn't mean that Unity has yet been upgraded from the modified V2 version of the Mono compiler (http://www.mono-project.com/Main_Page) it uses to the current V3 branch, most likely, due to the reduced platform and the later versions of the Mono support. Movie textures Movie textures is not exactly a new feature in Unity as it has been available for some time for platforms such as Android and iOS. However, in Unity 4.3, it was made available for both the new Windows 8 and Windows Phone platforms. This adds even more functionality to these platforms that were missing in the initial Unity 4.2 release where this feature was introduced. Refer to the following screenshot: With movie textures now added to the platform, other streaming features are also available, for example, webcam (or a built-in camera in this case) and microphone support were also added. Understanding components Components in Unity are the building blocks of any game; almost everything you will use or apply will end up as a component on a GameObject inspector in a scene. Until you build your project, Unity doesn't know which components will be in the final game when your code actually runs (there is some magic applied in the editor). So, these components are not actually attached to your GameObject inspector but rather linked to them. Accessing components using a shortcut Now, in the previous Unity example, we added some behind-the-scenes trickery to enable you to reference a component without first discovering it. We did this by adding shortcuts to the MonoBehavior class that the game object inherits from. You can access the components with the help of the following code: this.renderer.collider.attachedRigidbody.angularDrag = 0.2f; What Unity then does behind the scenes for you is that it converts the preceding code to the following code: var renderer = this.GetComponent<Renderer>(); var collider = renderer.GetComponent<Collider>(); var ridgedBody = collider.GetComponent<Rigidbody>(); ridgedBody.angularDrag = 0.2f; The preceding code will also be the same as executing the following code: GetComponent<Renderer>().GetComponent<Collider>().GetComponent<Rigidbody>().angularDrag = 0.2f; Now, while this is functional and working, it isn't very performant or even a best practice as it creates variables and destroys them each time you use them; it also calls GetComponent for each component every time you access them. Using GetComponent in the Start or Awake methods isn't too bad as they are only called once when the script is loaded; however, if you do this on every frame in the update method, or even worse, in FixedUpdate methods, the problem multiplies; not to say you can't, you just need to be aware of the potential cost of doing so. A better way to use components – referencing Now, every programmer knows that they have to worry about garbage and exactly how much memory they should allocate to objects for the entire lifetime of the game. To improve things based on the preceding shortcut code, we simply need to manually maintain the references to the components we want to change or affect on a particular object. So, instead of the preceding code, we could simply use the following code: Rigidbody myScriptRigidBody; void Awake() { var renderer = this.GetComponent<Renderer>(); var collider = renderer.GetComponent<Collider>(); myScriptRigidBody = collider.GetComponent<Rigidbody>(); } void Update() { myScriptRigidBody.angularDrag = 0.2f * Time.deltaTime; } This way the RigidBody object that we want to affect can simply be discovered once (when the scripts awakes); then, we can just update the reference each time a value needs to be changed instead of discovering it every time. An even better way Now, it has been pointed out (by those who like to test such things) that even the GetComponent call isn't as fast as it should be because it uses C# generics to determine what type of component you are asking for (it's a two-step process: first, you determine the type and then get the component). However, there is another overload of the GetComponent function in which instead of using generics, you just need to supply the type (therefore removing the need to discover it). To do this, we will simply use the following code instead of the preceding GetComponent<>: myScriptRigidBody =(Rigidbody2D)GetComponent(typeof(Rigidbody2D)); The code is slightly longer and arguably only gives you a marginal increase, but if you need to use every byte of the processing power, it is worth keeping in mind. If you are using the "." shortcut to access components, I recommend that you change that practice now. In Unity 5, they are being removed. There will, however, be a tool built in the project's importer to upgrade any scripts you have using the shortcuts that are available for you. This is not a huge task, just something to be aware of; act now if you can! Animation components All of the animation in the new 2D system in Unity uses the new Mecanim system (introduced in Version 4) for design and control, which once you get used to is very simple and easy to use. It is broken up into three main parts: animation controllers, animation clips, and animator components. Animation controllers Animation controllers are simply state machines that are used to control when an animation should be played and how often, including what conditions control the transition between each state. In the new 2D system, there must be at least one controller per animation for it to play, and controllers can contain many animations as you can see here with three states and transition lines between them: Animation clips Animation clips are the heart of the animation system and have come very far from their previous implementation in Unity. Clips were used just to hold the crafted animations of the 3D models with a limited ability to tweak them for use on a complete 3D model: The new animation dope sheet system (as shown in the preceding screenshot) is very advanced; in fact, now it tracks almost every change in the inspector for sprites, allowing you to animate just about everything. You can even control which sprite from a spritesheet is used for each frame of the animation. The preceding screenshot shows a three-frame sprite animation and a modified x position modifier for the middle image, giving a hopping effect to the sprite as it runs. This ability of the dope sheet system implies there is less burden on the shoulders of art designers to craft complex animations as the animation system itself can be used to produce a great effect. Sprites don't have to be picked from the same spritesheet to be animated. They can come from individual textures or picked from any spritesheet you have imported. The Animator component To use the new animation prepared in a controller, you need to apply it to a game object in the scene. This is done through the Animator component, as shown here: The only property we actually care about in 2D is the Controller property. This is where we attach the controller we just created. Other properties only apply to the 3D humanoid models, so we can ignore them for 2D. For more information about the complete 3D Mecanim system, refer to the Unity Learn guide at http://unity3d.com/learn/tutorials/modules/beginner/animation. Animation is just one of the uses of the Mecanim system. Setting up animation controllers So, to start creating animations, you first need an animation controller in order to define your animation clips. As stated before, this is just a state machine that controls the execution of animations even if there is only one animation. In this case, the controller runs the selected animation for as long as it's told to. If you are browsing around the components that can be added to the game object, you will come across the Animator component, which takes a single animation clip as a parameter. This is the legacy animation system for backward compatibility only. Any new animation clip created and set to this component will not work; it will simply generate a console log item stating The AnimationClip used by the Animation component must be marked as Legacy. So, in Unity 4.3 onwards, just avoid this. Creating an animation controller is just as easy as any other game object. In the Project view, simply right-click on the view and select Create | Animator Controller. Opening the new animation will show you the blank animator controller in the Mecanim state manager window, as shown in the following screenshot: There is a lot of functionality in the Mecanim state engine, which is largely outside the scope of this article. Check out for more dedicated books on this, such as Unity 4 Character Animation with Mecanim, Jamie Dean, Packt Publishing. If you have any existing clips, you can just drag them to the Mecanim controller's Edit window; alternatively, you can just select them in the Project view, right-click on them, and select From selected clip under Create. However, we will cover more of this later in practice. Once you have a controller, you can add it to any game object in your project by clicking on Add Component in the inspector or by navigating to Component | Create and Miscellaneous | Animator and selecting it. Then, you can select your new controller as the Controller property of the animator. Alternatively, you can just drag your new controller to the game object you wish to add it to. Clips in a controller are bound to the spritesheet texture of the object the controller is attached to. Changing or removing this texture will prevent the animation from being displayed correctly. However, it will appear as it's still running. So with a controller in place, let's add some animation to it. Summary In this article, we did a detailed analysis of the new 2D features added in Unity 4.3. Then we overviewed all the main Unity components. Resources for Article: Further resources on this subject: Parallax scrolling [article] What's Your Input? [article] Unity 3-0 Enter the Third Dimension [article]
Read more
  • 0
  • 0
  • 5940

article-image-stream-grouping
Packt
26 Aug 2014
7 min read
Save for later

Stream Grouping

Packt
26 Aug 2014
7 min read
In this article, by Ankit Jain and Anand Nalya, the authors of the book Learning Storm, we will cover different types of stream groupings. (For more resources related to this topic, see here.) When defining a topology, we create a graph of computation with a number of bolt-processing streams. At a more granular level, each bolt executes as multiple tasks in the topology. A stream will be partitioned into a number of partitions and divided among the bolts' tasks. Thus, each task of a particular bolt will only get a subset of the tuples from the subscribed streams. Stream grouping in Storm provides complete control over how this partitioning of tuples happens among many tasks of a bolt subscribed to a stream. Grouping for a bolt can be defined on the instance of the backtype.storm.topology.InputDeclarer class returned when defining bolts using the backtype.storm.topology.TopologyBuilder.setBolt method. Storm supports the following types of stream groupings: Shuffle grouping Fields grouping All grouping Global grouping Direct grouping Local or shuffle grouping Custom grouping Now, we will look at each of these groupings in detail. Shuffle grouping Shuffle grouping distributes tuples in a uniform, random way across the tasks. An equal number of tuples will be processed by each task. This grouping is ideal when you want to distribute your processing load uniformly across the tasks and where there is no requirement of any data-driven partitioning. Fields grouping Fields grouping enables you to partition a stream on the basis of some of the fields in the tuples. For example, if you want that all the tweets from a particular user should go to a single task, then you can partition the tweet stream using fields grouping on the username field in the following manner: builder.setSpout("1", new TweetSpout()); builder.setBolt("2", new TweetCounter()).fieldsGrouping("1", new Fields("username")) Fields grouping is calculated with the following function: hash (fields) % (no. of tasks) Here, hash is a hashing function. It does not guarantee that each task will get tuples to process. For example, if you have applied fields grouping on a field, say X, with only two possible values, A and B, and created two tasks for the bolt, then it might be possible that both hash (A) % 2 and hash (B) % 2 are equal, which will result in all the tuples being routed to a single task and other tasks being completely idle. Another common usage of fields grouping is to join streams. Since partitioning happens solely on the basis of field values and not the stream type, we can join two streams with any common join fields. The name of the fields do not need to be the same. For example, in order to process domains, we can join the Order and ItemScanned streams when an order is completed: builder.setSpout("1", new OrderSpout()); builder.setSpout("2", new ItemScannedSpout()); builder.setBolt("joiner", new OrderJoiner()) .fieldsGrouping("1", new Fields("orderId")) .fieldsGrouping("2", new Fields("orderRefId")); All grouping All grouping is a special grouping that does not partition the tuples but replicates them to all the tasks, that is, each tuple will be sent to each of the bolt's tasks for processing. One common use case of all grouping is for sending signals to bolts. For example, if you are doing some kind of filtering on the streams, then you have to pass the filter parameters to all the bolts. This can be achieved by sending those parameters over a stream that is subscribed by all bolts' tasks with all grouping. Another example is to send a reset message to all the tasks in an aggregation bolt. The following is an example of all grouping: builder.setSpout("1", new TweetSpout()); builder.setSpout("signals", new SignalSpout()); builder.setBolt("2", new TweetCounter()).fieldsGrouping("1", new Fields("username")).allGrouping("signals"); Here, we are subscribing signals for all the TweetCounter bolt's tasks. Now, we can send different signals to the TweetCounter bolt using SignalSpout. Global grouping Global grouping does not partition the stream but sends the complete stream to the bolt's task with the smallest ID. A general use case of this is when there needs to be a reduce phase in your topology where you want to combine results from previous steps in the topology in a single bolt. Global grouping might seem redundant at first, as you can achieve the same results with defining the parallelism for the bolt as one and setting the number of input streams to one. Though, when you have multiple streams of data coming through different paths, you might want only one of the streams to be reduced and others to be processed in parallel. For example, consider the following topology. In this topology, you might want to route all the tuples coming from Bolt C to a single Bolt D task, while you might still want parallelism for tuples coming from Bolt E to Bolt D. Global grouping This can be achieved with the following code snippet: builder.setSpout("a", new SpoutA()); builder.setSpout("b", new SpoutB()); builder.setBolt("c", new BoltC()); builder.setBolt("e", new BoltE()); builder.setBolt("d", new BoltD()) .globalGrouping("c") .shuffleGrouping("e"); Direct grouping In direct grouping, the emitter decides where each tuple will go for processing. For example, say we have a log stream and we want to process each log entry using a specific bolt task on the basis of the type of resource. In this case, we can use direct grouping. Direct grouping can only be used with direct streams. To declare a stream as a direct stream, use the backtype.storm.topology.OutputFieldsDeclarer.declareStream method that takes a Boolean parameter directly in the following way in your spout: @Override public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declareStream("directStream", true, new Fields("field1")); } Now, we need the number of tasks for the component so that we can specify the taskId parameter while emitting the tuple. This can be done using the backtype.storm.task.TopologyContext.getComponentTasks method in the prepare method of the bolt. The following snippet stores the number of tasks in a bolt field: public void prepare(Map stormConf, TopologyContext context, OutputCollector collector) { this.numOfTasks = context.getComponentTasks("my-stream"); this.collector = collector; } Once you have a direct stream to emit to, use the backtype.storm.task.OutputCollector.emitDirect method instead of the emit method to emit it. The emitDirect method takes a taskId parameter to specify the task. In the following snippet, we are emitting to one of the tasks randomly: public void execute(Tuple input) { collector.emitDirect(new Random().nextInt(this.numOfTasks), process(input)); } Local or shuffle grouping If the tuple source and target bolt tasks are running in the same worker, using this grouping will act as a shuffle grouping only between the target tasks running on the same worker, thus minimizing any network hops resulting in increased performance. In case there are no target bolt tasks running on the source worker process, this grouping will act similar to the shuffle grouping mentioned earlier. Custom grouping If none of the preceding groupings fit your use case, you can define your own custom grouping by implementing the backtype.storm.grouping.CustomStreamGrouping interface. The following is a sample custom grouping that partitions a stream on the basis of the category in the tuples: public class CategoryGrouping implements CustomStreamGrouping, Serializable { // Mapping of category to integer values for grouping private static final Map<String, Integer> categories = ImmutableMap.of ( "Financial", 0, "Medical", 1, "FMCG", 2, "Electronics", 3 ); // number of tasks, this is initialized in prepare method private int tasks = 0; public void prepare(WorkerTopologyContext context, GlobalStreamId stream, List<Integer> targetTasks) { // initialize the number of tasks tasks = targetTasks.size(); } public List<Integer> chooseTasks(int taskId, List<Object> values) { // return the taskId for a given category String category = (String) values.get(0); return ImmutableList.of(categories.get(category) % tasks); } } Now, we can use this grouping in our topologies with the following code snippet: builder.setSpout("a", new SpoutA()); builder.setBolt("b", (IRichBolt)new BoltB()) .customGrouping("a", new CategoryGrouping()); The following diagram represents the Storm groupings graphically: Summary In this article, we discussed stream grouping in Storm and its types. Resources for Article: Further resources on this subject: Integrating Storm and Hadoop [article] Deploying Storm on Hadoop for Advertising Analysis [article] Photo Stream with iCloud [article]
Read more
  • 0
  • 0
  • 3647
Modal Close icon
Modal Close icon