Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials - Programming

1083 Articles
article-image-web-scraping-python
Packt
17 Feb 2010
5 min read
Save for later

Web Scraping with Python

Packt
17 Feb 2010
5 min read
To perform this task, usually three basic steps are followed: Explore the website to find out where the desired information is located in the HTML DOM tree Download as many web pages as needed Parse downloaded web pages and extract the information from the places found in the exploration step The exploration step is performed manually with the aid of some tools that make it easier to locate the information and reduce the development time in next steps. The download and parsing steps are usually performed in an iterative cycle since they are interrelated. This is because the next page to download may depend on a link or similar in the current page, so not every web page can be downloaded without previously looking into the earlier one. This article will show an example covering the three steps mentioned and how this could be done using python with some development. The code that will be displayed is guaranteed to work at the time of writing, however it should be taken into account that it may stop working in future if the presentation format changes. The reason is that web scraping depends on the DOM tree to be stable enough, that is to say, as happens with regular expressions, it will work fine for slight changes in the information being parsed. However, when the presentation format is completely changed, the web scraping scripts have to be modified to match the new DOM tree. Explore Let's say you are a fan of Pack Publishing article network and that you want to keep a list of the titles of all the articles that have been published until now and the link to them. First of all, you will need to connect to the main article network page (http://www.packtpub.com/article-network) and start exploring the web page to have an idea about where the information that you want to extract is located. Many ways are available to perform this task such as view the source code directly in your browser or download it and inspect it with your favorite editor. However, HTML pages often contain auto-generated code and are not as readable as they should be, so using a specialized tool might be quite helpful. In my opinion, the best one for this task is the Firebug add-on for the Firefox browser. With this add-on, instead of looking carefully in the code looking for some string, all you have to do is press the Inspect button, move the pointer to the area in which you are interested and click. After that, the HTML code for the area marked and the location of the tag in the DOM tree will be clearly displayed. For example, the links to the different pages containing all the articles are located inside a right tag, and, in every page, the links to the articles are contained as list items in an unnumbered list. In addition to this, the links URLs, as you probably have noticed while reading other articles, start with http://www.packtpub.com/article/ So, our scraping strategy will be Get the list of links to all pages containing articles Follow all links so as to extract the article information in all pages One small optimization here is that main article network page is the same as the one pointed by the first page link, so we will take this into account to avoid loading the same page twice when we develop the code. Download Before parsing any web page, the contents of that page must be downloaded. As usual, there are many ways to do this: Creating your own HTTP requests using urllib2 standard python library Using a more advanced library that provides the capability to navigate through a website simulating a browser such as  mechanize. In this article mechanize will be covered as it is the easiest choice. mechanize is a library that provides a Browser class that lets the developer to interact with a website in a similar way a real browser would. In particular it provides methods to open pages, follow links, change form data and submit forms. Recalling the scraping strategy in our previous version, the first thing we would like to do is to download the main article network web page. To do that we will create a Browser class instance and then open the main article network page: >>> import mechanize>>> BASE_URL = "http://www.packtpub.com/article-network">>> br = mechanize.Browser()>>> data = br.open(BASE_URL).get_data()>>> links = scrape_links(BASE_URL, data) Where the result of the open method is an HTTP response object, the get_data method returns the contents of the web page. The scrape_links function will be explained later. For now, as pointed out in the introduction section, bear in mind that the downloading and parsing steps are usually performed iteratively since some contents to be downloaded depends on the parsing done in some kind of initial contents such as in this case.
Read more
  • 0
  • 0
  • 9367

article-image-configuration-salesforce-crm
Packt
04 Nov 2011
13 min read
Save for later

Configuration in Salesforce CRM

Packt
04 Nov 2011
13 min read
(For more resources on this topic, see here.) We will look at the mechanisms for storing data in Salesforce and at the concepts of objects and fields. The features that allow these data to be grouped and arranged within the application are then considered by looking at Apps, Tabs, Page Layouts, and Record Types. Finally, we take a look at some of the features that allow views of data to be presented and customized by looking in detail at related lists and list views. Relationship between profile and the features that it controls The following diagram describes the relationship that exists between the profile and the features that it controls: The profile is used to: Control access to the type of license specified for the user and any login hours or IP address restrictions that are set. Control access to objects and records using the role and sharing model. If the appropriate object-level permission is not set on the user's profile, then the user will be unable to gain access to the records of that object type in the application. In this article, we will look at the configurable elements that are set in conjunction with the profile. These are used to control the structure and the user interface for the Salesforce CRM application. Objects Objects are a key element in Salesforce CRM as they provide a structure for storing data and are incorporated in the interface, allowing users to interact with the data. Similar in nature to a database table, objects have properties such as: Fields which are similar in concept to a database column Records which are similar in concept to a database row Relationships to other objects Optional tabs which are user interface components to display the object data Standard objects Salesforce provides standard objects in the application when you sign up and these include Account, Contact, Opportunity, and so on. These are the tables that contain the data records in any standard tab such as Accounts, Contacts, or Opportunities. In addition to the standard objects, you can create custom objects and custom tabs. Custom objects Custom objects are the tables you create to store your data. You can create a custom object to store data specific to your organization. Once you have the custom objects and have created records for these objects, you can also create reports and dashboards based on the record data in your custom object. Fields Fields in Salesforce are similar in concept to a database column and store the data for the object records. An object record is analogous to a row in a database table. Standard fields Standard fields are predefined fields that are included as standard within the Salesforce CRM application. Standard fields cannot be deleted but non-required standard fields can be removed from page layouts whenever necessary. With standard fields, you can customize visual elements that are associated to the field such as field labels and field-level help as well certain data definitions such as picklist values, the formatting of auto-number fields (which are used as unique identifiers for the records), and setting of field history tracking. Some aspects, however, such as the field name cannot be customized and some standard fields (such as Opportunity Probability) do not allow the changing of the field label. Custom fields Custom fields are unique to your business needs and can not only be added and amended, but also deleted. Creating custom fields allow you to store the information that is necessary for your organization. Both standard and custom fields can be customized to include custom help text to help users understand how to use the field: Object relationships Object relationships can be set on both standard and custom objects and are used to define how records in one object relates to records in another object. Accounts, for example, can have a one-to-many relationship with opportunities and these relationships are presented in the application as related lists. Apps An app in Salesforce is a container for all the objects, tabs, processes, and services associated with a business function. There are standard and custom apps that are accessed using the App menu located at the top-right of the Salesforce page as shown in the following screenshot: When users select an app from the App menu, their screen changes to present the objects associated with that app. For example, when switching from an app that contains the Campaign tab to one that does not, the Campaign tab no longer appears. This feature is applied to both standard and custom apps. Standard apps Salesforce provides standard apps such as Sales, Call Center, and Marketing. Custom apps A custom app can optionally include a custom logo. Both standard and custom apps consist of a name, a description, and an ordered list of tabs. Tabs A tab is a user-interface element which, when clicked, displays the record data on a page specific to that object. Hiding and showing tabs To customize your personal tab settings follow the path Your Name Setup | My Personal Settings | Change My Display | Customize My Tabs|. Now, choose the tabs that will display in each of your apps by moving the tab name between the Available Tabs and the Selected Tabs sections and click Save. The following shows the section of tabs for the Sales app: To customize the tab settings of your users, follow the path Your Name Setup | Administration Setup | Manage Users | Profiles|. Now select a profile and click Edit. Scroll down to the tab settings section of the page as shown in the following screenshot: Standard tabs Salesforce provides tabs for each of the standard objects that are provided in the application when you sign up. For example, there are standard tabs for Accounts, Contacts, Opportunities, and so on: Visibility of the tab depends on the setting on the tab display setting for the app. Custom tabs You can create three different types of custom tabs: Custom Object Tabs, Web Tabs, and Visualforce Tabs. Custom Object Tabs allow you to create, read, update, and delete the data records in your custom objects. Web Tabs display any web URL in a tab within your Salesforce application. Visualforce Tabs display custom user-interface pages created using Visualforce. Creating custom tabs: The text displayed on the custom tab is set from the Plural label of the custom object which is entered when creating the custom object. If the tab text needs to be changed this can be done by changing the Plural label stored on the custom object. Salesforce.com recommends selecting the Append tab to users' existing personal customizations checkbox. This benefits your users as they will automatically be presented with the new tab and can immediately access the corresponding functionality without having to first customize their personal settings themselves. It is recommended that you do not show tabs by setting appropriate permissions so that the users in your organization cannot see any of your changes until you are ready to make them available. You can create up to 25 custom tabs in Enterprise Edition and as many as you require in Unlimited Edition. To create custom tabs for a custom object, follow the path Your Name Setup | App Setup | Create | Tabs|. Now select the appropriate tab type and/or object from the available selections as shown in the following screenshot: (Move the mouse over the image to enlarge.) Creating custom objects Custom objects are database tables that allow you to store data specific to your organization in Salesforce.com. You can use custom objects to extend Salesforce functionality or to build new application functionality. You can create up to 200 custom objects in Enterprise Edition and 2000 in Unlimited Edition. Once you have created a custom object, you can create a custom tab, custom-related lists, reports, and dashboards for users to interact with the custom object data. To create a custom object, follow the path Your Name Setup | App Setup | Create | Objects|. Now click New Custom Object, or click Edit to modify an existing custom object. The following screenshot shows the resulting screen: On the Custom Object Definition Edit page, you can enter the following: Label: This is the visible name that is displayed for the object within the Salesforce CRM user interface and shown on pages, views, and reports, for example. Plural Label: This is the plural name specified for the object which is used within the application in places such as reports and on tabs if you create a tab for the object. Gender (language dependent): This field appears if your organization-wide default language expects gender. This is used for organizations where the default language settings is for example, Spanish, French, Italian, German among many others. Your personal language preference setting does not affect whether the field appears or not. For example, if your organization's default language is English but your personal language is French, you will not be prompted for gender when creating a custom object. Starts with a vowel sound: Use of this setting depends on your organization's default language and is a linguistic check to allow you to specify whether your label is to be preceded by "an" instead of "a". For example, resulting in reference to the object as "an Order" instead of "a Order" as an example. Object Name: A unique name used to refer to the object. Here, the Object Name field must be unique and can only contain underscores and alphanumeric characters. It must also begin with a letter, not contain spaces, not contain two consecutive underscores, and not end with an underscore. Description: An optional description of the object. A meaningful description will help to explain the purpose for your custom objects when you are viewing them in a list. Context-Sensitive Help Setting: Defines what information is displayed when your users click the Help for this Page context-sensitive help link from the custom object record home (overview), edit, and detail pages, as well as list views and related lists. The Help & Training link at the top of any page is not affected by this setting. It always opens the Salesforce Help & Training window. Record Name: This is the name that is used in areas such page layouts, search results, key lists, and related lists as shown next. Data Type: The type of field for the record name. Here the data type can be either text or auto-number. If the data type is set to be text, then when a record is created, users must enter a text value which does not need to be unique. If the data type is set to be Auto Number, it becomes a read-only field whereby new records are automatically assigned a unique number: Display Format: As in the preceding example, this option only appears when the Data Type is set to Auto Number. It allows you to specify the structure and appearance of the Auto Number field. For example: {YYYY}{MM}-{000} is a display format that produces a 4-digit year, 2-digit month prefix to a number with leading zeros padded to 3 digits. Example data output would include: 201203-001; 201203-066; 201203-999; 201203-1234. It is worth noting that although you can specify the number to be 3 digits if the number of records created becomes over 999 the record will still be saved but the automatically incremented number becomes 1000, 1001, and so on. Starting Number: As described, Auto Number fields in Salesforce CRM are automatically incremented for each new record. Here you must enter the starting number for the incremental count (which does not have to be set to start from 1). Allow Reports: This setting is required if you want to include the record data from the custom object in any report or dashboard analytics. Such relationships can be either a lookup or a master-detail. Lookup relationships create a relationship between two records so you can associate them with each other. Master-detail relationship creates a relationship between records where the master record controls certain behaviors of the detail record such as record deletion and security. When the custom object has a master-detail relationship with a standard object or is a lookup object on a standard object, a new report type will appear in the standard report category. The new report type allows the user to create reports that relate the standard object to the custom object which is done by selecting the standard object for the report type category instead of the custom object. Allow Activities: Allows users to include tasks and events related to the custom object records which appear as a related list on the custom object page. Track Field History: Enables the tracking of data field changes on the custom object records, such as who changed the value of a field and when it was changed. Fields history tracking also stores the value of the field before and after the fields edit. This feature is useful for auditing and data quality measurement and is also available within the reporting tools. Deployment Status: Indicates whether the custom object is now visible and available for use by other users. This is useful as you can easily set the status to In Development until you are happy for users to start working with the new object. Add Notes & Attachments: This setting allows your users to record notes and attach files to the custom object records. When this is specified, a related list with New Note and Attach File buttons automatically appears on the custom object record page where your users can enter notes and attach documents. The Add Notes & Attachments option is only available when you create a new object. Launch the New Custom Tab Wizard: Starts the custom tab wizard after you save the custom object. The New Custom Tab Wizard option is only available when you create a new object. Creating custom object relationships Considerations to be observed when creating object relationships: Create the object relationships as a first step before starting to build the custom fields, page layouts, and any related list The Related To entry cannot be modified after you have saved the object relationship Each custom object can have up to two master-detail relationship and up to 25 total relationships. When planning to create a master-detail relationship on an object be aware that it can only be created before the object contains record data Clicking Edit List Layout allows you to choose columns for the key views and lookups The Standard Name field is required on all custom object-related lists and also on any page layouts
Read more
  • 0
  • 0
  • 9360

article-image-welcome-spring-framework
Packt
30 Apr 2015
17 min read
Save for later

Welcome to the Spring Framework

Packt
30 Apr 2015
17 min read
In this article by Ravi Kant Soni, author of the book Learning Spring Application Development, you will be closely acquainted with the Spring Framework. Spring is an open source framework created by Rod Johnson to address the complexity of enterprise application development. Spring is now a long time de facto standard for Java enterprise software development. The framework was designed with developer productivity in mind and this makes it easier to work with the existing Java and JEE APIs. Using Spring, we can develop standalone applications, desktop applications, two tier applications, web applications, distributed applications, enterprise applications, and so on. (For more resources related to this topic, see here.) Features of the Spring Framework Lightweight: Spring is described as a lightweight framework when it comes to size and transparency. Lightweight frameworks reduce complexity in application code and also avoid unnecessary complexity in their own functioning. Non intrusive: Non intrusive means that your domain logic code has no dependencies on the framework itself. Spring is designed to be non intrusive. Container: Spring's container is a lightweight container, which contains and manages the life cycle and configuration of application objects. Inversion of control (IoC): Inversion of Control is an architectural pattern. This describes the Dependency Injection that needs to be performed by external entities instead of creating dependencies by the component itself. Aspect-oriented programming (AOP): Aspect-oriented programming refers to the programming paradigm that isolates supporting functions from the main program's business logic. It allows developers to build the core functionality of a system without making it aware of the secondary requirements of this system. JDBC exception handling: The JDBC abstraction layer of the Spring Framework offers a exceptional hierarchy that simplifies the error handling strategy. Spring MVC Framework: Spring comes with an MVC web application framework to build robust and maintainable web applications. Spring Security: Spring Security offers a declarative security mechanism for Spring-based applications, which is a critical aspect of many applications. ApplicationContext ApplicationContext is defined by the org.springframework.context.ApplicationContext interface. BeanFactory provides a basic functionality, while ApplicationContext provides advance features to our spring applications, which make them enterprise-level applications. Create ApplicationContext by using the ClassPathXmlApplicationContext framework API. This API loads the beans configuration file and it takes care of creating and initializing all the beans mentioned in the configuration file: import org.springframework.context.ApplicationContext; import org.springframework.context.support.ClassPathXmlApplicationContext;   public class MainApp {   public static void main(String[] args) {      ApplicationContext context =    new ClassPathXmlApplicationContext("beans.xml");      HelloWorld helloWorld =    (HelloWorld) context.getBean("helloworld");      helloWorld.getMessage(); } } Autowiring modes There are five modes of autowiring that can be used to instruct Spring Container to use autowiring for Dependency Injection. You use the autowire attribute of the <bean/> element to specify the autowire mode for a bean definition. The following table explains the different modes of autowire: Mode Description no By default, the Spring bean autowiring is turned off, meaning no autowiring is to be performed. You should use the explicit bean reference called ref for wiring purposes. byName This autowires by the property name. If the bean property is the same as the other bean name, autowire it. The setter method is used for this type of autowiring to inject dependency. byType Data type is used for this type of autowiring. If the data type bean property is compatible with the data type of the other bean, autowire it. Only one bean should be configured for this type in the configuration file; otherwise, a fatal exception will be thrown. constructor This is similar to the byType autowire, but here a constructor is used to inject dependencies. autodetect Spring first tries to autowire by constructor; if this does not work, then it tries to autowire by byType. This option is deprecated. Stereotype annotation Generally, @Component, a parent stereotype annotation, can define all beans. The following table explains the different stereotype annotations: Annotation Use Description @Component Type This is a generic stereotype annotation for any Spring-managed component. @Service Type This stereotypes a component as a service and is used when defining a class that handles the business logic. @Controller Type This stereotypes a component as a Spring MVC controller. It is used when defining a controller class, which composes of a presentation layer and is available only on Spring MVC. @Repository Type This stereotypes a component as a repository and is used when defining a class that handles the data access logic and provide translations on the exception occurred at the persistence layer. Annotation-based container configuration For a Spring IoC container to recognize annotation, the following definition must be added to the configuration file: <?xml version="1.0" encoding="UTF-8"?> <beans xsi_schemaLocation="http://www.springframework.org/schema/beans    http://www.springframework.org/schema/beans/spring-beans.xsd    http://www.springframework.org/schema/context    http://www.springframework.org/schema/context/spring-context-    3.2.xsd">   <context:annotation-config />                             </beans> Aspect-oriented programming (AOP) supports in Spring AOP is used in Spring to provide declarative enterprise services, especially as a replacement for EJB declarative services. Application objects do what they're supposed to do—perform business logic—and nothing more. They are not responsible for (or even aware of) other system concerns, such as logging, security, auditing, locking, and event handling. AOP is a methodology of applying middleware services, such as security services, transaction management services, and so on on the Spring application. Declaring an aspect An aspect can be declared by annotating the POJO class with the @Aspect annotation. This aspect is required to import the org.aspectj.lang.annotation.aspect package. The following code snippet represents the aspect declaration in the @AspectJ form: import org.aspectj.lang.annotation.Aspect; import org.springframework.stereotype.Component;   @Aspect @Component ("myAspect") public class AspectModule { // ... } JDBC with the Spring Framework The DriverManagerDataSource class is used to configure the DataSource for application, which is defined in the Spring.xml configuration file. The central class of Spring JDBC's abstraction framework is the JdbcTemplate class that includes the most common logic in using the JDBC API to access data (such as handling the creation of connection, creation of statement, execution of statement, and release of resources). The JdbcTemplate class resides in the org.springframework.jdbc.core package. JdbcTemplate can be used to execute different types of SQL statements. DML is an abbreviation of data manipulation language and is used to retrieve, modify, insert, update, and delete data in a database. Examples of DML are SELECT, INSERT, or UPDATE statements. DDL is an abbreviation of data definition language and is used to create or modify the structure of database objects in a database. Examples of DDL are CREATE, ALTER, and DROP statements. The JDBC batch operation in Spring The JDBC batch operation allows you to submit multiple SQL DataSource to process at once. Submitting multiple SQL DataSource together instead of separately improves the performance: JDBC with batch processing Hibernate with the Spring Framework Data persistence is an ability of an object to save its state so that it can regain the same state. Hibernate is one of the ORM libraries that is available to the open source community. Hibernate is the main component available for a Java developer with features such as POJO-based approach and supports relationship definitions. The object query language used by Hibernate is called as Hibernate Query Language (HQL). HQL is an SQL-like textual query language working at a class level or a field level. Let's start learning the architecture of Hibernate. Hibernate annotations is the powerful way to provide the metadata for the object and relational table mapping. Hibernate provides an implementation of the Java Persistence API so that we can use JPA annotations with model beans. Hibernate will take care of configuring it to be used in CRUD operations. The following table explains JPA annotations: JPA annotation Description @Entity The javax.persistence.Entity annotation is used to mark a class as an entity bean that can be persisted by Hibernate, as Hibernate provides the JPA implementation. @Table The javax.persistence.Table annotation is used to define table mapping and unique constraints for various columns. The @Table annotation provides four attributes, which allows you to override the name of the table, its catalogue, and its schema. This annotation also allows you to enforce unique constraints on columns in the table. For now, we will just use the table name as Employee. @Id Each entity bean will have a primary key, which you annotate on the class with the @Id annotation. The javax.persistence.Id annotation is used to define the primary key for the table. By default, the @Id annotation will automatically determine the most appropriate primary key generation strategy to be used. @GeneratedValue javax.persistence.GeneratedValue is used to define the field that will be autogenerated. It takes two parameters, that is, strategy and generator. The GenerationType.IDENTITY strategy is used so that the generated id value is mapped to the bean and can be retrieved in the Java program. @Column javax.persistence.Column is used to map the field with the table column. We can also specify the length, nullable, and uniqueness for the bean properties. Object-relational mapping (ORM, O/RM, and O/R mapping) ORM stands for Object-relational Mapping. ORM is the process of persisting objects in a relational database such as RDBMS. ORM bridges the gap between object and relational schemas, allowing object-oriented application to persist objects directly without having the need to convert object to and from a relational format: Hibernate Query Language (HQL) Hibernate Query Language (HQL) is an object-oriented query language that works on persistence object and their properties instead of operating on tables and columns. To use HQL, we need to use a query object. Query interface is an object-oriented representation of HQL. The query interface provides many methods; let's take a look at a few of them: Method Description public int executeUpdate() This is used to execute the update or delete query public List list() This returns the result of the relation as a list public Query setFirstResult(int rowno) This specifies the row number from where a record will be retrieved public Query setMaxResult(int rowno) This specifies the number of records to be retrieved from the relation (table) public Query setParameter(int position, Object value) This sets the value to the JDBC style query parameter public Query setParameter(String name, Object value) This sets the value to a named query parameter The Spring Web MVC Framework Spring Framework supports web application development by providing comprehensive and intensive support. The Spring MVC framework is a robust, flexible, and well-designed framework used to develop web applications. It's designed in such a way that development of a web application is highly configurable to Model, View, and Controller. In an MVC design pattern, Model represents the data of a web application, View represents the UI, that is, user interface components, such as checkbox, textbox, and so on, that are used to display web pages, and Controller processes the user request. Spring MVC framework supports the integration of other frameworks, such as Struts and WebWork, in a Spring application. This framework also helps in integrating other view technologies, such as Java Server Pages (JSP), velocity, tiles, and FreeMarker in a Spring application. The Spring MVC Framework is designed around a DispatcherServlet. The DispatcherServlet dispatches the http request to handler, which is a very simple controller interface. The Spring MVC Framework provides a set of the following web support features: Powerful configuration of framework and application classes: The Spring MVC Framework provides a powerful and straightforward configuration of framework and application classes (such as JavaBeans). Easier testing: Most of the Spring classes are designed as JavaBeans, which enable you to inject the test data using the setter method of these JavaBeans classes. The Spring MVC framework also provides classes to handle the Hyper Text Transfer Protocol (HTTP) requests (HttpServletRequest), which makes the unit testing of the web application much simpler. Separation of roles: Each component of a Spring MVC Framework performs a different role during request handling. A request is handled by components (such as controller, validator, model object, view resolver, and the HandlerMapping interface). The whole task is dependent on these components and provides a clear separation of roles. No need of the duplication of code: In the Spring MVC Framework, we can use the existing business code in any component of the Spring MVC application. Therefore, no duplicity of code arises in a Spring MVC application. Specific validation and binding: Validation errors are displayed when any mismatched data is entered in a form. DispatcherServlet in Spring MVC The DispatcherServlet of the Spring MVC Framework is an implementation of front controller and is a Java Servlet component for Spring MVC applications. DispatcherServlet is a front controller class that receives all incoming HTTP client request for the Spring MVC application. DispatcherServlet is also responsible for initializing the framework components that will be used to process the request at various stages. The following code snippet declares the DispatcherServlet in the web.xml deployment descriptor: <servlet> <servlet-name>SpringDispatcher</servlet-name> <servlet-class>    org.springframework.web.DispatcherServlet </servlet-class> <load-on-startup>1</load-on-startup> </servlet>   <servlet-mapping> <servlet-name>SpringDispatcher</servlet-name> <url-pattern>/</url-pattern> </servlet-mapping> In the preceding code snippet, the user-defined name of the DispatcherServlet class is SpringDispatcher, which is enclosed with the <servlet-name> element. When our newly created SpringDispatcher class is loaded in a web application, it loads an application context from an XML file. DispatcherServlet will try to load the application context from a file named SpringDispatcher-servlet.xml, which will be located in the application's WEB-INF directory: <beans xsi_schemaLocation=" http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-3.0.xsd http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context- 3.0.xsd http://www.springframework.org/schema/mvc http://www.springframework.org/schema/mvc/spring-mvc-3.0.xsd">   <mvc:annotation-driven />   <context:component-scan base- package="org.packt.Spring.chapter7.springmvc" />   <beanclass="org.springframework.web.servlet.view. InternalResourceViewResolver">    <property name="prefix" value="/WEB-INF/views/" />    <property name="suffix" value=".jsp" /> </bean>   </beans> Spring Security The Spring Security framework is the de facto standard to secure Spring-based applications. The Spring Security framework provides security services for enterprise Java software applications by handling authentication and authorization. The Spring Security framework handles authentication and authorization at the web request and the method invocation level. The two major operations provided by Spring Security are as follows: Authentication: Authentication is the process of assuring that a user is the one who he/she claims to be. It's a combination of identification and verification. The identification process can be performed in a number of different ways, that is, username and password that can be stored in a database, LDAP, or CAS (single sign-out protocol), and so on. Spring Security provides a password encoder interface to make sure that the user's password is hashed. Authorization: Authorization provides access control to an authenticated user. It's the process of assurance that the authenticated user is allowed to access only those resources that he/she is authorized for use. Let's take a look at an example of the HR payroll application, where some parts of the application have access to HR and to some other parts, all the employees have access. The access rights given to user of the system will determine the access rules. In a web-based application, this is often done by URL-based security and is implemented using filters that play an primary role in securing the Spring web application. Sometimes, URL-based security is not enough in web application because URLs can be manipulated and can have relative pass. So, Spring Security also provides method level security. An authorized user will only able to invoke those methods that he is granted access for. Securing web application's URL access HttpServletRequest is the starting point of Java's web application. To configure web security, it's required to set up a filter that provides various security features. In order to enable Spring Security, add filter and their mapping in the web.xml file: <!—Spring Security --> <filter> <filter-name>springSecurityFilterChain</filter-name> <filter-class>org.springframework.web.filter. DelegatingFilterProxy</filter-class> </filter>   <filter-mapping> <filter-name>springSecurityFilterChain</filter-name> <url-pattern>/*</url-pattern> </filter-mapping> Logging in to a web application There are multiple ways supported by Spring security for users to log in to a web application: HTTP basic authentication: This is supported by Spring Security by processing the basic credentials presented in the header of the HTTP request. It's generally used with stateless clients, who on each request pass their credential. Form-based login service: Spring Security supports the form-based login service by providing a default login form page for users to log in to the web application. Logout service: Spring Security supports logout services that allow users to log out of this application. Anonymous login: This service is provided by Spring Security that grants authority to an anonymous user, such as a normal user. Remember-me support: This is also supported by Spring Security and remembers the identity of a user across multiple browser sessions. Encrypting passwords Spring Security supports some hashing algorithms such as MD5 (Md5PasswordEncoder), SHA (ShaPasswordEncoder), and BCrypt (BCryptPasswordEncoder) for password encryption. To enable the password encoder, use the <password-encoder/> element and set the hash attribute, as shown in the following code snippet: <authentication-manager> <authentication-provider>    <password-encoder hash="md5" />    <jdbc-user-service data-source-    ref="dataSource"    . . .   </authentication-provider> </authentication-manager> Mail support in the Spring Framework The Spring Framework provides a simplified API and plug-in for full e-mail support, which minimizes the effect of the underlying e-mailing system specifications. The Sprig e-mail supports provide an abstract, easy, and implementation independent API to send e-mails. The Spring Framework provides an API to simplify the use of the JavaMail API. The classes handle the initialization, cleanup operations, and exceptions. The packages for the JavaMail API provided by the Spring Framework are listed as follows: Package Description org.springframework.mail This defines the basic set of classes and interfaces to send e-mails. org.springframework.mail.java This defines JavaMail API-specific classes and interfaces to send e-mails. Spring's Java Messaging Service (JMS) Java Message Service is a Java Message-oriented middleware (MOM) API responsible for sending messages between two or more clients. JMS is a part of the Java enterprise edition. JMS is a broker similar to a postman who acts like a middleware between the message sender and the receiver. Message is nothing, but just bytes of data or information exchanged between two parties. By taking different specifications, a message can be described in various ways. However, it's nothing, but an entity of communication. A message can be used to transfer a piece of information from one application to another, which may or may not run on the same platform. The JMS application Let's look at the sample JMS application pictorial, as shown in the following diagram: We have a Sender and a Receiver. The Sender is responsible for sending a message and the Receiver is responsible for receiving a message. We need a broker or MOM between the Sender and Receiver, who takes the sender's message and passes it from the network to the receiver. Message oriented middleware (MOM) is basically an MQ application such as ActiveMQ or IBM-MQ, which are two different message providers. The sender promises loose coupling and it can be .NET or mainframe-based application. The receiver can be Java or Spring-based application and it sends back the message to the sender as well. This is a two-way communication, which is loosely coupled. Summary This article covered the architecture of Spring Framework and how to set up the key components of the Spring application development environment. Resources for Article: Further resources on this subject: Creating an Extension in Yii 2 [article] Serving and processing forms [article] Time Travelling with Spring [article]
Read more
  • 0
  • 0
  • 9351

article-image-containerizing-web-application-docker-part-1
Darwin Corn
10 Jun 2016
4 min read
Save for later

Containerizing a Web Application with Docker Part 1

Darwin Corn
10 Jun 2016
4 min read
Congratulations, you’ve written a web application! Now what? Part one of this post deals with steps to take after development, more specifically the creation of a Docker image that contains the application. In part two, I’ll lay out deploying that image to the Google Cloud Platform as well as some further reading that'll help you descend into the rabbit hole that is DevOps. For demonstration purposes, let’s say that you’re me and you want to share your adventures in TrapRap and Death Metal (not simultaneously, thankfully!) with the world. I’ve written a simple Ember frontend for this purpose, and through the course of this post, I will explain to you how I go about containerizing it. Of course, the beauty of this procedure is that it will work with any frontend application, and you are certainly welcome to Bring Your Own Code. Everything I use is publically available on GitHub, however, and you’re certainly welcome to work through this post with the material presented as well. So, I’ve got this web app. You can get it here, or you can run: $ git clone https://github.com/ndarwincorn/docker-demo.git Do this for wherever it is you keep your source code. You’ll need ember-cli and some familiarity with Ember to customize it yourself, or you can just cut to the chase and build the Docker image, which is what I’m going to do in this post. I’m using Docker 1.10, but there’s no reason this wouldn’t work on a Mac running Docker Toolbox (or even Boot2Docker, but don’t quote me on that) or a less bleeding edge Linux distro. Since installing Docker is well documented, I won’t get into that here and will continue with the assumption that you have a working, up-to-date Docker installed on your machine, and that the Docker daemon is running. If you’re working with your own app, feel free to skip below to my explanation of the process and then come back here when you’ve got a Dockerfile in the root of your application. In the root of the application, run the following (make sure you don’t have any locally-installed web servers listening on port 80 already): # docker build -t docker-demo . # docker run -d -p 80:80 --name demo docker-demo Once the command finishes by printing a container ID, launch a web browser and navigate to http://localhost. Hey! Now you can listen to my music served from a LXC container running on your very own computer. How did we accomplish this? Let’s take it piece-by-piece (here’s where to start reading again if you’ve approached this article with your own app): I created a simple Dockerfile using the official Nginx image because I have a deep-seated mistrust of Canonical and don’t want to use the Dockerfile here. Here’s what it looks like in my project: docker-demo/Dockerfile FROM nginx COPY dist/usr/share/nginx/html Running the docker build command reads the Dockerfile and uses it to configure a docker image based on the nginx image. During image configuration, it copies the contents of the dist folder in my project to /srv/http/docker-demo in the container, which the nginx configuration that was mentioned is pointed to. The -t flag tells Docker to ‘tag’ (name) the image we’ve just created as ‘docker-demo’. The docker run command takes that image and builds a container from it. The -d flag is short for ‘detach’, or run the /usr/bin/nginx command built into the image from our Dockerfile and leave the container running. The -p flag maps a port on the host to a port in the container, and --name names the container for later reference. The command should return a container ID that can be used to manipulate it later. In part two, I’ll show you how to push the image we created to the Google Cloud Platform and then launch it as a container in a specially-purposed VM on their Compute Engine. About the Author Darwin Corn is a Systems Analyst for the Consumer Direct Care Network. He is a mid-level professional with diverse experience in the Information Technology world.
Read more
  • 0
  • 0
  • 9350

article-image-overview-process-management-microsoft-visio-2013
Packt
20 Nov 2013
6 min read
Save for later

Overview of Process Management in Microsoft Visio 2013

Packt
20 Nov 2013
6 min read
(For more resources related to this topic, see here.) When Visio was first conceived of over 20 years ago, its first stated marketing aim was to outsell ABC Flowcharter, the best-selling process diagramming tool at the time. Therefore, Visio had to have all of the features from the start that are core in the creation of flowcharts, namely the ability to connect one shape to another and to have the lines route themselves around shapes. Visio soon achieved its aim, and looked for other targets to reach. So, process flow diagrams have long been a cornerstone of Visio's popularity and appeal and, although there have been some usability improvements over the years, there have been few enhancements to turn the diagrams into models that can be managed efficiently. Microsoft Visio 2010 saw the introduction of two features, structured diagrams and validation rules, that make process management achievable and customizable, and Microsoft Visio 2013 sees these features enhanced. In this article, you will be introduced to the new features that have been added to Microsoft Visio to support structured diagrams and validation. You will see where Visio fits in the Process Management stack, and explore the relevant out of the box content. Exploring the new process management features in Visio 2013 Firstly, Microsoft Visio 2010 introduced a new Validation API for structured diagrams and provided several examples of this in use, for example with the BPMN (Business Process Modeling Notation) Diagram and Microsoft SharePoint Workflow templates and the improvements to the Basic Flowchart and Cross-Functional Flowchart templates, all of which are found in the Flowchart category. Microsoft Visio 2013 has updated the version of BPMN from 1.1 to 2.0, and has introduced a new SharePoint 2013 Workflow template, in addition to the 2010 one. Templates in Visio consist of a predefined Visio document that has one or more pages, and may have a series of docked stencils (usually positioned on the left-hand side of workspace area). The template document may have an associated list of add-ons that are active while it is in use, and, with Visio 2013 Professional edition, an associated list of structured diagram validation rulesets as well. Most of the templates that contain validation rules in Visio 2013 are in the Flowchart category, as seen in the following screenshot, with the exception being the Six Sigma template in the Business category. Secondly, the concept of a Subprocess was introduced in Visio 2010. This enables processes to hyperlink to other pages describing the subprocesses in the same document, or even across documents. This latter point is necessary if subprocesses are stored in a document library, such as Microsoft SharePoint. The following screenshot illustrates how an existing subprocess can be associated with a shape in a larger process, selecting an existing shape in the diagram, before selecting the existing page that it links to from the drop-down menu on the Link to Existing button. In addition, a subprocess page can be created from an existing shape, or a selection of shapes, in which case they will be moved to the newly-created page. There were also a number of ease-of-use features introduced in Microsoft Visio 2010 to assist in the creation and revision of process flow diagrams. These include: Easy auto-connection of shapes Aligning and spacing of shapes Insertion and deletion of connected shapes Improved cross-functional flowcharts Subprocesses An infinite page option, so you need not go over the edge of the paper ever again Microsoft Visio 2013 has added two more notable features: Commenting (a replacement for the old reviewer's comments) Co-authoring However, this book is not about teaching the user how to use these features, since there will be many other authors willing to show you how to perform tasks that only need to be explained once. This book is about understanding the Validation API in particular, so that you can create, or amend, the rules to match the business logic that your business requires. Reviewing Visio Process Management capabilities Microsoft Visio now sits at the top of the Microsoft Process Management Product Stack, providing a Business Process Analysis (BPA) or Business Process Modeling (BPM) tool for business analysts, process owners/participants, and line of business software architects/developers. Understanding the Visio BMP Maturity Model If we look at the Visio BPM Maturity Model that Microsoft has previously presented to its partners, then we can see that Visio 2013 has filled some of the gaps that were still there after Visio 2010. However, we can also see that there are plenty of opportunities for partners to provide solutions on top of the Visio platform. The maturity model shows how Visio initially provided the means to capture paper-drawn business processes into electronic format, and included the ability to encapsulate data into each shape and infer the relationship and order between elements through connectors. Visio 2007 Professional added the ability to easily link shapes, which represent processes, tasks, decisions, gateways, and so on with a data source. Along with that, data graphics were provided to enable shape data to be displayed simply as icons, data bars, text, or to be colored by value. This enriched the user experience and provided quicker visual representation of data, thus increasing the comprehension of the data in the diagrams. Generic templates for specific types of business modeling were provided. Visio had a built-in report writer for many versions, which provided the ability to export to Excel or XML, but Visio 2010 Premium introduced the concept of validation and structured diagrams, which meant that the information could be verified before exporting. Some templates for specific types of business modeling were provided. Visio 2010 Premium also saw the introduction of Visio Services on SharePoint that provided the automatic (without involving the Visio client) refreshing of data graphics that were linked to specific types of data sources. Throughout this book we will be going into detail about Level 5 (Validation) in Visio 2013, because it is important to understand the core capabilities provided in Visio 2013. We will then be able to take the opportunity to provide custom Business Rule Modeling and Visualization. Reviewing the foundations of structured diagramming A structured diagram is a set of logical relationships between items, where these relationships provide visual organization or describe special interaction behaviors between them. The Microsoft Visio team analyzed the requirements for adding structure to diagrams and came up with a number of features that needed to be added to the Visio product to achieve this: Container Management: The ability to add labeled boxes around shapes to visually organize them Callout Management: The ability to associate callouts with shapes to display notes List Management: To provide order to shapes within a container Validation API: The ability to test the business logic of a diagram Connectivity API: The ability to create, remove, or traverse connections easily The following diagram demonstrates the use of Containers and Callouts in the construction of a basic flowchart, that has been validated using the Validation API, which in turn uses the Connectivity API.
Read more
  • 0
  • 0
  • 9330

article-image-working-geo-spatial-data-python
Packt
30 Dec 2010
7 min read
Save for later

Working with Geo-Spatial Data in Python

Packt
30 Dec 2010
7 min read
Python Geospatial Development If you want to follow through the examples in this article, make sure you have the following Python libraries installed on your computer: GDAL/OGR version 1.7 or later (http://gdal.org) pyproj version 1.8.6 or later (http://code.google.com/p/pyproj) Shapely version 1.2 or later (http://trac.gispython.org/lab/wiki/Shapely) Reading and writing geo-spatial data In this section, we will look at some examples of tasks you might want to perform that involve reading and writing geo-spatial data in both vector and raster format. Task: Calculate the bounding box for each country in the world In this slightly contrived example, we will make use of a Shapefile to calculate the minimum and maximum latitude/longitude values for each country in the world. This "bounding box" can be used, among other things, to generate a map of a particular country. For example, the bounding box for Turkey would look like this: Start by downloading the World Borders Dataset from:   http://thematicmapping.org/downloads/world_borders.php Decompress the .zip archive and place the various files that make up the Shapefile (the .dbf, .prj, .shp, and .shx files) together in a suitable directory. We next need to create a Python program that can read the borders of each country. Fortunately, using OGR to read through the contents of a Shapefile is trivial: import osgeo.ogr shapefile = osgeo.ogr.Open("TM_WORLD_BORDERS-0.3.shp") layer = shapefile.GetLayer(0) for i in range(layer.GetFeatureCount()): feature = layer.GetFeature(i) The feature consists of a geometry and a set of fields. For this data, the geometry is a polygon that defines the outline of the country, while the fields contain various pieces of information about the country. According to the Readme.txt file, the fields in this Shapefile include the ISO-3166 three-letter code for the country (in a field named ISO3) as well as the name for the country (in a field named NAME). This allows us to obtain the country code and name like this: countryCode = feature.GetField("ISO3") countryName = feature.GetField("NAME") We can also obtain the country's border polygon using: geometry = feature.GetGeometryRef() There are all sorts of things we can do with this geometry, but in this case we want to obtain the bounding box or envelope for the polygon: minLong,maxLong,minLat,maxLat = geometry.GetEnvelope() Let's put all this together into a complete working program: # calcBoundingBoxes.py import osgeo.ogr shapefile = osgeo.ogr.Open("TM_WORLD_BORDERS-0.3.shp") layer = shapefile.GetLayer(0) countries = [] # List of (code,name,minLat,maxLat, # minLong,maxLong) tuples. for i in range(layer.GetFeatureCount()): feature = layer.GetFeature(i) countryCode = feature.GetField("ISO3") countryName = feature.GetField("NAME") geometry = feature.GetGeometryRef() minLong,maxLong,minLat,maxLat = geometry.GetEnvelope() countries.append((countryName, countryCode, minLat, maxLat, minLong, maxLong)) countries.sort() for name,code,minLat,maxLat,minLong,maxLong in countries: print "%s (%s) lat=%0.4f..%0.4f, long=%0.4f..%0.4f" % (name, code,minLat, maxLat,minLong, maxLong) Running this program produces the following output: % python calcBoundingBoxes.py Afghanistan (AFG) lat=29.4061..38.4721, long=60.5042..74.9157 Albania (ALB) lat=39.6447..42.6619, long=19.2825..21.0542 Algeria (DZA) lat=18.9764..37.0914, long=-8.6672..11.9865 ... Task: Save the country bounding boxes into a Shapefile While the previous example simply printed out the latitude and longitude values, it might be more useful to draw the bounding boxes onto a map. To do this, we have to convert the bounding boxes into polygons, and save these polygons into a Shapefile. Creating a Shapefile involves the following steps: Define the spatial reference used by the Shapefile's data. In this case, we'll use the WGS84 datum and unprojected geographic coordinates (that is, latitude and longitude values). This is how you would define this spatial reference using OGR: import osgeo.osr spatialReference = osgeo.osr.SpatialReference() spatialReference.SetWellKnownGeogCS('WGS84') We can now create the Shapefile itself using this spatial reference: import osgeo.ogr driver = osgeo.ogr.GetDriverByName("ESRI Shapefile") dstFile = driver.CreateDataSource("boundingBoxes.shp")) dstLayer = dstFile.CreateLayer("layer", spatialReference) After creating the Shapefile, you next define the various fields that will hold the metadata for each feature. In this case, let's add two fields to store the country name and its ISO-3166 code: fieldDef = osgeo.ogr.FieldDefn("COUNTRY", osgeo.ogr.OFTString) fieldDef.SetWidth(50) dstLayer.CreateField(fieldDef) fieldDef = osgeo.ogr.FieldDefn("CODE", osgeo.ogr.OFTString) fieldDef.SetWidth(3) dstLayer.CreateField(fieldDef) We now need to create the geometry for each feature—in this case, a polygon defining the country's bounding box. A polygon consists of one or more linear rings; the first linear ring defines the exterior of the polygon, while additional rings define "holes" inside the polygon. In this case, we want a simple polygon with a square exterior and no holes: linearRing = osgeo.ogr.Geometry(osgeo.ogr.wkbLinearRing) linearRing.AddPoint(minLong, minLat) linearRing.AddPoint(maxLong, minLat) linearRing.AddPoint(maxLong, maxLat) linearRing.AddPoint(minLong, maxLat) linearRing.AddPoint(minLong, minLat) polygon = osgeo.ogr.Geometry(osgeo.ogr.wkbPolygon) polygon.AddGeometry(linearRing) You may have noticed that the coordinate (minLong, minLat)was added to the linear ring twice. This is because we are defining line segments rather than just points—the first call to AddPoint()defines the starting point, and each subsequent call to AddPoint()adds a new line segment to the linear ring. In this case, we start in the lower-left corner and move counter-clockwise around the bounding box until we reach the lower-left corner again:   Once we have the polygon, we can use it to create a feature: feature = osgeo.ogr.Feature(dstLayer.GetLayerDefn()) feature.SetGeometry(polygon) feature.SetField("COUNTRY", countryName) feature.SetField("CODE", countryCode) dstLayer.CreateFeature(feature) feature.Destroy() Notice how we use the setField() method to store the feature's metadata. We also have to call the Destroy() method to close the feature once we have finished with it; this ensures that the feature is saved into the Shapefile. Finally, we call the Destroy() method to close the output Shapefile: dstFile.Destroy() Putting all this together, and combining it with the code from the previous recipe to calculate the bounding boxes for each country in the World Borders Dataset Shapefile, we end up with the following complete program: # boundingBoxesToShapefile.py import os, os.path, shutil import osgeo.ogr import osgeo.osr # Open the source shapefile. srcFile = osgeo.ogr.Open("TM_WORLD_BORDERS-0.3.shp") srcLayer = srcFile.GetLayer(0) # Open the output shapefile. if os.path.exists("bounding-boxes"): shutil.rmtree("bounding-boxes") os.mkdir("bounding-boxes") spatialReference = osgeo.osr.SpatialReference() spatialReference.SetWellKnownGeogCS('WGS84') driver = osgeo.ogr.GetDriverByName("ESRI Shapefile") dstPath = os.path.join("bounding-boxes", "boundingBoxes.shp") dstFile = driver.CreateDataSource(dstPath) dstLayer = dstFile.CreateLayer("layer", spatialReference) fieldDef = osgeo.ogr.FieldDefn("COUNTRY", osgeo.ogr.OFTString) fieldDef.SetWidth(50) dstLayer.CreateField(fieldDef) fieldDef = osgeo.ogr.FieldDefn("CODE", osgeo.ogr.OFTString) fieldDef.SetWidth(3) dstLayer.CreateField(fieldDef) # Read the country features from the source shapefile. for i in range(srcLayer.GetFeatureCount()): feature = srcLayer.GetFeature(i) countryCode = feature.GetField("ISO3") countryName = feature.GetField("NAME") geometry = feature.GetGeometryRef() minLong,maxLong,minLat,maxLat = geometry.GetEnvelope() # Save the bounding box as a feature in the output # shapefile. linearRing = osgeo.ogr.Geometry(osgeo.ogr.wkbLinearRing) linearRing.AddPoint(minLong, minLat) linearRing.AddPoint(maxLong, minLat) linearRing.AddPoint(maxLong, maxLat) linearRing.AddPoint(minLong, maxLat) linearRing.AddPoint(minLong, minLat) polygon = osgeo.ogr.Geometry(osgeo.ogr.wkbPolygon) polygon.AddGeometry(linearRing) feature = osgeo.ogr.Feature(dstLayer.GetLayerDefn()) feature.SetGeometry(polygon) feature.SetField("COUNTRY", countryName) feature.SetField("CODE", countryCode) dstLayer.CreateFeature(feature) feature.Destroy() # All done. srcFile.Destroy() dstFile.Destroy() The only unexpected twist in this program is the use of a sub-directory called bounding-boxes to store the output Shapefile. Because a Shapefile is actually made up of multiple files on disk (a .dbf file, a .prj file, a .shp file, and a .shx file), it is easier to place these together in a sub-directory. We use the Python Standard Library module shutil to delete the previous contents of this directory, and then os.mkdir() to create it again. If you aren't storing the TM_WORLD_BORDERS-0.3.shp Shapefile in the same directory as the script itself, you will need to add the directory where the Shapefile is stored to your osgeo.ogr.Open() call. You can also store the boundingBoxes.shp Shapefile in a different directory if you prefer, by changing the path where this Shapefile is created. Running this program creates the bounding box Shapefile, which we can then draw onto a map. For example, here is the outline of Thailand along with a bounding box taken from the boundingBoxes.shp Shapefile:  
Read more
  • 0
  • 0
  • 9286
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime
article-image-how-python-code-organized
Packt
19 Feb 2016
8 min read
Save for later

How is Python code organized

Packt
19 Feb 2016
8 min read
Python is an easy to learn yet a powerful programming language. It has efficient high-level data structures and effective approach to object-oriented programming. Let's talk a little bit about how Python code is organized. In this paragraph, we'll start going down the rabbit hole a little bit more and introduce a bit more technical names and concepts. Starting with the basics, how is Python code organized? Of course, you write your code into files. When you save a file with the extension .py, that file is said to be a Python module. If you're on Windows or Mac, which typically hide file extensions to the user, please make sure you change the configuration so that you can see the complete name of the files. This is not strictly a requirement, but a hearty suggestion. It would be impractical to save all the code that it is required for software to work within one single file. That solution works for scripts, which are usually not longer than a few hundred lines (and often they are quite shorter than that). A complete Python application can be made of hundreds of thousands of lines of code, so you will have to scatter it through different modules. Better, but not nearly good enough. It turns out that even like this it would still be impractical to work with the code. So Python gives you another structure, called package, which allows you to group modules together. A package is nothing more than a folder, which must contain a special file, __init__.py that doesn't need to hold any code but whose presence is required to tell Python that the folder is not just some folder, but it's actually a package (note that as of Python 3.3 __init__.py is not strictly required any more). As always, an example will make all of this much clearer. I have created an example structure in my project, and when I type in my Linux console: $ tree -v example Here's how a structure of a real simple application could look like: example/ ├── core.py ├── run.py └── util ├── __init__.py ├── db.py ├── math.py └── network.py You can see that within the root of this example, we have two modules, core.py and run.py, and one package: util. Within core.py, there may be the core logic of our application. On the other hand, within the run.py module, we can probably find the logic to start the application. Within the util package, I expect to find various utility tools, and in fact, we can guess that the modules there are called by the type of tools they hold: db.py would hold tools to work with databases, math.py would of course hold mathematical tools (maybe our application deals with financial data), and network.py would probably hold tools to send/receive data on networks. As explained before, the __init__.py file is there just to tell Python that util is a package and not just a mere folder. Had this software been organized within modules only, it would have been much harder to infer its structure. I put a module only example under the ch1/files_only folder, see it for yourself: $ tree -v files_only This shows us a completely different picture: files_only/ ├── core.py ├── db.py ├── math.py ├── network.py └── run.py It is a little harder to guess what each module does, right? Now, consider that this is just a simple example, so you can guess how much harder it would be to understand a real application if we couldn't organize the code in packages and modules. How do we use modules and packages When a developer is writing an application, it is very likely that they will need to apply the same piece of logic in different parts of it. For example, when writing a parser for the data that comes from a form that a user can fill in a web page, the application will have to validate whether a certain field is holding a number or not. Regardless of how the logic for this kind of validation is written, it's very likely that it will be needed in more than one place. For example in a poll application, where the user is asked many question, it's likely that several of them will require a numeric answer. For example: What is your age How many pets do you own How many children do you have How many times have you been married It would be very bad practice to copy paste (or, more properly said: duplicate) the validation logic in every place where we expect a numeric answer. This would violate the DRY (Don't Repeat Yourself) principle, which states that you should never repeat the same piece of code more than once in your application. I feel the need to stress the importance of this principle: you should never repeat the same piece of code more than once in your application (got the irony?). There are several reasons why repeating the same piece of logic can be very bad, the most important ones being: There could be a bug in the logic, and therefore, you would have to correct it in every place that logic is applied. You may want to amend the way you carry out the validation, and again you would have to change it in every place it is applied. You may forget to fix/amend a piece of logic because you missed it when searching for all its occurrences. This would leave wrong/inconsistent behavior in your application. Your code would be longer than needed, for no good reason. Python is a wonderful language and provides you with all the tools you need to apply all the coding best practices. For this particular example, we need to be able to reuse a piece of code. To be able to reuse a piece of code, we need to have a construct that will hold the code for us so that we can call that construct every time we need to repeat the logic inside it. That construct exists, and it's called function. I'm not going too deep into the specifics here, so please just remember that a function is a block of organized, reusable code which is used to perform a task. Functions can assume many forms and names, according to what kind of environment they belong to, but for now this is not important. Functions are the building blocks of modularity in your application, and they are almost indispensable (unless you're writing a super simple script, you'll use functions all the time). Python comes with a very extensive library, as I already said a few pages ago. Now, maybe it's a good time to define what a library is: a library is a collection of functions and objects that provide functionalities that enrich the abilities of a language. For example, within Python's math library we can find a plethora of functions, one of which is the factorial function, which of course calculates the factorial of a number. In mathematics, the factorial of a non-negative integer number N, denoted as N!, is defined as the product of all positive integers less than or equal to N. For example, the factorial of 5 is calculated as: 5! = 5 * 4 * 3 * 2 * 1 = 120 The factorial of 0 is 0! = 1, to respect the convention for an empty product. So, if you wanted to use this function in your code, all you would have to do is to import it and call it with the right input values. Don't worry too much if input values and the concept of calling is not very clear for now, please just concentrate on the import part. We use a library by importing what we need from it, and then we use it. In Python, to calculate the factorial of number 5, we just need the following code: >>> from math import factorial >>> factorial(5) 120 Whatever we type in the shell, if it has a printable representation, will be printed on the console for us (in this case, the result of the function call: 120). So, let's go back to our example, the one with core.py, run.py, util, and so on. In our example, the package util is our utility library. Our custom utility belt that holds all those reusable tools (that is, functions), which we need in our application. Some of them will deal with databases (db.py), some with the network (network.py), and some will perform mathematical calculations (math.py) that are outside the scope of Python's standard math library and therefore, we had to code them for ourselves. Summary In this article, we started to explore the world of programming and that of Python. We saw how Python code can be organized using modules and packages. For more information on Python, refer the following books recomended by Packt Publishing: Learning Python (https://www.packtpub.com/application-development/learning-python) Python 3 Object-oriented Programming - Second Edition (https://www.packtpub.com/application-development/python-3-object-oriented-programming-second-edition) Python Essentials (https://www.packtpub.com/application-development/python-essentials) Resources for Article: Further resources on this subject: Test all the things with Python [article] Putting the Fun in Functional Python [article] Scraping the Web with Python - Quick Start [article]
Read more
  • 0
  • 0
  • 9268

article-image-querying-and-selecting-data
Packt
17 Apr 2013
13 min read
Save for later

Querying and Selecting Data

Packt
17 Apr 2013
13 min read
(For more resources related to this topic, see here.) Constructing proper attribute query syntax The construction of property attribute queries is critical to your success in creating geoprocessing scripts that query data from feature classes and tables. All attribute queries that you execute against feature classes and tables will need to have the correct SQL syntax and also follow various rules depending upon the datatype that you execute the queries against. Getting ready Creating the syntax for attribute queries is one of the most difficult and time-consuming tasks that you'll need to master when creating Python scripts that incorporate the use of the Select by Attributes tool. These queries are basically SQL statements along with a few idiosyncrasies that you'll need to master. If you already have a good understanding of creating queries in ArcMap or perhaps an experience with creating SQL statements in other programming languages, then this will be a little easier for you. In addition to creating valid SQL statements, you also need to be aware of some specific Python syntax requirements and some datatype differences that will result in a slightly altered formatting of your statements for some datatypes. In this recipe, you'll learn how to construct valid query syntax and understand the nuances of how different datatypes alter the syntax as well as some Python-specific constructs. How to do it… Initially, we're going to take a look at how queries are constructed in ArcMap, so that you can get a feel of how they are structured. In ArcMap, open C:ArcpyBookCh8Crime_Ch8.mxd. Right-click on the Burglaries in 2009 layer and select Open Attribute Table. You should see an attribute table similar to the following screenshot. We're going to be querying the SVCAREA field: With the attribute table open, select the Table Options button and then Select by Attributes to display a dialog box that will allow you to construct an attribute query. Notice the Select * FROM Burglary WHERE: statement on the query dialog box (shown in the following screenshot). This is a basic SQL statement that will return all the columns from the attribute table for Burglary that meet the condition that we define through the query builder. The asterisk (*) simply indicates that all fields will be returned: Make sure that Create a new selection is the selected item in the Method dropdown list. This will create a new selection set. Double-click on SVCAREA from the list of fields to add the field to the SQL statement builder, as follows: Click on the = button. Click on the Get Unique Values button. From the list of values generated, double-click on 'North' to complete the SQL statement, as shown in the following screenshot: Click on the Apply button to execute the query. This should select 7520 records. Many people mistakenly assume that you can simply take a query that has been generated in this fashion and paste it into a Python script. That is not the case. There are some important differences that we'll cover next. Close the Select by Attributes window and the Burglaries in 2009 table. Clear the selected feature set by clicking on Selection | Clear Selected Features. Open the Python window and add the code to import arcpy. import arcpy Create a new variable to hold the query and add the exact same statement that you created earlier: qry = "SVCAREA" = 'North' Press Enter on your keyboard and you should see an error message similar to the following: Runtime error SyntaxError: can't assign to literal (<string>, line1) Python interprets SVCAREA and North as strings but the equal to sign between the two is not part of the string used to set the qry variable. There are several things we need to do to generate a syntactically correct statement for the Python interpreter. One important thing has already been taken care of though. Each field name used in a query needs to be surrounded by double quotes. In this case, SVCAREA is the only field used in the query and it has already been enclosed by double quotes. This will always be the case when you're working with shapefiles, file geodatabases, or ArcSDE geodatabases. Here is where it gets a little confusing though. If you're working with data from a personal geodatabase, the field names will need to be enclosed by square brackets instead of double quotes as shown in the following code example. This can certainly leads to confusion for script developers. qry = [SVCAREA] = 'North' Now, we need to deal with the single quotes surrounding 'North'. When querying data from fields that have a text datatype, the string being evaluated must be enclosed by quotes. If you examine the original query, you'll notice that we have in fact already enclosed the word North with quotes, so everything should be fine right? Unfortunately, it's not that simple with Python. Quotes, along with a number of other characters, must be escaped with a forward slash followed by the character being escaped. In this case, the escape sequence would be '. Alter your query syntax to incorporate the escape sequence: qry = "SVCAREA" = 'North' Finally, the entire query statement should be enclosed with quotes: qry = '"SVCAREA" = 'North'' In addition to the = sign, which tests for equality, there are a number of additional operators that you can use with strings and numeric data, including not equal (> <), greater than (<), greater than or equal to (<=), less than (>), and less than or equal to (>=). Wildcard characters including % and _ can also be used for shapefiles, file geodatabases, and ArcSDE geodatabases. These include % for representing any number of characters. The LIKE operator is often used with wildcard characters to perform partial string matching. For example, the following query would find all records with a service area that begins with N and has any number of characters after. qry = '"SVCAREA" LIKE 'N%'' The underscore character (_) can be used to represent a single character. For personal geodatabases the asterisk (*) is used to represent a wildcard character for any number of characters, while (?) represents a single character. You can also query for the absence of data, also known as NULL values. A NULL value is often mistaken for a value of zero, but that is not the case. NULL values indicate the absence of data, which is different from a value of zero. Null operators include IS NULL and IS NOT NULL. The following code example will find all records where the SVCAREA field contains no data: qry = '"SVCAREA" IS NULL' The final topic that we'll cover in this section are operators used for combining expressions where multiple query conditions need to be met. The AND operator requires that both query conditions be met for the query result to be true, resulting in selected records. The OR operator requires that at least one of the conditions be met. How it works… The creation of syntactically correct queries is one of the most challenging aspects of programming ArcGIS with Python. However, once you understand some basic rules, it gets a little easier. In this section, we'll summarize these rules. One of the more important things to keep in mind is that field names must be enclosed with double quotes for all datasets, with the exception of personal geodatabases, which require braces surrounding field names. There is also an AddFieldDelimiters() function that you can use to add the correct delimiter to a field based on the datasource supplied as a parameter to the function. The syntax for this function is as follows: AddFieldDelimiters(dataSource,field) Additionally, most people, especially those new to programming with Python, struggle with the issue of adding single quotes to string values being evaluated by the query. In Python, quotes have to be escaped with a single forward slash followed by the quote. Using this escape sequence will ensure that Python does in fact see that as a quote rather than the end of the string. Finally, take some time to familiarize yourself with the wildcard characters. For datasets other than personal geodatabases, you'll use the (%) character for multiple characters and an underscore (_) character for a single character. If you're using a personal geodatabase, the (*) character is used to match multiple characters and the (?) character is used to match a single character. Obviously, the syntax differences between personal geodatabases and all other types of datasets can lead to some confusion. Creating feature layers and table views Feature layers and table views serve as intermediate datasets held in memory for use specifically with tools such as Select by Location and Select Attributes. Although these temporary datasets can be saved, they are not needed in most cases. Getting ready Feature classes are physical representations of geographic data and are stored as files (shapefiles, personal geodatabases, and file geodatabases) or within a geodatabase. ESRI defines a feature class as "a collection of features that shares a common geometry (point, line, or polygon), attribute table, and spatial reference." Feature classes can contain default and user-defined fields. Default fields include the SHAPE and OBJECTID fields. These fields are maintained and updated automatically by ArcGIS. The SHAPE field holds the geometric representation of a geographic feature, while the OBJECTID field holds a unique identifier for each feature. Additional default fields will also exist depending on the type of feature class. A line feature class will have a SHAPE_LENGTH field. A polygon feature class will have both, a SHAPE_LENGTH and a SHAPE_AREA field. Optional fields are created by end users of ArcGIS and are not automatically updated by GIS. These contain attribute information about the features. These fields can also be updated by your scripts. Tables are physically represented as standalone DBF tables or within a geodatabase. Both, tables and feature classes, contain attribute information. However, a table contains only attribute information. There isn't a SHAPE field associated with a table, and they may or may not contain an OBJECTID field. Standalone Python scripts that use the Select by Attributes or Select by Location tool require that you create an intermediate dataset rather than using feature classes or tables. These intermediate datasets are temporary in nature and are called Feature Layers or Table Views. Unlike feature classes and tables, these temporary datasets do not represent actual files on disk or within a geodatabase. Instead, they are "in memory" representations of feature classes and tables. These datasets are active only while a Python script is running. They are removed from memory after the tool has executed. However, if the script is run from within ArcGIS as a script tool, then the temporary layer can be saved either by right-clicking on the layer in the table of contents and selecting Save As Layer File or simply by saving the map document file. Feature layers and table views must be created as a separate step in your Python scripts, before you can call the Select by Attributes or Select by Location tools. The Make Feature Layer tool generates the "in-memory" representation of a feature class, which can then be used to create queries and selection sets, as well as to join tables. After this step has been completed, you can use the Select by Attributes or Select by Location tool. Similarly, the Make Table View tool is used to create an "in-memory" representation of a table. The function of this tool is the same as Make Feature Layer. Both the Make Feature Layer and Make Table View tools require an input dataset, an output layer name, and an optional query expression, which can be used to limit the features or rows that are a part of the output layer. In addition, both tools can be found in the Data Management Tools toolbox. The syntax for using the Make Feature Layer tool is as follows: arcpy.MakeFeatureLayer_management(<input feature layer>, <output layer name>,{where clause}) The syntax for using the Make Table View tool is as follows: Arcpy.MakeTableView_management(<input table>, <output table name>, {where clause}) In this recipe, you will learn how to use the Make Feature Layer and Make Table View tools. These tasks will be done inside ArcGIS, so that you can see the in-memory copy of the layer that is created. How to do it… Follow these steps to learn how to use the Make Feature Layer and Make Table View tools: Open c:ArcpyBookCh8Crime_Ch8.mxd in ArcMap. Open the Python window. Import the arcpy module: import arcpy Set the workspace: arcpy.env.workspace = "c:/ArcpyBook/data/CityOfSanAntonio.gdb" Start a try block: try: Make an in-memory copy of the Burglary feature class using the Make Feature Layer tool. Make sure you indent this line of code: flayer = arcpy.MakeFeatureLayer_management("Burglary","Burglary_ Layer") Add an except block and a line of code to print an error message in the event of a problem: except: print "An error occurred during creation" The entire script should appear as follows: import arcpy arcpy.env.workspace = "c:/ArcpyBook/data/CityOfSanAntonio.gdb" try: flayer = arcpy.MakeFeatureLayer_management("Burglary","Burglary_ Layer") except: print "An error occurred during creation" Save the script to c:ArcpyBookCh8CreateFeatureLayer.py. Run the script. The new Burglary_Layer file will be added to the ArcMap table of contents: The Make Table View tool functionality is equivalent to the Make Feature Layer tool. The difference is that it works against standalone tables instead of feature classes. Remove the following line of code: flayer = arcpy.MakeFeatureLayer_management("Burglary","Burglary_ Layer") Add the following line of code in its place: tView = arcpy.MakeTableView_management("Crime2009Table", "Crime2009TView") Run the script to see the table view added to the ArcMap table of contents. How it works... The Make Feature Layer and Make Table View tools create in-memory representations of feature classes and tables respectively. Both the Select by Attributes and Select by Location tools require that these temporary, in-memory structures be passed in as parameters when called from a Python script. Both tools also require that you pass in a name for the temporary structures. There's more... You can also apply a query to either the Make Feature Layer or Make Table View tools to restrict the records returned in the feature layer or table view. This is done through the addition of a where clause when calling either of the tools from your script. This query is much the same as if you'd set a definition query on the layer through Layer Properties | Definition Query. The syntax for adding a query is as follows: MakeFeatureLayer(in_features, out_layer, where_clause) MakeTableView(in_table, out_view, where_clause)
Read more
  • 0
  • 0
  • 9198

article-image-asynchronous-programming-python
Packt
26 Aug 2015
20 min read
Save for later

Asynchronous Programming with Python

Packt
26 Aug 2015
20 min read
 In this article by Giancarlo Zaccone, the author of the book Python Parallel Programming Cookbook, we will cover the following topics: Introducing Asyncio GPU programming with Python Introducing PyCUDA Introducing PyOpenCL (For more resources related to this topic, see here.) An asynchronous model is of fundamental importance along with the concept of event programming. The execution model of asynchronous activities can be implemented using a single stream of main control, both in uniprocessor systems and multiprocessor systems. In the asynchronous model of a concurrent execution, various tasks intersect with each other along the timeline, and all of this happens under the action of a single flow of control (single-threaded). The execution of a task can be suspended and then resumed alternating in time with any other task. The asynchronous programming model As you can see in the preceding figure, the tasks (each with a different color) are interleaved with one another, but they are in a single thread of control. This implies that when one task is in execution, the other tasks are not. A key difference between a multithreaded programming model and single-threaded asynchronous concurrent model is that in the first case, the operating system decides on the timeline whether to suspend the activity of a thread and start another, while in the second case, the programmer must assume that a thread may be suspended and replaced with another at almost any time. Introducing Asyncio The Python module Asyncio provides facilities to manage events, coroutines, tasks and threads, and synchronization primitives to write concurrent code. When a program becomes very long and complex, it is convenient to divide it into subroutines, each of which realizes a specific task, for which the program implements a suitable algorithm. The subroutine cannot be executed independently but only at the request of the main program, which is then responsible for coordinating the use of subroutines. Coroutines are a generalization of the subroutine. Like a subroutine, the coroutine computes a single computational step, but unlike subroutines, there is no main program that is used to coordinate the results. This is because the coroutines link themselves together to form a pipeline without any supervising function responsible for calling them in a particular order. In a coroutine, the execution point can be suspended and resumed later, having kept track of its local state in the intervening time. In this example, we see how to use the coroutine mechanism of Asyncio to simulate a finite state machine of five states. A Finite-state automaton (FSA) is a mathematical model that is not only widely used in engineering disciplines but also in sciences, such as mathematics and computer science. The automata we want to simulate the behavior is as follows: Finite State Machine We have indicated with S0, S1, S2, S3, and S4 the states of the system, with 0 and 1 as the values for which the automata can pass from one state to the next state (this operation is called a transition). So for example, the state S0 can be passed to the state S1 only for the value 1 and S0 can pass the state S2 only to the value 0. The Python code that follows simulates a transition of the automaton from the state S0, the so-called Start State, up to the state S4, the End State: #Asyncio Finite State Machine import asyncio import time from random import randint @asyncio.coroutine def StartState(): print ("Start State called n") input_value = randint(0,1) time.sleep(1) if (input_value == 0): result = yield from State2(input_value) else : result = yield from State1(input_value) print("Resume of the Transition : nStart State calling " + result) @asyncio.coroutine def State1(transition_value): outputValue = str(("State 1 with transition value = %s n" %(transition_value))) input_value = randint(0,1) time.sleep(1) print("...Evaluating...") if (input_value == 0): result = yield from State3(input_value) else : result = yield from State2(input_value) result = "State 1 calling " + result return (outputValue + str(result)) @asyncio.coroutine def State2(transition_value): outputValue = str(("State 2 with transition value = %s n" %(transition_value))) input_value = randint(0,1) time.sleep(1) print("...Evaluating...") if (input_value == 0): result = yield from State1(input_value) else : result = yield from State3(input_value) result = "State 2 calling " + result return (outputValue + str(result)) @asyncio.coroutine def State3(transition_value): outputValue = str(("State 3 with transition value = %s n" %(transition_value))) input_value = randint(0,1) time.sleep(1) print("...Evaluating...") if (input_value == 0): result = yield from State1(input_value) else : result = yield from EndState(input_value) result = "State 3 calling " + result return (outputValue + str(result)) @asyncio.coroutine def EndState(transition_value): outputValue = str(("End State with transition value = %s n" %(transition_value))) print("...Stop Computation...") return (outputValue ) if __name__ == "__main__": print("Finite State Machine simulation with Asyncio Coroutine") loop = asyncio.get_event_loop() loop.run_until_complete(StartState()) After running the code, we have an output similar to this: C:Python CookBookChapter 4- Asynchronous Programmingcodes - Chapter 4>python asyncio_state_machine.py Finite State Machine simulation with Asyncio Coroutine Start State called ...Evaluating... ...Evaluating... ...Evaluating... ...Evaluating... ...Evaluating... ...Evaluating... ...Evaluating... ...Evaluating... ...Evaluating... ...Evaluating... ...Evaluating... ...Evaluating... ...Stop Computation... Resume of the Transition : Start State calling State 1 with transition value = 1 State 1 calling State 3 with transition value = 0 State 3 calling State 1 with transition value = 0 State 1 calling State 2 with transition value = 1 State 2 calling State 3 with transition value = 1 State 3 calling State 1 with transition value = 0 State 1 calling State 2 with transition value = 1 State 2 calling State 1 with transition value = 0 State 1 calling State 3 with transition value = 0 State 3 calling State 1 with transition value = 0 State 1 calling State 2 with transition value = 1 State 2 calling State 3 with transition value = 1 State 3 calling End State with transition value = 1 Each state of the automata has been defined with the annotation @asyncio.coroutine. For example, the state S0 is: @asyncio.coroutine def StartState(): print ("Start State called n") input_value = randint(0,1) time.sleep(1) if (input_value == 0): result = yield from State2(input_value) else : result = yield from State1(input_value) The transition to the next state is determined by input_value, which is defined by the randint(0,1) function of Python's module random. This function randomly provides the value 0 or 1, where it randomly determines to which state the finite-state machine will be passed: input_value = randint(0,1) After determining the value at which state the finite state machine will be passed, the coroutine calls the next coroutine using the command yield from: if (input_value == 0): result = yield from State2(input_value) else : result = yield from State1(input_value) The variable result is the value that each coroutine returns. It is a string, and at the end of the computation, we can reconstruct [NV1] the transition from the initial state of the automaton, the Start State, up to the final state, the End State. The main program starts the evaluation inside the event loop: if __name__ == "__main__": print("Finite State Machine simulation with Asyncio Coroutine") loop = asyncio.get_event_loop() loop.run_until_complete(StartState()) GPU programming with Python A graphics processing unit (GPU) is an electronic circuit that specializes in processing data to render images from polygonal primitives. Although they were designed to carry out rendering images, GPUs have continued to evolve, becoming more complex and efficient in serving both real-time and offline rendering community. GPUs have continued to evolve, becoming more complex and efficient in performing any scientific computation. Each GPU is indeed composed of several processing units called streaming multiprocessor (SM), representing the first logic level of parallelism; each SM in fact, works simultaneously and independently from the others. The GPU architecture Each SM is in turn divided into a group of Stream Processors (SP), each of which has a core of real execution and can run a thread sequentially. SP represents the smallest unit of execution logic and the level of finer parallelism. The division in SM and SP is structural in nature, but it is possible to outline a further logical organization of the SP of a GPU, which are grouped together in logical blocks characterized by a particular mode of execution—all cores that make up a group run at the same time with the same instructions. This is just the SIMD (Single Instruction, Multiple Data) model. The programming paradigm that characterizes GPU computing is also called stream processing because the data can be viewed as a homogeneous flow of values that are applied synchronously to the same operations. Currently, the most efficient solutions to exploit the computing power provided by the GPU cards are the software libraries CUDA and OpenCL. Introducing PyCUDA PyCUDA is a Python wrapper for CUDA (Compute Unified Device Architecture), the software library developed by NVIDIA for GPU programming. The PyCuda programming model is designed for the common execution of a program on the CPU and GPU so as to allow you to perform the sequential parts on the CPU and the numeric parts that are more intensive on the GPU. The phases to be performed in the sequential mode are implemented and executed on the CPU (host), while the steps to be performed in parallel are implemented and executed on the GPU (device). The functions to be performed in parallel on the device are called kernels. The skeleton general for the execution of a generic function kernel on the device is as follows: Allocation of memory on the device. Transfer of data from the host memory to that allocated on the device. Running the device: Running the configuration. Invocation of the kernel function. Transfer of the results from the memory on the device to the host memory. Release of the memory allocated on the device. The PyCUDA programming model To show the PyCuda workflow, let's consider a 5 × 5 random array and the following procedure: Create the array 5×5 on the CPU. Transfer the array to the GPU. Perform a Task[NV2]  on the array in the GPU (double all the items in the array). Transfer the array from the GPU to the CPU. Print the results. The code for this is as follows: import pycuda.driver as cuda import pycuda.autoinit from pycuda.compiler import SourceModule import numpy a = numpy.random.randn(5,5) a = a.astype(numpy.float32) a_gpu = cuda.mem_alloc(a.nbytes) cuda.memcpy_htod(a_gpu, a) mod = SourceModule(""" __global__ void doubleMatrix(float *a) { int idx = threadIdx.x + threadIdx.y*4; a[idx] *= 2; } """) func = mod.get_function("doubleMatrix") func(a_gpu, block=(5,5,1)) a_doubled = numpy.empty_like(a) cuda.memcpy_dtoh(a_doubled, a_gpu) print ("ORIGINAL MATRIX") print a print ("DOUBLED MATRIX AFTER PyCUDA EXECUTION") print a_doubled The example output should be like this : C:Python CookBookChapter 6 - GPU Programming with Python >python PyCudaWorkflow.py ORIGINAL MATRIX [[-0.59975582 1.93627465 0.65337795 0.13205571 -0.46468592] [ 0.01441949 1.40946579 0.5343408 -0.46614054 -0.31727529] [-0.06868593 1.21149373 -0.6035406 -1.29117763 0.47762445] [ 0.36176383 -1.443097 1.21592784 -1.04906416 -1.18935871] [-0.06960868 -1.44647694 -1.22041082 1.17092752 0.3686313 ]] DOUBLED MATRIX AFTER PyCUDA EXECUTION [[-1.19951165 3.8725493 1.3067559 0.26411143 -0.92937183] [ 0.02883899 2.81893158 1.0686816 -0.93228108 -0.63455057] [-0.13737187 2.42298746 -1.2070812 -2.58235526 0.95524889] [ 0.72352767 -1.443097 1.21592784 -1.04906416 -1.18935871] [-0.06960868 -1.44647694 -1.22041082 1.17092752 0.3686313 ]] The code starts with the following imports: import pycuda.driver as cuda import pycuda.autoinit from pycuda.compiler import SourceModule The pycuda.autoinit import automatically picks a GPU to run on based on the availability and number. It also creates a GPU context for subsequent code to run in. Both the chosen device and the created context are available from pycuda.autoinit as importable symbols if needed. While the SourceModule component is the object where a C-like code for the GPU must be written. The first step is to generate the input 5 × 5 matrix. Since most GPU computations involve large arrays of data, the NumPy module must be imported: import numpy a = numpy.random.randn(5,5) Then, the items in the matrix are converted in a single precision mode, many NVIDIA cards support only single precision: a = a.astype(numpy.float32) The first operation to be done in order to implement a GPU loads the input array from the host memory (CPU) to the device (GPU). This is done at the beginning of the operation and consists two steps that are performed by invoking two functions provided PyCuda[NV3] . Memory allocation on the device is done via the cuda.mem_alloc function. The device and host memory may not ever communicate while performing a function kernel. This means that to run a function in parallel on the device, the data relating to it must be present in the memory of the device itself. Before you copy data from the host memory to the device memory, you must allocate the memory required on the device: a_gpu = cuda.mem_alloc(a.nbytes). Copy of the matrix from the host memory to that of the device with the function: call cuda.memcpy_htod : cuda.memcpy_htod(a_gpu, a). We also note that a_gpu is one dimensional, and on the device, we need to handle it as such. All these operations do not require the invocation of a kernel and are made directly by the main processor. The SourceModule entity serves to define the (C-like) kernel function doubleMatrix that multiplies each array entry by 2: mod = SourceModule(""" __global__ void doubleMatrix(float *a) { int idx = threadIdx.x + threadIdx.y*4; a[idx] *= 2; } """) The __global__ qualifier is a directive that indicates that the doubleMatrix function will be processed on the device. It will be just the compiler Cuda nvcc that will be used to perform this task. Let's look at the function's body, which is as follows: int idx = threadIdx.x + threadIdx.y*4; The idx parameter is the matrix index that is identified by the thread coordinates threadIdx.x and threadIdx.y. Then, the element matrix with the index idx is multiplied by 2: a[idx] *= 2; We also note that this kernel function will be executed once in 16 different threads. Both the variables threadIdx.x and threadIdx.y contain indices between 0 and 3 , and the pair[NV4]  is different for each thread. Threads scheduling is directly linked to the GPU architecture and its intrinsic parallelism. A block of threads is assigned to a single SM. Here, threads are further divided into groups called warps. The size of a warp depends on the architecture under consideration. The threads of the same warp are managed by the control unit called the warp scheduler. To take full advantage of the inherent parallelism of the SM, the threads of the same warp must execute the same instruction. If this condition does not occur, we speak of divergence of threads. If the same warp threads execute different instructions, the control unit cannot handle all the warps. It must follow the sequences of instructions for every single thread (or for homogeneous subsets of threads) in a serial mode. Let's observe how the thread block is divided in various warps—threads are divided by the value of threadIdx. The threadIdx structure consists of 3 fields: threadIdx.x, threadIdx.y, and threadIdx.z. Thread blocks subdivision: T(x,y), where x = threadIdx.x and y = threadIdx.y We can see again that the code in the kernel function will be automatically compiled by the nvcc cuda compiler. If there are no errors, a pointer to this compiled function will be created. In fact, the mod.get_function[NV5] ("doubleMatrix") returns an identifier to the function created func: func = mod.get_function("doubleMatrix ") To perform a function on the device, you must first configure the execution appropriately. This means that we need to determine the size of the coordinates to identify and distinguish the thread belonging to different blocks. This will be done using the block parameter inside the func call: func(a_gpu, block = (5, 5, 1)) The block = (5, 5, 1) tells us that we are calling a kernel function with a_gpu linearized input matrix and a single thread block of size, 5 threads in the x direction, 5 threads in the y direction, and 1 thread in the z direction, 16 threads in total. This structure is designed with parallel implementation of the algorithm of interest. The division of the workload results is an early form of parallelism that is sufficient and necessary to make use of the computing resources provided by the GPU. Once you've configured the kernel's invocation, you can invoke the kernel function that executes instructions in parallel on the device. Each thread executes the same code kernel. After the computation in the gpu device, we use an array to store the results: a_doubled = numpy.empty_like(a) cuda.memcpy_dtoh(a_doubled, a_gpu) Introducing PyOpenCL As for programming with PyCuda, the first step to build a program for PyOpenCL is the encoding of the host application. In fact, this is performed on the host computer (typically, the user's PC) and then, it dispatches the kernel application on the connected devices (GPU cards). The host application must contain five data structures, which are as follows: Device: This identifies the hardware where the kernel code must be executed. A PyOpenCL application can be executed not only on CPU and GPU cards but also on embedded devices such as FPGA (Field Programmable Gate Array). Program: This is a group of kernels. A program selects which kernel must be executed on the device. Kernel: This is the code to be executed on the device. A kernel is essentially a (C-like) function that enables it to be compiled for an execution on any device that supports OpenCL drivers. The kernel is the only way the host can call a function that will run on a device. When the host invokes a kernel, many work items start running on the device. Each work item runs the code of the kernel but works on a different part of the dataset. Command queue: Each device receives kernels through this data structure. A command queue orders the execution of kernels on the device. Context: This is a group of devices. A context allows devices to receive kernels and transfer data. PyOpenCL programming The preceding figure shows how these data structures can work in a host application. Let's remember again that a program can contain multiple functions that are to be executed on the device and each kernel encapsulates only a single function from the program. In this example, we show you the basic steps to build a PyOpenCL program. The task to be executed is the parallel sum of two vectors. In order to maintain a readable output, let's consider two vectors, each of one out of 100 elements. The resulting vector will be for each element's i[NV6] th, which is the sum of the ith element vector_a plus the ith element vector_b. Of course, to be able to appreciate the parallel execution of this code, you can also increase some orders of magnitude the size of the input vector_dimension:[NV7]  import numpy as np import pyopencl as cl import numpy.linalg as la vector_dimension = 100 vector_a = np.random.randint(vector_dimension, size=vector_dimension) vector_b = np.random.randint(vector_dimension, size=vector_dimension) platform = cl.get_platforms()[0] device = platform.get_devices()[0] context = cl.Context([device]) queue = cl.CommandQueue(context) mf = cl.mem_flags a_g = cl.Buffer(context, mf.READ_ONLY | mf.COPY_HOST_PTR, hostbuf=vector_a) b_g = cl.Buffer(context, mf.READ_ONLY | mf.COPY_HOST_PTR, hostbuf=vector_b) program = cl.Program(context, """ __kernel void vectorSum(__global const int *a_g, __global const int *b_g, __global int *res_g) { int gid = get_global_id(0); res_g[gid] = a_g[gid] + b_g[gid]; } """).build() res_g = cl.Buffer(context, mf.WRITE_ONLY, vector_a.nbytes) program.vectorSum(queue, vector_a.shape, None, a_g, b_g, res_g) res_np = np.empty_like(vector_a) cl.enqueue_copy(queue, res_np, res_g) print ("PyOPENCL SUM OF TWO VECTORS") print ("Platform Selected = %s" %platform.name ) print ("Device Selected = %s" %device.name) print ("VECTOR LENGTH = %s" %vector_dimension) print ("INPUT VECTOR A") print vector_a print ("INPUT VECTOR B") print vector_b print ("OUTPUT VECTOR RESULT A + B ") print res_np assert(la.norm(res_np - (vector_a + vector_b))) < 1e-5 The output from Command Prompt should be like this: C:Python CookBook Chapter 6 - GPU Programming with PythonChapter 6 - codes>python PyOpenCLParallellSum.py Platform Selected = NVIDIA CUDA Device Selected = GeForce GT 240 VECTOR LENGTH = 100 INPUT VECTOR A [ 0 29 88 46 68 93 81 3 58 44 95 20 81 69 85 25 89 39 47 29 47 48 20 86 59 99 3 26 68 62 16 13 63 28 77 57 59 45 52 89 16 6 18 95 30 66 19 29 31 18 42 34 70 21 28 0 42 96 23 86 64 88 20 26 96 45 28 53 75 53 39 83 85 99 49 93 23 39 1 89 39 87 62 29 51 66 5 66 48 53 66 8 51 3 29 96 67 38 22 88] INPUT VECTOR B [98 43 16 28 63 1 83 18 6 58 47 86 59 29 60 68 19 51 37 46 99 27 4 94 5 22 3 96 18 84 29 34 27 31 37 94 13 89 3 90 57 85 66 63 8 74 21 18 34 93 17 26 9 88 38 28 14 68 88 90 18 6 40 30 70 93 75 0 45 86 15 10 29 84 47 74 22 72 69 33 81 31 45 62 81 66 69 14 71 96 91 51 35 4 63 36 28 65 10 41] OUTPUT VECTOR RESULT A + B [ 98 72 104 74 131 94 164 21 64 102 142 106 140 98 145 93 108 90 84 75 146 75 24 180 64 121 6 122 86 146 45 47 90 59 114 151 72 134 55 179 73 91 84 158 38 140 40 47 65 111 59 60 79 109 66 28 56 164 111 176 82 94 60 56 166 138 103 53 120 139 54 93 114 183 96 167 45 111 70 122 120 118 107 91 132 132 74 80 119 149 157 59 86 7 92 132 95 103 32 129] In the first line of the code after the required module import, we defined the input vectors: vector_dimension = 100 vector_a = np.random.randint(vector_dimension, size= vector_dimension) vector_b = np.random.randint(vector_dimension, size= vector_dimension) Each vector contain 100 integers items that are randomly selected thought the NumPy function: np.random.randint(max integer , size of the vector) Then, we must select the device to run the kernel code. To do this, we must first select the platform using the get_platform() PyOpenCL statement: platform = cl.get_platforms()[0] This platform, as you can see from the output, corresponds to the NVIDIA CUDA platform. Then, we must select the device using the get_device() platform's method: device = platform.get_devices()[0] In the following steps, the context and the queue are defined, PyOpenCL provides the method context (device selected) and queue (context selected): context = cl.Context([device]) queue = cl.CommandQueue(context) To perform the computation in the device, the input vector must be transferred to the device's memory. So, two input buffers in the device memory must be created: mf = cl.mem_flags a_g = cl.Buffer(context, mf.READ_ONLY | mf.COPY_HOST_PTR, hostbuf=vector_a) b_g = cl.Buffer(context, mf.READ_ONLY | mf.COPY_HOST_PTR, hostbuf=vector_b) Also, we prepare the buffer for the resulting vector: res_g = cl.Buffer(context, mf.WRITE_ONLY, vector_a.nbytes) Finally, the core of the script, the kernel code is defined inside a program as follows: program = cl.Program(context, """ __kernel void vectorSum(__global const int *a_g, __global const int *b_g, __global int *res_g) { int gid = get_global_id(0); res_g[gid] = a_g[gid] + b_g[gid]; } """).build() The kernel's name is vectorSum. The parameter list defines the data types of the input arguments (vectors of integers) and output data type (a vector of integer again). Inside the kernel, the sum of the two vectors is simply defined as: Initialize the vector index: int gid = get_global_id(0) Sum the vector's components: res_g[gid] = a_g[gid] + b_g[gid]; In OpenCL and PyOpenCL, buffers are attached to a context[NV8]  and are only moved to a device once the buffer is used on that device. Finally, we execute vectorSum in the device: program.vectorSum(queue, vector_a.shape, None, a_g, b_g, res_g) To visualize the results, an empty vector is built: res_np = np.empty_like(vector_a) Then, the result is copied into this vector: cl.enqueue_copy(queue, res_np, res_g) Finally, the results are displayed: print ("VECTOR LENGTH = %s" %vector_dimension) print ("INPUT VECTOR A") print vector_a print ("INPUT VECTOR B") print vector_b print ("OUTPUT VECTOR RESULT A + B ") print res_np To check the result, we use the assert statement. It tests the result and triggers an error if the condition is false: assert(la.norm(res_np - (vector_a + vector_b))) < 1e-5 Summary In this article we discussed about Asyncio, GPU programming with Python, PyCUDA, and PyOpenCL. Resources for Article: Further resources on this subject: Bizarre Python[article] Scientific Computing APIs for Python[article] Optimization in Python [article]
Read more
  • 0
  • 0
  • 9146

article-image-managing-it-portfolio-using-troux-enterprise-architecture
Packt
12 Aug 2010
16 min read
Save for later

Managing the IT Portfolio using Troux Enterprise Architecture

Packt
12 Aug 2010
16 min read
(For more resources on Troux, see here.) Managing the IT Portfolio using Troux Enterprise Architecture Almost every company today is totally dependent on IT for day-to-day operations. Large companies literally spend billions on IT-related personnel, software, equipment, and facilities. However, do business leaders really know what they get in return for these investments? Upper management knows that a successful business model depends on information technology. Whether the company is focused on delivery of services or development of products, management depends on its IT team to deliver solutions that meet or exceed customer expectations. However, even though companies continue to invest heavily in various technologies, for most companies, knowing the return-on-investment in technology is difficult or impossible. When upper management asks where the revenues are for the huge investments in software, servers, networks, and databases, few IT professionals are able to answer. There are questions that are almost impossible to answer without guessing, such as: Which IT projects in the portfolio of projects will actually generate revenue? What are we getting for spending millions on vendor software? When will our data center run out of capacity? This article will explore how IT professionals can be prepared when management asks the difficult questions. By being prepared, IT professionals can turn conversations with management about IT costs into discussions about the value IT provides. Using consolidated information about the majority of the IT portfolio, IT professionals can work with business leaders to select revenue-generating projects, decrease IT expenses, and develop realistic IT plans. The following sections will describe what IT professionals can do to be ready with accurate information in response to the most challenging questions business leaders might ask. Management repositories IT has done a fine job of delivering solutions for years. However, pressure to deliver business projects quickly has created a mentality in most IT organizations of "just put it in and we will go back and do the clean up later." This has led to a layering effect where older "legacy" technology remains in place, while new technology is adopted. With this complex mix of legacy solutions and emerging technology, business leaders have a hard time understanding how everything fits together and what value is provided from IT investments. Gone are the days when the Chief Information Officer (CIO) could say "just trust me" when business people asked questions about IT spending. In addition, new requirements for corporate compliance combined with the expanding use of web-based solutions makes managing technology more difficult than ever. With the advent of Software-as-a-Service (SaaS) or cloud computing, the technical footprint, or ecosystem, of IT has extended beyond the enterprise itself. Virtualization of platforms and service-orientation adds to the mind-numbing mix of technologies available to IT. However, there are many systems available to help companies manage their technological portfolio. Unfortunately, multiple teams within the business and within IT see the problem of managing the IT portfolio differently. In many companies, there is no centralized effort to gather and store IT portfolio information. Teams with a need for IT asset information tend to purchase or build a repository specific to their area of responsibility. Some examples of these include: Business goals repository Change management database Configuration management database Business process management database Fixed assets database Metadata repository Project portfolio management database Service catalog Service registry While each of these repositories provides valuable information about IT portfolios, they are each optimized to meet a specific set of requirements. The following table shows the main types of information stored in each of these repositories along with a brief statement about its functional purpose: Repository Main content Main purpose Business goals Goal statements and assignments Documents business goals and who is responsible Change management database Change request tickets, application owners Captures change requests and who can authorize change Configuration management database Identifies actual hardware and software in use across the enterprise Supports Information Technology Infrastructure Library (ITIL) processes Business process management database Business processes, information flows, and process owners Used to develop applications and document business processes Fixed assets database Asset identifiers for hardware and software, asset life, purchase cost, and depreciation amounts Documents cost and depreciable life of IT assets Metadata repository Data about the company databases and files Documents the names, definitions, data types, and locations of the company data Project portfolio management database Project names, classifications, assignments, business value and scope Used to manage IT workload and assess value of IT projects to the business Service catalog Defines hardware and compatible software available for project use Used to manage hardware and software implementations assigned to the IT department Service registry Names and details of reusable software services Used to manage, control, and report on reusable software It is easy to see that while each of these repositories serves a specific purpose, none supports an overarching view across the others. For example, one might ask: How many SQL Server databases do we have installed and what hardware do they run on? To answer this question, IT managers would have to extract data from the metadata repository and combine it with data from the Configuration Management Database (CMDB). The question could be extended: How much will it cost in early expense write-offs if we retire the SQL Server DB servers into a new virtual grid of servers? To answer this question, IT managers need to determine not only how many servers host SQL Server, but how old they are, what they cost at purchase time, and how much depreciation is left on them. Now the query must span at least three systems (CMDB, fixed assets, and metadata repository). The accuracy of the answer will also depend on the relative validity of the data in each repository. There could be overlapping data in some, and outright errors in others. Changing the conversation When upper management asks difficult questions, they are usually interested in cost, risk management, or IT agility. Not knowing a great deal about IT, they are curious about why they need to spend millions on technology and what they get for their investments. The conversation ends up being primarily about cost and how to reduce expenses. This is not a good position to be in if you are running a support function like Enterprise Architecture. How can you explain IT investments in a way that management can understand? If you are not prepared with facts, management has no choice but to assume that costs are out of control and they can be reduced, usually by dramatic amounts. As a good corporate citizen, it is your job to help reduce costs. Like everyone in management, getting the most out of the company's assets is your responsibility. However, as we in IT know, it's just as important to be ready for changes in technology and to be on top of technology trends. As technology leaders, it is our job to help the company stay current through investments that may pay off in the future rather than show an immediate return. The following diagram shows various management functions and technologies that are used to manage the business of IT: The dimensions of these tools and processes span systems that run the business to change the business and from the ones using operational information to using strategic information. Various technologies that support data about IT assets are shown. These include: Business process analytics and management information Service-oriented architecture governance Asset-liability management Information technology systems management Financial management information Project portfolio and management information The key to changing the conversation about IT is having the ability to bring the information of these disciplines into a single view. The single view provides the ability to actually discuss IT in a strategic way. Gathering data and reporting on the actual metrics of IT, in a way business leaders can understand, supports strategic planning. The strategic planning process combined with fact-based metrics establishes credibility with upper management and promotes improved decision making on a daily basis. Troux Technologies Solving the IT-business communication problem has been difficult until recently. Troux Technologies (www.troux.com) developed a new open-architected repository and software solution, called the Troux Transformation Platform, to help IT manage the vast array of technology deployed within the company. Troux customers use the suite of applications and advanced integration platform within the product architecture to deliver bottom-line results. By locating where IT expenses are redundant, or out-of-step with business strategy, Troux customers experience significant cost savings. When used properly, the platform also supports improved IT efficiency, quicker response to business requirements, and IT risk reduction. In today's globally-connected markets, where shocks and innovations happen at an unprecedented rate, antiquated approaches to Strategic IT Planning and Enterprise Architecture have become a major obstruction. The inability of IT to plan effectively has driven business leaders to seek solutions available outside the enterprise. Using SaaS or Application Service Providers (ASPs) to meet urgent business objectives can be an effective means to meet short-term goals. However, to be complete, even these solutions usually require integration with internal systems. IT finds itself dealing with unspecified service-level requirements, developing integration architectures, and cleaning up after poorly planned activities by business leaders who don't understand what capabilities exist within the software running inside the company. A global leader in Strategic IT Planning and Enterprise Architecture software, Troux has created an Enterprise Architecture repository that IT can use to put itself at the center of strategic planning. Troux has been successful in implementing its repository at a number of companies. A partial list of Troux's customers can be found on the website. There are other enterprise-level repository vendors on the market. However, leading analysts, such as The Gartner Group and Forrester Research, have published recent studies ranking Troux as a leader in the IT strategy planning tools space. Troux Transformation Platform Troux's sophisticated integration and collaboration capabilities support multiple business initiatives such as handling mergers, aligning business and IT plans, and consolidating IT assets. The business-driven platform provides new levels of visibility into the complex web of IT resources, programs, and business strategy so the business can see instantly where IT spending and programs are redundant or out-of-step with business strategy. The business suite of applications helps IT to plan and execute faster with data assimilated from various trusted sources within the company. The platform provides information necessary to relevant stakeholders such as Business Analysts, Enterprise Architects, The Program Management Office, Solutions Architects, and executives within the business and IT. The transformation platform is not only designed to address today's urgent cost-restructuring agendas, but it also introduces an ongoing IT management discipline, allowing EA and business users to drive strategic growth initiatives. The integration platform provides visibility and control to: Uncover and fix business/IT disconnects: This shows how IT directly supports business strategies and capabilities, and ensures that mismatched spending can be eliminated. Troux Alignment helps IT think like a CFO and demonstrate control and business purpose for the billions that are spent on IT assets, by ensuring that all stakeholders have valid and relevant IT information. Identify and eliminate redundant IT spending: This uncovers the many untapped opportunities with Troux Optimization to free up needless spend, and apply it either to the bottom line or to support new business initiatives. Speed business response and simplify IT: This speeds the creation and deployment of a set of standard, reusable building blocks that are proven to work in agile business cycles. Troux Standards enables the use of IT standards in real time, thereby streamlining the process of IT governance. Accelerate business transformation for government agencies: This helps federal agencies create an actionable Enterprise Architecture and comply with constantly changing mandates. Troux eaGov automatically identifies opportunities to reduce costs to business and IT risks, while fostering effective initiative planning and execution within or across agencies. Support EA methodology: Companies adopting The Open Group Architecture Framework (TOGAF™) can use the Troux for TOGAF solution to streamline their efforts. Unlock the full potential of IT portfolio investment: Unifies Strategic IT Planning, EA, and portfolio project management through a common IT governance process. The Troux CA Clarity Connection enables the first bi-directional integration in the market between CA Clarity Project Portfolio Management (PPM) and the Troux EA repository for enhanced IT investment portfolio planning, analysis, and control. Understand your deployed IT assets: Using the out-of-the-box connection to HP's Universal Configuration Management Database (uCMDB), link software and hardware with the applications they support. All of these capabilities are enabled through an open-architected platform that provides uncomplicated data integration tools. The platform provides Architecture-modeling capabilities for IT Architects, an extensible database schema (or meta-model), and integration interfaces that are simple to automate and bring online with minimal programming efforts. Enterprise Architecture repository The Troux Transformation Platform acts as the consolidation point across all the various IT management databases and even some management systems outside the control of IT. By collecting data from across various areas, new insights are possible, leading to reductions in operating costs and improvements in service levels to the business. While it is possible to combine these using other products on the market or even develop a home-grown EA repository, Troux has created a very easy-to-use API for data collection purposes. In addition, Troux provides a database meta-model for the repository that is extensible. Meta-model extensibility makes the product adaptable to the other management systems across the company. Troux also supports a configurable user interface allowing for a customized view into the repository. This capability makes the catalog appear as if it were a part of the other control systems already in place at the company. Additionally, Troux provides an optional set of applications that support a variety of roles, out of the box, with no meta-model extensions or user interface configurations required. These include: Troux Standards: This application supports the IT technology standards and lifecycle governance process usually conducted by the Enterprise Architecture department. Troux Optimization: This application supports the Application portfolio lifecycle management process conducted by the Enterprise Program Management Office (EPMO) and/or Enterprise Architecture. Troux Alignment: This application supports the business and IT assets and application-planning processes conducted by IT Engineering, Corporate Finance, and Enterprise Architecture. Even these three applications that are available out-of-the-box from Troux can be customized by extending their underlying meta-models and customizing the user interface. The EA repository provides output that is viewable online. Standard reports are provided or custom reports can be developed as per the specific needs of the user community. Departments within or even outside of IT can use the customized views, standard reports, and custom reports to perform analyses. For example, the Enterprise Program Management Office (EPMO) can produce reports that link projects with business goals. The EPMO can review the project portfolio of the company to identify projects that do not support company goals. Decisions can be made about these projects, thereby stopping them, slowing them down, or completing them faster. Resources can be moved from the stopped or completed low-value projects to the higher-value projects, leading to increased revenue or reduced costs for the company. In a similar fashion, the Internal Audit department can check on the level of compliance to company IT standards or use the list of applications stored within the catalog to determine the best audit schedule to follow. Less time can be spent auditing applications with minimal impact on company operations or on applications and projects targeted as low value. Application development can use data from the catalog to understand the current capabilities of the existing applications of the company. As staff changes or "off-shore" resources are applied to projects, knowing what existing systems do in advance of a new project can save many hours of work. Information can be extracted from the EA repository directly into requirements documentation, which is always the starting point for new applications, as well as maintenance projects on existing applications. One study performed at a major financial services company showed that over 40% of project development time was spent in the upfront work of documenting and explaining current application capabilities to business sponsors of projects. By supplying development teams with lists of application capabilities early in the project life cycle, time to gather and document requirements can be reduced significantly. Of course, one of the biggest benefactors of the repository is the EA group. In most companies, EA's main charter is to be the steward of information about applications, databases, hardware, software, and network architecture. EA can perform analyses using the data from the repository leading to recommendations for changes by middle and upper management. In addition, EA is responsible for collecting, setting, and managing the IT standards for the company. The repository supports a single source for IT standards, whether they are internal or external standards. The standards portion of the repository can be used as the centerpiece of IT governance. The function of the Architecture Review Board (ARB) is fully supported by Troux Standards. Capacity Planning and IT Engineering functions will also gain substantially through the use of an EA repository. The useful life of IT assets can be analyzed to create a master plan for technical refresh or reengineering efforts. The annual spend on IT expenses can be reduced dramatically through increased levels of virtualization of IT assets, consolidation of platforms, and even consolidation of whole data centers. IT Engineering can review what is currently running across the company and recommend changes to reduce software maintenance costs, eliminate underutilized hardware, and consolidate federated databases. Lastly, IT Operations can benefit from a consolidated view into the technical footprint running at any point in time. Even when system availability service levels call for near-real-time error correction, it may take hours for IT Operations personnel to diagnose problems. They tend not to have a full understanding of what applications run on what servers, which firewalls support which networks, and which databases support which applications. Problem determination time can be reduced by providing accurate technical architecture information to those focused on keeping systems running and meeting business service-level requirements. Summary This article identified the problem IT has with understanding what technologies it has under management. While many solutions are in place in many companies to gain a better view into the IT portfolio, none are designed to show the impact of IT assets in the aggregate. Without the capabilities provided by an EA repository, IT management has a difficult time answering tough questions asked by business leaders. Troux Technologies offers a solution to this problem using the Troux Transformation Platform. The platform acts as a master metadata repository and becomes the focus of many efforts that IT may run to reduce significant costs and improve business service levels. Further resources on this subject: Troux Enterprise Architecture: Managing the EA function [article]
Read more
  • 0
  • 0
  • 9108
article-image-introduction-logging-tomcat-7
Packt
21 Mar 2012
9 min read
Save for later

Introduction to Logging in Tomcat 7

Packt
21 Mar 2012
9 min read
(For more resources on Apache, see here.) JULI Previous versions of Tomcat (till 5.x) use Apache common logging services for generating logs. A major disadvantage with this logging mechanism is that it can handle only single JVM configuration and makes it difficult to configure separate logging for each class loader for independent application. In order to resolve this issue, Tomcat developers have introduced a separate API for Tomcat 6 version, that comes with the capability of capturing each class loader activity in the Tomcat logs. It is based on java.util.logging framework. By default, Tomcat 7 uses its own Java logging API to implement logging services. This is also called as JULI. This API can be found in TOMCAT_HOME/bin of the Tomcat 7 directory structures (tomcat-juli.jar). The following screenshot shows the directory structure of the bin directory where tomcat-juli.jar is placed. JULI also provides the feature for custom logging for each web application, and it also supports private per-application logging configurations. With the enhanced feature of separate class loader logging, it also helps in detecting memory issues while unloading the classes at runtime. For more information on JULI and the class loading issue, please refer to http://tomcat.apache.org/tomcat-7.0-doc/logging.html and http://tomcat.apache.org/tomcat-7.0-doc/class-loader-howto.html respectively. Loggers, appenders, and layouts There are some important components of logging which we use at the time of implementing the logging mechanism for applications. Each term has its individual importance in tracking the events of the application. Let's discuss each term individually to find out their usage: Loggers:It can be defined as the logical name for the log file. This logical name is written in the application code. We can configure an independent logger for each application. Appenders: The process of generation of logs are handled by appenders. There are many types of appenders, such as FileAppender, ConsoleAppender, SocketAppender, and so on, which are available in log4j. The following are some examples of appenders for log4j: log4j.appender.CATALINA=org.apache.log4j.DailyRollingFileAppender log4j.appender.CATALINA.File=${catalina.base}/logs/catalina.out log4j.appender.CATALINA.Append=true log4j.appender.CATALINA.Encoding=UTF-8 The previous four lines of appenders define the DailyRollingFileAppender in log4j, where the filename is catalina.out . These logs will have UTF-8 encoding enabled. If log4j.appender.CATALINA.append=false, then logs will not get updated in the log files. # Roll-over the log once per day log4j.appender.CATALINA.DatePattern='.'dd-MM-yyyy'.log' log4j.appender.CATALINA.layout = org.apache.log4j.PatternLayout log4j.appender.CATALINA.layout.ConversionPattern = %d [%t] %-5p %c- %m%n The previous three lines of code show the roll-over of log once per day. Layout: It is defined as the format of logs displayed in the log file. The appender uses layout to format the log files (also called as patterns). The highlighted code shows the pattern for access logs: <Valve className="org.apache.catalina.valves.AccessLogValve" directory="logs" prefix="localhost_access_log." suffix=".txt" pattern="%h %l %u %t &quot;%r&quot; %s %b" resolveHosts="false"/> Loggers, appenders, and layouts together help the developer to capture the log message for the application event. Types of logging in Tomcat 7 We can enable logging in Tomcat 7 in different ways based on the requirement. There are a total of five types of logging that we can configure in Tomcat, such as application, server, console, and so on. The following figure shows the different types of logging for Tomcat 7. These methods are used in combination with each other based on environment needs. For example, if you have issues where Tomcat services are not displayed, then console logs are very helpful to identify the issue, as we can verify the real-time boot sequence. Let's discuss each logging method briefly. Application log These logs are used to capture the application event while running the application transaction. These logs are very useful in order to identify the application level issues. For example, suppose your application performance is slow on a particular transition, then the details of that transition can only be traced in application log. The biggest advantage of application logs is we can configure separate log levels and log files for each application, making it very easy for the administrators to troubleshoot the application. Log4j is used in 90 percent of the cases for application log generation. Server log Server logs are identical to console logs. The only advantage of server logs is that they can be retrieved anytime but console logs are not available after we log out from the console. Console log This log gives you the complete information of Tomcat 7 startup and loader sequence. The log file is named as catalina.out and is found in TOMCAT_HOME/logs. This log file is very useful in checking the application deployment and server startup testing for any environment. This log is configured in the Tomcat file catalina.sh, which can be found in TOMCAT_HOME/bin. The previous screenshot shows the definition for Tomcat logging. By default, the console logs are configured as INFO mode. There are different levels of logging in Tomcat such as WARNING, INFORMATION, CONFIG, and FINE. The previous screenshot shows the Tomcat log file location, after the start of Tomcat services. The previous screenshot shows the output of the catalina.out file, where Tomcat services are started in 1903 ms. Access log Access logs are customized logs, which give information about the following: Who has accessed the application What components of the application are accessed Source IP and so on These logs play a vital role in traffic analysis of many applications to analyze the bandwidth requirement and also helps in troubleshooting the application under heavy load. These logs are configured in server.xml in TOMCAT_HOME/conf. The following screenshot shows the definition of access logs. You can customize them according to the environment and your auditing requirement. Let's discuss the pattern format of the access logs and understand how we can customize the logging format: <Valve className="org.apache.catalina.valves.AccessLogValve" directory="logs" prefix="localhost_access_log." suffix=".txt" pattern="%h %l %u %t &quot;%r&quot; %s %b" resolveHosts="false"/> Class Name: This parameter defines the class name used for generation of logs. By default, Apache Tomcat 7 uses the org.apache.catalina.valves.AccessLogValve class for access logs. Directory: This parameter defines the directory location for the log file. All the log files are generated in the log directory—TOMCAT_HOME/logs—but we can customize the log location based on our environment setup and then update the directory path in the definition of access logs. Prefix: This parameter defines the prefix of the access log filename, that is, by default, access log files are generated by the name localhost_access_log.yy-mm-dd.txt. Suffix: This parameter defines the file extension of the log file. Currently it is in .txt format. Pattern: This parameter defines the format of the log file. The pattern is a combination of values defined by the administrator, for example, %h = remote host address. The following screenshot shows the default log format for Tomcat 7. Access logs show the remote host address, date/time of request, method used for response, URI mapping, and HTTP status code. In case you have installed the web traffic analysis tool for application, then you have to change the access logs to a different format. Host manager These logs define the activity performed using Tomcat Manager, such as various tasks performed, status of application, deployment of application, and lifecycle of Tomcat. These configurations are done on the logging.properties, which can be found in TOMCAT_HOME/conf. The previous screenshot shows the definition of host, manager, and host-manager details. If you see the definitions, it defines the log location, log level, and prefix of the filename. In logging.properties, we are defining file handlers and appenders using JULI. The log file for manager looks similar to the following: I28 Jun, 2011 3:36:23 AM org.apache.catalina.core.ApplicationContext log INFO: HTMLManager: list: Listing contexts for virtual host 'localhost' 28 Jun, 2011 3:37:13 AM org.apache.catalina.core.ApplicationContext log INFO: HTMLManager: list: Listing contexts for virtual host 'localhost' 28 Jun, 2011 3:37:42 AM org.apache.catalina.core.ApplicationContext log INFO: HTMLManager: undeploy: Undeploying web application at '/sample' 28 Jun, 2011 3:37:43 AM org.apache.catalina.core.ApplicationContext log INFO: HTMLManager: list: Listing contexts for virtual host 'localhost' 28 Jun, 2011 3:42:59 AM org.apache.catalina.core.ApplicationContext log INFO: HTMLManager: list: Listing contexts for virtual host 'localhost' 28 Jun, 2011 3:43:01 AM org.apache.catalina.core.ApplicationContext log INFO: HTMLManager: list: Listing contexts for virtual host 'localhost' 28 Jun, 2011 3:53:44 AM org.apache.catalina.core.ApplicationContext log INFO: HTMLManager: list: Listing contexts for virtual host 'localhost' Types of log levels in Tomcat 7 There are seven levels defined for Tomcat logging services (JULI). They can be set based on the application requirement. The following figure shows the sequence of log levels for JULI: Every log level in JULI had its own functionality. The following table shows the functionality of each log level in JULI: Log level Description SEVERE(highest) Captures exception and Error WARNING Warning messages INFO Informational message, related to server activity CONFIG Configuration message FINE Detailed activity of server transaction (similar to debug) FINER More detailed logs than FINE FINEST(least) Entire flow of events (similar to trace) For example, let's take an appender from logging.properties and find out the log level used; the first log appender for localhost is using FINE as the log level, as shown in the following code snippet: localhost.org.apache.juli.FileHandler.level = FINE localhost.org.apache.juli.FileHandler.directory = ${catalina.base}/logs localhost.org.apache.juli.FileHandler.prefix = localhost. The following code shows the default file handler configuration for logging in Tomcat 7 using JULI. The properties are mentioned and log levels are highlighted: ############################################################ # Facility specific properties. # Provides extra control for each logger. ############################################################ org.apache.catalina.core.ContainerBase.[Catalina].[localhost].level = INFO org.apache.catalina.core.ContainerBase.[Catalina].[localhost].handlers = 2localhost.org.apache.juli.FileHandler org.apache.catalina.core.ContainerBase.[Catalina].[localhost].[/manager] .level = INFO org.apache.catalina.core.ContainerBase.[Catalina].[localhost].[/manager] .handlers = 3manager.org.apache.juli.FileHandler org.apache.catalina.core.ContainerBase.[Catalina].[localhost].[/host- manager].level = INFO org.apache.catalina.core.ContainerBase.[Catalina].[localhost].[/host- manager].handlers = 4host-manager.org.apache.juli.FileHandler
Read more
  • 0
  • 0
  • 8999

article-image-common-performance-issues
Packt
19 Jun 2014
16 min read
Save for later

Common performance issues

Packt
19 Jun 2014
16 min read
(For more resources related to this topic, see here.) Threading performance issues Threading performance issues are the issues related to concurrency, as follows: Lack of threading or excessive threading Threads blocking up to starvation (usually from competing on shared resources) Deadlock until the complete application hangs (threads waiting for each other) Memory performance issues Memory performance issues are the issues that are related to application memory management, as follows: Memory leakage: This issue is an explicit leakage or implicit leakage as seen in improper hashing Improper caching: This issue is due to over caching, inadequate size of the object, or missing essential caching Insufficient memory allocation: This issue is due to missing JVM memory tuning Algorithmic performance issues Implementing the application logic requires two important parameters that are related to each other; correctness and optimization. If the logic is not optimized, we have algorithmic issues, as follows: Costive algorithmic logic Unnecessary logic Work as designed performance issues The work as designed performance issue is a group of issues related to the application design. The application behaves exactly as designed but if the design has issues, it will lead to performance issues. Some examples of performance issues are as follows: Using synchronous when asynchronous should be used Neglecting remoteness, that is, using remote calls as if they are local calls Improper loading technique, that is, eager versus lazy loading techniques Selection of the size of the object Excessive serialization layers Web services granularity Too much synchronization Non-scalable architecture, especially in the integration layer or middleware Saturated hardware on a shared infrastructure Interfacing performance issues Whenever the application is dealing with resources, we may face the following interfacing issues that could impact our application performance: Using an old driver/library Missing frequent database housekeeping Database issues, such as, missing database indexes Low performing JMS or integration service bus Logging issues (excessive logging or not following the best practices while logging) Network component issues, that is, load balancer, proxy, firewall, and so on Miscellaneous performance issues Miscellaneous performance issues include different performance issues, as follows: Inconsistent performance of application components, for example, having slow components can cause the whole application to slow down Introduced performance issues to delay the processing speed Improper configuration tuning of different components, for example, JVM, application server, and so on Application-specific performance issues, such as excessive validations, apply many business rules, and so on Fake performance issues Fake performance issues could be a temporary issue or not even an issue. The famous examples are as follows: Networking temporary issues Scheduled running jobs (detected from the associated pattern) Software automatic updates (it must be disabled in production) Non-reproducible issues In the following sections, we will go through some of the listed issues. Threading performance issues Multithreading has the advantage of maximizing the hardware utilization. In particular, it maximizes the processing power by executing multiple tasks concurrently. But it has different side effects, especially if not used wisely inside the application. For example, in order to distribute tasks among different concurrent threads, there should be no or minimal data dependency, so each thread can complete its task without waiting for other threads to finish. Also, they shouldn't compete over different shared resources or they will be blocked, waiting for each other. We will discuss some of the common threading issues in the next section. Blocking threads A common issue where threads are blocked is waiting to obtain the monitor(s) of certain shared resources (objects), that is, holding by other threads. If most of the application server threads are consumed in a certain blocked status, the application becomes gradually unresponsive to user requests. In the Weblogic application server, if a thread keeps executing for more than a configurable period of time (not idle), it gets promoted to the Stuck thread. The more the threads are in the stuck status, the more the server status becomes critical. Configuring the stuck thread parameters is part of the Weblogic performance tuning. Performance symptoms The following symptoms are the performance symptoms that usually appear in cases of thread blocking: Slow application response (increased single request latency and pending user requests) Application server logs might show some stuck threads. The server's healthy status becomes critical on monitoring tools (application server console or different monitoring tools) Frequent application server restarts either manually or automatically Thread dump shows a lot of threads in the blocked status waiting for different resources Application profiling shows a lot of thread blocking An example of thread blocking To understand the effect of thread blocking on application execution, open the HighCPU project and measure the time it takes for execution by adding the following additional lines: long start= new Date().getTime(); .. .. long duration= new Date().getTime()-start; System.err.println("total time = "+duration); Now, try to execute the code with a different number of the thread pool size. We can try using the thread pool size as 50 and 5, and compare the results. In our results, the execution of the application with 5 threads is much faster than 50 threads! Let's now compare the NetBeans profiling results of both the executions to understand the reason behind this unexpected difference. The following screenshot shows the profiling of 50 threads; we can see a lot of blocking for the monitor in the column and the percentage of Monitor to the left waiting around at 75 percent: To get the preceding profiling screen, click on the Profile menu inside NetBeans, and then click on Profile Project (HighCPU). From the pop-up options, select Monitor and check all the available options, and then click on Run. The following screenshot shows the profiling of 5 threads, where there is almost no blocking, that is, less threads compete on these resources: Try to remove the System.out statement from inside the run() method, re-execute the tests, and compare the results. Another factor that also affects the selection of the pool size, especially when the thread execution takes long time, is the context switching overhead. This overhead requires the selection of the optimal pool size, usually related to the number of available processors for our application. Context switching is the CPU switching from one process (or thread) to another, which requires restoration of the execution data (different CPU registers and program counters). The context switching includes suspension of the current executing process, storing its current data, picking up the next process for execution according to its priority, and restoring its data. Although it's supported on the hardware level and is faster, most operating systems do this on the level of software context switching to improve the performance. The main reason behind this is the ability of the software context switching to selectively choose the required registers to save. Thread deadlock When many threads hold the monitor for objects that they need, this will result in a deadlock unless the implementation uses the new explicit Lock interface. In the example, we had a deadlock caused by two different threads waiting to obtain the monitor that the other thread held. The thread profiling will show these threads in a continuous blocking status, waiting for the monitors. All threads that go into the deadlock status become out of service for the user's requests, as shown in the following screenshot: Usually, this happens if the order of obtaining the locks is not planned. For example, if we need to have a quick and easy fix for a multidirectional thread deadlock, we can always lock the smallest or the largest bank account first, regardless of the transfer direction. This will prevent any deadlock from happening in our simple two-threaded mode. But if we have more threads, we need to have a much more mature way to handle this by using the Lock interface or some other technique. Memory performance issues In spite of all this great effort put into the allocated and free memory in an optimized way, we still see memory issues in Java Enterprise applications mainly due to the way people are dealing with memory in these applications. We will discuss mainly three types of memory issues: memory leakage, memory allocation, and application data caching. Memory leakage Memory leakage is a common performance issue where the garbage collector is not at fault; it is mainly the design/coding issues where the object is no longer required but it remains referenced in the heap, so the garbage collector can't reclaim its space. If this is repeated with different objects over a long period (according to object size and involved scenarios), it may lead to an out of memory error. The most common example of memory leakage is adding objects to the static collections (or an instance collection of long living objects, such as a servlet) and forgetting to clean collections totally or partially. Performance symptoms The following symptoms are some of the expected performance symptoms during a memory leakage in our application: The application uses heap memory increased by time The response slows down gradually due to memory congestion OutOfMemoryError occurs frequently in the logs and sometimes an application server restart is required Aggressive execution of garbage collection activities Heap dump shows a lot of objects retained (from the leakage types) A sudden increase of memory paging as reported by the operating system monitoring tools An example of memory leakage We have a sample application ExampleTwo; this is a product catalog where users can select products and add them to the basket. The application is written in spaghetti code, so it has a lot of issues, including bad design, improper object scopes, bad caching, and memory leakage. The following screenshot shows the product catalog browser page: One of the bad issues is the usage of the servlet instance (or static members), as it causes a lot of issues in multiple threads and has a common location for unnoticed memory leakages. We have added the following instance variable as a leakage location: private final HashMap<String, HashMap> cachingAllUsersCollection = new HashMap(); We will add some collections to the preceding code to cause memory leakage. We also used the caching in the session scope, which causes implicit leakage. The session scope leakage is difficult to diagnose, as it follows the session life cycle. Once the session is destroyed, the leakage stops, so we can say it is less severe but more difficult to catch. Adding global elements, such as a catalog or stock levels, to the session scope has no meaning. The session scope should only be restricted to the user-specific data. Also, forgetting to remove data that is not required from a session makes the memory utilization worse. Refer to the following code: @Stateful public class CacheSessionBean Instead of using a singleton class here or stateless bean with a static member, we used the Stateful bean, so it is instantiated per user session. We used JPA beans in the application layers instead of using View Objects. We also used loops over collections instead of querying or retrieving the required object directly, and so on. It would be good to troubleshoot this application with different profiling aspects to fix all these issues. All these factors are enough to describe such a project as spaghetti. We can use our knowledge in Apache JMeter to develop simple testing scenarios. As shown in the following screenshot, the scenario consists of catalog navigations and details of adding some products to the basket: Executing the test plan using many concurrent users over many iterations will show the bad behavior of our application, where the used memory is increased by time. There is no justification as the catalog is the same for all users and there's no specific user data, except for the IDs of the selected products. Actually, it needs to be saved inside the user session, which won't take any remarkable memory space. In our example, we intend to save a lot of objects in the session, implement a wrong session level, cache, and implement meaningless servlet level caching. All this will contribute to memory leakage. This gradual increase in the memory consumption is what we need to spot in our environment as early as possible (as we can see in the following screenshot, the memory consumption in our application is approaching 200 MB!): Improper data caching Caching is one of the critical components in the enterprise application architecture. It increases the application performance by decreasing the time required to query the object again from its data store, but it also complicates the application design and causes a lot of other secondary issues. The main concerns in the cache implementation are caching refresh rate, caching invalidation policy, data inconsistency in a distributed environment, locking issues while waiting to obtain the cached object's lock, and so on. Improper caching issue types The improper caching issue can take a lot of different variants. We will pick some of them and discuss them in the following sections. No caching (disabled caching) Disabled caching will definitely cause a big load over the interfacing resources (for example, database) by hitting it in with almost every interaction. This should be avoided while designing an enterprise application; otherwise; the application won't be usable. Fortunately, this has less impact than using wrong caching implementation! Most of the application components such as database, JPA, and application servers already have an out-of-the-box caching support. Too small caching size Too small caching size is a common performance issue, where the cache size is initially determined but doesn't get reviewed with the increase of the application data. The cache sizing is affected by many factors such as the memory size. If it allows more caching and the type of the data, lookup data should be cached entirely when possible, while transactional data shouldn't be cached unless required under a very strict locking mechanism. Also, the cache replacement policy and invalidation play an important role and should be tailored according to the application's needs, for example, least frequently used, least recently used, most frequently used, and so on. As a general rule, the bigger the cache size, the higher the cache hit rate and the lower the cache miss ratio. Also, the proper replacement policy contributes here; if we are working—as in our example—on an online product catalog, we may use the least recently used policy so all the old products will be removed, which makes sense as the users usually look for the new products. Monitoring of the caching utilization periodically is an essential proactive measure to catch any deviations early and adjust the cache size according to the monitoring results. For example, if the cache saturation is more than 90 percent and the missed cache ratio is high, a cache resizing is required. Missed cache hits are very costive as they hit the cache first and then the resource itself (for example, database) to get the required object, and then add this loaded object into the cache again by releasing another object (if the cache is 100 percent), according to the used cache replacement policy. Too big caching size Too big caching size might cause memory issues. If there is no control over the cache size and it keeps growing, and if it is a Java cache, the garbage collector will consume a lot of time trying to garbage collect that huge memory, aiming to free some memory. This will increase the garbage collection pause time and decrease the cache throughput. If the cache throughput is decreased, the latency to get objects from the cache will increase causing the cache retrieval cost to be high to the level it might be slower than hitting the actual resources (for example, database). Using the wrong caching policy Each application's cache implementation should be tailored according to the application's needs and data types (transactional versus lookup data). If the selection of the caching policy is wrong, the cache will affect the application performance rather than improving it. Performance symptoms According to the cache issue type and different cache configurations, we will see the following symptoms: Decreased cache hit rate (and increased cache missed ratio) Increased cache loading because of the improper size Increased cache latency with a huge caching size Spiky pattern in the performance testing response time, in case the cache size is not correct, causes continuous invalidation and reloading of the cached objects An example of improper caching techniques In our example, ExampleTwo, we have demonstrated many caching issues, such as no policy defined, global cache is wrong, local cache is improper, and no cache invalidation is implemented. So, we can have stale objects inside the cache. Cache invalidation is the process of refreshing or updating the existing object inside the cache or simply removing it from the cache. So in the next load, it reflects its recent values. This is to keep the cached objects always updated. Cache hit rate is the rate or ratio in which cache hits match (finds) the required cached object. It is the main measure for cache effectiveness together with the retrieval cost. Cache miss rate is the rate or ratio at which the cache hits the required object that is not found in the cache. Last access time is the timestamp of the last access (successful hit) to the cached objects. Caching replacement policies or algorithms are algorithms implemented by a cache to replace the existing cached objects with other new objects when there are no rooms available for any additional objects. This follows missed cache hits for these objects. Some examples of these policies are as follows: First-in-first-out (FIFO): In this policy, the cached object is aged and the oldest object is removed in favor of the new added ones. Least frequently used (LFU): In this policy, the cache picks the less frequently used object to free the memory, which means the cache will record statistics against each cached object. Least recently used (LRU): In this policy, the cache replaces the least recently accessed or used items; this means the cache will keep information like the last access time of all cached objects. Most recently used (MRU): This policy is the opposite of the previous one; it removes the most recently used items. This policy fits the application where items are no longer needed after the access, such as used exam vouchers. Aging policy: Every object in the cache will have an age limit, and once it exceeds this limit, it will be removed from the cache in the simple type. In the advanced type, it will also consider the invalidation of the cache according to predefined configuration rules, for example, every three hours, and so on. It is important for us to understand that caching is not our magic bullet and it has a lot of related issues and drawbacks. Sometimes, it causes overhead if not correctly tailored according to real application needs.
Read more
  • 0
  • 0
  • 8988

article-image-running-your-applications-aws-part-2
Cheryl Adams
19 Aug 2016
6 min read
Save for later

Running Your Applications with AWS - Part 2

Cheryl Adams
19 Aug 2016
6 min read
An active account with AWS means you are on your way with building in the cloud.  Before you start building, you need to tackle the Billing and Cost Management, under Account. It is likely that you are starting with a Free-Tier, so it is important to know that you still have the option of paying for additional services. Also, if you decide to continue with AWS,you should get familiar with this page.  This is not your average bill or invoice page—it is much more than that. The Billing & Cost Management Dashboard is a bird’s-eye view of all of your account activity. Once you start accumulating pay-as-you-go services, this page will give you a quick review of your monthly spending based on services. Part of managing your cloud services includes billing, so it is a good idea to become familiar with this from the start. Amazon also gives you the option of setting up cost-based alerts for your system, which is essential if youwant to be alerted by any excessive cost related to your cloud services. Budgets allow you to receive e-mailed notifications or alerts if spending exceeds the budget that you have created.    If you want to dig in even deeper, try turning on the Cost Explorer for an analysis of your spending. The Billing and Cost Management section of your account is much more than just invoices. It is the AWS complete cost management system for your cloud. Being familiar with all aspects of the cost management system will help you to monitor your cloud services, and hopefully avoid any expenses that may exceed your budget. In our previous discussion, we considered all AWSservices.  Let’s take another look at the details of the services. Amazon Web Services Based on this illustration, you can see that the build options are grouped by words such asCompute, Storage & Content Delivery and  Databases.  Each of these objects or services lists a step-by-step routine that is easy to follow. Within the AWS site, there are numerous tutorials with detailed build instructions. If you are still exploring in the free-tier, AWS also has an active online community of users whotry to answer most questions. Let’s look at the build process for Amazon’s EC2 Virtual Server. The first thing that you will notice is that Amazon provides 22 different Amazon Machine Images (AMIs) to choose from (at the time this post was written).At the top of the screen is a Step process that will guide you through the build. It should be noted that some of the images available are not defined as a part of the free-tier plan. The remaining images that do fit into the plan should fit almost any project need. For this walkthrough, let’s select SUSE Linux (free eligible). It is important to note that just because the image itself is free, that does not mean all the options available within that image are free. Notice on this screen that Amazon has pre-selected the only free-tier option available for this image. From this screen you are given two options: (Review and Launch) or (Next Configure Instance Details).  Let’s try Review and Launch to see what occurs. Notice that our Step process advanced to Step 7. Amazon gives you a soft warning regarding the state of the build and potential risk. If you are okay with these risks, you can proceed and launch your server. It is important to note that the Amazon build process is user driven. It will allow you to build a server with these potential risks in your cloud. It is recommended that you carefully consider each screen before proceeding. In this instance,select Previous and not Cancel to return to Step 3. Selecting Cancelwill stop the build process and return you to the AWS main services page. Until you actually launch your server, nothing is built or saved. There are information bubbles for each line in Step 3: Configure Instance Details. Review the content of each bubble, make any changes if needed, and then proceed to the next step. Select the storage size; then select Next Tag Instance. Enter Values and Continue or Learn More for further information. Select the Next: Configure Security Group button. Security is an extremely important part of setting up your virtual server. It is recommended that you speak to your security administrator to determine the best option. For source, it is recommended that you avoid using the Anywhereoption. This selection will put your build at risk. Select my IP or custom IP as shown. If you are involved in a self-study plan, you can select the Learn More link to determine the best option. Next: Review and Launch The full details of this screen be expanded, reviewed or edited. If everything appears to be okay,proceed to Launch. One additional screen will appear for adding Private and/or Public Keys to access your new server. Make the appropriate selection and proceed to the Launch Instances. One more screen will appear for adding Private and/or Public Keys to access your new server. Make the appropriate selection and proceed to Launch Instances to see the build process. You can access your new server from the EC2 Dashboard. This example of a build process gives you a window into how the  AWS build process works. The other objects and services have a similar step-through process. Once you have launched your server, you should be able to access it and proceed with your development. Additional details for development are also available through the site. Amazon’s Web Services Platform is an all-in-one solution for your graduation to the cloud. Not only can you manage your technical environment, but also it has features that allow you to manage your budget. By setting up your virtual applicances and servers appropriately, you can maximize the value of the first  12 months of your free-tier. Carefully monitoring activities through alerts and notification will help you to avoid having any billing surprises. Going through the tutorials and visting the online community will only aid to increase your knowledge base of AWS. AWS is inviting everyone to test their services on this exciting platform, so I would definitely recommend taking advantage of it. Have fun! About the author Cheryl Adams is a senior cloud data andinfrastructure architect in the healthcare data realm. She is also the co-author of Professional Hadoop by Wrox.
Read more
  • 0
  • 0
  • 8857
article-image-transactions-for-async-programming-in-javaee
Aaron Lazar
31 Jul 2018
5 min read
Save for later

Using Transactions with Asynchronous Tasks in JavaEE [Tutorial]

Aaron Lazar
31 Jul 2018
5 min read
Threading is a common issue in most software projects, no matter which language or other technology is involved. When talking about enterprise applications, things become even more important and sometimes harder. Using asynchronous tasks could be a challenge: what if you need to add some spice and add a transaction to it? Thankfully, the Java EE environment has some great features for dealing with this challenge, and this article will show you how. This article is an extract from the book Java EE 8 Cookbook, authored by Elder Moraes. Usually, a transaction means something like code blocking. Isn't it awkward to combine two opposing concepts? Well, it's not! They can work together nicely, as shown here. Adding Java EE 8 dependency Let's first add our Java EE 8 dependency: <dependency> <groupId>javax</groupId> <artifactId>javaee-api</artifactId> <version>8.0</version> <scope>provided</scope> </dependency> Let's first create a User POJO: public class User { private Long id; private String name; public Long getId() { return id; } public void setId(Long id) { this.id = id; } public String getName() { return name; } public void setName(String name) { this.name = name; } public User(Long id, String name) { this.id = id; this.name = name; } @Override public String toString() { return "User{" + "id=" + id + ", name=" + name + '}'; } } And here is a slow bean that will return User: @Stateless public class UserBean { public User getUser(){ try { TimeUnit.SECONDS.sleep(5); long id = new Date().getTime(); return new User(id, "User " + id); } catch (InterruptedException ex) { System.err.println(ex.getMessage()); long id = new Date().getTime(); return new User(id, "Error " + id); } } } Now we create a task to be executed that will return User using some transaction stuff: public class AsyncTask implements Callable<User> { private UserTransaction userTransaction; private UserBean userBean; @Override public User call() throws Exception { performLookups(); try { userTransaction.begin(); User user = userBean.getUser(); userTransaction.commit(); return user; } catch (IllegalStateException | SecurityException | HeuristicMixedException | HeuristicRollbackException | NotSupportedException | RollbackException | SystemException e) { userTransaction.rollback(); return null; } } private void performLookups() throws NamingException{ userBean = CDI.current().select(UserBean.class).get(); userTransaction = CDI.current() .select(UserTransaction.class).get(); } } And finally, here is the service endpoint that will use the task to write the result to a response: @Path("asyncService") @RequestScoped public class AsyncService { private AsyncTask asyncTask; @Resource(name = "LocalManagedExecutorService") private ManagedExecutorService executor; @PostConstruct public void init(){ asyncTask = new AsyncTask(); } @GET public void asyncService(@Suspended AsyncResponse response){ Future<User> result = executor.submit(asyncTask); while(!result.isDone()){ try { TimeUnit.SECONDS.sleep(1); } catch (InterruptedException ex) { System.err.println(ex.getMessage()); } } try { response.resume(Response.ok(result.get()).build()); } catch (InterruptedException | ExecutionException ex) { System.err.println(ex.getMessage()); response.resume(Response.status(Response .Status.INTERNAL_SERVER_ERROR) .entity(ex.getMessage()).build()); } } } To try this code, just deploy it to GlassFish 5 and open this URL: http://localhost:8080/ch09-async-transaction/asyncService How the Asynchronous execution works The magic happens in the AsyncTask class, where we will first take a look at the performLookups method: private void performLookups() throws NamingException{ Context ctx = new InitialContext(); userTransaction = (UserTransaction) ctx.lookup("java:comp/UserTransaction"); userBean = (UserBean) ctx.lookup("java:global/ ch09-async-transaction/UserBean"); } It will give you the instances of both UserTransaction and UserBean from the application server. Then you can relax and rely on the things already instantiated for you. As our task implements a Callabe<V> object that it needs to implement the call() method: @Override public User call() throws Exception { performLookups(); try { userTransaction.begin(); User user = userBean.getUser(); userTransaction.commit(); return user; } catch (IllegalStateException | SecurityException | HeuristicMixedException | HeuristicRollbackException | NotSupportedException | RollbackException | SystemException e) { userTransaction.rollback(); return null; } } You can see Callable as a Runnable interface that returns a result. Our transaction code lives here: userTransaction.begin(); User user = userBean.getUser(); userTransaction.commit(); And if anything goes wrong, we have the following: } catch (IllegalStateException | SecurityException | HeuristicMixedException | HeuristicRollbackException | NotSupportedException | RollbackException | SystemException e) { userTransaction.rollback(); return null; } Now we will look at AsyncService. First, we have some declarations: private AsyncTask asyncTask; @Resource(name = "LocalManagedExecutorService") private ManagedExecutorService executor; @PostConstruct public void init(){ asyncTask = new AsyncTask(); } We are asking the container to give us an instance from ManagedExecutorService, which It is responsible for executing the task in the enterprise context. Then we call an init() method, and the bean is constructed (@PostConstruct). This instantiates the task. Now we have our task execution: @GET public void asyncService(@Suspended AsyncResponse response){ Future<User> result = executor.submit(asyncTask); while(!result.isDone()){ try { TimeUnit.SECONDS.sleep(1); } catch (InterruptedException ex) { System.err.println(ex.getMessage()); } } try { response.resume(Response.ok(result.get()).build()); } catch (InterruptedException | ExecutionException ex) { System.err.println(ex.getMessage()); response.resume(Response.status(Response. Status.INTERNAL_SERVER_ERROR) .entity(ex.getMessage()).build()); } } Note that the executor returns Future<User>: Future<User> result = executor.submit(asyncTask); This means this task will be executed asynchronously. Then we check its execution status until it's done: while(!result.isDone()){ try { TimeUnit.SECONDS.sleep(1); } catch (InterruptedException ex) { System.err.println(ex.getMessage()); } } And once it's done, we write it down to the asynchronous response: response.resume(Response.ok(result.get()).build()); The full source code of this recipe is at Github. So now, using Transactions with Asynchronous Tasks in JavaEE isn't such a daunting task, is it? If you found this tutorial helpful and would like to learn more, head on to this book Java EE 8 Cookbook. Oracle announces a new pricing structure for Java Design a RESTful web API with Java [Tutorial] How to convert Java code into Kotlin
Read more
  • 0
  • 0
  • 8836

article-image-scribus-creating-layout
Packt
07 Jan 2011
9 min read
Save for later

Scribus: Creating a Layout

Packt
07 Jan 2011
9 min read
  Scribus 1.3.5: Beginner's Guide Create optimum page layouts for your documents using productive tools of Scribus. Master desktop publishing with Scribus Create professional-looking documents with ease Enhance the readability of your documents using powerful layout tools of Scribus Packed with interesting examples and screenshots that show you the most important Scribus tools to create and publish your documents. Creating a new layout Creating a layout in Scribus means dealing with the New Document window. It is not a complex window but be aware that many things you'll set here will be considered definitive. If these settings look simple or evident, you should consider all these settings as important. Some of them like the page size mean that you already have an idea of the final document, or atleast that you've already made some choices that won't change after it is created. Of course, Scribus will let you change this later if you change your mind, but many things you will have done in the meantime will simply have to be done again. Time for action – setting page size and paper size and margins This window is the first that opens when you launch Scribus or when you go to the File | New menu. It contains several options that need to be set. First among these options will certainly be the page size. In our case, people usually use 54x85mm (USA: 51×89mm). When you type the measurements in the Width and Height fields, the Size option, which contains the common size presets, is automatically switched to Custom. If you want to use a different system unit, just change the Default Unit value placed below. Usually, we prefer Millimeters (mm), which is quite precise without having too many significant decimals. Then, you can set the margin for your document. Professional printers are very different from desktop printers as they can print without margins. In fact, consider margins as helpers to place objects. For a small document like a business card, having small 4mm margins will be good. What just happened? Some common page sizes are: the series (the ISO standards biggest starting with A0 841x1189, that is 1m², and halving at each half step), the US formats, especially letter (216x279mm), legal (216x356mm), and tabloid (approximately 279x432mm, 11x17in), commonly used in the UK for newspapers. The best business card size When choosing the size for the business card, you'll consider the existing size often used. Is ISO 54x85.6mm better than the US 2x3.5in, or the European 55x85mm, or the Australian 55x90mm, when only a few millimeters divide them? Best is certainly to match the most commonly-used size in your country. Remember one thing: a business card must have to be easily stored and sorted. Grabbing an uncommon format can just lead to the fact that no one will be able to put your card in their wallet. Presets will be useful if you want to print locally, but don't forget that your print company crops the paper to the size you want. So don't mind being creative and do some testing. For example, you might print on an A3 size paper for your final document or in an A3+ real printing size so that you'll be able to use bleeds, as we'll explain in the following sections. Here we're talking about the page size and not the paper size, which can be double if the Document Layout is set to any option but Single Page. For all the folded documents, the page size differs from the paper size—keep that in mind. For now choose 54x85.6 in landscape: just set 54 as the height or change the orientation button if you haven't. The other setting that might interest you is the margin . In Scribus, consider the margin as a helper. In fact nobody in the professional print process will need margins. It is useful for desktop printers, which can't print up to the sheet border. As our example is much smaller than the usual paper size, we won't have any trouble with it. Scribus has some presets for margins that are available only when a layout other than Single Page is selected. For our model, 4mm to each side will be fine. If you want to set all the fields at once, just click on the chain button at the right-hand side of the margin fields. But actually, we can consider that we won't have much to write and that it would be nice if our margins could help position the text. So let's define the margins as follows: Left: 10mm Right: 40mm Top: 30mm Bottom: 2mm Choosing a layout We've already talked about this option several times but here we are again. What kind of layout would you choose? Single page will simulate what you might have in a text processor. You can have as many pages as you want but it will be printed page after page. You'll get its result when printing with your desktop printer: Double-sided will be the option you'll use when you'll need a folded document. This is useful for magazines, newsletters, books, or such documents. In this layout, the reader will see two pages side by side at once, and you can easily manage elements that will overlap both pages. The fold will be in the exact middle. Usually, unless you have a small document size like A5 or smaller, this layout is intended to be printed by a professional. 3-Fold and 4-Fold are more for commercial little brochures. Usually, you won't use it in Scribus and will prefer a Single Page layout that you'll divide later into three or four parts. Why? Because with the folded out, Scribus will consider each "fold" as a page and will print each of them on a separate sheet—a bit tricky. You can see that for a business card, where no fold is needed, the Single Page layout will be our choice. (Move the mouse over the image to enlarge.) For the moment we won't need other options, so you can click on OK. You'll get a white rectangle on a greyish workspace. The red outline is the selection indicator for the selected page. It shows the borders of the page. The blue rectangle shows where the margins are placed. Save the document as often as possible "Save the document as often as possible"—this is the first commandment of a software user, but in Scribus this is much more important for several reasons: First of all, apologies, Scribus is a very nice piece of software but still not perfect (but which one is?). It can crash sometimes, slightly more than you'd wish, and never at a time you would expect or appreciate. Saving often will help you save a lot of time doing again what you've already achieved during the day. The Scribus undo system acts on layout options but not on text manipulations. Saving often can be helpful if you make mistakes that you can't undo. In Scribus, we will use File | Save As (or Ctrl + Shift + S) to set the document name and format. It's very simple because you have no other choice than Scribus Documents *.sla. In the list, you will see sla.gz that will be used when the Compress File checkbox will be selected. Usually, a Scribus file is not that large in size and there is no real need to compress it. Of course, if the file already exists, Scribus asks whether you want to overwrite the previous one. Scribus file version Each Scribus release has enhanced the file format to be able to store the new possibilities in the file. But when saving, you cannot choose a version: Scribus will always use the current one. Every document can be opened in future Scribus releases but not in the older ones. So be careful when you need to send the file to someone or else when you're working on several computers. Once you've used Saved As, you'll just have to simply save (File | Save) or more magically use Ctrl + S, and the modifications will automatically be added to the saved document. The extra Save as Template menu will store the actual file in a special Scribus folder. When you want to create a new document with the same global aspect, you can go to the New from Template menu and grab it from the list. There are some default templates available here, but yours might be better. Saving as a template might not be the usual saving process; this is done at the end when the basics of your layout have been made. Saving as template must happen only once for a template. So we'll use it at the end of our tutorial. Basic frames for text and images The biggest part of a design job is adding frames, setting their visual aspect, and importing content into them. In our business card we'll need a logo, name, and other information. You may add a photo. Time for action – adding the logo They are several types of graphic elements in a layout. The logo is of course one of the most important. Generally, we prefer using vector logos in SVG or EPS. Let's import a logo. In the File menu choose File | Import | Get Vector File. The cursor has now been changed, and you can click on the page where you want to place the logo. Try to click at the upper-left corner of the margins. It will certainly not be correctly placed and the logo may be too big. We'll soon see how to change it. A warning will appear and inform you that some SVG features will not be supported. There is no option other than clicking on OK, and everything should be good. What just happened? The logo is the master piece of the card. It helps recognize the origin of the contact. In some ways, it is the most important recognition for a company. Usually, a logo is the only graphical element on the card. It can be put anywhere you want, but generally the upper left-hand side corner is the place of choice.  
Read more
  • 0
  • 0
  • 8787
Modal Close icon
Modal Close icon