Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Events
Videos
Audiobooks
Packt Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials - Programming

1081 Articles
article-image-welcome-spring-framework
Packt
30 Apr 2015
17 min read
Save for later

Welcome to the Spring Framework

Packt
30 Apr 2015
17 min read
In this article by Ravi Kant Soni, author of the book Learning Spring Application Development, you will be closely acquainted with the Spring Framework. Spring is an open source framework created by Rod Johnson to address the complexity of enterprise application development. Spring is now a long time de facto standard for Java enterprise software development. The framework was designed with developer productivity in mind and this makes it easier to work with the existing Java and JEE APIs. Using Spring, we can develop standalone applications, desktop applications, two tier applications, web applications, distributed applications, enterprise applications, and so on. (For more resources related to this topic, see here.) Features of the Spring Framework Lightweight: Spring is described as a lightweight framework when it comes to size and transparency. Lightweight frameworks reduce complexity in application code and also avoid unnecessary complexity in their own functioning. Non intrusive: Non intrusive means that your domain logic code has no dependencies on the framework itself. Spring is designed to be non intrusive. Container: Spring's container is a lightweight container, which contains and manages the life cycle and configuration of application objects. Inversion of control (IoC): Inversion of Control is an architectural pattern. This describes the Dependency Injection that needs to be performed by external entities instead of creating dependencies by the component itself. Aspect-oriented programming (AOP): Aspect-oriented programming refers to the programming paradigm that isolates supporting functions from the main program's business logic. It allows developers to build the core functionality of a system without making it aware of the secondary requirements of this system. JDBC exception handling: The JDBC abstraction layer of the Spring Framework offers a exceptional hierarchy that simplifies the error handling strategy. Spring MVC Framework: Spring comes with an MVC web application framework to build robust and maintainable web applications. Spring Security: Spring Security offers a declarative security mechanism for Spring-based applications, which is a critical aspect of many applications. ApplicationContext ApplicationContext is defined by the org.springframework.context.ApplicationContext interface. BeanFactory provides a basic functionality, while ApplicationContext provides advance features to our spring applications, which make them enterprise-level applications. Create ApplicationContext by using the ClassPathXmlApplicationContext framework API. This API loads the beans configuration file and it takes care of creating and initializing all the beans mentioned in the configuration file: import org.springframework.context.ApplicationContext; import org.springframework.context.support.ClassPathXmlApplicationContext;   public class MainApp {   public static void main(String[] args) {      ApplicationContext context =    new ClassPathXmlApplicationContext("beans.xml");      HelloWorld helloWorld =    (HelloWorld) context.getBean("helloworld");      helloWorld.getMessage(); } } Autowiring modes There are five modes of autowiring that can be used to instruct Spring Container to use autowiring for Dependency Injection. You use the autowire attribute of the <bean/> element to specify the autowire mode for a bean definition. The following table explains the different modes of autowire: Mode Description no By default, the Spring bean autowiring is turned off, meaning no autowiring is to be performed. You should use the explicit bean reference called ref for wiring purposes. byName This autowires by the property name. If the bean property is the same as the other bean name, autowire it. The setter method is used for this type of autowiring to inject dependency. byType Data type is used for this type of autowiring. If the data type bean property is compatible with the data type of the other bean, autowire it. Only one bean should be configured for this type in the configuration file; otherwise, a fatal exception will be thrown. constructor This is similar to the byType autowire, but here a constructor is used to inject dependencies. autodetect Spring first tries to autowire by constructor; if this does not work, then it tries to autowire by byType. This option is deprecated. Stereotype annotation Generally, @Component, a parent stereotype annotation, can define all beans. The following table explains the different stereotype annotations: Annotation Use Description @Component Type This is a generic stereotype annotation for any Spring-managed component. @Service Type This stereotypes a component as a service and is used when defining a class that handles the business logic. @Controller Type This stereotypes a component as a Spring MVC controller. It is used when defining a controller class, which composes of a presentation layer and is available only on Spring MVC. @Repository Type This stereotypes a component as a repository and is used when defining a class that handles the data access logic and provide translations on the exception occurred at the persistence layer. Annotation-based container configuration For a Spring IoC container to recognize annotation, the following definition must be added to the configuration file: <?xml version="1.0" encoding="UTF-8"?> <beans xsi_schemaLocation="http://www.springframework.org/schema/beans    http://www.springframework.org/schema/beans/spring-beans.xsd    http://www.springframework.org/schema/context    http://www.springframework.org/schema/context/spring-context-    3.2.xsd">   <context:annotation-config />                             </beans> Aspect-oriented programming (AOP) supports in Spring AOP is used in Spring to provide declarative enterprise services, especially as a replacement for EJB declarative services. Application objects do what they're supposed to do—perform business logic—and nothing more. They are not responsible for (or even aware of) other system concerns, such as logging, security, auditing, locking, and event handling. AOP is a methodology of applying middleware services, such as security services, transaction management services, and so on on the Spring application. Declaring an aspect An aspect can be declared by annotating the POJO class with the @Aspect annotation. This aspect is required to import the org.aspectj.lang.annotation.aspect package. The following code snippet represents the aspect declaration in the @AspectJ form: import org.aspectj.lang.annotation.Aspect; import org.springframework.stereotype.Component;   @Aspect @Component ("myAspect") public class AspectModule { // ... } JDBC with the Spring Framework The DriverManagerDataSource class is used to configure the DataSource for application, which is defined in the Spring.xml configuration file. The central class of Spring JDBC's abstraction framework is the JdbcTemplate class that includes the most common logic in using the JDBC API to access data (such as handling the creation of connection, creation of statement, execution of statement, and release of resources). The JdbcTemplate class resides in the org.springframework.jdbc.core package. JdbcTemplate can be used to execute different types of SQL statements. DML is an abbreviation of data manipulation language and is used to retrieve, modify, insert, update, and delete data in a database. Examples of DML are SELECT, INSERT, or UPDATE statements. DDL is an abbreviation of data definition language and is used to create or modify the structure of database objects in a database. Examples of DDL are CREATE, ALTER, and DROP statements. The JDBC batch operation in Spring The JDBC batch operation allows you to submit multiple SQL DataSource to process at once. Submitting multiple SQL DataSource together instead of separately improves the performance: JDBC with batch processing Hibernate with the Spring Framework Data persistence is an ability of an object to save its state so that it can regain the same state. Hibernate is one of the ORM libraries that is available to the open source community. Hibernate is the main component available for a Java developer with features such as POJO-based approach and supports relationship definitions. The object query language used by Hibernate is called as Hibernate Query Language (HQL). HQL is an SQL-like textual query language working at a class level or a field level. Let's start learning the architecture of Hibernate. Hibernate annotations is the powerful way to provide the metadata for the object and relational table mapping. Hibernate provides an implementation of the Java Persistence API so that we can use JPA annotations with model beans. Hibernate will take care of configuring it to be used in CRUD operations. The following table explains JPA annotations: JPA annotation Description @Entity The javax.persistence.Entity annotation is used to mark a class as an entity bean that can be persisted by Hibernate, as Hibernate provides the JPA implementation. @Table The javax.persistence.Table annotation is used to define table mapping and unique constraints for various columns. The @Table annotation provides four attributes, which allows you to override the name of the table, its catalogue, and its schema. This annotation also allows you to enforce unique constraints on columns in the table. For now, we will just use the table name as Employee. @Id Each entity bean will have a primary key, which you annotate on the class with the @Id annotation. The javax.persistence.Id annotation is used to define the primary key for the table. By default, the @Id annotation will automatically determine the most appropriate primary key generation strategy to be used. @GeneratedValue javax.persistence.GeneratedValue is used to define the field that will be autogenerated. It takes two parameters, that is, strategy and generator. The GenerationType.IDENTITY strategy is used so that the generated id value is mapped to the bean and can be retrieved in the Java program. @Column javax.persistence.Column is used to map the field with the table column. We can also specify the length, nullable, and uniqueness for the bean properties. Object-relational mapping (ORM, O/RM, and O/R mapping) ORM stands for Object-relational Mapping. ORM is the process of persisting objects in a relational database such as RDBMS. ORM bridges the gap between object and relational schemas, allowing object-oriented application to persist objects directly without having the need to convert object to and from a relational format: Hibernate Query Language (HQL) Hibernate Query Language (HQL) is an object-oriented query language that works on persistence object and their properties instead of operating on tables and columns. To use HQL, we need to use a query object. Query interface is an object-oriented representation of HQL. The query interface provides many methods; let's take a look at a few of them: Method Description public int executeUpdate() This is used to execute the update or delete query public List list() This returns the result of the relation as a list public Query setFirstResult(int rowno) This specifies the row number from where a record will be retrieved public Query setMaxResult(int rowno) This specifies the number of records to be retrieved from the relation (table) public Query setParameter(int position, Object value) This sets the value to the JDBC style query parameter public Query setParameter(String name, Object value) This sets the value to a named query parameter The Spring Web MVC Framework Spring Framework supports web application development by providing comprehensive and intensive support. The Spring MVC framework is a robust, flexible, and well-designed framework used to develop web applications. It's designed in such a way that development of a web application is highly configurable to Model, View, and Controller. In an MVC design pattern, Model represents the data of a web application, View represents the UI, that is, user interface components, such as checkbox, textbox, and so on, that are used to display web pages, and Controller processes the user request. Spring MVC framework supports the integration of other frameworks, such as Struts and WebWork, in a Spring application. This framework also helps in integrating other view technologies, such as Java Server Pages (JSP), velocity, tiles, and FreeMarker in a Spring application. The Spring MVC Framework is designed around a DispatcherServlet. The DispatcherServlet dispatches the http request to handler, which is a very simple controller interface. The Spring MVC Framework provides a set of the following web support features: Powerful configuration of framework and application classes: The Spring MVC Framework provides a powerful and straightforward configuration of framework and application classes (such as JavaBeans). Easier testing: Most of the Spring classes are designed as JavaBeans, which enable you to inject the test data using the setter method of these JavaBeans classes. The Spring MVC framework also provides classes to handle the Hyper Text Transfer Protocol (HTTP) requests (HttpServletRequest), which makes the unit testing of the web application much simpler. Separation of roles: Each component of a Spring MVC Framework performs a different role during request handling. A request is handled by components (such as controller, validator, model object, view resolver, and the HandlerMapping interface). The whole task is dependent on these components and provides a clear separation of roles. No need of the duplication of code: In the Spring MVC Framework, we can use the existing business code in any component of the Spring MVC application. Therefore, no duplicity of code arises in a Spring MVC application. Specific validation and binding: Validation errors are displayed when any mismatched data is entered in a form. DispatcherServlet in Spring MVC The DispatcherServlet of the Spring MVC Framework is an implementation of front controller and is a Java Servlet component for Spring MVC applications. DispatcherServlet is a front controller class that receives all incoming HTTP client request for the Spring MVC application. DispatcherServlet is also responsible for initializing the framework components that will be used to process the request at various stages. The following code snippet declares the DispatcherServlet in the web.xml deployment descriptor: <servlet> <servlet-name>SpringDispatcher</servlet-name> <servlet-class>    org.springframework.web.DispatcherServlet </servlet-class> <load-on-startup>1</load-on-startup> </servlet>   <servlet-mapping> <servlet-name>SpringDispatcher</servlet-name> <url-pattern>/</url-pattern> </servlet-mapping> In the preceding code snippet, the user-defined name of the DispatcherServlet class is SpringDispatcher, which is enclosed with the <servlet-name> element. When our newly created SpringDispatcher class is loaded in a web application, it loads an application context from an XML file. DispatcherServlet will try to load the application context from a file named SpringDispatcher-servlet.xml, which will be located in the application's WEB-INF directory: <beans xsi_schemaLocation=" http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-3.0.xsd http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context- 3.0.xsd http://www.springframework.org/schema/mvc http://www.springframework.org/schema/mvc/spring-mvc-3.0.xsd">   <mvc:annotation-driven />   <context:component-scan base- package="org.packt.Spring.chapter7.springmvc" />   <beanclass="org.springframework.web.servlet.view. InternalResourceViewResolver">    <property name="prefix" value="/WEB-INF/views/" />    <property name="suffix" value=".jsp" /> </bean>   </beans> Spring Security The Spring Security framework is the de facto standard to secure Spring-based applications. The Spring Security framework provides security services for enterprise Java software applications by handling authentication and authorization. The Spring Security framework handles authentication and authorization at the web request and the method invocation level. The two major operations provided by Spring Security are as follows: Authentication: Authentication is the process of assuring that a user is the one who he/she claims to be. It's a combination of identification and verification. The identification process can be performed in a number of different ways, that is, username and password that can be stored in a database, LDAP, or CAS (single sign-out protocol), and so on. Spring Security provides a password encoder interface to make sure that the user's password is hashed. Authorization: Authorization provides access control to an authenticated user. It's the process of assurance that the authenticated user is allowed to access only those resources that he/she is authorized for use. Let's take a look at an example of the HR payroll application, where some parts of the application have access to HR and to some other parts, all the employees have access. The access rights given to user of the system will determine the access rules. In a web-based application, this is often done by URL-based security and is implemented using filters that play an primary role in securing the Spring web application. Sometimes, URL-based security is not enough in web application because URLs can be manipulated and can have relative pass. So, Spring Security also provides method level security. An authorized user will only able to invoke those methods that he is granted access for. Securing web application's URL access HttpServletRequest is the starting point of Java's web application. To configure web security, it's required to set up a filter that provides various security features. In order to enable Spring Security, add filter and their mapping in the web.xml file: <!—Spring Security --> <filter> <filter-name>springSecurityFilterChain</filter-name> <filter-class>org.springframework.web.filter. DelegatingFilterProxy</filter-class> </filter>   <filter-mapping> <filter-name>springSecurityFilterChain</filter-name> <url-pattern>/*</url-pattern> </filter-mapping> Logging in to a web application There are multiple ways supported by Spring security for users to log in to a web application: HTTP basic authentication: This is supported by Spring Security by processing the basic credentials presented in the header of the HTTP request. It's generally used with stateless clients, who on each request pass their credential. Form-based login service: Spring Security supports the form-based login service by providing a default login form page for users to log in to the web application. Logout service: Spring Security supports logout services that allow users to log out of this application. Anonymous login: This service is provided by Spring Security that grants authority to an anonymous user, such as a normal user. Remember-me support: This is also supported by Spring Security and remembers the identity of a user across multiple browser sessions. Encrypting passwords Spring Security supports some hashing algorithms such as MD5 (Md5PasswordEncoder), SHA (ShaPasswordEncoder), and BCrypt (BCryptPasswordEncoder) for password encryption. To enable the password encoder, use the <password-encoder/> element and set the hash attribute, as shown in the following code snippet: <authentication-manager> <authentication-provider>    <password-encoder hash="md5" />    <jdbc-user-service data-source-    ref="dataSource"    . . .   </authentication-provider> </authentication-manager> Mail support in the Spring Framework The Spring Framework provides a simplified API and plug-in for full e-mail support, which minimizes the effect of the underlying e-mailing system specifications. The Sprig e-mail supports provide an abstract, easy, and implementation independent API to send e-mails. The Spring Framework provides an API to simplify the use of the JavaMail API. The classes handle the initialization, cleanup operations, and exceptions. The packages for the JavaMail API provided by the Spring Framework are listed as follows: Package Description org.springframework.mail This defines the basic set of classes and interfaces to send e-mails. org.springframework.mail.java This defines JavaMail API-specific classes and interfaces to send e-mails. Spring's Java Messaging Service (JMS) Java Message Service is a Java Message-oriented middleware (MOM) API responsible for sending messages between two or more clients. JMS is a part of the Java enterprise edition. JMS is a broker similar to a postman who acts like a middleware between the message sender and the receiver. Message is nothing, but just bytes of data or information exchanged between two parties. By taking different specifications, a message can be described in various ways. However, it's nothing, but an entity of communication. A message can be used to transfer a piece of information from one application to another, which may or may not run on the same platform. The JMS application Let's look at the sample JMS application pictorial, as shown in the following diagram: We have a Sender and a Receiver. The Sender is responsible for sending a message and the Receiver is responsible for receiving a message. We need a broker or MOM between the Sender and Receiver, who takes the sender's message and passes it from the network to the receiver. Message oriented middleware (MOM) is basically an MQ application such as ActiveMQ or IBM-MQ, which are two different message providers. The sender promises loose coupling and it can be .NET or mainframe-based application. The receiver can be Java or Spring-based application and it sends back the message to the sender as well. This is a two-way communication, which is loosely coupled. Summary This article covered the architecture of Spring Framework and how to set up the key components of the Spring application development environment. Resources for Article: Further resources on this subject: Creating an Extension in Yii 2 [article] Serving and processing forms [article] Time Travelling with Spring [article]
Read more
  • 0
  • 0
  • 9351

article-image-containerizing-web-application-docker-part-1
Darwin Corn
10 Jun 2016
4 min read
Save for later

Containerizing a Web Application with Docker Part 1

Darwin Corn
10 Jun 2016
4 min read
Congratulations, you’ve written a web application! Now what? Part one of this post deals with steps to take after development, more specifically the creation of a Docker image that contains the application. In part two, I’ll lay out deploying that image to the Google Cloud Platform as well as some further reading that'll help you descend into the rabbit hole that is DevOps. For demonstration purposes, let’s say that you’re me and you want to share your adventures in TrapRap and Death Metal (not simultaneously, thankfully!) with the world. I’ve written a simple Ember frontend for this purpose, and through the course of this post, I will explain to you how I go about containerizing it. Of course, the beauty of this procedure is that it will work with any frontend application, and you are certainly welcome to Bring Your Own Code. Everything I use is publically available on GitHub, however, and you’re certainly welcome to work through this post with the material presented as well. So, I’ve got this web app. You can get it here, or you can run: $ git clone https://github.com/ndarwincorn/docker-demo.git Do this for wherever it is you keep your source code. You’ll need ember-cli and some familiarity with Ember to customize it yourself, or you can just cut to the chase and build the Docker image, which is what I’m going to do in this post. I’m using Docker 1.10, but there’s no reason this wouldn’t work on a Mac running Docker Toolbox (or even Boot2Docker, but don’t quote me on that) or a less bleeding edge Linux distro. Since installing Docker is well documented, I won’t get into that here and will continue with the assumption that you have a working, up-to-date Docker installed on your machine, and that the Docker daemon is running. If you’re working with your own app, feel free to skip below to my explanation of the process and then come back here when you’ve got a Dockerfile in the root of your application. In the root of the application, run the following (make sure you don’t have any locally-installed web servers listening on port 80 already): # docker build -t docker-demo . # docker run -d -p 80:80 --name demo docker-demo Once the command finishes by printing a container ID, launch a web browser and navigate to http://localhost. Hey! Now you can listen to my music served from a LXC container running on your very own computer. How did we accomplish this? Let’s take it piece-by-piece (here’s where to start reading again if you’ve approached this article with your own app): I created a simple Dockerfile using the official Nginx image because I have a deep-seated mistrust of Canonical and don’t want to use the Dockerfile here. Here’s what it looks like in my project: docker-demo/Dockerfile FROM nginx COPY dist/usr/share/nginx/html Running the docker build command reads the Dockerfile and uses it to configure a docker image based on the nginx image. During image configuration, it copies the contents of the dist folder in my project to /srv/http/docker-demo in the container, which the nginx configuration that was mentioned is pointed to. The -t flag tells Docker to ‘tag’ (name) the image we’ve just created as ‘docker-demo’. The docker run command takes that image and builds a container from it. The -d flag is short for ‘detach’, or run the /usr/bin/nginx command built into the image from our Dockerfile and leave the container running. The -p flag maps a port on the host to a port in the container, and --name names the container for later reference. The command should return a container ID that can be used to manipulate it later. In part two, I’ll show you how to push the image we created to the Google Cloud Platform and then launch it as a container in a specially-purposed VM on their Compute Engine. About the Author Darwin Corn is a Systems Analyst for the Consumer Direct Care Network. He is a mid-level professional with diverse experience in the Information Technology world.
Read more
  • 0
  • 0
  • 9350

article-image-overview-process-management-microsoft-visio-2013
Packt
20 Nov 2013
6 min read
Save for later

Overview of Process Management in Microsoft Visio 2013

Packt
20 Nov 2013
6 min read
(For more resources related to this topic, see here.) When Visio was first conceived of over 20 years ago, its first stated marketing aim was to outsell ABC Flowcharter, the best-selling process diagramming tool at the time. Therefore, Visio had to have all of the features from the start that are core in the creation of flowcharts, namely the ability to connect one shape to another and to have the lines route themselves around shapes. Visio soon achieved its aim, and looked for other targets to reach. So, process flow diagrams have long been a cornerstone of Visio's popularity and appeal and, although there have been some usability improvements over the years, there have been few enhancements to turn the diagrams into models that can be managed efficiently. Microsoft Visio 2010 saw the introduction of two features, structured diagrams and validation rules, that make process management achievable and customizable, and Microsoft Visio 2013 sees these features enhanced. In this article, you will be introduced to the new features that have been added to Microsoft Visio to support structured diagrams and validation. You will see where Visio fits in the Process Management stack, and explore the relevant out of the box content. Exploring the new process management features in Visio 2013 Firstly, Microsoft Visio 2010 introduced a new Validation API for structured diagrams and provided several examples of this in use, for example with the BPMN (Business Process Modeling Notation) Diagram and Microsoft SharePoint Workflow templates and the improvements to the Basic Flowchart and Cross-Functional Flowchart templates, all of which are found in the Flowchart category. Microsoft Visio 2013 has updated the version of BPMN from 1.1 to 2.0, and has introduced a new SharePoint 2013 Workflow template, in addition to the 2010 one. Templates in Visio consist of a predefined Visio document that has one or more pages, and may have a series of docked stencils (usually positioned on the left-hand side of workspace area). The template document may have an associated list of add-ons that are active while it is in use, and, with Visio 2013 Professional edition, an associated list of structured diagram validation rulesets as well. Most of the templates that contain validation rules in Visio 2013 are in the Flowchart category, as seen in the following screenshot, with the exception being the Six Sigma template in the Business category. Secondly, the concept of a Subprocess was introduced in Visio 2010. This enables processes to hyperlink to other pages describing the subprocesses in the same document, or even across documents. This latter point is necessary if subprocesses are stored in a document library, such as Microsoft SharePoint. The following screenshot illustrates how an existing subprocess can be associated with a shape in a larger process, selecting an existing shape in the diagram, before selecting the existing page that it links to from the drop-down menu on the Link to Existing button. In addition, a subprocess page can be created from an existing shape, or a selection of shapes, in which case they will be moved to the newly-created page. There were also a number of ease-of-use features introduced in Microsoft Visio 2010 to assist in the creation and revision of process flow diagrams. These include: Easy auto-connection of shapes Aligning and spacing of shapes Insertion and deletion of connected shapes Improved cross-functional flowcharts Subprocesses An infinite page option, so you need not go over the edge of the paper ever again Microsoft Visio 2013 has added two more notable features: Commenting (a replacement for the old reviewer's comments) Co-authoring However, this book is not about teaching the user how to use these features, since there will be many other authors willing to show you how to perform tasks that only need to be explained once. This book is about understanding the Validation API in particular, so that you can create, or amend, the rules to match the business logic that your business requires. Reviewing Visio Process Management capabilities Microsoft Visio now sits at the top of the Microsoft Process Management Product Stack, providing a Business Process Analysis (BPA) or Business Process Modeling (BPM) tool for business analysts, process owners/participants, and line of business software architects/developers. Understanding the Visio BMP Maturity Model If we look at the Visio BPM Maturity Model that Microsoft has previously presented to its partners, then we can see that Visio 2013 has filled some of the gaps that were still there after Visio 2010. However, we can also see that there are plenty of opportunities for partners to provide solutions on top of the Visio platform. The maturity model shows how Visio initially provided the means to capture paper-drawn business processes into electronic format, and included the ability to encapsulate data into each shape and infer the relationship and order between elements through connectors. Visio 2007 Professional added the ability to easily link shapes, which represent processes, tasks, decisions, gateways, and so on with a data source. Along with that, data graphics were provided to enable shape data to be displayed simply as icons, data bars, text, or to be colored by value. This enriched the user experience and provided quicker visual representation of data, thus increasing the comprehension of the data in the diagrams. Generic templates for specific types of business modeling were provided. Visio had a built-in report writer for many versions, which provided the ability to export to Excel or XML, but Visio 2010 Premium introduced the concept of validation and structured diagrams, which meant that the information could be verified before exporting. Some templates for specific types of business modeling were provided. Visio 2010 Premium also saw the introduction of Visio Services on SharePoint that provided the automatic (without involving the Visio client) refreshing of data graphics that were linked to specific types of data sources. Throughout this book we will be going into detail about Level 5 (Validation) in Visio 2013, because it is important to understand the core capabilities provided in Visio 2013. We will then be able to take the opportunity to provide custom Business Rule Modeling and Visualization. Reviewing the foundations of structured diagramming A structured diagram is a set of logical relationships between items, where these relationships provide visual organization or describe special interaction behaviors between them. The Microsoft Visio team analyzed the requirements for adding structure to diagrams and came up with a number of features that needed to be added to the Visio product to achieve this: Container Management: The ability to add labeled boxes around shapes to visually organize them Callout Management: The ability to associate callouts with shapes to display notes List Management: To provide order to shapes within a container Validation API: The ability to test the business logic of a diagram Connectivity API: The ability to create, remove, or traverse connections easily The following diagram demonstrates the use of Containers and Callouts in the construction of a basic flowchart, that has been validated using the Validation API, which in turn uses the Connectivity API.
Read more
  • 0
  • 0
  • 9330

article-image-working-geo-spatial-data-python
Packt
30 Dec 2010
7 min read
Save for later

Working with Geo-Spatial Data in Python

Packt
30 Dec 2010
7 min read
Python Geospatial Development If you want to follow through the examples in this article, make sure you have the following Python libraries installed on your computer: GDAL/OGR version 1.7 or later (http://gdal.org) pyproj version 1.8.6 or later (http://code.google.com/p/pyproj) Shapely version 1.2 or later (http://trac.gispython.org/lab/wiki/Shapely) Reading and writing geo-spatial data In this section, we will look at some examples of tasks you might want to perform that involve reading and writing geo-spatial data in both vector and raster format. Task: Calculate the bounding box for each country in the world In this slightly contrived example, we will make use of a Shapefile to calculate the minimum and maximum latitude/longitude values for each country in the world. This "bounding box" can be used, among other things, to generate a map of a particular country. For example, the bounding box for Turkey would look like this: Start by downloading the World Borders Dataset from:   http://thematicmapping.org/downloads/world_borders.php Decompress the .zip archive and place the various files that make up the Shapefile (the .dbf, .prj, .shp, and .shx files) together in a suitable directory. We next need to create a Python program that can read the borders of each country. Fortunately, using OGR to read through the contents of a Shapefile is trivial: import osgeo.ogr shapefile = osgeo.ogr.Open("TM_WORLD_BORDERS-0.3.shp") layer = shapefile.GetLayer(0) for i in range(layer.GetFeatureCount()): feature = layer.GetFeature(i) The feature consists of a geometry and a set of fields. For this data, the geometry is a polygon that defines the outline of the country, while the fields contain various pieces of information about the country. According to the Readme.txt file, the fields in this Shapefile include the ISO-3166 three-letter code for the country (in a field named ISO3) as well as the name for the country (in a field named NAME). This allows us to obtain the country code and name like this: countryCode = feature.GetField("ISO3") countryName = feature.GetField("NAME") We can also obtain the country's border polygon using: geometry = feature.GetGeometryRef() There are all sorts of things we can do with this geometry, but in this case we want to obtain the bounding box or envelope for the polygon: minLong,maxLong,minLat,maxLat = geometry.GetEnvelope() Let's put all this together into a complete working program: # calcBoundingBoxes.py import osgeo.ogr shapefile = osgeo.ogr.Open("TM_WORLD_BORDERS-0.3.shp") layer = shapefile.GetLayer(0) countries = [] # List of (code,name,minLat,maxLat, # minLong,maxLong) tuples. for i in range(layer.GetFeatureCount()): feature = layer.GetFeature(i) countryCode = feature.GetField("ISO3") countryName = feature.GetField("NAME") geometry = feature.GetGeometryRef() minLong,maxLong,minLat,maxLat = geometry.GetEnvelope() countries.append((countryName, countryCode, minLat, maxLat, minLong, maxLong)) countries.sort() for name,code,minLat,maxLat,minLong,maxLong in countries: print "%s (%s) lat=%0.4f..%0.4f, long=%0.4f..%0.4f" % (name, code,minLat, maxLat,minLong, maxLong) Running this program produces the following output: % python calcBoundingBoxes.py Afghanistan (AFG) lat=29.4061..38.4721, long=60.5042..74.9157 Albania (ALB) lat=39.6447..42.6619, long=19.2825..21.0542 Algeria (DZA) lat=18.9764..37.0914, long=-8.6672..11.9865 ... Task: Save the country bounding boxes into a Shapefile While the previous example simply printed out the latitude and longitude values, it might be more useful to draw the bounding boxes onto a map. To do this, we have to convert the bounding boxes into polygons, and save these polygons into a Shapefile. Creating a Shapefile involves the following steps: Define the spatial reference used by the Shapefile's data. In this case, we'll use the WGS84 datum and unprojected geographic coordinates (that is, latitude and longitude values). This is how you would define this spatial reference using OGR: import osgeo.osr spatialReference = osgeo.osr.SpatialReference() spatialReference.SetWellKnownGeogCS('WGS84') We can now create the Shapefile itself using this spatial reference: import osgeo.ogr driver = osgeo.ogr.GetDriverByName("ESRI Shapefile") dstFile = driver.CreateDataSource("boundingBoxes.shp")) dstLayer = dstFile.CreateLayer("layer", spatialReference) After creating the Shapefile, you next define the various fields that will hold the metadata for each feature. In this case, let's add two fields to store the country name and its ISO-3166 code: fieldDef = osgeo.ogr.FieldDefn("COUNTRY", osgeo.ogr.OFTString) fieldDef.SetWidth(50) dstLayer.CreateField(fieldDef) fieldDef = osgeo.ogr.FieldDefn("CODE", osgeo.ogr.OFTString) fieldDef.SetWidth(3) dstLayer.CreateField(fieldDef) We now need to create the geometry for each feature—in this case, a polygon defining the country's bounding box. A polygon consists of one or more linear rings; the first linear ring defines the exterior of the polygon, while additional rings define "holes" inside the polygon. In this case, we want a simple polygon with a square exterior and no holes: linearRing = osgeo.ogr.Geometry(osgeo.ogr.wkbLinearRing) linearRing.AddPoint(minLong, minLat) linearRing.AddPoint(maxLong, minLat) linearRing.AddPoint(maxLong, maxLat) linearRing.AddPoint(minLong, maxLat) linearRing.AddPoint(minLong, minLat) polygon = osgeo.ogr.Geometry(osgeo.ogr.wkbPolygon) polygon.AddGeometry(linearRing) You may have noticed that the coordinate (minLong, minLat)was added to the linear ring twice. This is because we are defining line segments rather than just points—the first call to AddPoint()defines the starting point, and each subsequent call to AddPoint()adds a new line segment to the linear ring. In this case, we start in the lower-left corner and move counter-clockwise around the bounding box until we reach the lower-left corner again:   Once we have the polygon, we can use it to create a feature: feature = osgeo.ogr.Feature(dstLayer.GetLayerDefn()) feature.SetGeometry(polygon) feature.SetField("COUNTRY", countryName) feature.SetField("CODE", countryCode) dstLayer.CreateFeature(feature) feature.Destroy() Notice how we use the setField() method to store the feature's metadata. We also have to call the Destroy() method to close the feature once we have finished with it; this ensures that the feature is saved into the Shapefile. Finally, we call the Destroy() method to close the output Shapefile: dstFile.Destroy() Putting all this together, and combining it with the code from the previous recipe to calculate the bounding boxes for each country in the World Borders Dataset Shapefile, we end up with the following complete program: # boundingBoxesToShapefile.py import os, os.path, shutil import osgeo.ogr import osgeo.osr # Open the source shapefile. srcFile = osgeo.ogr.Open("TM_WORLD_BORDERS-0.3.shp") srcLayer = srcFile.GetLayer(0) # Open the output shapefile. if os.path.exists("bounding-boxes"): shutil.rmtree("bounding-boxes") os.mkdir("bounding-boxes") spatialReference = osgeo.osr.SpatialReference() spatialReference.SetWellKnownGeogCS('WGS84') driver = osgeo.ogr.GetDriverByName("ESRI Shapefile") dstPath = os.path.join("bounding-boxes", "boundingBoxes.shp") dstFile = driver.CreateDataSource(dstPath) dstLayer = dstFile.CreateLayer("layer", spatialReference) fieldDef = osgeo.ogr.FieldDefn("COUNTRY", osgeo.ogr.OFTString) fieldDef.SetWidth(50) dstLayer.CreateField(fieldDef) fieldDef = osgeo.ogr.FieldDefn("CODE", osgeo.ogr.OFTString) fieldDef.SetWidth(3) dstLayer.CreateField(fieldDef) # Read the country features from the source shapefile. for i in range(srcLayer.GetFeatureCount()): feature = srcLayer.GetFeature(i) countryCode = feature.GetField("ISO3") countryName = feature.GetField("NAME") geometry = feature.GetGeometryRef() minLong,maxLong,minLat,maxLat = geometry.GetEnvelope() # Save the bounding box as a feature in the output # shapefile. linearRing = osgeo.ogr.Geometry(osgeo.ogr.wkbLinearRing) linearRing.AddPoint(minLong, minLat) linearRing.AddPoint(maxLong, minLat) linearRing.AddPoint(maxLong, maxLat) linearRing.AddPoint(minLong, maxLat) linearRing.AddPoint(minLong, minLat) polygon = osgeo.ogr.Geometry(osgeo.ogr.wkbPolygon) polygon.AddGeometry(linearRing) feature = osgeo.ogr.Feature(dstLayer.GetLayerDefn()) feature.SetGeometry(polygon) feature.SetField("COUNTRY", countryName) feature.SetField("CODE", countryCode) dstLayer.CreateFeature(feature) feature.Destroy() # All done. srcFile.Destroy() dstFile.Destroy() The only unexpected twist in this program is the use of a sub-directory called bounding-boxes to store the output Shapefile. Because a Shapefile is actually made up of multiple files on disk (a .dbf file, a .prj file, a .shp file, and a .shx file), it is easier to place these together in a sub-directory. We use the Python Standard Library module shutil to delete the previous contents of this directory, and then os.mkdir() to create it again. If you aren't storing the TM_WORLD_BORDERS-0.3.shp Shapefile in the same directory as the script itself, you will need to add the directory where the Shapefile is stored to your osgeo.ogr.Open() call. You can also store the boundingBoxes.shp Shapefile in a different directory if you prefer, by changing the path where this Shapefile is created. Running this program creates the bounding box Shapefile, which we can then draw onto a map. For example, here is the outline of Thailand along with a bounding box taken from the boundingBoxes.shp Shapefile:  
Read more
  • 0
  • 0
  • 9286

article-image-how-python-code-organized
Packt
19 Feb 2016
8 min read
Save for later

How is Python code organized

Packt
19 Feb 2016
8 min read
Python is an easy to learn yet a powerful programming language. It has efficient high-level data structures and effective approach to object-oriented programming. Let's talk a little bit about how Python code is organized. In this paragraph, we'll start going down the rabbit hole a little bit more and introduce a bit more technical names and concepts. Starting with the basics, how is Python code organized? Of course, you write your code into files. When you save a file with the extension .py, that file is said to be a Python module. If you're on Windows or Mac, which typically hide file extensions to the user, please make sure you change the configuration so that you can see the complete name of the files. This is not strictly a requirement, but a hearty suggestion. It would be impractical to save all the code that it is required for software to work within one single file. That solution works for scripts, which are usually not longer than a few hundred lines (and often they are quite shorter than that). A complete Python application can be made of hundreds of thousands of lines of code, so you will have to scatter it through different modules. Better, but not nearly good enough. It turns out that even like this it would still be impractical to work with the code. So Python gives you another structure, called package, which allows you to group modules together. A package is nothing more than a folder, which must contain a special file, __init__.py that doesn't need to hold any code but whose presence is required to tell Python that the folder is not just some folder, but it's actually a package (note that as of Python 3.3 __init__.py is not strictly required any more). As always, an example will make all of this much clearer. I have created an example structure in my project, and when I type in my Linux console: $ tree -v example Here's how a structure of a real simple application could look like: example/ ├── core.py ├── run.py └── util ├── __init__.py ├── db.py ├── math.py └── network.py You can see that within the root of this example, we have two modules, core.py and run.py, and one package: util. Within core.py, there may be the core logic of our application. On the other hand, within the run.py module, we can probably find the logic to start the application. Within the util package, I expect to find various utility tools, and in fact, we can guess that the modules there are called by the type of tools they hold: db.py would hold tools to work with databases, math.py would of course hold mathematical tools (maybe our application deals with financial data), and network.py would probably hold tools to send/receive data on networks. As explained before, the __init__.py file is there just to tell Python that util is a package and not just a mere folder. Had this software been organized within modules only, it would have been much harder to infer its structure. I put a module only example under the ch1/files_only folder, see it for yourself: $ tree -v files_only This shows us a completely different picture: files_only/ ├── core.py ├── db.py ├── math.py ├── network.py └── run.py It is a little harder to guess what each module does, right? Now, consider that this is just a simple example, so you can guess how much harder it would be to understand a real application if we couldn't organize the code in packages and modules. How do we use modules and packages When a developer is writing an application, it is very likely that they will need to apply the same piece of logic in different parts of it. For example, when writing a parser for the data that comes from a form that a user can fill in a web page, the application will have to validate whether a certain field is holding a number or not. Regardless of how the logic for this kind of validation is written, it's very likely that it will be needed in more than one place. For example in a poll application, where the user is asked many question, it's likely that several of them will require a numeric answer. For example: What is your age How many pets do you own How many children do you have How many times have you been married It would be very bad practice to copy paste (or, more properly said: duplicate) the validation logic in every place where we expect a numeric answer. This would violate the DRY (Don't Repeat Yourself) principle, which states that you should never repeat the same piece of code more than once in your application. I feel the need to stress the importance of this principle: you should never repeat the same piece of code more than once in your application (got the irony?). There are several reasons why repeating the same piece of logic can be very bad, the most important ones being: There could be a bug in the logic, and therefore, you would have to correct it in every place that logic is applied. You may want to amend the way you carry out the validation, and again you would have to change it in every place it is applied. You may forget to fix/amend a piece of logic because you missed it when searching for all its occurrences. This would leave wrong/inconsistent behavior in your application. Your code would be longer than needed, for no good reason. Python is a wonderful language and provides you with all the tools you need to apply all the coding best practices. For this particular example, we need to be able to reuse a piece of code. To be able to reuse a piece of code, we need to have a construct that will hold the code for us so that we can call that construct every time we need to repeat the logic inside it. That construct exists, and it's called function. I'm not going too deep into the specifics here, so please just remember that a function is a block of organized, reusable code which is used to perform a task. Functions can assume many forms and names, according to what kind of environment they belong to, but for now this is not important. Functions are the building blocks of modularity in your application, and they are almost indispensable (unless you're writing a super simple script, you'll use functions all the time). Python comes with a very extensive library, as I already said a few pages ago. Now, maybe it's a good time to define what a library is: a library is a collection of functions and objects that provide functionalities that enrich the abilities of a language. For example, within Python's math library we can find a plethora of functions, one of which is the factorial function, which of course calculates the factorial of a number. In mathematics, the factorial of a non-negative integer number N, denoted as N!, is defined as the product of all positive integers less than or equal to N. For example, the factorial of 5 is calculated as: 5! = 5 * 4 * 3 * 2 * 1 = 120 The factorial of 0 is 0! = 1, to respect the convention for an empty product. So, if you wanted to use this function in your code, all you would have to do is to import it and call it with the right input values. Don't worry too much if input values and the concept of calling is not very clear for now, please just concentrate on the import part. We use a library by importing what we need from it, and then we use it. In Python, to calculate the factorial of number 5, we just need the following code: >>> from math import factorial >>> factorial(5) 120 Whatever we type in the shell, if it has a printable representation, will be printed on the console for us (in this case, the result of the function call: 120). So, let's go back to our example, the one with core.py, run.py, util, and so on. In our example, the package util is our utility library. Our custom utility belt that holds all those reusable tools (that is, functions), which we need in our application. Some of them will deal with databases (db.py), some with the network (network.py), and some will perform mathematical calculations (math.py) that are outside the scope of Python's standard math library and therefore, we had to code them for ourselves. Summary In this article, we started to explore the world of programming and that of Python. We saw how Python code can be organized using modules and packages. For more information on Python, refer the following books recomended by Packt Publishing: Learning Python (https://www.packtpub.com/application-development/learning-python) Python 3 Object-oriented Programming - Second Edition (https://www.packtpub.com/application-development/python-3-object-oriented-programming-second-edition) Python Essentials (https://www.packtpub.com/application-development/python-essentials) Resources for Article: Further resources on this subject: Test all the things with Python [article] Putting the Fun in Functional Python [article] Scraping the Web with Python - Quick Start [article]
Read more
  • 0
  • 0
  • 9268

article-image-querying-and-selecting-data
Packt
17 Apr 2013
13 min read
Save for later

Querying and Selecting Data

Packt
17 Apr 2013
13 min read
(For more resources related to this topic, see here.) Constructing proper attribute query syntax The construction of property attribute queries is critical to your success in creating geoprocessing scripts that query data from feature classes and tables. All attribute queries that you execute against feature classes and tables will need to have the correct SQL syntax and also follow various rules depending upon the datatype that you execute the queries against. Getting ready Creating the syntax for attribute queries is one of the most difficult and time-consuming tasks that you'll need to master when creating Python scripts that incorporate the use of the Select by Attributes tool. These queries are basically SQL statements along with a few idiosyncrasies that you'll need to master. If you already have a good understanding of creating queries in ArcMap or perhaps an experience with creating SQL statements in other programming languages, then this will be a little easier for you. In addition to creating valid SQL statements, you also need to be aware of some specific Python syntax requirements and some datatype differences that will result in a slightly altered formatting of your statements for some datatypes. In this recipe, you'll learn how to construct valid query syntax and understand the nuances of how different datatypes alter the syntax as well as some Python-specific constructs. How to do it… Initially, we're going to take a look at how queries are constructed in ArcMap, so that you can get a feel of how they are structured. In ArcMap, open C:ArcpyBookCh8Crime_Ch8.mxd. Right-click on the Burglaries in 2009 layer and select Open Attribute Table. You should see an attribute table similar to the following screenshot. We're going to be querying the SVCAREA field: With the attribute table open, select the Table Options button and then Select by Attributes to display a dialog box that will allow you to construct an attribute query. Notice the Select * FROM Burglary WHERE: statement on the query dialog box (shown in the following screenshot). This is a basic SQL statement that will return all the columns from the attribute table for Burglary that meet the condition that we define through the query builder. The asterisk (*) simply indicates that all fields will be returned: Make sure that Create a new selection is the selected item in the Method dropdown list. This will create a new selection set. Double-click on SVCAREA from the list of fields to add the field to the SQL statement builder, as follows: Click on the = button. Click on the Get Unique Values button. From the list of values generated, double-click on 'North' to complete the SQL statement, as shown in the following screenshot: Click on the Apply button to execute the query. This should select 7520 records. Many people mistakenly assume that you can simply take a query that has been generated in this fashion and paste it into a Python script. That is not the case. There are some important differences that we'll cover next. Close the Select by Attributes window and the Burglaries in 2009 table. Clear the selected feature set by clicking on Selection | Clear Selected Features. Open the Python window and add the code to import arcpy. import arcpy Create a new variable to hold the query and add the exact same statement that you created earlier: qry = "SVCAREA" = 'North' Press Enter on your keyboard and you should see an error message similar to the following: Runtime error SyntaxError: can't assign to literal (<string>, line1) Python interprets SVCAREA and North as strings but the equal to sign between the two is not part of the string used to set the qry variable. There are several things we need to do to generate a syntactically correct statement for the Python interpreter. One important thing has already been taken care of though. Each field name used in a query needs to be surrounded by double quotes. In this case, SVCAREA is the only field used in the query and it has already been enclosed by double quotes. This will always be the case when you're working with shapefiles, file geodatabases, or ArcSDE geodatabases. Here is where it gets a little confusing though. If you're working with data from a personal geodatabase, the field names will need to be enclosed by square brackets instead of double quotes as shown in the following code example. This can certainly leads to confusion for script developers. qry = [SVCAREA] = 'North' Now, we need to deal with the single quotes surrounding 'North'. When querying data from fields that have a text datatype, the string being evaluated must be enclosed by quotes. If you examine the original query, you'll notice that we have in fact already enclosed the word North with quotes, so everything should be fine right? Unfortunately, it's not that simple with Python. Quotes, along with a number of other characters, must be escaped with a forward slash followed by the character being escaped. In this case, the escape sequence would be '. Alter your query syntax to incorporate the escape sequence: qry = "SVCAREA" = 'North' Finally, the entire query statement should be enclosed with quotes: qry = '"SVCAREA" = 'North'' In addition to the = sign, which tests for equality, there are a number of additional operators that you can use with strings and numeric data, including not equal (> <), greater than (<), greater than or equal to (<=), less than (>), and less than or equal to (>=). Wildcard characters including % and _ can also be used for shapefiles, file geodatabases, and ArcSDE geodatabases. These include % for representing any number of characters. The LIKE operator is often used with wildcard characters to perform partial string matching. For example, the following query would find all records with a service area that begins with N and has any number of characters after. qry = '"SVCAREA" LIKE 'N%'' The underscore character (_) can be used to represent a single character. For personal geodatabases the asterisk (*) is used to represent a wildcard character for any number of characters, while (?) represents a single character. You can also query for the absence of data, also known as NULL values. A NULL value is often mistaken for a value of zero, but that is not the case. NULL values indicate the absence of data, which is different from a value of zero. Null operators include IS NULL and IS NOT NULL. The following code example will find all records where the SVCAREA field contains no data: qry = '"SVCAREA" IS NULL' The final topic that we'll cover in this section are operators used for combining expressions where multiple query conditions need to be met. The AND operator requires that both query conditions be met for the query result to be true, resulting in selected records. The OR operator requires that at least one of the conditions be met. How it works… The creation of syntactically correct queries is one of the most challenging aspects of programming ArcGIS with Python. However, once you understand some basic rules, it gets a little easier. In this section, we'll summarize these rules. One of the more important things to keep in mind is that field names must be enclosed with double quotes for all datasets, with the exception of personal geodatabases, which require braces surrounding field names. There is also an AddFieldDelimiters() function that you can use to add the correct delimiter to a field based on the datasource supplied as a parameter to the function. The syntax for this function is as follows: AddFieldDelimiters(dataSource,field) Additionally, most people, especially those new to programming with Python, struggle with the issue of adding single quotes to string values being evaluated by the query. In Python, quotes have to be escaped with a single forward slash followed by the quote. Using this escape sequence will ensure that Python does in fact see that as a quote rather than the end of the string. Finally, take some time to familiarize yourself with the wildcard characters. For datasets other than personal geodatabases, you'll use the (%) character for multiple characters and an underscore (_) character for a single character. If you're using a personal geodatabase, the (*) character is used to match multiple characters and the (?) character is used to match a single character. Obviously, the syntax differences between personal geodatabases and all other types of datasets can lead to some confusion. Creating feature layers and table views Feature layers and table views serve as intermediate datasets held in memory for use specifically with tools such as Select by Location and Select Attributes. Although these temporary datasets can be saved, they are not needed in most cases. Getting ready Feature classes are physical representations of geographic data and are stored as files (shapefiles, personal geodatabases, and file geodatabases) or within a geodatabase. ESRI defines a feature class as "a collection of features that shares a common geometry (point, line, or polygon), attribute table, and spatial reference." Feature classes can contain default and user-defined fields. Default fields include the SHAPE and OBJECTID fields. These fields are maintained and updated automatically by ArcGIS. The SHAPE field holds the geometric representation of a geographic feature, while the OBJECTID field holds a unique identifier for each feature. Additional default fields will also exist depending on the type of feature class. A line feature class will have a SHAPE_LENGTH field. A polygon feature class will have both, a SHAPE_LENGTH and a SHAPE_AREA field. Optional fields are created by end users of ArcGIS and are not automatically updated by GIS. These contain attribute information about the features. These fields can also be updated by your scripts. Tables are physically represented as standalone DBF tables or within a geodatabase. Both, tables and feature classes, contain attribute information. However, a table contains only attribute information. There isn't a SHAPE field associated with a table, and they may or may not contain an OBJECTID field. Standalone Python scripts that use the Select by Attributes or Select by Location tool require that you create an intermediate dataset rather than using feature classes or tables. These intermediate datasets are temporary in nature and are called Feature Layers or Table Views. Unlike feature classes and tables, these temporary datasets do not represent actual files on disk or within a geodatabase. Instead, they are "in memory" representations of feature classes and tables. These datasets are active only while a Python script is running. They are removed from memory after the tool has executed. However, if the script is run from within ArcGIS as a script tool, then the temporary layer can be saved either by right-clicking on the layer in the table of contents and selecting Save As Layer File or simply by saving the map document file. Feature layers and table views must be created as a separate step in your Python scripts, before you can call the Select by Attributes or Select by Location tools. The Make Feature Layer tool generates the "in-memory" representation of a feature class, which can then be used to create queries and selection sets, as well as to join tables. After this step has been completed, you can use the Select by Attributes or Select by Location tool. Similarly, the Make Table View tool is used to create an "in-memory" representation of a table. The function of this tool is the same as Make Feature Layer. Both the Make Feature Layer and Make Table View tools require an input dataset, an output layer name, and an optional query expression, which can be used to limit the features or rows that are a part of the output layer. In addition, both tools can be found in the Data Management Tools toolbox. The syntax for using the Make Feature Layer tool is as follows: arcpy.MakeFeatureLayer_management(<input feature layer>, <output layer name>,{where clause}) The syntax for using the Make Table View tool is as follows: Arcpy.MakeTableView_management(<input table>, <output table name>, {where clause}) In this recipe, you will learn how to use the Make Feature Layer and Make Table View tools. These tasks will be done inside ArcGIS, so that you can see the in-memory copy of the layer that is created. How to do it… Follow these steps to learn how to use the Make Feature Layer and Make Table View tools: Open c:ArcpyBookCh8Crime_Ch8.mxd in ArcMap. Open the Python window. Import the arcpy module: import arcpy Set the workspace: arcpy.env.workspace = "c:/ArcpyBook/data/CityOfSanAntonio.gdb" Start a try block: try: Make an in-memory copy of the Burglary feature class using the Make Feature Layer tool. Make sure you indent this line of code: flayer = arcpy.MakeFeatureLayer_management("Burglary","Burglary_ Layer") Add an except block and a line of code to print an error message in the event of a problem: except: print "An error occurred during creation" The entire script should appear as follows: import arcpy arcpy.env.workspace = "c:/ArcpyBook/data/CityOfSanAntonio.gdb" try: flayer = arcpy.MakeFeatureLayer_management("Burglary","Burglary_ Layer") except: print "An error occurred during creation" Save the script to c:ArcpyBookCh8CreateFeatureLayer.py. Run the script. The new Burglary_Layer file will be added to the ArcMap table of contents: The Make Table View tool functionality is equivalent to the Make Feature Layer tool. The difference is that it works against standalone tables instead of feature classes. Remove the following line of code: flayer = arcpy.MakeFeatureLayer_management("Burglary","Burglary_ Layer") Add the following line of code in its place: tView = arcpy.MakeTableView_management("Crime2009Table", "Crime2009TView") Run the script to see the table view added to the ArcMap table of contents. How it works... The Make Feature Layer and Make Table View tools create in-memory representations of feature classes and tables respectively. Both the Select by Attributes and Select by Location tools require that these temporary, in-memory structures be passed in as parameters when called from a Python script. Both tools also require that you pass in a name for the temporary structures. There's more... You can also apply a query to either the Make Feature Layer or Make Table View tools to restrict the records returned in the feature layer or table view. This is done through the addition of a where clause when calling either of the tools from your script. This query is much the same as if you'd set a definition query on the layer through Layer Properties | Definition Query. The syntax for adding a query is as follows: MakeFeatureLayer(in_features, out_layer, where_clause) MakeTableView(in_table, out_view, where_clause)
Read more
  • 0
  • 0
  • 9198
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at €18.99/month. Cancel anytime
article-image-asynchronous-programming-python
Packt
26 Aug 2015
20 min read
Save for later

Asynchronous Programming with Python

Packt
26 Aug 2015
20 min read
 In this article by Giancarlo Zaccone, the author of the book Python Parallel Programming Cookbook, we will cover the following topics: Introducing Asyncio GPU programming with Python Introducing PyCUDA Introducing PyOpenCL (For more resources related to this topic, see here.) An asynchronous model is of fundamental importance along with the concept of event programming. The execution model of asynchronous activities can be implemented using a single stream of main control, both in uniprocessor systems and multiprocessor systems. In the asynchronous model of a concurrent execution, various tasks intersect with each other along the timeline, and all of this happens under the action of a single flow of control (single-threaded). The execution of a task can be suspended and then resumed alternating in time with any other task. The asynchronous programming model As you can see in the preceding figure, the tasks (each with a different color) are interleaved with one another, but they are in a single thread of control. This implies that when one task is in execution, the other tasks are not. A key difference between a multithreaded programming model and single-threaded asynchronous concurrent model is that in the first case, the operating system decides on the timeline whether to suspend the activity of a thread and start another, while in the second case, the programmer must assume that a thread may be suspended and replaced with another at almost any time. Introducing Asyncio The Python module Asyncio provides facilities to manage events, coroutines, tasks and threads, and synchronization primitives to write concurrent code. When a program becomes very long and complex, it is convenient to divide it into subroutines, each of which realizes a specific task, for which the program implements a suitable algorithm. The subroutine cannot be executed independently but only at the request of the main program, which is then responsible for coordinating the use of subroutines. Coroutines are a generalization of the subroutine. Like a subroutine, the coroutine computes a single computational step, but unlike subroutines, there is no main program that is used to coordinate the results. This is because the coroutines link themselves together to form a pipeline without any supervising function responsible for calling them in a particular order. In a coroutine, the execution point can be suspended and resumed later, having kept track of its local state in the intervening time. In this example, we see how to use the coroutine mechanism of Asyncio to simulate a finite state machine of five states. A Finite-state automaton (FSA) is a mathematical model that is not only widely used in engineering disciplines but also in sciences, such as mathematics and computer science. The automata we want to simulate the behavior is as follows: Finite State Machine We have indicated with S0, S1, S2, S3, and S4 the states of the system, with 0 and 1 as the values for which the automata can pass from one state to the next state (this operation is called a transition). So for example, the state S0 can be passed to the state S1 only for the value 1 and S0 can pass the state S2 only to the value 0. The Python code that follows simulates a transition of the automaton from the state S0, the so-called Start State, up to the state S4, the End State: #Asyncio Finite State Machine import asyncio import time from random import randint @asyncio.coroutine def StartState(): print ("Start State called n") input_value = randint(0,1) time.sleep(1) if (input_value == 0): result = yield from State2(input_value) else : result = yield from State1(input_value) print("Resume of the Transition : nStart State calling " + result) @asyncio.coroutine def State1(transition_value): outputValue = str(("State 1 with transition value = %s n" %(transition_value))) input_value = randint(0,1) time.sleep(1) print("...Evaluating...") if (input_value == 0): result = yield from State3(input_value) else : result = yield from State2(input_value) result = "State 1 calling " + result return (outputValue + str(result)) @asyncio.coroutine def State2(transition_value): outputValue = str(("State 2 with transition value = %s n" %(transition_value))) input_value = randint(0,1) time.sleep(1) print("...Evaluating...") if (input_value == 0): result = yield from State1(input_value) else : result = yield from State3(input_value) result = "State 2 calling " + result return (outputValue + str(result)) @asyncio.coroutine def State3(transition_value): outputValue = str(("State 3 with transition value = %s n" %(transition_value))) input_value = randint(0,1) time.sleep(1) print("...Evaluating...") if (input_value == 0): result = yield from State1(input_value) else : result = yield from EndState(input_value) result = "State 3 calling " + result return (outputValue + str(result)) @asyncio.coroutine def EndState(transition_value): outputValue = str(("End State with transition value = %s n" %(transition_value))) print("...Stop Computation...") return (outputValue ) if __name__ == "__main__": print("Finite State Machine simulation with Asyncio Coroutine") loop = asyncio.get_event_loop() loop.run_until_complete(StartState()) After running the code, we have an output similar to this: C:Python CookBookChapter 4- Asynchronous Programmingcodes - Chapter 4>python asyncio_state_machine.py Finite State Machine simulation with Asyncio Coroutine Start State called ...Evaluating... ...Evaluating... ...Evaluating... ...Evaluating... ...Evaluating... ...Evaluating... ...Evaluating... ...Evaluating... ...Evaluating... ...Evaluating... ...Evaluating... ...Evaluating... ...Stop Computation... Resume of the Transition : Start State calling State 1 with transition value = 1 State 1 calling State 3 with transition value = 0 State 3 calling State 1 with transition value = 0 State 1 calling State 2 with transition value = 1 State 2 calling State 3 with transition value = 1 State 3 calling State 1 with transition value = 0 State 1 calling State 2 with transition value = 1 State 2 calling State 1 with transition value = 0 State 1 calling State 3 with transition value = 0 State 3 calling State 1 with transition value = 0 State 1 calling State 2 with transition value = 1 State 2 calling State 3 with transition value = 1 State 3 calling End State with transition value = 1 Each state of the automata has been defined with the annotation @asyncio.coroutine. For example, the state S0 is: @asyncio.coroutine def StartState(): print ("Start State called n") input_value = randint(0,1) time.sleep(1) if (input_value == 0): result = yield from State2(input_value) else : result = yield from State1(input_value) The transition to the next state is determined by input_value, which is defined by the randint(0,1) function of Python's module random. This function randomly provides the value 0 or 1, where it randomly determines to which state the finite-state machine will be passed: input_value = randint(0,1) After determining the value at which state the finite state machine will be passed, the coroutine calls the next coroutine using the command yield from: if (input_value == 0): result = yield from State2(input_value) else : result = yield from State1(input_value) The variable result is the value that each coroutine returns. It is a string, and at the end of the computation, we can reconstruct [NV1] the transition from the initial state of the automaton, the Start State, up to the final state, the End State. The main program starts the evaluation inside the event loop: if __name__ == "__main__": print("Finite State Machine simulation with Asyncio Coroutine") loop = asyncio.get_event_loop() loop.run_until_complete(StartState()) GPU programming with Python A graphics processing unit (GPU) is an electronic circuit that specializes in processing data to render images from polygonal primitives. Although they were designed to carry out rendering images, GPUs have continued to evolve, becoming more complex and efficient in serving both real-time and offline rendering community. GPUs have continued to evolve, becoming more complex and efficient in performing any scientific computation. Each GPU is indeed composed of several processing units called streaming multiprocessor (SM), representing the first logic level of parallelism; each SM in fact, works simultaneously and independently from the others. The GPU architecture Each SM is in turn divided into a group of Stream Processors (SP), each of which has a core of real execution and can run a thread sequentially. SP represents the smallest unit of execution logic and the level of finer parallelism. The division in SM and SP is structural in nature, but it is possible to outline a further logical organization of the SP of a GPU, which are grouped together in logical blocks characterized by a particular mode of execution—all cores that make up a group run at the same time with the same instructions. This is just the SIMD (Single Instruction, Multiple Data) model. The programming paradigm that characterizes GPU computing is also called stream processing because the data can be viewed as a homogeneous flow of values that are applied synchronously to the same operations. Currently, the most efficient solutions to exploit the computing power provided by the GPU cards are the software libraries CUDA and OpenCL. Introducing PyCUDA PyCUDA is a Python wrapper for CUDA (Compute Unified Device Architecture), the software library developed by NVIDIA for GPU programming. The PyCuda programming model is designed for the common execution of a program on the CPU and GPU so as to allow you to perform the sequential parts on the CPU and the numeric parts that are more intensive on the GPU. The phases to be performed in the sequential mode are implemented and executed on the CPU (host), while the steps to be performed in parallel are implemented and executed on the GPU (device). The functions to be performed in parallel on the device are called kernels. The skeleton general for the execution of a generic function kernel on the device is as follows: Allocation of memory on the device. Transfer of data from the host memory to that allocated on the device. Running the device: Running the configuration. Invocation of the kernel function. Transfer of the results from the memory on the device to the host memory. Release of the memory allocated on the device. The PyCUDA programming model To show the PyCuda workflow, let's consider a 5 × 5 random array and the following procedure: Create the array 5×5 on the CPU. Transfer the array to the GPU. Perform a Task[NV2]  on the array in the GPU (double all the items in the array). Transfer the array from the GPU to the CPU. Print the results. The code for this is as follows: import pycuda.driver as cuda import pycuda.autoinit from pycuda.compiler import SourceModule import numpy a = numpy.random.randn(5,5) a = a.astype(numpy.float32) a_gpu = cuda.mem_alloc(a.nbytes) cuda.memcpy_htod(a_gpu, a) mod = SourceModule(""" __global__ void doubleMatrix(float *a) { int idx = threadIdx.x + threadIdx.y*4; a[idx] *= 2; } """) func = mod.get_function("doubleMatrix") func(a_gpu, block=(5,5,1)) a_doubled = numpy.empty_like(a) cuda.memcpy_dtoh(a_doubled, a_gpu) print ("ORIGINAL MATRIX") print a print ("DOUBLED MATRIX AFTER PyCUDA EXECUTION") print a_doubled The example output should be like this : C:Python CookBookChapter 6 - GPU Programming with Python >python PyCudaWorkflow.py ORIGINAL MATRIX [[-0.59975582 1.93627465 0.65337795 0.13205571 -0.46468592] [ 0.01441949 1.40946579 0.5343408 -0.46614054 -0.31727529] [-0.06868593 1.21149373 -0.6035406 -1.29117763 0.47762445] [ 0.36176383 -1.443097 1.21592784 -1.04906416 -1.18935871] [-0.06960868 -1.44647694 -1.22041082 1.17092752 0.3686313 ]] DOUBLED MATRIX AFTER PyCUDA EXECUTION [[-1.19951165 3.8725493 1.3067559 0.26411143 -0.92937183] [ 0.02883899 2.81893158 1.0686816 -0.93228108 -0.63455057] [-0.13737187 2.42298746 -1.2070812 -2.58235526 0.95524889] [ 0.72352767 -1.443097 1.21592784 -1.04906416 -1.18935871] [-0.06960868 -1.44647694 -1.22041082 1.17092752 0.3686313 ]] The code starts with the following imports: import pycuda.driver as cuda import pycuda.autoinit from pycuda.compiler import SourceModule The pycuda.autoinit import automatically picks a GPU to run on based on the availability and number. It also creates a GPU context for subsequent code to run in. Both the chosen device and the created context are available from pycuda.autoinit as importable symbols if needed. While the SourceModule component is the object where a C-like code for the GPU must be written. The first step is to generate the input 5 × 5 matrix. Since most GPU computations involve large arrays of data, the NumPy module must be imported: import numpy a = numpy.random.randn(5,5) Then, the items in the matrix are converted in a single precision mode, many NVIDIA cards support only single precision: a = a.astype(numpy.float32) The first operation to be done in order to implement a GPU loads the input array from the host memory (CPU) to the device (GPU). This is done at the beginning of the operation and consists two steps that are performed by invoking two functions provided PyCuda[NV3] . Memory allocation on the device is done via the cuda.mem_alloc function. The device and host memory may not ever communicate while performing a function kernel. This means that to run a function in parallel on the device, the data relating to it must be present in the memory of the device itself. Before you copy data from the host memory to the device memory, you must allocate the memory required on the device: a_gpu = cuda.mem_alloc(a.nbytes). Copy of the matrix from the host memory to that of the device with the function: call cuda.memcpy_htod : cuda.memcpy_htod(a_gpu, a). We also note that a_gpu is one dimensional, and on the device, we need to handle it as such. All these operations do not require the invocation of a kernel and are made directly by the main processor. The SourceModule entity serves to define the (C-like) kernel function doubleMatrix that multiplies each array entry by 2: mod = SourceModule(""" __global__ void doubleMatrix(float *a) { int idx = threadIdx.x + threadIdx.y*4; a[idx] *= 2; } """) The __global__ qualifier is a directive that indicates that the doubleMatrix function will be processed on the device. It will be just the compiler Cuda nvcc that will be used to perform this task. Let's look at the function's body, which is as follows: int idx = threadIdx.x + threadIdx.y*4; The idx parameter is the matrix index that is identified by the thread coordinates threadIdx.x and threadIdx.y. Then, the element matrix with the index idx is multiplied by 2: a[idx] *= 2; We also note that this kernel function will be executed once in 16 different threads. Both the variables threadIdx.x and threadIdx.y contain indices between 0 and 3 , and the pair[NV4]  is different for each thread. Threads scheduling is directly linked to the GPU architecture and its intrinsic parallelism. A block of threads is assigned to a single SM. Here, threads are further divided into groups called warps. The size of a warp depends on the architecture under consideration. The threads of the same warp are managed by the control unit called the warp scheduler. To take full advantage of the inherent parallelism of the SM, the threads of the same warp must execute the same instruction. If this condition does not occur, we speak of divergence of threads. If the same warp threads execute different instructions, the control unit cannot handle all the warps. It must follow the sequences of instructions for every single thread (or for homogeneous subsets of threads) in a serial mode. Let's observe how the thread block is divided in various warps—threads are divided by the value of threadIdx. The threadIdx structure consists of 3 fields: threadIdx.x, threadIdx.y, and threadIdx.z. Thread blocks subdivision: T(x,y), where x = threadIdx.x and y = threadIdx.y We can see again that the code in the kernel function will be automatically compiled by the nvcc cuda compiler. If there are no errors, a pointer to this compiled function will be created. In fact, the mod.get_function[NV5] ("doubleMatrix") returns an identifier to the function created func: func = mod.get_function("doubleMatrix ") To perform a function on the device, you must first configure the execution appropriately. This means that we need to determine the size of the coordinates to identify and distinguish the thread belonging to different blocks. This will be done using the block parameter inside the func call: func(a_gpu, block = (5, 5, 1)) The block = (5, 5, 1) tells us that we are calling a kernel function with a_gpu linearized input matrix and a single thread block of size, 5 threads in the x direction, 5 threads in the y direction, and 1 thread in the z direction, 16 threads in total. This structure is designed with parallel implementation of the algorithm of interest. The division of the workload results is an early form of parallelism that is sufficient and necessary to make use of the computing resources provided by the GPU. Once you've configured the kernel's invocation, you can invoke the kernel function that executes instructions in parallel on the device. Each thread executes the same code kernel. After the computation in the gpu device, we use an array to store the results: a_doubled = numpy.empty_like(a) cuda.memcpy_dtoh(a_doubled, a_gpu) Introducing PyOpenCL As for programming with PyCuda, the first step to build a program for PyOpenCL is the encoding of the host application. In fact, this is performed on the host computer (typically, the user's PC) and then, it dispatches the kernel application on the connected devices (GPU cards). The host application must contain five data structures, which are as follows: Device: This identifies the hardware where the kernel code must be executed. A PyOpenCL application can be executed not only on CPU and GPU cards but also on embedded devices such as FPGA (Field Programmable Gate Array). Program: This is a group of kernels. A program selects which kernel must be executed on the device. Kernel: This is the code to be executed on the device. A kernel is essentially a (C-like) function that enables it to be compiled for an execution on any device that supports OpenCL drivers. The kernel is the only way the host can call a function that will run on a device. When the host invokes a kernel, many work items start running on the device. Each work item runs the code of the kernel but works on a different part of the dataset. Command queue: Each device receives kernels through this data structure. A command queue orders the execution of kernels on the device. Context: This is a group of devices. A context allows devices to receive kernels and transfer data. PyOpenCL programming The preceding figure shows how these data structures can work in a host application. Let's remember again that a program can contain multiple functions that are to be executed on the device and each kernel encapsulates only a single function from the program. In this example, we show you the basic steps to build a PyOpenCL program. The task to be executed is the parallel sum of two vectors. In order to maintain a readable output, let's consider two vectors, each of one out of 100 elements. The resulting vector will be for each element's i[NV6] th, which is the sum of the ith element vector_a plus the ith element vector_b. Of course, to be able to appreciate the parallel execution of this code, you can also increase some orders of magnitude the size of the input vector_dimension:[NV7]  import numpy as np import pyopencl as cl import numpy.linalg as la vector_dimension = 100 vector_a = np.random.randint(vector_dimension, size=vector_dimension) vector_b = np.random.randint(vector_dimension, size=vector_dimension) platform = cl.get_platforms()[0] device = platform.get_devices()[0] context = cl.Context([device]) queue = cl.CommandQueue(context) mf = cl.mem_flags a_g = cl.Buffer(context, mf.READ_ONLY | mf.COPY_HOST_PTR, hostbuf=vector_a) b_g = cl.Buffer(context, mf.READ_ONLY | mf.COPY_HOST_PTR, hostbuf=vector_b) program = cl.Program(context, """ __kernel void vectorSum(__global const int *a_g, __global const int *b_g, __global int *res_g) { int gid = get_global_id(0); res_g[gid] = a_g[gid] + b_g[gid]; } """).build() res_g = cl.Buffer(context, mf.WRITE_ONLY, vector_a.nbytes) program.vectorSum(queue, vector_a.shape, None, a_g, b_g, res_g) res_np = np.empty_like(vector_a) cl.enqueue_copy(queue, res_np, res_g) print ("PyOPENCL SUM OF TWO VECTORS") print ("Platform Selected = %s" %platform.name ) print ("Device Selected = %s" %device.name) print ("VECTOR LENGTH = %s" %vector_dimension) print ("INPUT VECTOR A") print vector_a print ("INPUT VECTOR B") print vector_b print ("OUTPUT VECTOR RESULT A + B ") print res_np assert(la.norm(res_np - (vector_a + vector_b))) < 1e-5 The output from Command Prompt should be like this: C:Python CookBook Chapter 6 - GPU Programming with PythonChapter 6 - codes>python PyOpenCLParallellSum.py Platform Selected = NVIDIA CUDA Device Selected = GeForce GT 240 VECTOR LENGTH = 100 INPUT VECTOR A [ 0 29 88 46 68 93 81 3 58 44 95 20 81 69 85 25 89 39 47 29 47 48 20 86 59 99 3 26 68 62 16 13 63 28 77 57 59 45 52 89 16 6 18 95 30 66 19 29 31 18 42 34 70 21 28 0 42 96 23 86 64 88 20 26 96 45 28 53 75 53 39 83 85 99 49 93 23 39 1 89 39 87 62 29 51 66 5 66 48 53 66 8 51 3 29 96 67 38 22 88] INPUT VECTOR B [98 43 16 28 63 1 83 18 6 58 47 86 59 29 60 68 19 51 37 46 99 27 4 94 5 22 3 96 18 84 29 34 27 31 37 94 13 89 3 90 57 85 66 63 8 74 21 18 34 93 17 26 9 88 38 28 14 68 88 90 18 6 40 30 70 93 75 0 45 86 15 10 29 84 47 74 22 72 69 33 81 31 45 62 81 66 69 14 71 96 91 51 35 4 63 36 28 65 10 41] OUTPUT VECTOR RESULT A + B [ 98 72 104 74 131 94 164 21 64 102 142 106 140 98 145 93 108 90 84 75 146 75 24 180 64 121 6 122 86 146 45 47 90 59 114 151 72 134 55 179 73 91 84 158 38 140 40 47 65 111 59 60 79 109 66 28 56 164 111 176 82 94 60 56 166 138 103 53 120 139 54 93 114 183 96 167 45 111 70 122 120 118 107 91 132 132 74 80 119 149 157 59 86 7 92 132 95 103 32 129] In the first line of the code after the required module import, we defined the input vectors: vector_dimension = 100 vector_a = np.random.randint(vector_dimension, size= vector_dimension) vector_b = np.random.randint(vector_dimension, size= vector_dimension) Each vector contain 100 integers items that are randomly selected thought the NumPy function: np.random.randint(max integer , size of the vector) Then, we must select the device to run the kernel code. To do this, we must first select the platform using the get_platform() PyOpenCL statement: platform = cl.get_platforms()[0] This platform, as you can see from the output, corresponds to the NVIDIA CUDA platform. Then, we must select the device using the get_device() platform's method: device = platform.get_devices()[0] In the following steps, the context and the queue are defined, PyOpenCL provides the method context (device selected) and queue (context selected): context = cl.Context([device]) queue = cl.CommandQueue(context) To perform the computation in the device, the input vector must be transferred to the device's memory. So, two input buffers in the device memory must be created: mf = cl.mem_flags a_g = cl.Buffer(context, mf.READ_ONLY | mf.COPY_HOST_PTR, hostbuf=vector_a) b_g = cl.Buffer(context, mf.READ_ONLY | mf.COPY_HOST_PTR, hostbuf=vector_b) Also, we prepare the buffer for the resulting vector: res_g = cl.Buffer(context, mf.WRITE_ONLY, vector_a.nbytes) Finally, the core of the script, the kernel code is defined inside a program as follows: program = cl.Program(context, """ __kernel void vectorSum(__global const int *a_g, __global const int *b_g, __global int *res_g) { int gid = get_global_id(0); res_g[gid] = a_g[gid] + b_g[gid]; } """).build() The kernel's name is vectorSum. The parameter list defines the data types of the input arguments (vectors of integers) and output data type (a vector of integer again). Inside the kernel, the sum of the two vectors is simply defined as: Initialize the vector index: int gid = get_global_id(0) Sum the vector's components: res_g[gid] = a_g[gid] + b_g[gid]; In OpenCL and PyOpenCL, buffers are attached to a context[NV8]  and are only moved to a device once the buffer is used on that device. Finally, we execute vectorSum in the device: program.vectorSum(queue, vector_a.shape, None, a_g, b_g, res_g) To visualize the results, an empty vector is built: res_np = np.empty_like(vector_a) Then, the result is copied into this vector: cl.enqueue_copy(queue, res_np, res_g) Finally, the results are displayed: print ("VECTOR LENGTH = %s" %vector_dimension) print ("INPUT VECTOR A") print vector_a print ("INPUT VECTOR B") print vector_b print ("OUTPUT VECTOR RESULT A + B ") print res_np To check the result, we use the assert statement. It tests the result and triggers an error if the condition is false: assert(la.norm(res_np - (vector_a + vector_b))) < 1e-5 Summary In this article we discussed about Asyncio, GPU programming with Python, PyCUDA, and PyOpenCL. Resources for Article: Further resources on this subject: Bizarre Python[article] Scientific Computing APIs for Python[article] Optimization in Python [article]
Read more
  • 0
  • 0
  • 9146

article-image-managing-it-portfolio-using-troux-enterprise-architecture
Packt
12 Aug 2010
16 min read
Save for later

Managing the IT Portfolio using Troux Enterprise Architecture

Packt
12 Aug 2010
16 min read
(For more resources on Troux, see here.) Managing the IT Portfolio using Troux Enterprise Architecture Almost every company today is totally dependent on IT for day-to-day operations. Large companies literally spend billions on IT-related personnel, software, equipment, and facilities. However, do business leaders really know what they get in return for these investments? Upper management knows that a successful business model depends on information technology. Whether the company is focused on delivery of services or development of products, management depends on its IT team to deliver solutions that meet or exceed customer expectations. However, even though companies continue to invest heavily in various technologies, for most companies, knowing the return-on-investment in technology is difficult or impossible. When upper management asks where the revenues are for the huge investments in software, servers, networks, and databases, few IT professionals are able to answer. There are questions that are almost impossible to answer without guessing, such as: Which IT projects in the portfolio of projects will actually generate revenue? What are we getting for spending millions on vendor software? When will our data center run out of capacity? This article will explore how IT professionals can be prepared when management asks the difficult questions. By being prepared, IT professionals can turn conversations with management about IT costs into discussions about the value IT provides. Using consolidated information about the majority of the IT portfolio, IT professionals can work with business leaders to select revenue-generating projects, decrease IT expenses, and develop realistic IT plans. The following sections will describe what IT professionals can do to be ready with accurate information in response to the most challenging questions business leaders might ask. Management repositories IT has done a fine job of delivering solutions for years. However, pressure to deliver business projects quickly has created a mentality in most IT organizations of "just put it in and we will go back and do the clean up later." This has led to a layering effect where older "legacy" technology remains in place, while new technology is adopted. With this complex mix of legacy solutions and emerging technology, business leaders have a hard time understanding how everything fits together and what value is provided from IT investments. Gone are the days when the Chief Information Officer (CIO) could say "just trust me" when business people asked questions about IT spending. In addition, new requirements for corporate compliance combined with the expanding use of web-based solutions makes managing technology more difficult than ever. With the advent of Software-as-a-Service (SaaS) or cloud computing, the technical footprint, or ecosystem, of IT has extended beyond the enterprise itself. Virtualization of platforms and service-orientation adds to the mind-numbing mix of technologies available to IT. However, there are many systems available to help companies manage their technological portfolio. Unfortunately, multiple teams within the business and within IT see the problem of managing the IT portfolio differently. In many companies, there is no centralized effort to gather and store IT portfolio information. Teams with a need for IT asset information tend to purchase or build a repository specific to their area of responsibility. Some examples of these include: Business goals repository Change management database Configuration management database Business process management database Fixed assets database Metadata repository Project portfolio management database Service catalog Service registry While each of these repositories provides valuable information about IT portfolios, they are each optimized to meet a specific set of requirements. The following table shows the main types of information stored in each of these repositories along with a brief statement about its functional purpose: Repository Main content Main purpose Business goals Goal statements and assignments Documents business goals and who is responsible Change management database Change request tickets, application owners Captures change requests and who can authorize change Configuration management database Identifies actual hardware and software in use across the enterprise Supports Information Technology Infrastructure Library (ITIL) processes Business process management database Business processes, information flows, and process owners Used to develop applications and document business processes Fixed assets database Asset identifiers for hardware and software, asset life, purchase cost, and depreciation amounts Documents cost and depreciable life of IT assets Metadata repository Data about the company databases and files Documents the names, definitions, data types, and locations of the company data Project portfolio management database Project names, classifications, assignments, business value and scope Used to manage IT workload and assess value of IT projects to the business Service catalog Defines hardware and compatible software available for project use Used to manage hardware and software implementations assigned to the IT department Service registry Names and details of reusable software services Used to manage, control, and report on reusable software It is easy to see that while each of these repositories serves a specific purpose, none supports an overarching view across the others. For example, one might ask: How many SQL Server databases do we have installed and what hardware do they run on? To answer this question, IT managers would have to extract data from the metadata repository and combine it with data from the Configuration Management Database (CMDB). The question could be extended: How much will it cost in early expense write-offs if we retire the SQL Server DB servers into a new virtual grid of servers? To answer this question, IT managers need to determine not only how many servers host SQL Server, but how old they are, what they cost at purchase time, and how much depreciation is left on them. Now the query must span at least three systems (CMDB, fixed assets, and metadata repository). The accuracy of the answer will also depend on the relative validity of the data in each repository. There could be overlapping data in some, and outright errors in others. Changing the conversation When upper management asks difficult questions, they are usually interested in cost, risk management, or IT agility. Not knowing a great deal about IT, they are curious about why they need to spend millions on technology and what they get for their investments. The conversation ends up being primarily about cost and how to reduce expenses. This is not a good position to be in if you are running a support function like Enterprise Architecture. How can you explain IT investments in a way that management can understand? If you are not prepared with facts, management has no choice but to assume that costs are out of control and they can be reduced, usually by dramatic amounts. As a good corporate citizen, it is your job to help reduce costs. Like everyone in management, getting the most out of the company's assets is your responsibility. However, as we in IT know, it's just as important to be ready for changes in technology and to be on top of technology trends. As technology leaders, it is our job to help the company stay current through investments that may pay off in the future rather than show an immediate return. The following diagram shows various management functions and technologies that are used to manage the business of IT: The dimensions of these tools and processes span systems that run the business to change the business and from the ones using operational information to using strategic information. Various technologies that support data about IT assets are shown. These include: Business process analytics and management information Service-oriented architecture governance Asset-liability management Information technology systems management Financial management information Project portfolio and management information The key to changing the conversation about IT is having the ability to bring the information of these disciplines into a single view. The single view provides the ability to actually discuss IT in a strategic way. Gathering data and reporting on the actual metrics of IT, in a way business leaders can understand, supports strategic planning. The strategic planning process combined with fact-based metrics establishes credibility with upper management and promotes improved decision making on a daily basis. Troux Technologies Solving the IT-business communication problem has been difficult until recently. Troux Technologies (www.troux.com) developed a new open-architected repository and software solution, called the Troux Transformation Platform, to help IT manage the vast array of technology deployed within the company. Troux customers use the suite of applications and advanced integration platform within the product architecture to deliver bottom-line results. By locating where IT expenses are redundant, or out-of-step with business strategy, Troux customers experience significant cost savings. When used properly, the platform also supports improved IT efficiency, quicker response to business requirements, and IT risk reduction. In today's globally-connected markets, where shocks and innovations happen at an unprecedented rate, antiquated approaches to Strategic IT Planning and Enterprise Architecture have become a major obstruction. The inability of IT to plan effectively has driven business leaders to seek solutions available outside the enterprise. Using SaaS or Application Service Providers (ASPs) to meet urgent business objectives can be an effective means to meet short-term goals. However, to be complete, even these solutions usually require integration with internal systems. IT finds itself dealing with unspecified service-level requirements, developing integration architectures, and cleaning up after poorly planned activities by business leaders who don't understand what capabilities exist within the software running inside the company. A global leader in Strategic IT Planning and Enterprise Architecture software, Troux has created an Enterprise Architecture repository that IT can use to put itself at the center of strategic planning. Troux has been successful in implementing its repository at a number of companies. A partial list of Troux's customers can be found on the website. There are other enterprise-level repository vendors on the market. However, leading analysts, such as The Gartner Group and Forrester Research, have published recent studies ranking Troux as a leader in the IT strategy planning tools space. Troux Transformation Platform Troux's sophisticated integration and collaboration capabilities support multiple business initiatives such as handling mergers, aligning business and IT plans, and consolidating IT assets. The business-driven platform provides new levels of visibility into the complex web of IT resources, programs, and business strategy so the business can see instantly where IT spending and programs are redundant or out-of-step with business strategy. The business suite of applications helps IT to plan and execute faster with data assimilated from various trusted sources within the company. The platform provides information necessary to relevant stakeholders such as Business Analysts, Enterprise Architects, The Program Management Office, Solutions Architects, and executives within the business and IT. The transformation platform is not only designed to address today's urgent cost-restructuring agendas, but it also introduces an ongoing IT management discipline, allowing EA and business users to drive strategic growth initiatives. The integration platform provides visibility and control to: Uncover and fix business/IT disconnects: This shows how IT directly supports business strategies and capabilities, and ensures that mismatched spending can be eliminated. Troux Alignment helps IT think like a CFO and demonstrate control and business purpose for the billions that are spent on IT assets, by ensuring that all stakeholders have valid and relevant IT information. Identify and eliminate redundant IT spending: This uncovers the many untapped opportunities with Troux Optimization to free up needless spend, and apply it either to the bottom line or to support new business initiatives. Speed business response and simplify IT: This speeds the creation and deployment of a set of standard, reusable building blocks that are proven to work in agile business cycles. Troux Standards enables the use of IT standards in real time, thereby streamlining the process of IT governance. Accelerate business transformation for government agencies: This helps federal agencies create an actionable Enterprise Architecture and comply with constantly changing mandates. Troux eaGov automatically identifies opportunities to reduce costs to business and IT risks, while fostering effective initiative planning and execution within or across agencies. Support EA methodology: Companies adopting The Open Group Architecture Framework (TOGAF™) can use the Troux for TOGAF solution to streamline their efforts. Unlock the full potential of IT portfolio investment: Unifies Strategic IT Planning, EA, and portfolio project management through a common IT governance process. The Troux CA Clarity Connection enables the first bi-directional integration in the market between CA Clarity Project Portfolio Management (PPM) and the Troux EA repository for enhanced IT investment portfolio planning, analysis, and control. Understand your deployed IT assets: Using the out-of-the-box connection to HP's Universal Configuration Management Database (uCMDB), link software and hardware with the applications they support. All of these capabilities are enabled through an open-architected platform that provides uncomplicated data integration tools. The platform provides Architecture-modeling capabilities for IT Architects, an extensible database schema (or meta-model), and integration interfaces that are simple to automate and bring online with minimal programming efforts. Enterprise Architecture repository The Troux Transformation Platform acts as the consolidation point across all the various IT management databases and even some management systems outside the control of IT. By collecting data from across various areas, new insights are possible, leading to reductions in operating costs and improvements in service levels to the business. While it is possible to combine these using other products on the market or even develop a home-grown EA repository, Troux has created a very easy-to-use API for data collection purposes. In addition, Troux provides a database meta-model for the repository that is extensible. Meta-model extensibility makes the product adaptable to the other management systems across the company. Troux also supports a configurable user interface allowing for a customized view into the repository. This capability makes the catalog appear as if it were a part of the other control systems already in place at the company. Additionally, Troux provides an optional set of applications that support a variety of roles, out of the box, with no meta-model extensions or user interface configurations required. These include: Troux Standards: This application supports the IT technology standards and lifecycle governance process usually conducted by the Enterprise Architecture department. Troux Optimization: This application supports the Application portfolio lifecycle management process conducted by the Enterprise Program Management Office (EPMO) and/or Enterprise Architecture. Troux Alignment: This application supports the business and IT assets and application-planning processes conducted by IT Engineering, Corporate Finance, and Enterprise Architecture. Even these three applications that are available out-of-the-box from Troux can be customized by extending their underlying meta-models and customizing the user interface. The EA repository provides output that is viewable online. Standard reports are provided or custom reports can be developed as per the specific needs of the user community. Departments within or even outside of IT can use the customized views, standard reports, and custom reports to perform analyses. For example, the Enterprise Program Management Office (EPMO) can produce reports that link projects with business goals. The EPMO can review the project portfolio of the company to identify projects that do not support company goals. Decisions can be made about these projects, thereby stopping them, slowing them down, or completing them faster. Resources can be moved from the stopped or completed low-value projects to the higher-value projects, leading to increased revenue or reduced costs for the company. In a similar fashion, the Internal Audit department can check on the level of compliance to company IT standards or use the list of applications stored within the catalog to determine the best audit schedule to follow. Less time can be spent auditing applications with minimal impact on company operations or on applications and projects targeted as low value. Application development can use data from the catalog to understand the current capabilities of the existing applications of the company. As staff changes or "off-shore" resources are applied to projects, knowing what existing systems do in advance of a new project can save many hours of work. Information can be extracted from the EA repository directly into requirements documentation, which is always the starting point for new applications, as well as maintenance projects on existing applications. One study performed at a major financial services company showed that over 40% of project development time was spent in the upfront work of documenting and explaining current application capabilities to business sponsors of projects. By supplying development teams with lists of application capabilities early in the project life cycle, time to gather and document requirements can be reduced significantly. Of course, one of the biggest benefactors of the repository is the EA group. In most companies, EA's main charter is to be the steward of information about applications, databases, hardware, software, and network architecture. EA can perform analyses using the data from the repository leading to recommendations for changes by middle and upper management. In addition, EA is responsible for collecting, setting, and managing the IT standards for the company. The repository supports a single source for IT standards, whether they are internal or external standards. The standards portion of the repository can be used as the centerpiece of IT governance. The function of the Architecture Review Board (ARB) is fully supported by Troux Standards. Capacity Planning and IT Engineering functions will also gain substantially through the use of an EA repository. The useful life of IT assets can be analyzed to create a master plan for technical refresh or reengineering efforts. The annual spend on IT expenses can be reduced dramatically through increased levels of virtualization of IT assets, consolidation of platforms, and even consolidation of whole data centers. IT Engineering can review what is currently running across the company and recommend changes to reduce software maintenance costs, eliminate underutilized hardware, and consolidate federated databases. Lastly, IT Operations can benefit from a consolidated view into the technical footprint running at any point in time. Even when system availability service levels call for near-real-time error correction, it may take hours for IT Operations personnel to diagnose problems. They tend not to have a full understanding of what applications run on what servers, which firewalls support which networks, and which databases support which applications. Problem determination time can be reduced by providing accurate technical architecture information to those focused on keeping systems running and meeting business service-level requirements. Summary This article identified the problem IT has with understanding what technologies it has under management. While many solutions are in place in many companies to gain a better view into the IT portfolio, none are designed to show the impact of IT assets in the aggregate. Without the capabilities provided by an EA repository, IT management has a difficult time answering tough questions asked by business leaders. Troux Technologies offers a solution to this problem using the Troux Transformation Platform. The platform acts as a master metadata repository and becomes the focus of many efforts that IT may run to reduce significant costs and improve business service levels. Further resources on this subject: Troux Enterprise Architecture: Managing the EA function [article]
Read more
  • 0
  • 0
  • 9108

article-image-introduction-logging-tomcat-7
Packt
21 Mar 2012
9 min read
Save for later

Introduction to Logging in Tomcat 7

Packt
21 Mar 2012
9 min read
(For more resources on Apache, see here.) JULI Previous versions of Tomcat (till 5.x) use Apache common logging services for generating logs. A major disadvantage with this logging mechanism is that it can handle only single JVM configuration and makes it difficult to configure separate logging for each class loader for independent application. In order to resolve this issue, Tomcat developers have introduced a separate API for Tomcat 6 version, that comes with the capability of capturing each class loader activity in the Tomcat logs. It is based on java.util.logging framework. By default, Tomcat 7 uses its own Java logging API to implement logging services. This is also called as JULI. This API can be found in TOMCAT_HOME/bin of the Tomcat 7 directory structures (tomcat-juli.jar). The following screenshot shows the directory structure of the bin directory where tomcat-juli.jar is placed. JULI also provides the feature for custom logging for each web application, and it also supports private per-application logging configurations. With the enhanced feature of separate class loader logging, it also helps in detecting memory issues while unloading the classes at runtime. For more information on JULI and the class loading issue, please refer to http://tomcat.apache.org/tomcat-7.0-doc/logging.html and http://tomcat.apache.org/tomcat-7.0-doc/class-loader-howto.html respectively. Loggers, appenders, and layouts There are some important components of logging which we use at the time of implementing the logging mechanism for applications. Each term has its individual importance in tracking the events of the application. Let's discuss each term individually to find out their usage: Loggers:It can be defined as the logical name for the log file. This logical name is written in the application code. We can configure an independent logger for each application. Appenders: The process of generation of logs are handled by appenders. There are many types of appenders, such as FileAppender, ConsoleAppender, SocketAppender, and so on, which are available in log4j. The following are some examples of appenders for log4j: log4j.appender.CATALINA=org.apache.log4j.DailyRollingFileAppender log4j.appender.CATALINA.File=${catalina.base}/logs/catalina.out log4j.appender.CATALINA.Append=true log4j.appender.CATALINA.Encoding=UTF-8 The previous four lines of appenders define the DailyRollingFileAppender in log4j, where the filename is catalina.out . These logs will have UTF-8 encoding enabled. If log4j.appender.CATALINA.append=false, then logs will not get updated in the log files. # Roll-over the log once per day log4j.appender.CATALINA.DatePattern='.'dd-MM-yyyy'.log' log4j.appender.CATALINA.layout = org.apache.log4j.PatternLayout log4j.appender.CATALINA.layout.ConversionPattern = %d [%t] %-5p %c- %m%n The previous three lines of code show the roll-over of log once per day. Layout: It is defined as the format of logs displayed in the log file. The appender uses layout to format the log files (also called as patterns). The highlighted code shows the pattern for access logs: <Valve className="org.apache.catalina.valves.AccessLogValve" directory="logs" prefix="localhost_access_log." suffix=".txt" pattern="%h %l %u %t &quot;%r&quot; %s %b" resolveHosts="false"/> Loggers, appenders, and layouts together help the developer to capture the log message for the application event. Types of logging in Tomcat 7 We can enable logging in Tomcat 7 in different ways based on the requirement. There are a total of five types of logging that we can configure in Tomcat, such as application, server, console, and so on. The following figure shows the different types of logging for Tomcat 7. These methods are used in combination with each other based on environment needs. For example, if you have issues where Tomcat services are not displayed, then console logs are very helpful to identify the issue, as we can verify the real-time boot sequence. Let's discuss each logging method briefly. Application log These logs are used to capture the application event while running the application transaction. These logs are very useful in order to identify the application level issues. For example, suppose your application performance is slow on a particular transition, then the details of that transition can only be traced in application log. The biggest advantage of application logs is we can configure separate log levels and log files for each application, making it very easy for the administrators to troubleshoot the application. Log4j is used in 90 percent of the cases for application log generation. Server log Server logs are identical to console logs. The only advantage of server logs is that they can be retrieved anytime but console logs are not available after we log out from the console. Console log This log gives you the complete information of Tomcat 7 startup and loader sequence. The log file is named as catalina.out and is found in TOMCAT_HOME/logs. This log file is very useful in checking the application deployment and server startup testing for any environment. This log is configured in the Tomcat file catalina.sh, which can be found in TOMCAT_HOME/bin. The previous screenshot shows the definition for Tomcat logging. By default, the console logs are configured as INFO mode. There are different levels of logging in Tomcat such as WARNING, INFORMATION, CONFIG, and FINE. The previous screenshot shows the Tomcat log file location, after the start of Tomcat services. The previous screenshot shows the output of the catalina.out file, where Tomcat services are started in 1903 ms. Access log Access logs are customized logs, which give information about the following: Who has accessed the application What components of the application are accessed Source IP and so on These logs play a vital role in traffic analysis of many applications to analyze the bandwidth requirement and also helps in troubleshooting the application under heavy load. These logs are configured in server.xml in TOMCAT_HOME/conf. The following screenshot shows the definition of access logs. You can customize them according to the environment and your auditing requirement. Let's discuss the pattern format of the access logs and understand how we can customize the logging format: <Valve className="org.apache.catalina.valves.AccessLogValve" directory="logs" prefix="localhost_access_log." suffix=".txt" pattern="%h %l %u %t &quot;%r&quot; %s %b" resolveHosts="false"/> Class Name: This parameter defines the class name used for generation of logs. By default, Apache Tomcat 7 uses the org.apache.catalina.valves.AccessLogValve class for access logs. Directory: This parameter defines the directory location for the log file. All the log files are generated in the log directory—TOMCAT_HOME/logs—but we can customize the log location based on our environment setup and then update the directory path in the definition of access logs. Prefix: This parameter defines the prefix of the access log filename, that is, by default, access log files are generated by the name localhost_access_log.yy-mm-dd.txt. Suffix: This parameter defines the file extension of the log file. Currently it is in .txt format. Pattern: This parameter defines the format of the log file. The pattern is a combination of values defined by the administrator, for example, %h = remote host address. The following screenshot shows the default log format for Tomcat 7. Access logs show the remote host address, date/time of request, method used for response, URI mapping, and HTTP status code. In case you have installed the web traffic analysis tool for application, then you have to change the access logs to a different format. Host manager These logs define the activity performed using Tomcat Manager, such as various tasks performed, status of application, deployment of application, and lifecycle of Tomcat. These configurations are done on the logging.properties, which can be found in TOMCAT_HOME/conf. The previous screenshot shows the definition of host, manager, and host-manager details. If you see the definitions, it defines the log location, log level, and prefix of the filename. In logging.properties, we are defining file handlers and appenders using JULI. The log file for manager looks similar to the following: I28 Jun, 2011 3:36:23 AM org.apache.catalina.core.ApplicationContext log INFO: HTMLManager: list: Listing contexts for virtual host 'localhost' 28 Jun, 2011 3:37:13 AM org.apache.catalina.core.ApplicationContext log INFO: HTMLManager: list: Listing contexts for virtual host 'localhost' 28 Jun, 2011 3:37:42 AM org.apache.catalina.core.ApplicationContext log INFO: HTMLManager: undeploy: Undeploying web application at '/sample' 28 Jun, 2011 3:37:43 AM org.apache.catalina.core.ApplicationContext log INFO: HTMLManager: list: Listing contexts for virtual host 'localhost' 28 Jun, 2011 3:42:59 AM org.apache.catalina.core.ApplicationContext log INFO: HTMLManager: list: Listing contexts for virtual host 'localhost' 28 Jun, 2011 3:43:01 AM org.apache.catalina.core.ApplicationContext log INFO: HTMLManager: list: Listing contexts for virtual host 'localhost' 28 Jun, 2011 3:53:44 AM org.apache.catalina.core.ApplicationContext log INFO: HTMLManager: list: Listing contexts for virtual host 'localhost' Types of log levels in Tomcat 7 There are seven levels defined for Tomcat logging services (JULI). They can be set based on the application requirement. The following figure shows the sequence of log levels for JULI: Every log level in JULI had its own functionality. The following table shows the functionality of each log level in JULI: Log level Description SEVERE(highest) Captures exception and Error WARNING Warning messages INFO Informational message, related to server activity CONFIG Configuration message FINE Detailed activity of server transaction (similar to debug) FINER More detailed logs than FINE FINEST(least) Entire flow of events (similar to trace) For example, let's take an appender from logging.properties and find out the log level used; the first log appender for localhost is using FINE as the log level, as shown in the following code snippet: localhost.org.apache.juli.FileHandler.level = FINE localhost.org.apache.juli.FileHandler.directory = ${catalina.base}/logs localhost.org.apache.juli.FileHandler.prefix = localhost. The following code shows the default file handler configuration for logging in Tomcat 7 using JULI. The properties are mentioned and log levels are highlighted: ############################################################ # Facility specific properties. # Provides extra control for each logger. ############################################################ org.apache.catalina.core.ContainerBase.[Catalina].[localhost].level = INFO org.apache.catalina.core.ContainerBase.[Catalina].[localhost].handlers = 2localhost.org.apache.juli.FileHandler org.apache.catalina.core.ContainerBase.[Catalina].[localhost].[/manager] .level = INFO org.apache.catalina.core.ContainerBase.[Catalina].[localhost].[/manager] .handlers = 3manager.org.apache.juli.FileHandler org.apache.catalina.core.ContainerBase.[Catalina].[localhost].[/host- manager].level = INFO org.apache.catalina.core.ContainerBase.[Catalina].[localhost].[/host- manager].handlers = 4host-manager.org.apache.juli.FileHandler
Read more
  • 0
  • 0
  • 8999

article-image-common-performance-issues
Packt
19 Jun 2014
16 min read
Save for later

Common performance issues

Packt
19 Jun 2014
16 min read
(For more resources related to this topic, see here.) Threading performance issues Threading performance issues are the issues related to concurrency, as follows: Lack of threading or excessive threading Threads blocking up to starvation (usually from competing on shared resources) Deadlock until the complete application hangs (threads waiting for each other) Memory performance issues Memory performance issues are the issues that are related to application memory management, as follows: Memory leakage: This issue is an explicit leakage or implicit leakage as seen in improper hashing Improper caching: This issue is due to over caching, inadequate size of the object, or missing essential caching Insufficient memory allocation: This issue is due to missing JVM memory tuning Algorithmic performance issues Implementing the application logic requires two important parameters that are related to each other; correctness and optimization. If the logic is not optimized, we have algorithmic issues, as follows: Costive algorithmic logic Unnecessary logic Work as designed performance issues The work as designed performance issue is a group of issues related to the application design. The application behaves exactly as designed but if the design has issues, it will lead to performance issues. Some examples of performance issues are as follows: Using synchronous when asynchronous should be used Neglecting remoteness, that is, using remote calls as if they are local calls Improper loading technique, that is, eager versus lazy loading techniques Selection of the size of the object Excessive serialization layers Web services granularity Too much synchronization Non-scalable architecture, especially in the integration layer or middleware Saturated hardware on a shared infrastructure Interfacing performance issues Whenever the application is dealing with resources, we may face the following interfacing issues that could impact our application performance: Using an old driver/library Missing frequent database housekeeping Database issues, such as, missing database indexes Low performing JMS or integration service bus Logging issues (excessive logging or not following the best practices while logging) Network component issues, that is, load balancer, proxy, firewall, and so on Miscellaneous performance issues Miscellaneous performance issues include different performance issues, as follows: Inconsistent performance of application components, for example, having slow components can cause the whole application to slow down Introduced performance issues to delay the processing speed Improper configuration tuning of different components, for example, JVM, application server, and so on Application-specific performance issues, such as excessive validations, apply many business rules, and so on Fake performance issues Fake performance issues could be a temporary issue or not even an issue. The famous examples are as follows: Networking temporary issues Scheduled running jobs (detected from the associated pattern) Software automatic updates (it must be disabled in production) Non-reproducible issues In the following sections, we will go through some of the listed issues. Threading performance issues Multithreading has the advantage of maximizing the hardware utilization. In particular, it maximizes the processing power by executing multiple tasks concurrently. But it has different side effects, especially if not used wisely inside the application. For example, in order to distribute tasks among different concurrent threads, there should be no or minimal data dependency, so each thread can complete its task without waiting for other threads to finish. Also, they shouldn't compete over different shared resources or they will be blocked, waiting for each other. We will discuss some of the common threading issues in the next section. Blocking threads A common issue where threads are blocked is waiting to obtain the monitor(s) of certain shared resources (objects), that is, holding by other threads. If most of the application server threads are consumed in a certain blocked status, the application becomes gradually unresponsive to user requests. In the Weblogic application server, if a thread keeps executing for more than a configurable period of time (not idle), it gets promoted to the Stuck thread. The more the threads are in the stuck status, the more the server status becomes critical. Configuring the stuck thread parameters is part of the Weblogic performance tuning. Performance symptoms The following symptoms are the performance symptoms that usually appear in cases of thread blocking: Slow application response (increased single request latency and pending user requests) Application server logs might show some stuck threads. The server's healthy status becomes critical on monitoring tools (application server console or different monitoring tools) Frequent application server restarts either manually or automatically Thread dump shows a lot of threads in the blocked status waiting for different resources Application profiling shows a lot of thread blocking An example of thread blocking To understand the effect of thread blocking on application execution, open the HighCPU project and measure the time it takes for execution by adding the following additional lines: long start= new Date().getTime(); .. .. long duration= new Date().getTime()-start; System.err.println("total time = "+duration); Now, try to execute the code with a different number of the thread pool size. We can try using the thread pool size as 50 and 5, and compare the results. In our results, the execution of the application with 5 threads is much faster than 50 threads! Let's now compare the NetBeans profiling results of both the executions to understand the reason behind this unexpected difference. The following screenshot shows the profiling of 50 threads; we can see a lot of blocking for the monitor in the column and the percentage of Monitor to the left waiting around at 75 percent: To get the preceding profiling screen, click on the Profile menu inside NetBeans, and then click on Profile Project (HighCPU). From the pop-up options, select Monitor and check all the available options, and then click on Run. The following screenshot shows the profiling of 5 threads, where there is almost no blocking, that is, less threads compete on these resources: Try to remove the System.out statement from inside the run() method, re-execute the tests, and compare the results. Another factor that also affects the selection of the pool size, especially when the thread execution takes long time, is the context switching overhead. This overhead requires the selection of the optimal pool size, usually related to the number of available processors for our application. Context switching is the CPU switching from one process (or thread) to another, which requires restoration of the execution data (different CPU registers and program counters). The context switching includes suspension of the current executing process, storing its current data, picking up the next process for execution according to its priority, and restoring its data. Although it's supported on the hardware level and is faster, most operating systems do this on the level of software context switching to improve the performance. The main reason behind this is the ability of the software context switching to selectively choose the required registers to save. Thread deadlock When many threads hold the monitor for objects that they need, this will result in a deadlock unless the implementation uses the new explicit Lock interface. In the example, we had a deadlock caused by two different threads waiting to obtain the monitor that the other thread held. The thread profiling will show these threads in a continuous blocking status, waiting for the monitors. All threads that go into the deadlock status become out of service for the user's requests, as shown in the following screenshot: Usually, this happens if the order of obtaining the locks is not planned. For example, if we need to have a quick and easy fix for a multidirectional thread deadlock, we can always lock the smallest or the largest bank account first, regardless of the transfer direction. This will prevent any deadlock from happening in our simple two-threaded mode. But if we have more threads, we need to have a much more mature way to handle this by using the Lock interface or some other technique. Memory performance issues In spite of all this great effort put into the allocated and free memory in an optimized way, we still see memory issues in Java Enterprise applications mainly due to the way people are dealing with memory in these applications. We will discuss mainly three types of memory issues: memory leakage, memory allocation, and application data caching. Memory leakage Memory leakage is a common performance issue where the garbage collector is not at fault; it is mainly the design/coding issues where the object is no longer required but it remains referenced in the heap, so the garbage collector can't reclaim its space. If this is repeated with different objects over a long period (according to object size and involved scenarios), it may lead to an out of memory error. The most common example of memory leakage is adding objects to the static collections (or an instance collection of long living objects, such as a servlet) and forgetting to clean collections totally or partially. Performance symptoms The following symptoms are some of the expected performance symptoms during a memory leakage in our application: The application uses heap memory increased by time The response slows down gradually due to memory congestion OutOfMemoryError occurs frequently in the logs and sometimes an application server restart is required Aggressive execution of garbage collection activities Heap dump shows a lot of objects retained (from the leakage types) A sudden increase of memory paging as reported by the operating system monitoring tools An example of memory leakage We have a sample application ExampleTwo; this is a product catalog where users can select products and add them to the basket. The application is written in spaghetti code, so it has a lot of issues, including bad design, improper object scopes, bad caching, and memory leakage. The following screenshot shows the product catalog browser page: One of the bad issues is the usage of the servlet instance (or static members), as it causes a lot of issues in multiple threads and has a common location for unnoticed memory leakages. We have added the following instance variable as a leakage location: private final HashMap<String, HashMap> cachingAllUsersCollection = new HashMap(); We will add some collections to the preceding code to cause memory leakage. We also used the caching in the session scope, which causes implicit leakage. The session scope leakage is difficult to diagnose, as it follows the session life cycle. Once the session is destroyed, the leakage stops, so we can say it is less severe but more difficult to catch. Adding global elements, such as a catalog or stock levels, to the session scope has no meaning. The session scope should only be restricted to the user-specific data. Also, forgetting to remove data that is not required from a session makes the memory utilization worse. Refer to the following code: @Stateful public class CacheSessionBean Instead of using a singleton class here or stateless bean with a static member, we used the Stateful bean, so it is instantiated per user session. We used JPA beans in the application layers instead of using View Objects. We also used loops over collections instead of querying or retrieving the required object directly, and so on. It would be good to troubleshoot this application with different profiling aspects to fix all these issues. All these factors are enough to describe such a project as spaghetti. We can use our knowledge in Apache JMeter to develop simple testing scenarios. As shown in the following screenshot, the scenario consists of catalog navigations and details of adding some products to the basket: Executing the test plan using many concurrent users over many iterations will show the bad behavior of our application, where the used memory is increased by time. There is no justification as the catalog is the same for all users and there's no specific user data, except for the IDs of the selected products. Actually, it needs to be saved inside the user session, which won't take any remarkable memory space. In our example, we intend to save a lot of objects in the session, implement a wrong session level, cache, and implement meaningless servlet level caching. All this will contribute to memory leakage. This gradual increase in the memory consumption is what we need to spot in our environment as early as possible (as we can see in the following screenshot, the memory consumption in our application is approaching 200 MB!): Improper data caching Caching is one of the critical components in the enterprise application architecture. It increases the application performance by decreasing the time required to query the object again from its data store, but it also complicates the application design and causes a lot of other secondary issues. The main concerns in the cache implementation are caching refresh rate, caching invalidation policy, data inconsistency in a distributed environment, locking issues while waiting to obtain the cached object's lock, and so on. Improper caching issue types The improper caching issue can take a lot of different variants. We will pick some of them and discuss them in the following sections. No caching (disabled caching) Disabled caching will definitely cause a big load over the interfacing resources (for example, database) by hitting it in with almost every interaction. This should be avoided while designing an enterprise application; otherwise; the application won't be usable. Fortunately, this has less impact than using wrong caching implementation! Most of the application components such as database, JPA, and application servers already have an out-of-the-box caching support. Too small caching size Too small caching size is a common performance issue, where the cache size is initially determined but doesn't get reviewed with the increase of the application data. The cache sizing is affected by many factors such as the memory size. If it allows more caching and the type of the data, lookup data should be cached entirely when possible, while transactional data shouldn't be cached unless required under a very strict locking mechanism. Also, the cache replacement policy and invalidation play an important role and should be tailored according to the application's needs, for example, least frequently used, least recently used, most frequently used, and so on. As a general rule, the bigger the cache size, the higher the cache hit rate and the lower the cache miss ratio. Also, the proper replacement policy contributes here; if we are working—as in our example—on an online product catalog, we may use the least recently used policy so all the old products will be removed, which makes sense as the users usually look for the new products. Monitoring of the caching utilization periodically is an essential proactive measure to catch any deviations early and adjust the cache size according to the monitoring results. For example, if the cache saturation is more than 90 percent and the missed cache ratio is high, a cache resizing is required. Missed cache hits are very costive as they hit the cache first and then the resource itself (for example, database) to get the required object, and then add this loaded object into the cache again by releasing another object (if the cache is 100 percent), according to the used cache replacement policy. Too big caching size Too big caching size might cause memory issues. If there is no control over the cache size and it keeps growing, and if it is a Java cache, the garbage collector will consume a lot of time trying to garbage collect that huge memory, aiming to free some memory. This will increase the garbage collection pause time and decrease the cache throughput. If the cache throughput is decreased, the latency to get objects from the cache will increase causing the cache retrieval cost to be high to the level it might be slower than hitting the actual resources (for example, database). Using the wrong caching policy Each application's cache implementation should be tailored according to the application's needs and data types (transactional versus lookup data). If the selection of the caching policy is wrong, the cache will affect the application performance rather than improving it. Performance symptoms According to the cache issue type and different cache configurations, we will see the following symptoms: Decreased cache hit rate (and increased cache missed ratio) Increased cache loading because of the improper size Increased cache latency with a huge caching size Spiky pattern in the performance testing response time, in case the cache size is not correct, causes continuous invalidation and reloading of the cached objects An example of improper caching techniques In our example, ExampleTwo, we have demonstrated many caching issues, such as no policy defined, global cache is wrong, local cache is improper, and no cache invalidation is implemented. So, we can have stale objects inside the cache. Cache invalidation is the process of refreshing or updating the existing object inside the cache or simply removing it from the cache. So in the next load, it reflects its recent values. This is to keep the cached objects always updated. Cache hit rate is the rate or ratio in which cache hits match (finds) the required cached object. It is the main measure for cache effectiveness together with the retrieval cost. Cache miss rate is the rate or ratio at which the cache hits the required object that is not found in the cache. Last access time is the timestamp of the last access (successful hit) to the cached objects. Caching replacement policies or algorithms are algorithms implemented by a cache to replace the existing cached objects with other new objects when there are no rooms available for any additional objects. This follows missed cache hits for these objects. Some examples of these policies are as follows: First-in-first-out (FIFO): In this policy, the cached object is aged and the oldest object is removed in favor of the new added ones. Least frequently used (LFU): In this policy, the cache picks the less frequently used object to free the memory, which means the cache will record statistics against each cached object. Least recently used (LRU): In this policy, the cache replaces the least recently accessed or used items; this means the cache will keep information like the last access time of all cached objects. Most recently used (MRU): This policy is the opposite of the previous one; it removes the most recently used items. This policy fits the application where items are no longer needed after the access, such as used exam vouchers. Aging policy: Every object in the cache will have an age limit, and once it exceeds this limit, it will be removed from the cache in the simple type. In the advanced type, it will also consider the invalidation of the cache according to predefined configuration rules, for example, every three hours, and so on. It is important for us to understand that caching is not our magic bullet and it has a lot of related issues and drawbacks. Sometimes, it causes overhead if not correctly tailored according to real application needs.
Read more
  • 0
  • 0
  • 8988
article-image-running-your-applications-aws-part-2
Cheryl Adams
19 Aug 2016
6 min read
Save for later

Running Your Applications with AWS - Part 2

Cheryl Adams
19 Aug 2016
6 min read
An active account with AWS means you are on your way with building in the cloud.  Before you start building, you need to tackle the Billing and Cost Management, under Account. It is likely that you are starting with a Free-Tier, so it is important to know that you still have the option of paying for additional services. Also, if you decide to continue with AWS,you should get familiar with this page.  This is not your average bill or invoice page—it is much more than that. The Billing & Cost Management Dashboard is a bird’s-eye view of all of your account activity. Once you start accumulating pay-as-you-go services, this page will give you a quick review of your monthly spending based on services. Part of managing your cloud services includes billing, so it is a good idea to become familiar with this from the start. Amazon also gives you the option of setting up cost-based alerts for your system, which is essential if youwant to be alerted by any excessive cost related to your cloud services. Budgets allow you to receive e-mailed notifications or alerts if spending exceeds the budget that you have created.    If you want to dig in even deeper, try turning on the Cost Explorer for an analysis of your spending. The Billing and Cost Management section of your account is much more than just invoices. It is the AWS complete cost management system for your cloud. Being familiar with all aspects of the cost management system will help you to monitor your cloud services, and hopefully avoid any expenses that may exceed your budget. In our previous discussion, we considered all AWSservices.  Let’s take another look at the details of the services. Amazon Web Services Based on this illustration, you can see that the build options are grouped by words such asCompute, Storage & Content Delivery and  Databases.  Each of these objects or services lists a step-by-step routine that is easy to follow. Within the AWS site, there are numerous tutorials with detailed build instructions. If you are still exploring in the free-tier, AWS also has an active online community of users whotry to answer most questions. Let’s look at the build process for Amazon’s EC2 Virtual Server. The first thing that you will notice is that Amazon provides 22 different Amazon Machine Images (AMIs) to choose from (at the time this post was written).At the top of the screen is a Step process that will guide you through the build. It should be noted that some of the images available are not defined as a part of the free-tier plan. The remaining images that do fit into the plan should fit almost any project need. For this walkthrough, let’s select SUSE Linux (free eligible). It is important to note that just because the image itself is free, that does not mean all the options available within that image are free. Notice on this screen that Amazon has pre-selected the only free-tier option available for this image. From this screen you are given two options: (Review and Launch) or (Next Configure Instance Details).  Let’s try Review and Launch to see what occurs. Notice that our Step process advanced to Step 7. Amazon gives you a soft warning regarding the state of the build and potential risk. If you are okay with these risks, you can proceed and launch your server. It is important to note that the Amazon build process is user driven. It will allow you to build a server with these potential risks in your cloud. It is recommended that you carefully consider each screen before proceeding. In this instance,select Previous and not Cancel to return to Step 3. Selecting Cancelwill stop the build process and return you to the AWS main services page. Until you actually launch your server, nothing is built or saved. There are information bubbles for each line in Step 3: Configure Instance Details. Review the content of each bubble, make any changes if needed, and then proceed to the next step. Select the storage size; then select Next Tag Instance. Enter Values and Continue or Learn More for further information. Select the Next: Configure Security Group button. Security is an extremely important part of setting up your virtual server. It is recommended that you speak to your security administrator to determine the best option. For source, it is recommended that you avoid using the Anywhereoption. This selection will put your build at risk. Select my IP or custom IP as shown. If you are involved in a self-study plan, you can select the Learn More link to determine the best option. Next: Review and Launch The full details of this screen be expanded, reviewed or edited. If everything appears to be okay,proceed to Launch. One additional screen will appear for adding Private and/or Public Keys to access your new server. Make the appropriate selection and proceed to the Launch Instances. One more screen will appear for adding Private and/or Public Keys to access your new server. Make the appropriate selection and proceed to Launch Instances to see the build process. You can access your new server from the EC2 Dashboard. This example of a build process gives you a window into how the  AWS build process works. The other objects and services have a similar step-through process. Once you have launched your server, you should be able to access it and proceed with your development. Additional details for development are also available through the site. Amazon’s Web Services Platform is an all-in-one solution for your graduation to the cloud. Not only can you manage your technical environment, but also it has features that allow you to manage your budget. By setting up your virtual applicances and servers appropriately, you can maximize the value of the first  12 months of your free-tier. Carefully monitoring activities through alerts and notification will help you to avoid having any billing surprises. Going through the tutorials and visting the online community will only aid to increase your knowledge base of AWS. AWS is inviting everyone to test their services on this exciting platform, so I would definitely recommend taking advantage of it. Have fun! About the author Cheryl Adams is a senior cloud data andinfrastructure architect in the healthcare data realm. She is also the co-author of Professional Hadoop by Wrox.
Read more
  • 0
  • 0
  • 8857

article-image-transactions-for-async-programming-in-javaee
Aaron Lazar
31 Jul 2018
5 min read
Save for later

Using Transactions with Asynchronous Tasks in JavaEE [Tutorial]

Aaron Lazar
31 Jul 2018
5 min read
Threading is a common issue in most software projects, no matter which language or other technology is involved. When talking about enterprise applications, things become even more important and sometimes harder. Using asynchronous tasks could be a challenge: what if you need to add some spice and add a transaction to it? Thankfully, the Java EE environment has some great features for dealing with this challenge, and this article will show you how. This article is an extract from the book Java EE 8 Cookbook, authored by Elder Moraes. Usually, a transaction means something like code blocking. Isn't it awkward to combine two opposing concepts? Well, it's not! They can work together nicely, as shown here. Adding Java EE 8 dependency Let's first add our Java EE 8 dependency: <dependency> <groupId>javax</groupId> <artifactId>javaee-api</artifactId> <version>8.0</version> <scope>provided</scope> </dependency> Let's first create a User POJO: public class User { private Long id; private String name; public Long getId() { return id; } public void setId(Long id) { this.id = id; } public String getName() { return name; } public void setName(String name) { this.name = name; } public User(Long id, String name) { this.id = id; this.name = name; } @Override public String toString() { return "User{" + "id=" + id + ", name=" + name + '}'; } } And here is a slow bean that will return User: @Stateless public class UserBean { public User getUser(){ try { TimeUnit.SECONDS.sleep(5); long id = new Date().getTime(); return new User(id, "User " + id); } catch (InterruptedException ex) { System.err.println(ex.getMessage()); long id = new Date().getTime(); return new User(id, "Error " + id); } } } Now we create a task to be executed that will return User using some transaction stuff: public class AsyncTask implements Callable<User> { private UserTransaction userTransaction; private UserBean userBean; @Override public User call() throws Exception { performLookups(); try { userTransaction.begin(); User user = userBean.getUser(); userTransaction.commit(); return user; } catch (IllegalStateException | SecurityException | HeuristicMixedException | HeuristicRollbackException | NotSupportedException | RollbackException | SystemException e) { userTransaction.rollback(); return null; } } private void performLookups() throws NamingException{ userBean = CDI.current().select(UserBean.class).get(); userTransaction = CDI.current() .select(UserTransaction.class).get(); } } And finally, here is the service endpoint that will use the task to write the result to a response: @Path("asyncService") @RequestScoped public class AsyncService { private AsyncTask asyncTask; @Resource(name = "LocalManagedExecutorService") private ManagedExecutorService executor; @PostConstruct public void init(){ asyncTask = new AsyncTask(); } @GET public void asyncService(@Suspended AsyncResponse response){ Future<User> result = executor.submit(asyncTask); while(!result.isDone()){ try { TimeUnit.SECONDS.sleep(1); } catch (InterruptedException ex) { System.err.println(ex.getMessage()); } } try { response.resume(Response.ok(result.get()).build()); } catch (InterruptedException | ExecutionException ex) { System.err.println(ex.getMessage()); response.resume(Response.status(Response .Status.INTERNAL_SERVER_ERROR) .entity(ex.getMessage()).build()); } } } To try this code, just deploy it to GlassFish 5 and open this URL: http://localhost:8080/ch09-async-transaction/asyncService How the Asynchronous execution works The magic happens in the AsyncTask class, where we will first take a look at the performLookups method: private void performLookups() throws NamingException{ Context ctx = new InitialContext(); userTransaction = (UserTransaction) ctx.lookup("java:comp/UserTransaction"); userBean = (UserBean) ctx.lookup("java:global/ ch09-async-transaction/UserBean"); } It will give you the instances of both UserTransaction and UserBean from the application server. Then you can relax and rely on the things already instantiated for you. As our task implements a Callabe<V> object that it needs to implement the call() method: @Override public User call() throws Exception { performLookups(); try { userTransaction.begin(); User user = userBean.getUser(); userTransaction.commit(); return user; } catch (IllegalStateException | SecurityException | HeuristicMixedException | HeuristicRollbackException | NotSupportedException | RollbackException | SystemException e) { userTransaction.rollback(); return null; } } You can see Callable as a Runnable interface that returns a result. Our transaction code lives here: userTransaction.begin(); User user = userBean.getUser(); userTransaction.commit(); And if anything goes wrong, we have the following: } catch (IllegalStateException | SecurityException | HeuristicMixedException | HeuristicRollbackException | NotSupportedException | RollbackException | SystemException e) { userTransaction.rollback(); return null; } Now we will look at AsyncService. First, we have some declarations: private AsyncTask asyncTask; @Resource(name = "LocalManagedExecutorService") private ManagedExecutorService executor; @PostConstruct public void init(){ asyncTask = new AsyncTask(); } We are asking the container to give us an instance from ManagedExecutorService, which It is responsible for executing the task in the enterprise context. Then we call an init() method, and the bean is constructed (@PostConstruct). This instantiates the task. Now we have our task execution: @GET public void asyncService(@Suspended AsyncResponse response){ Future<User> result = executor.submit(asyncTask); while(!result.isDone()){ try { TimeUnit.SECONDS.sleep(1); } catch (InterruptedException ex) { System.err.println(ex.getMessage()); } } try { response.resume(Response.ok(result.get()).build()); } catch (InterruptedException | ExecutionException ex) { System.err.println(ex.getMessage()); response.resume(Response.status(Response. Status.INTERNAL_SERVER_ERROR) .entity(ex.getMessage()).build()); } } Note that the executor returns Future<User>: Future<User> result = executor.submit(asyncTask); This means this task will be executed asynchronously. Then we check its execution status until it's done: while(!result.isDone()){ try { TimeUnit.SECONDS.sleep(1); } catch (InterruptedException ex) { System.err.println(ex.getMessage()); } } And once it's done, we write it down to the asynchronous response: response.resume(Response.ok(result.get()).build()); The full source code of this recipe is at Github. So now, using Transactions with Asynchronous Tasks in JavaEE isn't such a daunting task, is it? If you found this tutorial helpful and would like to learn more, head on to this book Java EE 8 Cookbook. Oracle announces a new pricing structure for Java Design a RESTful web API with Java [Tutorial] How to convert Java code into Kotlin
Read more
  • 0
  • 0
  • 8836

article-image-scribus-creating-layout
Packt
07 Jan 2011
9 min read
Save for later

Scribus: Creating a Layout

Packt
07 Jan 2011
9 min read
  Scribus 1.3.5: Beginner's Guide Create optimum page layouts for your documents using productive tools of Scribus. Master desktop publishing with Scribus Create professional-looking documents with ease Enhance the readability of your documents using powerful layout tools of Scribus Packed with interesting examples and screenshots that show you the most important Scribus tools to create and publish your documents. Creating a new layout Creating a layout in Scribus means dealing with the New Document window. It is not a complex window but be aware that many things you'll set here will be considered definitive. If these settings look simple or evident, you should consider all these settings as important. Some of them like the page size mean that you already have an idea of the final document, or atleast that you've already made some choices that won't change after it is created. Of course, Scribus will let you change this later if you change your mind, but many things you will have done in the meantime will simply have to be done again. Time for action – setting page size and paper size and margins This window is the first that opens when you launch Scribus or when you go to the File | New menu. It contains several options that need to be set. First among these options will certainly be the page size. In our case, people usually use 54x85mm (USA: 51×89mm). When you type the measurements in the Width and Height fields, the Size option, which contains the common size presets, is automatically switched to Custom. If you want to use a different system unit, just change the Default Unit value placed below. Usually, we prefer Millimeters (mm), which is quite precise without having too many significant decimals. Then, you can set the margin for your document. Professional printers are very different from desktop printers as they can print without margins. In fact, consider margins as helpers to place objects. For a small document like a business card, having small 4mm margins will be good. What just happened? Some common page sizes are: the series (the ISO standards biggest starting with A0 841x1189, that is 1m², and halving at each half step), the US formats, especially letter (216x279mm), legal (216x356mm), and tabloid (approximately 279x432mm, 11x17in), commonly used in the UK for newspapers. The best business card size When choosing the size for the business card, you'll consider the existing size often used. Is ISO 54x85.6mm better than the US 2x3.5in, or the European 55x85mm, or the Australian 55x90mm, when only a few millimeters divide them? Best is certainly to match the most commonly-used size in your country. Remember one thing: a business card must have to be easily stored and sorted. Grabbing an uncommon format can just lead to the fact that no one will be able to put your card in their wallet. Presets will be useful if you want to print locally, but don't forget that your print company crops the paper to the size you want. So don't mind being creative and do some testing. For example, you might print on an A3 size paper for your final document or in an A3+ real printing size so that you'll be able to use bleeds, as we'll explain in the following sections. Here we're talking about the page size and not the paper size, which can be double if the Document Layout is set to any option but Single Page. For all the folded documents, the page size differs from the paper size—keep that in mind. For now choose 54x85.6 in landscape: just set 54 as the height or change the orientation button if you haven't. The other setting that might interest you is the margin . In Scribus, consider the margin as a helper. In fact nobody in the professional print process will need margins. It is useful for desktop printers, which can't print up to the sheet border. As our example is much smaller than the usual paper size, we won't have any trouble with it. Scribus has some presets for margins that are available only when a layout other than Single Page is selected. For our model, 4mm to each side will be fine. If you want to set all the fields at once, just click on the chain button at the right-hand side of the margin fields. But actually, we can consider that we won't have much to write and that it would be nice if our margins could help position the text. So let's define the margins as follows: Left: 10mm Right: 40mm Top: 30mm Bottom: 2mm Choosing a layout We've already talked about this option several times but here we are again. What kind of layout would you choose? Single page will simulate what you might have in a text processor. You can have as many pages as you want but it will be printed page after page. You'll get its result when printing with your desktop printer: Double-sided will be the option you'll use when you'll need a folded document. This is useful for magazines, newsletters, books, or such documents. In this layout, the reader will see two pages side by side at once, and you can easily manage elements that will overlap both pages. The fold will be in the exact middle. Usually, unless you have a small document size like A5 or smaller, this layout is intended to be printed by a professional. 3-Fold and 4-Fold are more for commercial little brochures. Usually, you won't use it in Scribus and will prefer a Single Page layout that you'll divide later into three or four parts. Why? Because with the folded out, Scribus will consider each "fold" as a page and will print each of them on a separate sheet—a bit tricky. You can see that for a business card, where no fold is needed, the Single Page layout will be our choice. (Move the mouse over the image to enlarge.) For the moment we won't need other options, so you can click on OK. You'll get a white rectangle on a greyish workspace. The red outline is the selection indicator for the selected page. It shows the borders of the page. The blue rectangle shows where the margins are placed. Save the document as often as possible "Save the document as often as possible"—this is the first commandment of a software user, but in Scribus this is much more important for several reasons: First of all, apologies, Scribus is a very nice piece of software but still not perfect (but which one is?). It can crash sometimes, slightly more than you'd wish, and never at a time you would expect or appreciate. Saving often will help you save a lot of time doing again what you've already achieved during the day. The Scribus undo system acts on layout options but not on text manipulations. Saving often can be helpful if you make mistakes that you can't undo. In Scribus, we will use File | Save As (or Ctrl + Shift + S) to set the document name and format. It's very simple because you have no other choice than Scribus Documents *.sla. In the list, you will see sla.gz that will be used when the Compress File checkbox will be selected. Usually, a Scribus file is not that large in size and there is no real need to compress it. Of course, if the file already exists, Scribus asks whether you want to overwrite the previous one. Scribus file version Each Scribus release has enhanced the file format to be able to store the new possibilities in the file. But when saving, you cannot choose a version: Scribus will always use the current one. Every document can be opened in future Scribus releases but not in the older ones. So be careful when you need to send the file to someone or else when you're working on several computers. Once you've used Saved As, you'll just have to simply save (File | Save) or more magically use Ctrl + S, and the modifications will automatically be added to the saved document. The extra Save as Template menu will store the actual file in a special Scribus folder. When you want to create a new document with the same global aspect, you can go to the New from Template menu and grab it from the list. There are some default templates available here, but yours might be better. Saving as a template might not be the usual saving process; this is done at the end when the basics of your layout have been made. Saving as template must happen only once for a template. So we'll use it at the end of our tutorial. Basic frames for text and images The biggest part of a design job is adding frames, setting their visual aspect, and importing content into them. In our business card we'll need a logo, name, and other information. You may add a photo. Time for action – adding the logo They are several types of graphic elements in a layout. The logo is of course one of the most important. Generally, we prefer using vector logos in SVG or EPS. Let's import a logo. In the File menu choose File | Import | Get Vector File. The cursor has now been changed, and you can click on the page where you want to place the logo. Try to click at the upper-left corner of the margins. It will certainly not be correctly placed and the logo may be too big. We'll soon see how to change it. A warning will appear and inform you that some SVG features will not be supported. There is no option other than clicking on OK, and everything should be good. What just happened? The logo is the master piece of the card. It helps recognize the origin of the contact. In some ways, it is the most important recognition for a company. Usually, a logo is the only graphical element on the card. It can be put anywhere you want, but generally the upper left-hand side corner is the place of choice.  
Read more
  • 0
  • 0
  • 8787
Packt
22 May 2015
27 min read
Save for later

Financial Derivative – Options

Packt
22 May 2015
27 min read
In this article by Michael Heydt, author of Mastering pandas for Finance, we will examine working with options data provided by Yahoo! Finance using pandas. Options are a type of financial derivative and can be very complicated to price and use in investment portfolios. Because of their level of complexity, there have been many books written that are very heavy on the mathematics of options. Our goal will not be to cover the mathematics in detail but to focus on understanding several core concepts in options, retrieving options data from the Internet, manipulating it using pandas, including determining their value, and being able to check the validity of the prices offered in the market. (For more resources related to this topic, see here.) Introducing options An option is a contract that gives the buyer the right, but not the obligation, to buy or sell an underlying security at a specific price on or before a certain date. Options are considered derivatives as their price is derived from one or more underlying securities. Options involve two parties: the buyer and the seller. The parties buy and sell the option, not the underlying security. There are two general types of options: the call and the put. Let's look at them in detail: Call: This gives the holder of the option the right to buy an underlying security at a certain price within a specific period of time. They are similar to having a long position on a stock. The buyer of a call is hoping that the value of the underlying security will increase substantially before the expiration of the option and, therefore, they can buy the security at a discount from the future value. Put: This gives the option holder the right to sell an underlying security at a certain price within a specific period of time. A put is similar to having a short position on a stock. The buyer of a put is betting that the price of the underlying security will fall before the expiration of the option and they will, thereby, be able to gain a profit by benefitting from receiving the payment in excess of the future market value. The basic idea is that one side of the party believes that the underlying security will increase in value and the other believes it will decrease. They will agree upon a price known as the strike price, where they place their bet on whether the price of the underlying security finishes above or below this strike price on the expiration date of the option. Through the contract of the option, the option seller agrees to give the buyer the underlying security on the expiry of the option if the price is above the strike price (for a call). The price of the option is referred to as the premium. This is the amount the buyer will pay to the seller to receive the option. This price of an option depends upon many factors, of which the following are the primary factors: The current price of the underlying security How long the option needs to be held before it expires (the expiry date) The strike price on the expiry date of the option The interest rate of capital in the market The volatility of the underlying security There being an adequate interest between buyer and seller around the given option The premium is often established so that the buyer can speculate on the future value of the underlying security and be able to gain rights to the underlying security in the future at a discount in the present. The holder of the option, known as the buyer, is not obliged to exercise the option on its expiration date, but the writer, also referred to as the seller, however, is obliged to buy or sell the instrument if the option is exercised. Options can provide a variety of benefits such as the ability to limit risk and the advantage of providing leverage. They are often used to diversify an investment portfolio to lower risk during times of rising or falling markets. There are four types of participants in an options market: Buyers of calls Sellers of calls Buyers of puts Sellers of puts Buyers of calls believe that the underlying security will exceed a certain level and are not only willing to pay a certain amount to see whether that happens, but also lose their entire premium if it does not. Their goal is that the resulting payout of the option exceeds their initial premium and they, therefore, make a profit. However, they are willing to forgo their premium in its entirety if it does not clear the strike price. This then becomes a game of managing the risk of the profit versus the fixed potential loss. Sellers of calls are on the other side of buyers. They believe the price will drop and that the amount they receive in payment for the premium will exceed any loss in the price. Normally, the seller of a call would already own the stock. They do not believe the price will exceed the strike price and that they will be able to keep the underlying security and profit if the underlying security stays below the strike by an amount that does not exceed the received premium. Loss is potentially unbounded as the stock increases in price above the strike price, but that is the risk for an upfront receipt of cash and potential gains on loss of price in the underlying instrument. A buyer of a put is betting that the price of the stock will drop beyond a certain level. By buying a put they gain the option to force someone to buy the underlying instrument at a fixed price. By doing this, they are betting that they can force the sale of the underlying instrument at a strike price that is higher than the market price and in excess of the premium that they pay to the seller of the put option. On the other hand, the seller of the put is betting that they can make an offer on an instrument that is perceived to lose value in the future. They will offer the option for a price that gives them cash upfront, and they plan that at maturity of the option, they will not be forced to purchase the underlying instrument. Therefore, it keeps the premium as pure profit. Or, the price of the underlying instruments drops only a small amount so that the price of buying the underlying instrument relative to its market price does not exceed the premium that they received. Notebook setup The examples in this article will be based on the following configuration in IPython: In [1]:    import pandas as pd    import numpy as np    import pandas.io.data as web    from datetime import datetime      import matplotlib.pyplot as plt    %matplotlib inline      pd.set_option('display.notebook_repr_html', False)    pd.set_option('display.max_columns', 7)    pd.set_option('display.max_rows', 15)    pd.set_option('display.width', 82)    pd.set_option('precision', 3) Options data from Yahoo! Finance Options data can be obtained from several sources. Publicly listed options are exchanged on the Chicago Board Options Exchange (CBOE) and can be obtained from their website. Through the DataReader class, pandas also provides built-in (although in the documentation referred to as experimental) access to options data. The following command reads all currently available options data for AAPL: In [2]:    aapl_options = web.Options('AAPL', 'yahoo') aapl_options = aapl_options.get_all_data().reset_index() This operation can take a while as it downloads quite a bit of data. Fortunately, it is cached so that subsequent calls will be quicker, and there are other calls to limit the types of data downloaded (such as getting just puts). For convenience, the following command will save this data to a file for quick reload at a later time. Also, it helps with repeatability of the examples. The data retrieved changes very frequently, so the actual examples in the book will use the data in the file provided with the book. It saves the data for later use (it's commented out for now so as not to overwrite the existing file). Here's the command we are talking about: In [3]:    #aapl_options.to_csv('aapl_options.csv') This data file can be reloaded with the following command: In [4]:    aapl_options = pd.read_csv('aapl_options.csv',                              parse_dates=['Expiry']) Whether from the Web or the file, the following command restructures and tidies the data into a format best used in the examples to follow: In [5]:    aos = aapl_options.sort(['Expiry', 'Strike'])[      ['Expiry', 'Strike', 'Type', 'IV', 'Bid',          'Ask', 'Underlying_Price']]    aos['IV'] = aos['IV'].apply(lambda x: float(x.strip('%'))) Now, we can take a look at the data retrieved: In [6]:    aos   Out[6]:            Expiry Strike Type     IV   Bid   Ask Underlying_Price    158 2015-02-27     75 call 271.88 53.60 53.85           128.79    159 2015-02-27     75 put 193.75 0.00 0.01           128.79    190 2015-02-27     80 call 225.78 48.65 48.80           128.79    191 2015-02-27     80 put 171.88 0.00 0.01           128.79    226 2015-02-27     85 call 199.22 43.65 43.80           128.79 There are 1,103 rows of options data available. The data is sorted by Expiry and then Strike price to help demonstrate examples. Expiry is the data at which the particular option will expire and potentially be exercised. We have the following expiry dates that were retrieved. Options typically are offered by an exchange on a monthly basis and within a short overall duration from several days to perhaps two years. In this dataset, we have the following expiry dates: In [7]:    aos['Expiry'].unique()   Out[7]:    array(['2015-02-26T17:00:00.000000000-0700',          '2015-03-05T17:00:00.000000000-0700',          '2015-03-12T18:00:00.000000000-0600',          '2015-03-19T18:00:00.000000000-0600',          '2015-03-26T18:00:00.000000000-0600',          '2015-04-01T18:00:00.000000000-0600',          '2015-04-16T18:00:00.000000000-0600',          '2015-05-14T18:00:00.000000000-0600',          '2015-07-16T18:00:00.000000000-0600',          '2015-10-15T18:00:00.000000000-0600',          '2016-01-14T17:00:00.000000000-0700',          '2017-01-19T17:00:00.000000000-0700'], dtype='datetime64[ns]') For each option's expiration date, there are multiple options available, split between puts and calls, and with different strike values, prices, and associated risk values. As an example, the option with the index 158 that expires on 2015-02-27 is for buying a call on AAPL with a strike price of $75. The price we would pay for each share of AAPL would be the bid price of $53.60. Options typically sell 100 units of the underlying security, and, therefore, this would mean that this option would cost of 100 x $53.60 or $5,360 upfront: In [8]:    aos.loc[158]   Out[8]:    Expiry             2015-02-27 00:00:00    Strike                               75    Type                              call    IV                                 272    Bid                               53.6    Ask                               53.9    Underlying_Price                   129    Name: 158, dtype: object This $5,360 does not buy us the 100 shares of AAPL. It gives us the right to buy 100 shares of AAPL on 2015-02-27 at $75 per share. We should only buy if the price of AAPL is above $75 on 2015-02-27. If not, we will have lost our premium of $5360 and purchasing below will only increase our loss. Also, note that these quotes were retrieved on 2015-02-25. This specific option has only two days until it expires. That has a huge effect on the pricing: We have paid $5,360 for the option to buy 100 shares of AAPL on 2015-02-27 if the price of AAPL is above $75 on that date. The price of AAPL when the option was priced was $128.79 per share. If we were to buy 100 shares of AAPL now, we would have paid $12,879 now. If AAPL is above $75 on 2015-02-27, we can buy 100 shares for $7500. There is not a lot of time between the quote and Expiry of this option. With AAPL being at $128.79, it is very likely that the price will be above $75 in two days. Therefore, in two days: We can walk away if the price is $75 or above. Since we paid $5360, we probably wouldn't want to do that. At $75 or above, we can force execution of the option, where we give the seller $7,500 and receive 100 shares of AAPL. If the price of AAPL is still $128.79 per share, then we will have bought $12,879 of AAPL for $7,500+$5,360, or $12,860 in total. In technicality, we will have saved $19 over two days! But only if the price didn't drop. If for some reason, AAPL dropped below $75 in two days, we kept our loss to our premium of $5,360. This is not great, but if we had bought $12,879 of AAPL on 2015-02-5 and it dropped to $74.99 on 2015-02-27, we would have lost $12,879 – $7,499, or $5,380. So, we actually would have saved $20 in loss by buying the call option. It is interesting how this math works out. Excluding transaction fees, options are a zero-loss game. It just comes down to how much risk is involved in the option versus your upfront premium and how the market moves. If you feel you know something, it can be quite profitable. Of course, it can also be devastatingly unprofitable. We will not examine the put side of this example. It would suffice to say it works out similarly from the side of the seller. Implied volatility There is one more field in our dataset that we didn't look at—implied volatility (IV). We won't get into the details of the mathematics of how this is calculated, but this reflects the amount of volatility that the market has factored into the option. This is different than historical volatility (typically the standard deviation of the previous year of returns). In general, it is informative to examine the IV relative to the strike price on a particular Expiry date. The following command shows this in tabular form for calls on 2015-02-27: In [9]:    calls1 = aos[(aos.Expiry=='2015-02-27') & (aos.Type=='call')]    calls1[:5]   Out[9]:            Expiry Strike Type     IV   Bid   Ask Underlying_Price    158 2015-02-27     75 call 271.88 53.60 53.85           128.79    159 2015-02-27     75   put 193.75 0.00   0.01           128.79    190 2015-02-27     80 call 225.78 48.65 48.80           128.79    191 2015-02-27     80   put 171.88 0.00   0.01           128.79    226 2015-02-27     85 call 199.22 43.65 43.80           128.79 It appears that as the strike price approaches the underlying price, the implied volatility decreases. Plotting this shows it even more clearly: In [10]:    ax = aos[(aos.Expiry=='2015-02-27') & (aos.Type=='call')] \            .set_index('Strike')[['IV']].plot(figsize=(12,8))    ax.axvline(calls1.Underlying_Price.iloc[0], color='g'); The shape of this curve is important as it defines points where options are considered to be either in or out of the money. A call option is referred to as in the money when the options strike price is below the market price of the underlying instrument. A put option is in the money when the strike price is above the market price of the underlying instrument. Being in the money does not mean that you will profit; it simply means that the option is worth exercising. Where and when an option is in our out of the money can be visualized by examining the shape of its implied volatility curve. Because of this curved shape, it is generally referred to as a volatility smile as both ends tend to turn upwards on both ends, particularly, if the curve has a uniform shape around its lowest point. This is demonstrated in the following graph, which shows the nature of in/out of the money for both puts and calls: A skew on the smile demonstrates a relative demand that is greater toward the option being in or out of the money. When this occurs, the skew is often referred to as a smirk. Volatility smirks Smirks can either be reverse or forward. The following graph demonstrates a reverse skew, similar to what we have seen with our AAPL 2015-02-27 call: In a reverse-skew smirk, the volatility for options at lower strikes is higher than at higher strikes. This is the case with our AAPL options expiring on 2015-02-27. This means that the in-the-money calls and out-of-the-money puts are more expensive than out-of-the-money calls and in-the-money puts. A popular explanation for the manifestation of the reverse volatility skew is that investors are generally worried about market crashes and buy puts for protection. One piece of evidence supporting this argument is the fact that the reverse skew did not show up for equity options until after the crash of 1987. Another possible explanation is that in-the-money calls have become popular alternatives to outright stock purchases as they offer leverage and, hence, increased ROI. This leads to greater demand for in-the-money calls and, therefore, increased IV at the lower strikes. The other variant of the volatility smirk is the forward skew. In the forward-skew pattern, the IV for options at the lower strikes is lower than the IV at higher strikes. This suggests that out-of-the-money calls and in-the-money puts are in greater demand compared to in-the-money calls and out-of-the-money puts: The forward-skew pattern is common for options in the commodities market. When supply is tight, businesses would rather pay more to secure supply than to risk supply disruption. For example, if weather reports indicate a heightened possibility of an impending frost, fear of supply disruption will cause businesses to drive up demand for out-of-the-money calls for the affected crops. Calculating payoff on options The payoff of an option is a relatively straightforward calculation based upon the type of the option and is derived from the price of the underlying security on expiry relative to the strike price. The formula for the call option payoff is as follows: The formula for the put option payoff is as follows: We will model both of these functions and visualize their payouts. The call option payoff calculation An option gives the buyer of the option the right to buy (a call option) or sell (a put option) an underlying security at a point in the future and at a predetermined price. A call option is basically a bet on whether or not the price of the underlying instrument will exceed the strike price. Your bet is the price of the option (the premium). On the expiry date of a call, the value of the option is 0 if the strike price has not been exceeded. If it has been exceeded, its value is the market value of the underlying security. The general value of a call option can be calculated with the following function: In [11]:    def call_payoff(price_at_maturity, strike_price):        return max(0, price_at_maturity - strike_price) When the price of the underlying instrument is below the strike price, the value is 0 (out of the money). This can be seen here: In [12]:    call_payoff(25, 30)   Out[12]:    0 When it is above the strike price (in the money), it will be the difference of the price and the strike price: In [13]:    call_payoff(35, 30)   Out[13]:    5 The following function returns a DataFrame object that calculates the return for an option over a range of maturity prices. It uses np.vectorize() to efficiently apply the call_payoff() function to each item in the specific column of the DataFrame: In [14]:    def call_payoffs(min_maturity_price, max_maturity_price,                    strike_price, step=1):        maturities = np.arange(min_maturity_price,                              max_maturity_price + step, step)        payoffs = np.vectorize(call_payoff)(maturities, strike_price)        df = pd.DataFrame({'Strike': strike_price, 'Payoff': payoffs},                          index=maturities)        df.index.name = 'Maturity Price'    return df The following command demonstrates the use of this function to calculate payoff of an underlying security at finishing prices ranging from 10 to 25 and with a strike price of 15: In [15]:    call_payoffs(10, 25, 15)   Out[15]:                    Payoff Strike    Maturity Price                  10                   0     15    11                   0     15    12                   0     15    13                   0     15    14                   0     15    ...               ...     ...    21                   6     15    22                  7     15    23                   8     15    24                   9     15    25                 10     15      [16 rows x 2 columns] Using this result, we can visualize the payoffs using the following function: In [16]:    def plot_call_payoffs(min_maturity_price, max_maturity_price,                          strike_price, step=1):        payoffs = call_payoffs(min_maturity_price, max_maturity_price,                              strike_price, step)        plt.ylim(payoffs.Payoff.min() - 10, payoffs.Payoff.max() + 10)        plt.ylabel("Payoff")        plt.xlabel("Maturity Price")        plt.title('Payoff of call option, Strike={0}'                  .format(strike_price))        plt.xlim(min_maturity_price, max_maturity_price)        plt.plot(payoffs.index, payoffs.Payoff.values); The payoffs are visualized as follows: In [17]:    plot_call_payoffs(10, 25, 15) The put option payoff calculation The value of a put option can be calculated with the following function: In [18]:    def put_payoff(price_at_maturity, strike_price):        return max(0, strike_price - price_at_maturity) While the price of the underlying is below the strike price, the value is 0: In [19]:    put_payoff(25, 20)   Out[19]:    0 When the price is below the strike price, the value of the option is the difference between the strike price and the price: In [20]:    put_payoff(15, 20)   Out [20]:    5 This payoff for a series of prices can be calculated with the following function: In [21]:    def put_payoffs(min_maturity_price, max_maturity_price,                    strike_price, step=1):        maturities = np.arange(min_maturity_price,                              max_maturity_price + step, step)        payoffs = np.vectorize(put_payoff)(maturities, strike_price)       df = pd.DataFrame({'Payoff': payoffs, 'Strike': strike_price},                          index=maturities)        df.index.name = 'Maturity Price'        return df The following command demonstrates the values of the put payoffs for prices of 10 through 25 with a strike price of 25: In [22]:    put_payoffs(10, 25, 15)   Out [22]:                    Payoff Strike    Maturity Price                  10                   5     15    11                   4     15    12                   3     15    13                  2     15    14                   1     15    ...               ...     ...    21                   0     15    22                   0     15    23                   0     15    24                   0     15    25                   0      15      [16 rows x 2 columns] The following function will generate a graph of payoffs: In [23]:    def plot_put_payoffs(min_maturity_price,                        max_maturity_price,                        strike_price,                        step=1):        payoffs = put_payoffs(min_maturity_price,                              max_maturity_price,                              strike_price, step)        plt.ylim(payoffs.Payoff.min() - 10, payoffs.Payoff.max() + 10)        plt.ylabel("Payoff")      plt.xlabel("Maturity Price")        plt.title('Payoff of put option, Strike={0}'                  .format(strike_price))        plt.xlim(min_maturity_price, max_maturity_price)        plt.plot(payoffs.index, payoffs.Payoff.values); The following command demonstrates the payoffs for prices between 10 and 25 with a strike price of 15: In [24]:    plot_put_payoffs(10, 25, 15) Summary In this article, we examined several techniques for using pandas to calculate the prices of options, their payoffs, and profit and loss for the various combinations of calls and puts for both buyers and sellers. Resources for Article: Further resources on this subject: Why Big Data in the Financial Sector? [article] Building Financial Functions into Excel 2010 [article] Using indexes to manipulate pandas objects [article]
Read more
  • 0
  • 0
  • 8756

article-image-creating-nhibernate-session-access-database-within-aspnet
Packt
14 May 2010
7 min read
Save for later

Creating a NHibernate session to access database within ASP.NET

Packt
14 May 2010
7 min read
NHibernate is an open source object-relational mapper, or simply put, a way to rapidly retrieve data from your database into standard .NET objects. This article teaches you how to create NHibernate sessions, which use database sessions to retrieve and store data into the database. In this article by Aaron B. Cure, author of Nhibernate 2 Beginner's Guide we'll talk about: What is an NHibernate session? How does it differ from a regular database session? Retrieving and committing data Session strategies for ASP.NET (Read more interesting articles on Nhibernate 2 Beginner's Guide here.) What is an NHibernate session? Think of an NHibernate session as an abstract or virtual conduit to the database. Gone are the days when you have to create a Connection, open the Connection, pass the Connection to a Command object, create a DataReader from the Command object, and so on. With NHibernate, we ask the SessionFactory for a Session object, and that's it. NHibernate handles all of the "real" sessions to the database, connections, pooling, and so on. We reap all the benefits without having to know the underlying intricacies of all of the database backends we are trying to connect to. Time for action – getting ready Before we actually connect to the database, we need to do a little "housekeeping". Just a note, if you run into trouble (that is, your code doesn't work like the walkthrough), then don't panic. See the troubleshooting section at the end of this Time for action section. Before we get started, make sure that you have all of the Mapping and Common files and that your Mapping files are included as "Embedded Resources". Your project should look as shown in the following screenshot: The first thing we need to do is create a new project to use to create our sessions. Right-click on the Solution 'Ordering' and click on Add | New Project. For our tests, we will use a Console Application and name it Ordering.Console. Use the same location as your previous project. Next, we need to add a few references. Right-click on the References folder and click on Add Reference. In VB.NET, you need to right-click on the Ordering.Console project, and click on Add Reference. Select the Browse tab, and navigate to the folder that contains your NHibernate dlls. You should have six files in this folder. Select the NHibernate.dll, Castle.Core.dll, Castle.DynamicProxy2.dll, Iesi.Collections.dll, log4net.dll, and NHibernate.ByteCode.Castle.dll files, and click on OK to add them as references to the project. Right-click on the References folder (or the project folder in VB.NET), and click on Add Reference again. Select the Projects tab, select the Ordering.Data project, and click on OK to add the data tier as a reference to our console application. The last thing we need to do is create a configuration object. We will discuss configuration in a later chapter, so for now, it would suffice to say that this will give us everything we need to connect to the database. Your current Program.cs file in the Ordering.Console application should look as follows: using System;using System.Collections.Generic;using System.Text;namespace Ordering.Console{ class Program { static void Main(string[] args) { } }} Or, if you are using VB.NET, your Module1.vb file will look as follows: Module Module1 Sub Main() End SubEnd Module At the top of the file, we need to import a few references to make our project compile. Right above the namespace or Module declarations, add the using/Imports statements for NHibernate, NHibernate.Cfg, and Ordering.Data: using NHibernate;using NHibernate.Cfg;using Ordering.Data; In VB.NET you need to use the Imports keyword as follows: Imports NHibernateImports NHibernate.CfgImports Ordering.Data Inside the Main() block, we want to create the configuration object that will tell NHibernate how to connect to the database. Inside your Main() block, add the following code: Configuration cfg = new Configuration();cfg.Properties.Add(NHibernate.Cfg.Environment.ConnectionProvider, typeof(NHibernate.Connection.DriverConnectionProvider) .AssemblyQualifiedName); cfg.Properties.Add(NHibernate.Cfg.Environment.Dialect, typeof(NHibernate.Dialect.MsSql2008Dialect) .AssemblyQualifiedName); cfg.Properties.Add(NHibernate.Cfg.Environment.ConnectionDriver, typeof(NHibernate.Driver.SqlClientDriver) .AssemblyQualifiedName); cfg.Properties.Add(NHibernate.Cfg.Environment.ConnectionString, "Server= (local)SQLExpress;Database= Ordering;Trusted_Connection=true;"); cfg.Properties.Add(NHibernate.Cfg.Environment. ProxyFactoryFactoryClass, typeof (NHibernate.ByteCode.LinFu.ProxyFactoryFactory) .AssemblyQualifiedName); cfg.AddAssembly(typeof(Address).AssemblyQualifiedName); For a VB.NET project, add the following code: Dim cfg As New Configuration()cfg.Properties.Add(NHibernate.Cfg.Environment. _ ConnectionProvider, GetType(NHibernate.Connection. _ DriverConnectionProvider).AssemblyQualifiedName) cfg.Properties.Add(NHibernate.Cfg.Environment.Dialect, _ GetType(NHibernate.Dialect.MsSql2008Dialect). _ AssemblyQualifiedName) cfg.Properties.Add(NHibernate.Cfg.Environment.ConnectionDriver, _ GetType(NHibernate.Driver.SqlClientDriver). _ AssemblyQualifiedName) cfg.Properties.Add(NHibernate.Cfg.Environment.ConnectionString, _ "Server= (local)SQLExpress;Database=Ordering; _ Trusted_Connection=true;") cfg.Properties.Add(NHibernate.Cfg.Environment. _ ProxyFactoryFactoryClass, GetType _ (NHibernate.ByteCode.LinFu.ProxyFactoryFactory). _ AssemblyQualifiedName) cfg.AddAssembly(GetType(Address).AssemblyQualifiedName) Lastly, right-click on the Ordering.Console project, and select Set as Startup Project, as shown in the following screenshot: Press F5 or Debug | Start Debugging and test your project. If everything goes well, you should see a command prompt window pop up and then go away. Congratulations! You are done! However, it is more than likely you will get an error on the line that says cfg.AddAssembly(). This line instructs NHibernate to "take all of my HBM.xml files and compile them". This is where we will find out how well we handcoded our HBM.xml files. The most common error that will show up is MappingException was unhandled. If you get a mapping exception, then see the next step for troubleshooting tips. Troubleshooting: NHibernate will tell us where the errors are and why they are an issue. The first step to debug these issues is to click on the View Detail link under Actions on the error pop up. This will bring up the View Detail dialog, as shown in the following screenshot: If you look at the message, NHibernate says that it Could not compile the mapping document: Ordering.Data.Mapping.Address.hbm.xml. So now we know that the issue is in our Address.hbm.xml file, but this is not very helpful. If we look at the InnerException, it says "Problem trying to set property type by reflection". Still not a specific issue, but if we click on the + next to the InnerException, I can see that there is an InnerException on this exception. The second InnerException says "class Ordering.Data.Address, Ordering.Data, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null not found while looking for property: Id". Now we are getting closer. It has something to do with the ID property. But wait, there is another InnerException. This InnerException says "Could not find a getter for property 'Id' in class 'Ordering.Data.Address'". How could that be? Looking at my Address.cs class, I see: using System;using System.Collections.Generic;using System.Text;namespace Ordering.Data{ public class Address { }} Oops! Apparently I stubbed out the class, but forgot to add the actual properties. I need to put the rest of the properties into the file, which looks as follows: using System;using System.Collections.Generic;using System.Text; namespace Ordering.Data{ public class Address { #region Constructors public Address() { } public Address(string Address1, string Address2, string City, string State, string Zip) : this() { this.Address1 = Address1; this.Address2 = Address2; this.City = City; this.State = State; this.Zip = Zip; } #endregion #region Properties private int _id; public virtual int Id { get { return _id; } set { _id = value; } } private string _address1; public virtual string Address1 { get { return _address1; } set { _address1 = value; } } private string _address2; public virtual string Address2 { get { return _address2; } set { _address2 = value; } } private string _city; public virtual string City { get { return _city; } set { _city = value; } } private string _state; public virtual string State { get { return _state; } set { _state = value; } } private string _zip; public virtual string Zip { get { return _zip; } set { _zip = value; } } private Contact _contact; public virtual Contact Contact { get { return _contact; } set { _contact = value; } } #endregion }} By continuing to work my way through the errors that are presented in the configuration and starting the project in Debug mode, I can handle each exception until there are no more errors. What just happened? We have successfully created a project to test out our database connectivity, and an NHibernate Configuration object which will allow us to create sessions, session factories, and a whole litany of NHibernate goodness!
Read more
  • 0
  • 0
  • 8751
Modal Close icon
Modal Close icon