Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Events
Videos
Audiobooks
Packt Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials

7018 Articles
article-image-query-performance-tuning-microsoft-analysis-services-part-2
Packt
20 Oct 2009
21 min read
Save for later

Query Performance Tuning in Microsoft Analysis Services: Part 2

Packt
20 Oct 2009
21 min read
MDX calculation performance Optimizing the performance of the Storage Engine is relatively straightforward: you can diagnose performance problems easily and you only have two options—partitioning and aggregation—for solving them. Optimizing the performance of the Formula Engine is much more complicated because it requires knowledge of MDX, diagnosing performance problems is difficult because the internal workings of the Formula Engine are hard to follow, and solving the problem is reliant on knowing tips and tricks that may change from service pack to service pack. Diagnosing Formula Engine performance problems If you have a poorly-performing query, and if you can rule out the Storage Engine as the cause of the problem, then the issue is with the Formula Engine. We've already seen how we can use Profiler to check the performance of Query Subcube events, to see which partitions are being hit and to check whether aggregations are being used; if you subtract the sum of the durations of all the Query Subcube events from the duration of the query as a whole, you'll get the amount of time spent in the Formula Engine. You can use MDX Studio's Profile functionality to do the same thing much more easily—here's a screenshot of what it outputs when a calculation-heavy query is run: The following blog entry describes this functionality in detail: http://tinyurl.com/mdxtrace; but what this screenshot displays is essentially the same thing that we'd see if we ran a Profiler trace when running the same query on a cold and warm cache, but in a much more easy-to-read format. The column to look at here is the Ratio to Total, which shows the ratio of the duration of each event to the total duration of the query. We can see that on both a cold cache and a warm cache the query took almost ten seconds to run but none of the events recorded took anywhere near that amount of time: the highest ratio to parent is 0.09%. This is typical of what you'd see with a Formula Engine-bound query. Another hallmark of a query that spends most of its time in the Formula Engine is that it will only use one CPU, even on a multiple-CPU server. This is because the Formula Engine, unlike the Storage Engine, is single-threaded. As a result if you watch CPU usage in Task Manager while you run a query you can get a good idea of what's happening internally: high usage of multiple CPUs indicates work is taking place in the Storage Engine, while high usage of one CPU indicates work is taking place in the Formula Engine. Calculation performance tuning Having worked out that the Formula Engine is the cause of a query's poor performance then the next step is, obviously, to try to tune the query. In some cases you can achieve impressive performance gains (sometimes of several hundred percent) simply by rewriting a query and the calculations it depends on; the problem is knowing how to rewrite the MDX and working out which calculations contribute most to the overall query duration. Unfortunately Analysis Services doesn't give you much information to use to solve this problem and there are very few tools out there which can help either, so doing this is something of a black art. There are three main ways you can improve the performance of the Formula Engine: tune the structure of the cube it's running on, tune the algorithms you're using in your MDX, and tune the implementation of those algorithms so they use functions and expressions that Analysis Services can run efficiently. We've already talked in depth about how the overall cube structure is important for the performance of the Storage Engine and the same goes for the Formula Engine; the only thing to repeat here is the recommendation that if you can avoid doing a calculation in MDX by doing it at an earlier stage, for example in your ETL or in your relational source, and do so without compromising functionality, you should do so. We'll now go into more detail about tuning algorithms and implementations. Mosha Pasumansky's blog, http://tinyurl.com/moshablog, is a goldmine of information on this subject. If you're serious about learning MDX we recommend that you subscribe to it and read everything he's ever written. Tuning algorithms used in MDX Tuning an algorithm in MDX is much the same as tuning an algorithm in any other kind of programming language—it's more a matter of understanding your problem and working out the logic that provides the most efficient solution than anything else. That said, there are some general techniques that can be used often in MDX and which we will walk through here. Using named sets to avoid recalculating set expressions Many MDX calculations involve expensive set operations, a good example being rank calculations where the position of a tuple within an ordered set needs to be determined. The following query includes a calculated member that displays Dates on the Rows axis of a query, and on columns shows a calculated measure that returns the rank of that date within the set of all dates based on the value of the Internet Sales Amount measure: WITH MEMBER MEASURES.MYRANK AS Rank ( [Date].[Date].CurrentMember ,Order ( [Date].[Date].[Date].MEMBERS ,[Measures].[Internet Sales Amount] ,BDESC ) )SELECT MEASURES.MYRANK ON 0 ,[Date].[Date].[Date].MEMBERS ON 1 FROM [Adventure Works] It runs very slowly, and the problem is that every time the calculation is evaluated it has to evaluate the Order function to return the set of ordered dates. In this particular situation, though, you can probably see that the set returned will be the same every time the calculation is called, so it makes no sense to do the ordering more than once. Instead, we can create a named set hold the ordered set and refer to that named set from within the calculated measure, so: WITH SET ORDEREDDATES AS Order ( [Date].[Date].[Date].MEMBERS ,[Measures].[Internet Sales Amount] ,BDESC ) MEMBER MEASURES.MYRANK AS Rank ( [Date].[Date].CurrentMember ,ORDEREDDATES ) SELECT MEASURES.MYRANK ON 0 ,[Date].[Date].[Date].MEMBERS ON 1 FROM [Adventure Works] This version of the query is many times faster, simply as a result of improving the algorithm used; the problem is explored in more depth in this blog entry: http://tinyurl.com/mosharank Since normal named sets are only evaluated once they can be used to cache set expressions in some circumstances; however, the fact that they are static means they can be too inflexible to be useful most of the time. Note that normal named sets defined in the MDX Script are only evaluated once, when the MDX script executes and not in the context of any particular query, so it wouldn't be possible to change the example above so that the set and calculated measure were defined on the server. Even named sets defined in the WITH clause are evaluated only once, in the context of the WHERE clause, so it wouldn't be possible to crossjoin another hierarchy on columns and use this approach, because for it to work the set would have to be reordered once for each column. The introduction of dynamic named sets in Analysis Services 2008 improves the situation a little, and other more advanced techniques can be used to work around these issues, but in general named sets are less useful than you might hope. For further reading on this subject see the following blog posts: http://tinyurl.com/chrisrankhttp://tinyurl.com/moshadsetshttp://tinyurl.com/chrisdsets Using calculated members to cache numeric values In the same way that you can avoid unnecessary re-evaluations of set expressions by using named sets, you can also rely on the fact that the Formula Engine can (usually) cache the result of a calculated member to avoid recalculating expressions which return numeric values. What this means in practice is that anywhere in your code you see an MDX expression that returns a numeric value repeated across multiple calculations, you should consider abstracting it to its own calculated member; not only will this help performance, but it will improve the readability of your code. For example, take the following slow query which includes two calculated measures: WITH MEMBER [Measures].TEST1 AS [Measures].[Internet Sales Amount] / Count ( TopPercent ( { [Scenario].[Scenario].&[1] ,[Scenario].[Scenario].&[2] }* [Account].[Account].[Account].MEMBERS* [Date].[Date].[Date].MEMBERS ,10 ,[Measures].[Amount] ) )MEMBER [Measures].TEST2 AS [Measures].[Internet Tax Amount] / Count ( TopPercent ( { [Scenario].[Scenario].&[1] ,[Scenario].[Scenario].&[2] }* [Account].[Account].[Account].MEMBERS* [Date].[Date].[Date].MEMBERS* [Department].[Departments].[Department Level 02].MEMBERS ,10 ,[Measures].[Amount] ) )SELECT { [Measures].TEST1 ,[Measures].TEST2 } ON 0 ,[Customer].[Gender].[Gender].MEMBERS ON 1FROM [Adventure Works] A quick glance over the code shows that a large section of it occurs twice in both calculations—everything inside the Count function. If we remove that code to its own calculated member as follows: WITH MEMBER [Measures].Denominator AS Count ( TopPercent ( { [Scenario].[Scenario].&[1] ,[Scenario].[Scenario].&[2] }* [Account].[Account].[Account].MEMBERS* [Date].[Date].[Date].MEMBERS ,10 ,[Measures].[Amount] ) )MEMBER [Measures].TEST1 AS [Measures].[Internet Sales Amount] / [Measures].DenominatorMEMBER [Measures].TEST2 AS [Measures].[Internet Tax Amount] / [Measures].DenominatorSELECT { [Measures].TEST1 ,[Measures].TEST2 } ON 0 ,[Customer].[Gender].[Gender].MEMBERS ON 1FROM [Adventure Works] The query runs much faster, simply because instead of evaluating the count twice for each of the two visible calculated measures, we evaluate it once, cache the result in the calculated measure Denominator and then reference this in the other calculated measures. It's also possible to find situations where you can rewrite code to avoid evaluating a calculation that always returns the same result over different cells in the multidimensional space of the cube. This is much more difficult to do effectively though; the following blog entry describes how to do it in detail: http://tinyurl.com/fecache Tuning the implementation of MDX Like just about any other software product, Analysis Services is able to do some things more efficiently than others. It's possible to write the same query or calculation using the same algorithm but using different MDX functions and see a big difference in performance; as a result, we need to know which are the functions we should use and which ones we should avoid. Which ones are these though? Luckily MDX Studio includes functionality to analyse MDX code and flag up such problems—to do this you just need to click the Analyze button—and there's even an online version of MDX Studio that allows you to do this too, available at: http://mdx.mosha.com/. We recommend that you run any MDX code you write through this functionality and take its suggestions on board. Mosha walks through an example of using MDX Studio to optimise a calculation on his blog here: http://tinyurl.com/moshaprodvol Block computation versus cell-by-cellWhen the Formula Engine has to evaluate an MDX expression for a query it can basically do so in one of two ways. It can evaluate the expression for each cell returned by the query, one at a time, an evaluation mode known as "cell-by-cell"; or it can try to analyse the calculations required for the whole query and find situations where the same expression would need to be calculated for multiple cells and instead do it only once, an evaluation mode known variously as "block computation" or "bulk evaluation". Block computation is only possible in some situations, depending on how the code is written, but is often many times more efficient than cell-by-cell mode. As a result, we want to write MDX code in such a way that the Formula Engine can use block computation as much as possible, and when we talk about using efficient MDX functions or constructs then this is what we in fact mean. Given that different calculations in the same query, and different expressions within the same calculation, can be evaluated using block computation and cell-by-cell mode, it’s very difficult to know which mode is used when. Indeed in some cases Analysis Services can’t use block mode anyway, so it’s hard know whether we have written our MDX in the most efficient way possible. One of the few indicators we have is the Perfmon counter MDXTotal Cells Calculated, which basically returns the number of cells in a query that were calculated in cell-by-cell mode; if a change to your MDX increments this value by a smaller amount than before, and the query runs faster, you're doing something right. The list of rules that MDX Studio applies is too long to list here, and in any case it is liable to change in future service packs or versions; another good guide for Analysis Services 2008 best practices exists in the Books Online topic Performance Improvements for MDX in SQL Server 2008 Analysis Services, available online here: http://tinyurl.com/mdximp. However, there are a few general rules that are worth highlighting: Don't use the Non_Empty_Behavior calculation property in Analysis Services 2008, unless you really know how to set it and are sure that it will provide a performance benefit. It was widely misused with Analysis Services 2005 and most of the work that went into the Formula Engine for Analysis Services 2008 was to ensure that it wouldn't need to be set for most calculations. This is something that needs to be checked if you're migrating an Analysis Services 2005 cube to 2008. Never use late binding functions such as LookupCube, or StrToMember or StrToSet without the Constrained flag, inside calculations since they have a serious negative impact on performance. It's almost always possible to rewrite calculations so they don't need to be used; in fact, the only valid use for StrToMember or StrToSet in production code is when using MDX parameters. The LinkMember function suffers from a similar problem but is less easy to avoid using it. Use the NonEmpty function wherever possible; it can be much more efficient than using the Filter function or other methods. Never use NonEmptyCrossjoin either: it's deprecated, and everything you can do with it you can do more easily and reliably with NonEmpty. Lastly, don't assume that whatever worked best for Analysis Services 2000 or 2005 is still best practice for Analysis Services 2008. In general, you should always try to write the simplest MDX code possible initially, and then only change it when you find performance is unacceptable. Many of the tricks that existed to optimise common calculations for earlier versions now perform worse on Analysis Services 2008 than the straightforward approaches they were designed to replace. Caching We've already seen how Analysis Services can cache the values returned in the cells of a query, and how this can have a significant impact on the performance of a query. Both the Formula Engine and the Storage Engine can cache data, but may not be able to do so in all circumstances; similarly, although Analysis Services can share the contents of the cache between users there are several situations where it is unable to do so. Given that in most cubes there will be a lot of overlap in the data that users are querying, caching is a very important factor in the overall performance of the cube and as a result ensuring that as much caching as possible is taking place is a good idea. Formula cache scopes There are three different cache contexts within the Formula Engine, which relate to how long data can be stored within the cache and how that data can be shared between users: Query Context, which means that the results of calculations can only be cached for the lifetime of a single query and so cannot be reused by subsequent queries or by other users. Session Context, which means the results of calculations are cached for the lifetime of a session and can be reused by subsequent queries in the same session by the same user. Global Context, which means the results of calculations are cached until the cache has to be dropped because data in the cube has changed (usually when some form of processing takes place on the server). These cached values can be reused by subsequent queries run by other users as well as the user who ran the original query. Clearly the Global Context is the best from a performance point of view, followed by the Session Context and then the Query Context; Analysis Services will always try to use the Global Context wherever possible, but it is all too easy to accidentally write queries or calculations that force the use of the Session Context or the Query Context. Here's a list of the most important situations when that can happen: If you define any calculations (not including named sets) in the WITH clause of a query, even if you do not use them, then Analysis Services can only use the Query Context (see http://tinyurl.com/chrisfecache for more details). If you define session-scoped calculations but do not define calculations in the WITH clause, then the Session Context must be used. Using a subselect in a query will force the use of the Query Context (see http://tinyurl.com/chrissubfe). Use of the CREATE SUBCUBE statement will force the use of the Session Context. When a user connects to a cube using a role that uses cell security, then the Query Context will be used. When calculations are used that contain non-deterministic functions (functions which could return different results each time they are called), for example the Now() function that returns the system date and time, the Username() function or any Analysis Services stored procedure, then this forces the use of the Query Context. Other scenarios that restrict caching Apart from the restrictions imposed by cache context, there are other scenarios where caching is either turned off or restricted. When arbitrary-shaped sets are used in the WHERE clause of a query, no caching at all can take place in either the Storage Engine or the Formula Engine. An arbitrary-shaped set is a set of tuples that cannot be created by a crossjoin, for example: ({([Customer].[Country].&[Australia], [Product].[Category].&[1]),([Customer].[Country].&[Canada], [Product].[Category].&[3])}) If your users frequently run queries that use arbitrary-shaped sets then this can represent a very serious problem, and you should consider redesigning your cube to avoid it. The following blog entries discuss this problem in more detail: http://tinyurl.com/tkarbsethttp://tinyurl.com/chrisarbset Even within the Global Context, the presence of security can affect the extent to which cache can be shared between users. When dimension security is used the contents of the Formula Engine cache can only be shared between users who are members of roles which have the same permissions. Worse, the contents of the Formula Engine cache cannot be shared between users who are members of roles which use dynamic security at all, even if those users do in fact share the same permissions. Cache warming Since we can expect many of our queries to run instantaneously on a warm cache, and the majority at least to run faster on a warm cache than on a cold cache, it makes sense to preload the cache with data so that when users come to run their queries they will get warm-cache performance. There are two basic ways of doing this, running CREATE CACHE statements and automatically running batches of queries. Create Cache statement The CREATE CACHE statement allows you to load a specified subcube of data into the Storage Engine cache. Here's an example of what it looks like: CREATE CACHE FOR [Adventure Works] AS({[Measures].[Internet Sales Amount]}, [Customer].[Country].[Country].MEMBERS,[Date].[Calendar Year].[Calendar Year].MEMBERS) More detail on this statement can be found here: http://tinyurl.com/createcache CREATE CACHE statements can be added to the MDX Script of the cube so they execute every time the MDX Script is executed, although if the statements take a long time to execute (as they often do) then this might not be a good idea; they can also be run after processing has finished from an Integration Services package using an Execute SQL task or through ASCMD, and this is a much better option because it means you have much more control over when the statements actually execute—you wouldn't want them running every time you cleared the cache, for instance. Running batches of queries The main drawback of the CREATE CACHE statement is that it can only be used to populate the Storage Engine cache, and in many cases it's warming the Formula Engine cache that makes the biggest difference to query performance. The only way to do this is to find a way to automate the execution of large batches of MDX queries (potentially captured by running a Profiler trace while users go about their work) that return the results of calculations and so which will warm the Formula Engine cache. This automation can be done in a number of ways, for example by using the ASCMD command line utility which is part of the sample code for Analysis Services that Microsoft provides (available for download here: http://tinyurl.com/sqlprodsamples); another common option is to use an Integration Services package to run the queries, as described in the following blog entries— http://tinyurl.com/chriscachewarm and http://tinyurl.com/allancachewarm This approach is not without its own problems, though: it can be very difficult to make sure that the queries you're running return all the data you want to load into cache, and even when you have done that, user query patterns change over time so ongoing maintenance of the set of queries is important. Scale-up and scale-out Buying better or more hardware should be your last resort when trying to solve query performance problems: it's expensive and you need to be completely sure that it will indeed improve matters. Adding more memory will increase the space available for caching but nothing else; adding more or faster CPUs will lead to faster queries but you might be better off investing time in building more aggregations or tuning your MDX. Scaling up as much as your hardware budget allows is a good idea, but may have little impact on the performance of individual problem queries unless you badly under-specified your Analysis Services server in the first place. If your query performance degenerates as the number of concurrent users running queries increases, consider scaling-out by implementing what's known as an OLAP farm. This architecture is widely used in large implementations and involves multiple Analysis Services instances on different servers, and using network load balancing to distribute user queries between these servers. Each of these instances needs to have the same database on it and each of these databases must contain exactly the same data in it for queries to be answered consistently. This means that, as the number of concurrent users increases, you can easily add new servers to handle the increased query load. It also has the added advantage of removing a single point of failure, so if one Analysis Services server fails then the others take on its load automatically. Making sure that data is the same across all servers is a complex operation and you have a number of different options for doing this: you can either use the Analysis Services database synchronisation functionality, copy and paste the data from one location to another using a tool like Robocopy, or use the new Analysis Services 2008 shared scalable database functionality. The following white paper from the SQLCat team describes how the first two options can be used to implement a network load-balanced solution for Analysis Services 2005: http://tinyurl.com/ssasnlb. Shared scalable databases have a significant advantage over synchronisation and file-copying in that they don't need to involve any moving of files at all. They can be implemented using the same approach described in the white paper above, but instead of copying the databases between instances you process a database (attached in ReadWrite mode) on one server, detach it from there, and then attach it in ReadOnly mode to one or more user-facing servers for querying while the files themselves stay in one place. You do, however, have to ensure that your disk subsystem does not become a bottleneck as a result. Summary In this article we covered MDX calculation performance and caching, and also how to write MDX to ensure that the Formula Engine works as efficiently as possible. We've also seen how important caching is to overall query performance and what we need to do to ensure that we can cache data as often as possible, and we've discussed how to scale-out Analysis Services using network load balancing to handle large numbers of concurrent users.
Read more
  • 0
  • 0
  • 7991

article-image-form-customizations
Packt
08 Aug 2013
26 min read
Save for later

Form customizations

Packt
08 Aug 2013
26 min read
(For more resources related to this topic, see here.) Forms are probably the most important visual element of the Dynamics CRM 2011 interface. To find the underlying data in every entity record, the user has to open the form. Dynamics CRM 2011 supports two types of forms: The main form : Dynamics CRM 2011 uses this form to allow the user to enter and view data within the Dynamics CRM 2011 web user interface as well as the Dynamics CRM 2011 within Microsoft Outlook interface. One main form per entity exists by default. However, multiple main forms can be created for an entity. Dynamics CRM 2011 supports role-based forms, which means separate forms can be visible depending on the security roles of the current user. Usually, multiple main forms are created when role-based forms have to be supported. The mobile form : Dynamics CRM 2011 uses this form when a user is accessing CRM from a mobile device that is compatible with HTML 4.0 using a URL such as <CRM_server> /m, where <CRM_server> is the path of Microsoft Dynamics CRM 2011 Server. A separate form for mobile devices is useful considering the limited space usually available on a mobile screen. A mobile form does not store data on a mobile device. If users try to access Dynamics CRM 2011 from an unsupported browser, they will be redirected to the mobile form. The following table outlines the browsers supported by Microsoft Dynamics CRM 2011: Browser Version / other requirements Internet Explorer IE7 (only for the on-premises version) IE 8, IE9 IE10 (desktop mode only) Mozilla Firefox Latest publicly released version running on Windows 8, Windows 7, Windows Vista, or Windows XP Google Chrome Latest publicly released version running on Windows 8, Windows 7, Windows Vista, or Windows XP Apple Safari Latest publicly released version running on Mac OS X 10.7 (Lion) or 10.8 (Mountain Lion) Detailed information about supported browsers can be found at http://technet.microsoft.com/en-us/library/hh699710.aspx. Dynamics CRM 2011 also supports special variants of the main form, as follows: The read-optimized form : Dynamics CRM 2011 has another type of form called the read-optimized form. Introduced in Update Rollup 7, this form is designed for the fast display of a record by disabling the ribbon and form scripts. This form displays the record in the read-only mode. Read-optimized forms are disabled by default and can be enabled by going to System | Administration | System Settings | Customization | Form Mode . Update Rollup 12 has introduced the following changes in read-optimized forms: The navigation pane for read-optimized forms is now enabled and the navigation pane can be expanded or collapsed. Support for web resources has been added. A new setting in the web resource properties, called Show this Web Resources in Read Optimized form , has been added. This setting must be enabled for the web resources to display in the read-optimized form. If the web resource depends on form resources, which are not available in a read-optimized form, we should not display it. Read-optimized forms honor all field-level security and role-based form definitions. If an entity has more than one form enabled, the read-optimized form uses the form that the user last used. The process-driven form : The December 2012 Service Update (Polaris update) of Dynamics CRM 2011 has introduced an enhanced read-optimized form, commonly known as the process-driven form for the Account, Contact, Lead, Opportunity, and Case entities. This new type of form is very useful, especially for touch devices, as the new form is designed to contain everything in one form; there is no need to open multiple pop ups. However, this new form type cannot be used for any entity other than the entities listed above. For the Account, Contact, Lead, Opportunity, and Case entities, in addition to the information form, there will be a new form with the same name as that of the entity. The <entity name> form will always display using the updated presentation, regardless of the settings for read-optimized forms. However, if read-optimized forms are enabled for the organization, the information form will also display using the updated presentation. These new forms are not available in an on-premises deployment of Microsoft Dynamics CRM 2011. Form editor We need to use a form editor to customize a form within Dynamics CRM 2011. The form layout definition is actually stored as an XML file called Form Xml in the SystemForm entity. The customization.xml file exported with an unmanaged solution contains the definition of the entity forms. Creating and customizing an entity main form Almost all the business entities have a customizable main form. The Activity entity does not have any form and some entity forms such as the Case Resolution entity form are not customizable. When a custom entity is created, one main and one mobile form are added automatically. In this recipe, we will focus our discussion on how to customize a main form. Getting ready Dynamics CRM 2011 introduced a flexible layout for form design. The following diagram outlines the typical main form layout within the Dynamics CRM 2011 system: The major visible components of a standard main form are as follows: Ribbon : This is the top area of the form. We cannot customize this using the form editor. Entity icon : This displays the Icon for Entity Form icon of the entity. It is a 32 x 32 pixel image and can be updated for an entity.  Header and footer : The header and footer are two read-only areas of the form layout. These two sections remain static when a user scrolls through the form data displayed by the various tabs and sections. So any data that is required to be available to the user irrespective of any scrolling, can be included in these sections. Form selector : When an entity has multiple forms and the current user's security role has access to more than one form, the form selector is displayed. The user can use the form selector to choose a form from multiple forms available to them. Navigation : This section allows users to navigate to related records of the current record. We can add, modify, delete, or reorganize the link to the related entity records using the form editor. We can also include links to URLs or web resources by adding navigation links using the form editor. Form assistant : It helps when we set values for lookup fields. Dynamics CRM 2011 has introduced improved capabilities to filter data returned in the lookup dialog. Hence, the form assistant is no longer useful; the form assistant has been turned off for all except the following three entity forms: Case Product Service activity Tabs and sections : Tabs and sections allow grouping and laying out of controls in a form. A tab can contain multiple sections. Each form can have a maximum of 100 tabs. Tabs have a vertical collapse/expand feature. We will now take a look at the various form-body elements that can be added or associated with an entity form: Field : Each field represents an attribute of the entity. A field can be added to a form using the form editor and the form editor allows us to add the same field multiple times in a form. Each instance of a field in a form is known as a control . The appearance and behavior of a control is driven by the type and formatting options of the attribute as well as display and formatting properties set on the control, using the form editor. Tab and section : As previously discussed, tabs and sections are used for grouping the controls in the form. A tab can contain multiple sections within it. Each tab or section can be assigned a name. We can choose to display the name of the tab or section on the form or include a separator line at the top of the tab or section, underneath the name. A tab can have one column or two columns; when two columns are specified, the width of each column is a percentage of the width of the tab. A section, on the other hand, may have up to four columns and we can control the width available for control labels to be displayed in the section as well as how labels for controls in the section should be aligned. Spacer : The Spacer element provides extra space between fields and controls in the form. This is used to improve the control layout in a section. Sub-Grid : Sub-Grid allows us to display a list of records, charts, or both. The first four subgrids can be populated with data in a form when it loads. If more than four subgrids exist on a form, the remaining subgrids require some user or form script action to retrieve data. This is for performance optimization. IFRAME : This control provides the HTML iFrame element in the form. Using the control, we can host another web page within the Dynamics CRM 2011 entity form. The form editor provides the ability to set regular iFrame properties along with properties specific to Dynamics CRM 2011. Web Resource : This control displays a form-enabled web resource to be displayed on the page. A form-enabled web resource includes a web page (HTML), image (JPG, PNG, GIF, ICO), or Silverlight (XAP) resource. The web resource contents are hosted within Dynamics CRM 2011. Notes : If the entity uses notes and attachments, we can add the Notes control into the form. This control can only be added if the entity has Notes enabled in the entity definition. Navigation Link : This control is available only within the Navigation section of the form. This control allows us to add a link to an external URL or web resource. How to do it… In this recipe, we will first discuss how to create a new main form and then discuss the form-customization options. The customization steps can be carried out on any main form. The entity main form can be customized by carrying out the following tasks: Editing tabs Editing sections Editing fields Editing header and footer Adding subgrids Adding iFrames Adding web resources Editing the Navigation area Editing form properties Making the form non-customizable In this recipe, we will discuss all the previously stated tasks one after the other. Please follow these steps to customize the main form for an entity: Log in to the Dynamics CRM 2011 system as a system administrator or with a relevant security role. Navigate to Settings | Customizations | Solutions and change the view to Unmanaged Solutions , if not already selected. Then double-click on the unmanaged solution to open it. On the expanded Solution page, navigate to Components | Entities | <Entity> | Forms . The next step is to create a new main form; this can be done in two ways. We will discuss both of these here: Creating an entirely new main form : Go to New | Main Form in the actions toolbar. This will create a new form by copying the existing main form. When the new form pops up, click on the save button to save the form. Creating a new form from an existing form : Open the existing form by double-clicking on it. When the form launches, click on Save As in the top ribbon. When the Save As -- Webpage Dialog window pops up, provide data for the Name and Description fields of the new form. Finally, click on the OK button to save the new form as shown in the following screenshot: Any newly created main form will be assigned only to the system administrator and system customizer security roles by default. To customize a main form, open the form by double-clicking on it in the forms list. The next step is to discuss the editing of tabs in the form. Tabs are collapsible controls that can contain section controls. The following two points will demonstrate adding a new tab and editing tab properties: Adding a new tab in the form : Click on Body in the form ribbon and then click on the Insert tab in the form. In the Insert tab, under the Tab group, select One Column to create a one-column tab, or Two Columns to create a two-column tab: If we add a tab, Dynamics CRM 2011 will automatically add a section for each column. To remove any control in an entity form, use the Delete key on the keyboard. Alternatively, the Remove button in the ribbon can also be used. Editing tab properties : Select the tab control and then click on the Change Properties button in the form ribbon. The Tab Properties page will open with the following properties being modifiable: Tab property Description Under the Display tab Name The unique name of the tab. Label The display label for this tab. This text will appear on the form. Show the label of this tab on the Form This determines whether the label defined for this tab will be displayed on the form. Select this option to enable the display of the tab's label on the form. Expand this tab by default If selected, the tab control will be displayed in expanded mode by default. Visible by default If selected, the tab control will be visible by default in the form. Under the Formatting tab Select tab layout Choose between One Column and Two Columns  to define the layout of the tab. Column 1 width If the Two Columns option is selected in the tab layout, we can specify the width of column 1 as a percentage. Column 2 width If the Two Columns option is selected in the tab layout, we can specify the width of column 2 as a percentage. The Events properties   Scripts libraries can be linked to the tab. The scripts functions will be called on the TabStateChange event. Next we will see the editing of a section in a tab. A section contains fields in the form. The following two sections will demonstrate adding a section in a form and editing the section's properties: Adding a section in the form : Select the tab control where the new section is to be added and then click on the Insert tab in the form ribbon. Thereafter, click on One Column , Two Columns , Three Columns , or Four Columns under the Section group depending on whether a section with one, two, three, or four columns is to be added. Editing section properties : Select the section control and then click on the Change Properties button in the form ribbon. The Section Properties page will open and the following properties will be modifiable: Section property Description Under the Display tab Name The unique name of the tab. Label The display label for this tab. This text will appear on the form. Show the label of this section on the Form This determines whether the label defined for this section will be displayed on the form. Select this option to enable the display of the section's label on the form. Show a line at top of the section If selected, a divider line will be displayed underneath the name of the section. Width Specify the width of the label area of the fields in this field. The width must be set between 50 and 250 pixels. Visible by default If selected, the section control will be visible by default on the form. Lock the section of the Form If selected, the section would be locked in the form. Under the Formatting tab Layout Choose from among One Column, Two Columns, Three Columns, and Four Columns to define the layout of the section control. Field label alignment Select between the Left and Right alignments for the field labels in the section control. Next we will take a look at editing a field in the section: Adding a field in a section : Select the section where the field has to be added. Thereafter, find the field in the right-hand side Field Explorer pane. By default, the Field Explorer pane displays all unused fields in the form. If we want to add a field that is already used in the form, uncheck the Only show unused fields checkbox as shown in the following screenshot: After selecting the field in Field Explorer , move the field by pressing the left mouse button and drop the field in the intended column of the section. The red line on top of the column indicates that the column has been selected. Now drop the field on the selected column. Editing field properties : To edit the form-level properties of the field, select the field and then click on the Change Properties button in the form ribbon. Then the Field Properties pop up will open and the following properties can be modified: Field property Description Under the Display tab Label Here you can edit the display name of the field on the form. By default, the display name of the field will be displayed there, which can be edited to provide a new display name for the field on the form. Display Label on the form This determines whether the display name of the field is to be displayed in the form. Field is read-only This determines whether a field is to be read-only for the users in the form. Lock the field on the form This determines whether the field is to be locked on the form. Visible by default This determines the default visibility of the control in the form. Under the Formatting tab Layout This determines the width of this field on the form. The width of a field depends on the layout settings of the section it is in. The Details properties   This tab displays the details of the field definition. Click on the Edit button to modify those properties of the field definition that can be modified. The Event properties   Script libraries can be linked to the tab. The scripts' functions will be called on the OnChange event. If the field is of type Lookup (N:1 relationship with another entity), then there exists an additional set of properties in the Field Properties list. These properties can be set to save the user's time, find the appropriate parent record, or to restrict the user to select among a subset of records in the parent entity. The following form-level properties of the lookup field can be edited: Property name Description Turn off automatic resolutions in the field If this setting is disabled (not selected) and if a user enters a partial value for the lookup field and tabs away, Dynamics CRM 2011 will try to autopopulate the lookup field. Disable most recently used items for this field If this setting is disabled (not selected), Dynamics CRM 2011 will automatically provide a list of recently selected values for the user to choose from. This property is not supported for process-driven forms of Microsoft Dynamics CRM 2011 Online. Related Record Filtering This setting provides a way to limit the list of records that the user can choose from. The list under the Only show records where heading displays all the potential relationships that can be used to filter this lookup. Once a record is selected, the list under the Contains  heading will display all relationships that connect the related entity (selected in the first list) to the target entity. Select the Allow users to turn off filter checkbox to provide users with the option to turn off the filter defined here. This makes it possible for them to view a wider range of records. Additional properties This setting controls how much search flexibility the user will have in terms of changing among various views and searching the record with a search box. Select the Display Search Box in lookup dialog checkbox if you want a search box to be available in the lookup. In the Default View list, select the default view for which results will be displayed in the lookup. Finally, choose the views we want users to have access to in the lookup, using the View Selector list. Adding a new entity field and then adding it to the form : A new field can also be created and then added to the entity from the form. To create a new field, click on the New Field button at the bottom of the Field Explorer pane. This will launch the new field pop up.  Next we will delve into editing headers and footers. To edit the header or footer of the form, click on the Header or Footer button in the form ribbon and the section will be focused automatically. Then click on Change Properties in the ribbon. The Header Properties or Footer Properties page will pop up and we can edit the following settings: Header/footer property Description Under the Display tab Width Specify the width field label area here. The width must be set between 50 and 250 pixels. Lock the section of the Form This setting is selected by default and cannot be modified. This setting determines whether the section would be locked in the form or not. Under the Formatting tab Layout Here you can choose from among One Column,  Two Columns, Three Columns, and Four Columns to define the layout of the header/footer control. Field Label Alignment Select from the Left (default), Right, or Center alignment for the field labels in the header/footer control. Field Label Position Select between Side (default) and Top to specify whether the field label in this section will be on the left-hand side or above the field. Fields can be added to the header or footer controls in the same way they are added in any section control in the form. Next we will look at how to add subgrids. The Sub-Grid control displays related entity records in the form body, using the following steps: Select the section control where the subgrid is to be added in the form. Then click on the Sub-Grid button under the Insert tab in the form ribbon. This will bring up the List or Chart Properties page, where we can specify the following properties of a subgrid: Subgrid property Description Under the Display tab Name The unique name of the subgrid control. Label The display text of the subgrid. This text will be displayed on the form. Display label on the Form Select to confirm that the Label text will be displayed on the form. Data Source This specifies the primary data source of the subgrid. The Records list allows us to select between Only Related Records (to set only entities having a relationship to the current entity) and All Record Types (to set all available entities). We can choose the related entity from the Entity list. This list content will vary based on the earlier list's selection. The Default View list allows us to choose which view is to be displayed in the subgrid. Display Search Box Select this setting to display the search box in the subgrid. Display Index Select this setting to display the alphabetic index record selector in the subgrid. This property is not supported for process-driven forms of Microsoft Dynamics CRM 2011 Online. View Selector Select this setting to display the view selector in the subgrid. This property is not supported for process-driven forms of Microsoft Dynamics CRM 2011 Online. Chart Options Select whether to display a chart selector along with a default chart or show only a specified chart in place of the subgrid. This property is not supported for process-driven forms of Microsoft Dynamics CRM 2011 Online. Under the Formatting tab Layout Choose from among One Column, Two Columns, Three Columns, and Four Columns to define the layout of the subgrid control. Number of Rows Select the maximum number of rows to be displayed in the subgrid control. The number of rows has to be between 2 and 250. Automatically expand to use available space Select this setting to enable automatic expansion of the subgrid to use available space in the form. iFrames or Inline Frames are HTML documents embedded inside the Dynamics CRM entity form. The following steps will guide you through adding an iFrame in the form: Select the section control where the iFrame is to be added in the form. Then click on the IFRAME button under the Insert tab in the form ribbon. This will bring up the Add an IFRAME page, where we can specify the following properties of an iFrame: iFrame property Description Under the General tab Name The unique name of the iFrame control. URL The URL of the HTML document to be displayed in the iFrame control. Pass record object-type code and unique identifier as parameters Select this option to pass contextual information entity object-type code and the record's unique identifier to the iFrame. Read more about this in the How it works... section of this recipe. Label Here, specify the display text for the iFrame. Display label on the Form Select this setting to display the label on the form. Restrict cross-frame scripting, where supported This checkbox is selected by default. We can remove this restriction only if we are certain that the HTML document/site we are using as the target of the iFrame can be trusted. Visible by default Select this setting to make the iFrame visible by default on the form. Under the Formatting tab Layout Choose from among One Column, Two Columns, Three Columns, and Four Columns to define the layout of the iFrame control. Number of Rows Select the maximum number of rows the iFrame control occupies on the form. The number of rows has to be between 1 and 40. Automatically expand to use available space Select this setting to enable automatic expansion of the iFrame control to use the available space in the form. Scrolling Select the scrolling option for the iFrame content display. Display Border Specify whether a border for the iFrame control is to be displayed. Web resources represent files that can be used to extend the Microsoft Dynamics CRM 2011 web application, such as HTML files, Image files, JScript library, and Silverlight applications. The following steps can be used to add a web resource in the form: Select the section control where the web resource is to be added in the form. Then click on the Web Resource button under the Insert tab in the form ribbon. This will bring up the Add Web Resource page, where we can specify the following properties of a web resource: Web resource property Description Under the General tab Web Resource Lookup to find a form-enabled web resource. Name The unique name for the web resource. Label Specify the display text for the web resource here. Display label on the Form Select this setting to display the label on the form. Visibility by default Select this setting to make the web resource visible by default on the form. Show this web resource in Read-Optimized Form Select this setting if the web resource is to be displayed in the read-optimized form. Under the Formatting tab Layout Choose from among One Column, Two Columns, Three Columns, and Four Columns to define the layout of the web resource control. Number of Rows Select the maximum number of rows the web resource control occupies on the form. The number of rows has to be between 1 and 40. Automatically expand to use available space Select this setting to enable automatic expansion of the web resource control to use the available space in the form. Scrolling Select the scrolling option for the web resource content display. Display Border Specify here whether a border for the web resource control is to be displayed. The Dependencies properties   Select the fields from the Available fields list that are required by the web resource, and then click on the (add selected records) button to move the selected fields to the Dependent fields list. The navigation area displays entities that are related to the current entity. Each relationship has a Label property and in this navigation section this Label property is displayed by default. However, the display name for the related entity can be changed. This display name does not update the Label property of the relationship. In order to edit the navigation area, perform the following steps: Select the Navigation button in the form ribbon. The navigation section will be enabled. Then click on any relationship label and select Change Properties to edit the display text. This will bring up the Relationship Properties page. Modify the Label field here. Next we will edit the form properties; in order to do this, click on the Form Properties button in the form ribbon and the Form Properties page will pop up. The following properties can be edited there: Form property Description The Event properties   Add or remove the JScript libraries that will be available for the form or field events. Under the Display tab Form Name The display name for the form. Modify this to rename the form. Description Specify a description for this form here. Show navigation items Select this setting to display the page navigation in the form. The Parameters properties   Add query string parameters to be passed to the form. Click on the green plus sign to add a query string. We have to provide a Name value and select a Type value of the query string parameter. The Non- Event Dependencies properties   Select the fields from the Available fields list that are required by any external, non-event scripts, and then click on the (add selected records) button to move the selected fields to the Dependent fields list. These fields will not be removable from the form. Lastly, making a form non-customizable restricts any future customization of the form. Therefore, to make a form non-customizable, perform the following steps: Select the Managed Properties button in the form ribbon. The Managed Properties of System Form: Form web page dialog will pop up. In this page, mark Customizable as False . After making any changes to an entity form, the form has to be saved and published. Use the Publish button in the form ribbon to publish the changes. How it works… Web resources and iFrames are not displayed using the Microsoft Dynamics CRM 2011 for Outlook reading pane, but iFrames are displayed in read-optimized forms. When the Pass record object-type code and unique identifier as parameters setting is enabled, iFrames allow the form to pass the following contextual parameters to itself: Parameter name Description typename The name of the entity. type This takes in the entity type code, which is an integer value to uniquely identify an entity in a specific organization. Id A GUID that represents a record. orgname The organization's name. userlcid The user's language code. orglcid The organization's language code. The list of entity type codes can be found at http://msdn.microsoft.com/en-us/library/gg328086.aspx. The key points about entity type codes are as follows: Type codes below 10,000 are reserved for out-of-the-box entities. Custom entities will have a type code greater than or equal to 10,000. Custom entities' type codes might change during solution import. Hence the type codes of a custom entity might be different in the development and test environments. The entity codes are stored in the Dynamics CRM database and can be retrieved from the EntityView table of the <OrganizationName>_MSCRM database.
Read more
  • 0
  • 0
  • 7989

Packt
20 Aug 2010
11 min read
Save for later

Adding Features to your Joomla! Form using ChronoForms

Packt
20 Aug 2010
11 min read
(For more resources on ChronoForms, see here.) Introduction We have so far mostly worked with fairly standard forms where the user is shown some inputs, enters some data, and the results are e-mailed and/or saved to a database table. Many forms are just like this, and some have other features added. These features can be of many different kinds and the recipes in this article are correspondingly a mixture. Some, like Adding a validated checkbox, change the way the form works. Others, like Signing up to a newsletter service change what happens after the form is submitted. While you can use these recipes as they are presented, they are just as useful as suggestions for ways to use ChronoForms to solve a wide range of user interactions on your site. Adding a validated checkbox Checkboxes are less often used on forms than most of the other elements and they have some slightly unusual behavior that we need to manage. ChronoForms will do a little to help us, but not everything that we need. In this recipe, we'll look at one of the most common applications—a stand alone checkbox that the user is asked to click to ensure that they've accepted some terms and conditions. We want to make sure that the form is not submitted unless the box is checked. Getting ready We'll just add one more element to our basic newsletter form. It's probably going to be best to recreate a new version of the form using the Form Wizard to make sure that we have a clean starting point. How to do it... In the Form Wizard, create a new form with two TextBox elements. In the Properties box, add the Labels "Name" and "Email" and the Field Names "name" and "email" respectively. Now drag in a CheckBox element. You'll see that ChronoForms inserts the element with three checkboxes and we only need one. In the Properties box remove the default values and type in "I agree". While you are there change the label to "Terms and Conditions". Lastly, we want to make sure that this box is checked so check the Validation | One Required checkbox and add "please confirm your agreement" in the Validation Message box. Apply the changes to the Properties. To complete the form add the Button element, then save your form, publish it, and view it in your browser. To test, click the Submit button without entering anything. You should find that the form does not submit and an error message is displayed. How it works... The only special thing to notice about this is that the validation we used was validate-one- required and not the more familiar required. Checkbox arrays, radio button groups, and select drop-downs will not work with the required option as they always have a value set, at least from the perspective of the JavaScript that is running the validation. There's more... Validating the checkbox server-side If the checkbox is really important to us, then we may want to confirm that it has been checked using the server-side validation box. We want to check and, if our box isn't checked, then create the error message. However, there is a little problem—an unchecked checkbox doesn't return anything at all, there is just no entry in the form results array. Joomla! has some functionality that will help us out though; the JRequest::getVar() function that we use to get the form results allows us to set a default value. If nothing is found in the form results, then the default value will be used instead. So we can add this code block to the server-side validation box: <?php $agree = JRequest::getString('check0[]', 'empty', 'post'); if ( $agree == 'empty' ) { return 'Please check the box to confirm your agreement'; } ?> Note: To test this, we need to remove the validate-one-required class from the input in the Form HTML. Now when we submit the empty form, we see the ChronoForms error message. Notice that the input name in the code snippet is check0[]. ChronoForms doesn't give you the option of setting the name of a checkbox element in the Form Wizard | Properties box. It assigns a check0, check1, and so on value for you. (You can edit this in the Form Editor if you like.) And because checkboxes often come in arrays of several linked boxes with the same name, ChronoForms also adds the [] to create an array name. If this isn't done then only the value of the last checked box will be returned. Locking the Submit button until the box is checked If we want to make the point about terms and conditions even more strongly then we can add some JavaScript to the form to disable the Submit button until the box is checked. We need to make one change to the Form HTML to make this task a little easier. ChronoForms does not add ID attributes to the Submit button input; so open the form in the Form Editor, find the line near the end of the Form HTML and alter it to read: <input value="Submit" name="submit" id='submit' type="submit" /> Now add the following snippet into the Form JavaScript box: // stop the code executing // until the page is loaded in the browser window.addEvent('load', function() { // function to enable and disable the submit button function agree() { if ( $('check00').checked == true ) { $('submit').disabled = false; } else { $('submit').disabled = true; } }; // disable the submit button on load $('submit').disabled = true; //execute the function when the checkbox is clicked $('check00').addEvent('click', agree); }); Apply or save the form and view it in your browser. Now as you tick or untick the checkbox, the submit button will be enabled and disabled. This is a simple example of adding a custom script to a form to add a useful feature. If you are reasonably competent in JavaScript, you will find that there is quite a lot more that you can do. There are different styles of laying out both JavaScript and PHP and sometimes fierce debates about where line breaks and spaces should go. We've adopted a style here that is hopefully fairly clear, reasonably compact, and more or less the same for both JavaScript and PHP. If it's not the style you are accustomed to, then we're sorry. Adding an "other" box to a drop-down Drop-downs are a valuable way of offering a list of choices to your user to select from. And sometimes it just isn't possible to make the list complete, there's always another option that someone will want to add. So we add an "other" option to the drop-down. But that tells us nothing, so we need to add an input to tell us what "other" means here. Getting ready We'll just add one more element to our basic newsletter form. We haven't used a drop-down before but it is very similar to the check-box element from the previous recipe. How to do it... Use the Form Wizard to create a form with two TextBox elements, a DropDown element, and a Button element. The changes to make in the element are: Add "I heard from" in the Label Change the Field Name to "hearabout" Add some options to the Options box—"Google", "Newspaper", "Friend", and "Other" Leave the Add Choose Option box checked and leave Choose Option in the Choose Option Text box. Apply the Properties box. Make any other changes you need to the form elements; then save the form, publish it, and view it in your browser. Notice that as well as the four options we added the Choose Option entry is at the top of the list. That comes from the checkbox and text field that we left with their default values. It's important to have a "null" option like this in a drop-down for two reasons. First, so that it is obvious to a user that no choice has been made. Otherwise it's very easy for them to leave the first option showing and this value—Google in this case—will be returned by default. Second, so that we can validate select-one-required if necessary. The "null" option has no value set and so can be detected by validation script. Now we just need one more text box to collect details if Other is selected. Open the form in the Wizard Edit; add one more TextBox element after the DropDown element. Give it the Label please add details and the name "other". Even though we set the name to "other", ChronoForms will have left the input ID attribute as text_4 or something similar. Open the Form in the Form Editor and change the ID to "other" as well. The same is true of the drop-down. The ID there is select_2, change that to hearabout. Now we need a script snippet to enable and disable the "other" text box if the Other option is selected in the drop-down. Here's the code to put in the Form JavaScript box: window.addEvent('domready', function() { $('hearabout').addEvent('change', function() { if ($('hearabout').value == 'Other' ) { $('other').disabled = false; } else { $('other').disabled = true; } }); $('other').disabled = true; }); This is very similar to the code in the last recipe except that it's been condensed a little more by merging the function directly into the addEvent(). When you view the form you will see that the text box for please add details is grayed out and blocked until you select Other in the drop-down. Make sure that you don't make the please add details input required. It's an easy mistake to make but it stops the form working correctly as you have to select Other in the drop-down to be able to submit it. How it works Once again, this is a little JavaScript that is checking for changes in one part of the form in order to alter the display of another part of the form. There's more... Hiding the whole input It looks a little untidy to have the disabled box showing on the form when it is not required. Let's change the script a little to hide and unhide the input instead of disabling and enabling it. To make this work we need a way of recognizing the input together with its label. We could deal with both separately, but let's make our lives simpler. In the Form Editor, open the Form HTML box and look near the end for the other input block: <div class="form_item"> <div class="form_element cf_textbox"> <label class="cf_label" style="width: 150px;">please add details</label> <input class="cf_inputbox" maxlength="150" size="30" title="" id="other" name="other" type="text" /> </div> <div class="cfclear">&nbsp;</div> </div> That <div class="form_element cf_textbox"> looks like it is just what we need so let's add an ID attribute to make it visible to the JavaScript:   <div class="form_element cf_textbox" id="other_input"> Now we'll modify our script snippet to use this: window.addEvent('domready', function() { $('hearabout').addEvent('change', function() { if ($('hearabout').value == 'Other' ) { $('other_input').setStyle('display', 'block'); } else { $('other_input').setStyle('display', 'none'); } }); // initialise the display if ($('hearabout').value == 'Other' ) { $('other_input').setStyle('display', 'block'); } else { $('other_input').setStyle('display', 'none'); } }); Apply or save the form and view it in your browser. Now the input is invisible see the following screenshot labeled 1 until you select Other from the drop-down see the following screenshot labeled 2. The disadvantage of this approach is that the form can appear to "jump around" as extra fields appear. You can overcome this with a little thought, for example by leaving an empty space. See also In some of the script here we are using shortcuts from the MooTools JavaScript framework. Version 1.1 of MooTools is installed with Joomla! 1.5 and is usually loaded by ChronoForms. You can find the documentation for MooTools v1.1 at http://docs111.mootools.net/ Version 1.1 is not the latest version of MooTools and many of the more recent MooTools script will not run with the earlier version. Joomla 1.6 is expected to use the latest release.
Read more
  • 0
  • 0
  • 7986

article-image-introduction-vsphere-distributed-switches
Packt
11 Feb 2014
10 min read
Save for later

Introduction to vSphere Distributed switches

Packt
11 Feb 2014
10 min read
(For more resources related to this topic, see here.) Using vSphere Distributed Switches The vSphere Distributed Switches are virtual switches that span across multiple hosts. This makes it easier to configure hosts that need similar network configurations. It also ensures that virtual machines will get the same network configuration when they migrate to another host. You need a vSphere Enterprise Plus license and a vCenter Server to be able to use vSphere Distributed Switches. In vSphere PowerCLI, there are separate sets of cmdlets for working with vSphere Standard Switches and vSphere Distributed Switches. In the past, the VMware PowerCLI team tried to integrate both types of switches into one set of cmdlets. This is why, for example, the Get-VirtualSwitch cmdlet has a –Distributed parameter. However, this parameter is now obsolete, and VMware made a new set of cmdlets specific for vSphere Distributed Switches. The vSphere PowerCLI VDS snap-in that provides support for managing Distributed Switches and port groups was introduced in PowerCLI 5.1 Release 2. In this section, you will learn more about these new cmdlets. In the following diagram, you see a vSphere Distributed Switch that spans two hosts: Creating vSphere Distributed Switches The New-VDSwitch cmdlet creates new vSphere Distributed Switches. This cmdlet has the following syntax: New-VDSwitch [-ContactDetails <String>] [-ContactName <String>] [-LinkDiscoveryProtocol <LinkDiscoveryProtocol>] [-LinkDiscoveryProtocolOperation <LinkDiscoveryOperation>] [-MaxPorts <Int32>] [-Mtu <Int32>] [-Notes <String>] [-NumUplinkPorts <Int32>] [-Version <String>] -Name <String> -Location <VIContainer> [-RunAsync] [-Server <VIServer[]>] [-WhatIf] [-Confirm] [<CommonParameters>] New-VDSwitch -ReferenceVDSwitch <VDSwitch> -Name <String> -Location <VIContainer> [-WithoutPortGroups] [-RunAsync] [-Server <VIServer[]>] [-WhatIf] [-Confirm] [<CommonParameters>] New-VDSwitch -BackupPath <String> [-KeepIdentifiers] [-Name <String>] -Location <VIContainer> [-WithoutPortGroups] [-RunAsync] [-Server <VIServer[]>] [-WhatIf] [-Confirm] [<CommonParameters>] If you compare the preceding syntax with the syntax of the New-VirtualSwitch cmdlet, then you will see that the New-VDSwitch cmdlet has many more possibilities to configure the virtual switch during creation. In vSphere 5.1 and later versions, you can even specify a reference vSphere Distributed Switch or a backup profile from where a new switch can be created. Creating a new vSphere Distributed Switch from scratch. To create a new vSphere Distributed Switch from scratch, you have to specify at least the -Name and the –Location parameters. The location can be a datacenter or a folder. In the next example. we will create a new vSphere Distributed Switch named VDSwitch1 in datacenter New York: PowerCLI C:> $Datacenter = Get-Datacenter -Name "New York" PowerCLI C:> New-VDSwitch -Name "VDSwitch1" -Location $Datacenter Name NumPorts Mtu Version Vendor ------------------------------------------------------------------------ VDSwitch2 0 1500 5.5.0 VMware Cloning a vSphere Distributed Switch We can now clone the VDSwitch1 to a new switch VDSwitch2. Of course, this is not very useful because VDSwitch1 is not configured yet. We will clone the switch just to show you how to do it: PowerCLI C:> New-VDSwitch -Name "VDSwitch2" `>> -ReferenceVDSwitch "VDSwitch1" -Location $Datacenter>>Name NumPorts Mtu Version Vendor---- -------- --- ------- ------VDSwitch2 0 1500 5.5.0 VMware If you want to clone a vSpheredistributed switch without cloning the port groups, then you can use the –WithoutPortGroups parameter. Cloning a vSphere Distributed Switch and the New-VDSwitch-ReferenceVDSwitch parameter are only supported in VMware vSphere 5.1 and later versions. Creating a vSphere Distributed Switch from an export Later in this article, we will see how to create an export of the configuration of a vSphere Distributed Switch. If you have an export, then you can use it to recreate a switch or to create a new one. The next example will create a new vSphere Distributed Switch named VDSwitch3 in the datacenter New York from an export of the configuration of VDSwitch1: PowerCLI C:> New-VDSwitch -BackupPath C:VDSwitch1Config.zip ` >> -Name VDSwitch3 -Location (Get-Datacenter –Name "New York") Importing vSphere distributed switch - Switch name: 'VDSwitch1', Switch version: 5.5.0, Number of port groups: 1, Number of resource pools: 8, Number of uplinks: 1, Notes: VDSwitch1 Configuration Name NumPorts Mtu Version Vendor ------------------------------------------------------------------------ VDSwitch3 0 1500 5.5.0 VMware Also, for importing an export you can use the –WithoutPortGroups parameter to prevent importing the port groups. Retrieving vSphere Distributed Switches You can retrieve vSphere Distributed Switches with the Get-VDSwitch cmdlet. The syntax of this cmdlet is as follows: Get-VDSwitch [[-Name] <String[]>] [-Location <FolderContainer[]>] [-VMHost <VMHost[]>] [-VM <VirtualMachine[]>] [-Server <VIServer[]>] [<CommonParameters>] Get-VDSwitch -Id <String[]> [-Server <VIServer[]>] [<CommonParameters>] Get-VDSwitch -RelatedObject <VDSwitchRelatedObjectBase[]> [<CommonParameters>] If you use the Get-VDSwitch cmdlet without parameters, then it will return all of the vSphere Distributed Switches in your environment. You can use the -Location parameter to specify a datacenter or folder in which you want to search for vSphere Distributed Switches. The next example will retrieve all of the vSphere Distributed Switches in the datacenter New York: PowerCLI C:> Get-Datacenter -Name "New York" | Get-VDSwitch Name NumPorts Mtu Version Vendor -------------------------------------------------------------- VDSwitch1 0 1500 5.5.0 VMware VDSwitch2 0 1500 5.5.0 VMware VDSwitch3 0 1500 5.5.0 VMware You can specify a value for the –Name parameter to retrieve a specific vSphere Distributed Switch. In the following example, we will retrieve the vSphere Distributed Switch VDSwitch1: PowerCLI C:> Get-VDSwitch -Name VDSwitch1 Name NumPorts Mtu Version Vendor ------------------------------------------------------------------------ VDSwitch1 0 1500 5.5.0 VMware Configuring vSphere Distributed Switches The selection you make using Get-VDSwitch can be used as an input of the SetVDSwitch cmdlet to modify the configuration or version of one or more vSphere Distributed Switches, to roll back the configuration to its previous state, or to import the configuration from a backup profile. Rolling back the configuration of a vSphere Distributed Switch and importing the configuration of a vSphere Distributed Switch from a backup are available only on vSphere 5.1 and later versions. The Set-VDSwitchcmdlet has the following syntax: Set-VDSwitch [-Name <String>] [-ContactDetails <String>] [-ContactName <String>] [-LinkDiscoveryProtocol <LinkDiscoveryProtocol>] [-LinkDiscoveryProtocolOperation <LinkDiscoveryOperation>] [-MaxPorts <Int32>] [-Mtu <Int32>] [-Notes <String>] [-NumUplinkPorts <Int32>] [-Version <String>] [-VDSwitch] <VDSwitch[]> [-RunAsync] [-Server <VIServer[]>] [-WhatIf] [-Confirm] [<CommonParameters>] Set-VDSwitch -BackupPath <String> [-WithoutPortGroups] [-VDSwitch] <VDSwitch[]> [-RunAsync] [-Server <VIServer[]>] [-WhatIf] [-Confirm] [<CommonParameters>] Set-VDSwitch -RollBackConfiguration [-VDSwitch] <VDSwitch[]> [-RunAsync] [-Server <VIServer[]>] [-WhatIf] [-Confirm] [<CommonParameters>] In the next example, we will set almost every property for VDSwitch1 that you can set using PowerShell. We will use a PowerShell technique called splatting to create a parameter list for the Set-VDSwitch cmdlet. The parameter list will be a hash table called $Parameters. By using the splat operator @,you can add the parameters in the $Parameters hash table to the Set-VDSwitchcmdlet call. When you use splatting, you have to use the name of the variable containing the parameter's hash table without the dollar sign, that is, @Parameters and not @$Parameters. PowerCLI C:> $VDSwitch = Get-VDSwitch -Name vDSWitch1 PowerCLI C:> $Parameters = @{ >> NumUplinkPorts = 2 >> MaxPorts = 1024 >> LinkDiscoveryProtocol = 'LLDP' >> LinkDiscoveryProtocolOperation = 'Both' >> ContactName = 'vsphereadmin@blackmilktea.com' >> ContactDetails = 'New York office' >> Notes = 'VDSwitch for New York datacenter' >> } >> $VDSwitch | Set-VDSwitch @Parameters >> Get-VDSwitch -Name vDSWitch1 | Format-List -Property * >> Name NumPorts Mtu Version Vendor ------------------------------------------------------------- VDSwitch1 0 1500 5.5.0 VMware LinkDiscoveryProtocol : LLDP LinkDiscoveryProtocolOperation : Both VlanConfiguration : Name : VDSwitch1 ExtensionData : VMware.Vim. VmwareDistributedVirtualSwitch NumPorts : 0 Key : 6c 48 12 50 ca 61 ec be-4d a8 bb 95 d0 0f bd a7 Mtu : 1500 Notes : VDSwitch for New York datacenter Datacenter : New York NumUplinkPorts : 4 ContactName : vsphereadmin@blackmilktea.com ContactDetails : New York office Version : 5.5.0 Vendor : VMware Folder : network MaxPorts : 1024 Id : VmwareDistributedVirtualSwitch-dvs-182 Uid : /VIServer=root@192.168.0.132:443/Distrib utedSwitch=VmwareDistributedVirtualSwitch-dvs-49/ Client : VMware.VimAutomation.Vds.Impl. V1.VDClientImpl The following screenshot of the vSphere Web Client shows the settings of the Distributed Switch VDSwitch1 after executing the previous commands: Rolling back the configuration of a vSphere Distributed Switch If you made a mistake in the configuration of a vSphere Distributed Switch, then you can roll back the configuration to its previous state using the Set-VDSwitch –RollbackConfiguration parameter In the following example, we will modify the MTU size of the switch VDSwitch1to 9,000, and then use the –RollbackConfigurationparameter to revert to the previous value of 1,500: PowerCLI C:> Get-VDSwitch -Name VDSwitch1 | >> Set-VDSwitch -MTU 9000 >> Name NumPorts Mtu Version Vendor -------------------------------------------------------------- VDSwitch1 0 9000 5.5.0 VMware PowerCLI C:> Get-VDSwitch -Name VDSwitch1 | >> Set-VDSwitch –RollbackConfiguration >> Name NumPorts Mtu Version Vendor ------------------------------------------------------------ VDSwitch1 0 1500 5.5.0 Importing the configuration of a vSphere Distributed Switch from a backup You can import the configuration of a vSphere Distributed Switch from a backup created with the Export-VDSwitch cmdlet. Later in this article, you will learn how to create a backup. In this section, we will show you how to restore the configuration from a backup. You have to use the Set-VDSwitch -BackupPath parameter to specify the path to the configuration .zip file. You can also use the –WithoutPortGroups parameter if you don't want to restore the port groups. In the next example, we will import the configuration of VDSwitch1 from a backup file called 'c:myVDSwitch1Config.zip': PowerCLI C:> Get-VDSwitch -Name 'VDSwitch1' | >> Set-VDSwitch -BackupPath 'c:myVDSwitch1Config.zip' >> Restoring vSphere distributed switch configuration - Switch name: 'VDSwitch1', Switch version: 5.5.0, Number of port groups: 0, Number of resource pools: 8, Number of uplinks: 1, Notes: My VDSwitch1 configuration Name NumPorts Mtu Version Vendor ------------------------------------------------------------- VDSwitch1 0 1500 5.5.0 Upgrading a vSphere Distributed Switch In every new VMware vSphere version, VMware introduces new features to the vSphere Distributed Switch. To be able to use these new features, you have to upgrade your switches to the new version. In the next example, we will upgrade the version of the VDSwitch4 switch to 5.5.0: PS C:> $VDSwitch = Get-VDSwitch -Name VDSwitch4 PS C:> $VDSwitch Name NumPorts Mtu Version Vendor ---------------------------------------------------------------- VDSwitch4 0 1500 4.1.0 VMware PS C:> $VDSwitch | Set-VDSwitch -Version 5.5.0 Name NumPorts Mtu Version Vendor --------------------------------------------------------------- VDSwitch4 0 1500 5.5.0 VMware Adding hosts to vSphere Distributed Switches After creating a vSphere Distributed Switch, you have to add hosts to the switch. Normally, you add at least all of the hosts that are in the same cluster. Because a vSphere Distributed Switch is created in a datacenter or a folder, a vSphere Distributed Switch is not limited to one cluster. You can add hosts of multiple clusters to the same switch. The Add-VDSwitchVMHost cmdlet will add hosts to a vSphere Distributed Switch. The syntax of this cmdlet is as follows: Add-VDSwitchVMHost -VDSwitch <VDSwitch> -VMHost <VMHost[]> [-Server <VIServer[]>] [-RunAsync] [-WhatIf] [-Confirm] [<CommonParameters>] In the next example, we will add all of the hosts of the cluster "Cluster01" to the vSphere Distributed Switch "VDSwitch1": PowerCLI C:> $VMHost = Get-Cluster Cluster01 | Get-VMHost PowerCLI C:> Add-VDSwitchVMHost -VDSwitch VDSwitch1 -VMHost $VMHost
Read more
  • 0
  • 0
  • 7980

article-image-linux-shell-script-tips-and-tricks
Packt
15 Apr 2011
7 min read
Save for later

Linux Shell Script: Tips and Tricks

Packt
15 Apr 2011
7 min read
  Linux Shell Scripting Cookbook Solve real-world shell scripting problems with over 110 simple but incredibly effective recipes  In this article, we took a look at some tips and tricks on working with Linux shell script.    Successful and unsuccessful command Tip: When a command returns after error, it returns a non-zero exit status. The command returns zero when it terminates after successful completion. Return status can be read from special variable $? (run echo $? immediately after the command execution statement to print the exit status). Fork bomb :(){ :|:& };: Tip: This recursive function is a function that calls itself. It infinitely spawns processes and ends up in a denial of service attack. & is postfixed with the function call to bring the subprocess into the background. This is a dangerous code as it forks processes and, therefore, it is called a fork bomb. You may find it difficult to interpret the above code. See Wikipedia page http://en.wikipedia.org/wiki/Fork_bomb for more details and interpretation of the fork bomb. It can be prevented by restricting the maximum number of processes that can be spawned from the config file /etc/security/limits.conf. Specify -maxdepth and –mindepth as the third argument Tip: -maxdepth and –mindepth should be specified as the third argument to the find. If they are specified as the fourth or further arguments, it may affect the efficiency of the find as it has to do unnecessary checks (for example, if –maxdepth is specified as the fourth argument and –type as the third argument, the find command first finds out all the files having the specified –type and then finds all of the matched files having the specified depth. However, if the depth were specified as the third argument and –type as the fourth, find could collect all the files having at most the specified depth and then check for the file type, which is the most efficient way of searching. -exec with multiple commands Tip: We cannot use multiple commands along with the –exec parameter. It accepts only a single command, but we can use a trick. Write multiple commands in a shell script (for example, commands.sh) and use it with –exec as follows: -exec ./commands.sh {} ; -n option for numeric sort Tip: Always be careful about the -n option for numeric sort. The sort command treats alphabetical sort and numeric sort differently. Hence, in order to specify numeric sort the –n option should be provided. The ## operator Tip: The ## operator is more preferred over the # operator to extract an extension from a filename since the filename may contain multiple '.' characters. Since ## makes greedy match, it always extract extensions only. Recursively search many files Tip: To recursively search for a text over many directories of descendants use: $ grep "text" . -R -n This is one of the most frequently used commands by developers. It is used to find the file of source code in which a certain text exists. Placing variable assignments Tip: Usually, we place initial variable assignments, such as var=0; and statements to print the file header in the BEGIN block. In the END{} block, we place statements such as printing results and so on. -d argument Tip: The -d argument should always be given in quotes. If quotes are not used, & is interpreted by the shell to indicate this should be a background process. Excluding a set of files from archiving Tip: It is possible to exclude a set of files from archiving by specifying patterns. Use --exclude [PATTERN] for excluding files matched by wildcard patterns. For example, to exclude all .txt files from archiving use: $ tar -cf arch.tar * --exclude "*.txt" Note that the pattern should be enclosed in double quotes. Using cpio command for absolute paths Tip: By using cpio, we can also archive using files as absolute paths. /usr/somedir is an absolute path as it contains the full path starting from root (/). A relative path will not start with / but it starts the path from the current directory. For example, test/file means that there is a directory test and the file is inside the test directory. While extracting, cpio extracts to the absolute path itself. But incase of tar it removes the / in the absolute path and converts it as relative path. PATH format Tip: For the PATH format, if we use / at the end of the source, rsync will copy contents of that end directory specified in the source_path to the destination. If / not at the end of the source, rsync will copy that end directory itself to the destination. / at the end of destination_path Tip: If / is at the end of destination_path, rsync will copy the source to the destination directory. If / is not used at the end of the destination path, rsync will create a folder, named similar to the source directory, at the end of the destination path and copy the source into that directory. wait command Tip: The wait command enables a script to be terminated only after all its child process or background processes terminate or complete. First argument of the sftp command Tip: -oPort should be the first argument of the sftp command. Running du DIRECTORY Tip: Running du DIRECTORY will output a similar result, but it will show only the size consumed by subdirectories. However, they do not show the disk usage for each of the files. For printing the disk usage by files, -a is mandatory. du DIRECTORY commands traversal Tip: du can be restricted to traverse only a single file system by using the –x argument. Suppose du DIRECTORY is run, it will traverse through every possible subdirectory of DIRECTORY recursively. A subdirectory in the directory hierarchy may be a mount point (for example, /mnt/sda1 is a subdirectory of /mnt and it is a mount point for the device /dev/sda1). du will traverse that mount point and calculate the sum of disk usage for that device filesystem also. In order to prevent du from traversing and to calculate from other mount points or filesystems, use the -x flag along with other du options. du –x / will exclude all mount points in /mnt/ for disk usage calculation. Use an absolute path for the executable Tip: An executable binary of the time command is available at /usr/bin/time as well as a shell built-in named time exists. When we run time, it calls the shell built-in by default. The shell built-in time has limited options. Hence, we should use an absolute path for the executable (/usr/bin/time) for performing additional functionalities. -x argument Tip: The -x argument along with -a specifies to remove the TTY restriction imparted, by default, by ps. Usually, using ps without arguments prints processes that are attached to terminal only. Parameters for -o Tip: Parameters for -o are delimited by using the comma (,) operator. It should be noted that there is no space in between the comma operator and next parameter. Mostly, the -o option is combined with the -e (every) option (-oe) since it should list every process running in the system. However, when certain filters are used along with –o, such as those used for listing the processes owned by specified users, -e is not used along with –o. Usage of -e with a filter will nullify the filter and it will show all process entries. pgrep command Tip: pgrep requires only a portion of the command name as its input argument to extract a Bash command, for example, pgrep ash or pgrep bas will also work. But ps requires you to type the exact command. apropos Tip: Sometimes we need to search if some command related to a word exists. Then we can search the manpages for strings in the command. For this we can use: apropos COMMAND   In this article, we took a look at some tips and tricks on working with Linux shell script.Further resources on this subject: Linux Email [Book] Installing VirtualBox on Linux [Article] Linux Shell Script: Logging Tasks [Article] Linux Shell Script: Monitoring Activities [Article] Compression Formats in Linux Shell Script [Article] Making a Complete yet Small Linux Distribution [Article]
Read more
  • 0
  • 0
  • 7978

article-image-home-security-beaglebone
Packt
13 Dec 2013
7 min read
Save for later

Home Security by BeagleBone

Packt
13 Dec 2013
7 min read
(For more resources related to this topic, see here.) One of the best kept secrets of the security and access control industry is just how simple the monitoring hardware actually is. It is the software that runs on the monitoring hardware that makes it seem cool. The original BeagleBone or the new BeagleBone Black, have all the computing power you need to build yourself an extremely sophisticated access control, alarm panel, home automation, and network intrusion detection system. All for less than a year's worth of monitoring charges from your local alarm company! Don't get me wrong, monitored alarm systems have their place. Your elderly mother, for example, or your convenience store in a bad part of town. There is no substitute for a live human on the other end of the line. That said, if you are reading this, you are probably a builder or a hobbyist with all the skills required to do it yourself. BeagleBone is used as the development platform. The modular design of the alarm system allows the hardware to be used with any of the popular single board computers available in the market today. Any single board computer with at least eight accessible input/output pins will work. For example, the Arduino series of boards, the Gumstix line of hardware, and many others. The block diagram of the alarm system is shown in the following diagram: Block Diagram The adapter board is what is used to connect the single board computer to the alarm system. The adapter board comes with connectors for adding two more zones and four more outputs. Instructions are provided for adding zone inputs and panel outputs to the software. An alarm zone can be thought of as having two properties. The first is the actual hardware sensors connected to the panel. The second is the physical area being protected by the sensors. There are three or four types of sensors found in home and small business alarm systems. The first and most common is the magnetic door or window contact. The magnet is attached to the moving part (the window or the door) and the contacts are attached to the frame of the door or window. When the door or window is opened past a certain point the magnet can no longer hold the contacts closed, and they open to signal an alarm. The second most common sensor is the active sensor. The PIR or passive infrared motion sensor is installed in the corner of a room in order to detect the motion of a body which is warmer than the ambient temperature. Two other common sensors are temperature rise and CO detectors. These can both be thought of as life saving detectors. They are normally on a separate zone so that they are not disabled when the alarm system is not armed. The temperature rise detector senses a sudden rise in the ambient temperature and is intended to replace the old ionization type smoke detectors. No more burnt toast false alarms! The CO detector is used to detect the presence of Carbon Monoxide, which is a byproduct of combustion. Basically, faulty oil or gas furnaces and wood or coal burning stoves are the main culprit. Temperature Rise or CO Detector Physical zones are the actual physical location that the sensors are protecting. For example "ground floor windows" could be a zone. Other typical zones defended by a PIR could be garage or rear patio. In the latter case, outdoor PIR motion sensors are available at about twice the price of an indoor model. Depending on your climate, you may be able to install an indoor sensor outside, provided that it is sheltered from rain. The basic alarm system comes with four zone inputs and four alarm outputs. The outputs are just optically isolated phototransistors. So you can use them for anything you like. The first output is reserved in software for the siren, but you can do whatever you like with the other outputs. All four outputs are accessible from the alarm system web page, so you can remotely turn on or off any number of things. For example, you can use the left over three outputs to turn on and off lawn sprinklers, outdoor lighting or fountains and pool pumps. That's right. The alarm system has its own built in web server which provides you with access to the alarm system from anywhere with an internet connection. You could be on the other side of the world and if anything goes wrong, the alarm system will send you an e-mail telling you that something is wrong. Also, if you leave for the airport and forget to turn on or off the lights or lawn sprinkler, simply connect to the alarm system and correct the problem. You can also connect to the system via SSH or secure shell. This allows you to remotely run terminal applications on your BeagleBone. The alarm system, actually has very little to do so long as no alarms occur. The alarm system hardware generates an interrupt which is detected by the BeagleBone, so the BeagleBone spends most of its time idle. This is a waste of computing resources, so the system can also run network intrusion detection software. Not only can this alarm system protect you physical property, it can also keep your network safe as well. Can any local alarm system company claim that? Iptraf Iptraf is short for IP Traffic Monitor. This is a terminal-based program which monitors traffic on any of the interfaces connected to your network or the BeagleBone. My TraceRoute (mtr-0.85) Anyone who has ever used trace route on either Linux or Windows will know that it is used to find the path to a given IP address. MTR is a combination of trace route and ping in one single tool. Wavemon Wavemon is a simple ASCII text-based program that you can use to monitor your WiFi connections to the BeagleBone. Unlike the first two programs, Wavemon requires an Angstrom compatible WiFi adapter. In this case I used an AWUS036H wireless adapter. hcitool Bluetooth monitoring can be done in much the same way as WiFi monitoring; with hcitool. For example: hcitool scan will scan any visible Bluetooth devices in range. As with Wavemon, an external Bluetooth adapter is required. Your personal security system These are just some of the features of the security system you can build and customize for yourself. With advanced programming skills, you can create a security system with fingerprint ID access, that not only monitors and controls its physical surroundings but also the network that it is connected to. It can also provide asset tracking via RFID, barcode, or both; all for much less than the price of a commercial system. Not only that but you designed built and installed it. So tech support is free and should be very knowledgeable! Summary A block diagram of the alarm system is explained. The adapter board is what is used to connect the single board computer to the alarm system. The adapter board comes with connectors for adding two more zones and four more outputs. Instructions are provided for adding zone inputs and panel outputs to the software. Resources for Article: Further resources on this subject: Building a News Aggregating Site in Joomla! [Article] Breaching Wireless Security [Article] Building HTML5 Pages from Scratch [Article]
Read more
  • 0
  • 0
  • 7973
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at ₹800/month. Cancel anytime
article-image-building-wpf-net-client
Packt
16 Sep 2015
8 min read
Save for later

Building a WPF .NET Client

Packt
16 Sep 2015
8 min read
In this article by Einar Ingebrigtsen, author of the book SignalR: Real-time Application Development - Second Edition we will bring the full feature set of what we've built so far for the web onto the desktop through a WPF .NET client. There are quite a few ways of developing Windows client solutions, and WPF was introduced back in 2005 and has become one of the most popular ways of developing software for Windows. In WPF, we have something called XAML, which is what Windows Phone development supports and is also the latest programming model in Windows 10. In this chapter, the following topics will be covered: MVVM Brief introduction to the SOLID principles XAML WPF (For more resources related to this topic, see here.) Decoupling it all So you might be asking yourself, what is MVVM? It stands for Model View ViewModel: a pattern for client development that became very popular in the XAML stack, enabled by Microsoft based on Martin Fowlers presentation model (http://martinfowler.com/eaaDev/PresentationModel.html). Its principle is that you have a ViewModel that holds the state and exposes behavior that can be utilized from a view. The view observes any changes of the state the ViewModel exposes, making the ViewModel totally unaware that there is a View. The ViewModel is decoupled and can be put in isolation and is perfect for automated testing. As part of the state that the ViewModel typically holds is the model part, which is something it usually gets from the server, and a SignalR hub is the perfect transport to get this. It boils down to recognizing the different concerns that make up the frontend and separating it all. This gives us the following diagram: Decoupling – the next level In this chapter, one of the things we will brush up is the usage of the Dependency Inversion Principle, the D of SOLID. Let's start with the first principle: the S in SOLID of Single Responsibility Principle, which states that a method or a class should only have one reason to change and only have one responsibility. With this, we can't have our units take on more than one responsibility and need help from collaborators to do the entire job. These collaborators are things we now depend on and we should represent these dependencies clearly to our units so that anyone or anything instantiating it knows what we are depending on. We have now flipped around the way in which we get dependencies. Instead of the unit trying to instantiate everything itself, we now clearly state what we need as collaborators, opening up for the calling code to decide what implementations of these dependencies you want to pass on. Also, this is an important aspect; typically, you'd want the dependencies expressed in the form of interfaces, yielding flexibility for the calling code. Basically, what this all means is that instead of a unit or system instantiating and managing its dependencies, we decouple and let something called as the Inversion of Control container deal with this. In the sample, we will use an IoC (Inversion of Control) container called Ninject that will deal with this for us. What it basically does is manage what implementations to give to the dependency specified on the constructor. Often, you'll find that the dependencies are interfaces in C#. This means one is not coupled to a specific implementation and has the flexibility of changing things at runtime based on configuration. Another role of the IOC container is to govern the life cycle of the dependencies. It is responsible for knowing when to create new instances and when to reuse an instance. For instance, in a web application, there are some systems that you want to have a life cycle of per request, meaning that we will get the same instance for the lifetime of a web request. The life cycle is configurable in what is known as a binding. When you explicitly set up the relationship between a contract (interface) and its implementation, you can choose to set up the life cycle behavior as well. Building for the desktop The first thing we will need is a separate project in our solution: Let's add it by right-clicking on the solution in Solution Explorer and navigating to Add | New Project: In the Add New Project dialog box, we want to make sure the .NET Framework 4.5.1 is selected. We could have gone with 4.5, but some of the dependencies that we're going to use have switched to 4.5.1. This is the latest version of the .NET Framework at the time of writing, so if you can, use it. Make sure to select Windows Desktop and then select WPF Application. Give the project the name SignalRChat.WPF and then click on the OK button: Setting up the packages We will need some packages to get started properly. This process is described in detail in Chapter 1, The Primer. Let's start off by adding SignalR, which is our primary framework that we will be working with to move on. We will be pulling this using NuGet, as described in Chapter 1, The Primer: Right-click on the References in Solution Explorer and select Manage NuGet Packages, and type Microsoft.AspNet.SignalR.Client in the Search dialog box. Select it and click on Install. Next, we're going to pull down something called as Bifrost. Bifrost is a library that helps us build MVVM-based solutions on WPF; there are a few other solutions out there, but we'll focus on Bifrost. Add a package called Bifrost.Client. Then, we need the package that gives us the IOC container called Ninject, working together with Bifrost. Add a package called Bifrost.Ninject. Observables One of the things that is part of WPF and all other XAML-based platforms is the notion of observables; be it in properties or collections that will notify when they change. The notification is done through well-known interfaces for this, such as INotifyPropertyChanged or INotifyCollectionChanged. Implementing these interfaces quickly becomes tedious all over the place where you want to notify everything when there are changes. Luckily, there are ways to make this pretty much go away. We can generate the code for this instead, either at runtime or at build time. For our project, we will go for a build-time solution. To accomplish this, we will use something called as Fody and a plugin for it called PropertyChanged. Add another NuGet package called PropertyChanged.Fody. If you happen to get problems during compiling, it could be the result of the dependency to a package called Fody not being installed. This happens for some versions of the package in combination with the latest Roslyn compiler. To fix this, install the NuGet package called Fody explicitly. Now that we have all the packages, we will need some configuration in code: Open the App.xam.cs file and add the following statement: using Bifrost.Configuration; The next thing we will need is a constructor for the App class: public App() { Configure.DiscoverAndConfigure(); } This will tell Bifrost to discover the implementations of the well-known interfaces to do the configuration. Bifrost uses the IoC container internally all the time, so the next thing we will need to do is give it an implementation. Add a class called ContainerCreator at the root of the project. Make it look as follows: using Bifrost.Configuration; using Bifrost.Execution; using Bifrost.Ninject; using Ninject; namespace SignalRChat.WPF { public class ContainerCreator : ICanCreateContainer { public IContainer CreateContainer() { var kernel = new StandardKernel(); var container = new Container(kernel); return container; } } } We've chosen Ninject among others that Bifrost supports, mainly because of familiarity and habit. If you happen to have another favorite, Bifrost supports a few. It's also fairly easy to implement your own support; just go to the source at http://github.com/dolittle/bifrost to find reference implementations. In order for Bifrost to be targeting the desktop, we need to tell it through configuration. Add a class called Configurator at the root of the project. Make it look as follows: using Bifrost.Configuration; namespace SignalRChat.WPF { public class Configurator : ICanConfigure { public void Configure(IConfigure configure) { configure.Frontend.Desktop(); } } } Summary Although there are differences between creating a web solution and a desktop client, the differences have faded over time. We can apply the same principles across the different environments; it's just different programming languages. The SignalR API adds the same type of consistency in thinking, although not as matured as the JavaScript API with proxy generation and so on; still the same ideas and concepts are found in the underlying API. Resources for Article: Further resources on this subject: The Importance of Securing Web Services [article] Working with WebStart and the Browser Plugin [article] Microsoft Azure – Developing Web API for Mobile Apps [article]
Read more
  • 0
  • 0
  • 7972

article-image-middleware
Packt
30 Dec 2014
13 min read
Save for later

Middleware

Packt
30 Dec 2014
13 min read
In this article by Mario Casciaro, the author of the book, "Node.js Design Patterns", has described the importance of using a middleware pattern. One of the most distinctive patterns in Node.js is definitely middleware. Unfortunately it's also one of the most confusing for the inexperienced, especially for developers coming from the enterprise programming world. The reason for the disorientation is probably connected with the meaning of the term middleware, which in the enterprise architecture's jargon represents the various software suites that help to abstract lower level mechanisms such as OS APIs, network communications, memory management, and so on, allowing the developer to focus only on the business case of the application. In this context, the term middleware recalls topics such as CORBA, Enterprise Service Bus, Spring, JBoss, but in its more generic meaning it can also define any kind of software layer that acts like a glue between lower level services and the application (literally the software in the middle). (For more resources related to this topic, see here.) Middleware in Express Express (http://expressjs.com) popularized the term middleware in theNode.js world, binding it to a very specific design pattern. In express, in fact, a middleware represents a set of services, typically functions, that are organized in a pipeline and are responsible for processing incoming HTTP requests and relative responses. An express middleware has the following signature: function(req, res, next) { ... } Where req is the incoming HTTP request, res is the response, and next is the callback to be invoked when the current middleware has completed its tasks and that in turn triggers the next middleware in the pipeline. Examples of the tasks carried out by an express middleware are as the following: Parsing the body of the request Compressing/decompressing requests and responses Producing access logs Managing sessions Providing Cross-site Request Forgery (CSRF) protection If we think about it, these are all tasks that are not strictly related to the main functionality of an application, rather, they are accessories, components providing support to the rest of the application and allowing the actual request handlers to focus only on their main business logic. Essentially, those tasks are software in the middle. Middleware as a pattern The technique used to implement middleware in express is not new; in fact, it can be considered the Node.js incarnation of the Intercepting Filter pattern and the Chain of Responsibility pattern. In more generic terms, it also represents a processing pipeline,which reminds us about streams. Today, in Node.js, the word middleware is used well beyond the boundaries of the express framework, and indicates a particular pattern whereby a set of processing units, filters, and handlers, under the form of functions are connected to form an asynchronous sequence in order to perform preprocessing and postprocessing of any kind of data. The main advantage of this pattern is flexibility; in fact, this pattern allows us to obtain a plugin infrastructure with incredibly little effort, providing an unobtrusive way for extending a system with new filters and handlers. If you want to know more about the Intercepting Filter pattern, the following article is a good starting point: http://www.oracle.com/technetwork/java/interceptingfilter-142169.html. A nice overview of the Chain of Responsibility pattern is available at this URL: http://java.dzone.com/articles/design-patterns-uncovered-chain-of-responsibility. The following diagram shows the components of the middleware pattern: The essential component of the pattern is the Middleware Manager, which is responsible for organizing and executing the middleware functions. The most important implementation details of the pattern are as follows: New middleware can be registered by invoking the use() function (the name of this function is a common convention in many implementations of this pattern, but we can choose any name). Usually, new middleware can only be appended at the end of the pipeline, but this is not a strict rule. When new data to process is received, the registered middleware is invoked in an asynchronous sequential execution flow. Each unit in the pipeline receives in input the result of the execution of the previous unit. Each middleware can decide to stop further processing of the data by simply not invoking its callback or by passing an error to the callback. An error situation usually triggers the execution of another sequence of middleware that is specifically dedicated to handling errors. There is no strict rule on how the data is processed and propagated in the pipeline. The strategies include: Augmenting the data with additional properties or functions Replacing the data with the result of some kind of processing Maintaining the immutability of the data and always returning fresh copies as result of the processing The right approach that we need to take depends on the way the Middleware Manager is implemented and on the type of processing carried out by the middleware itself. Creating a middleware framework for ØMQ Let's now demonstrate the pattern by building a middleware framework around the ØMQ (http://zeromq.org) messaging library. ØMQ (also known as ZMQ, or ZeroMQ) provides a simple interface for exchanging atomic messages across the network using a variety of protocols; it shines for its performances, and its basic set of abstractions are specifically built to facilitate the implementation of custom messaging architectures. For this reason, ØMQ is often chosen to build complex distributed systems. The interface of ØMQ is pretty low-level, it only allows us to use strings and binary buffers for messages, so any encoding or custom formatting of data has to be implemented by the users of the library. In the next example, we are going to build a middleware infrastructure to abstract the preprocessing and postprocessing of the data passing through a ØMQ socket, so that we can transparently work with JSON objects but also seamlessly compress the messages traveling over the wire. Before continuing with the example, please make sure to install the ØMQ native libraries following the instructions at this URL: http://zeromq.org/intro:get-the-software. Any version in the 4.0 branch should be enough for working on this example. The Middleware Manager The first step to build a middleware infrastructure around ØMQ is to create a component that is responsible for executing the middleware pipeline when a new message is received or sent. For the purpose, let's create a new module called zmqMiddlewareManager.js and let's start defining it: function ZmqMiddlewareManager(socket) { this.socket = socket; this.inboundMiddleware = []; //[1] this.outboundMiddleware = []; var self = this; socket.on('message', function(message) { //[2] self.executeMiddleware(self.inboundMiddleware, { data: message }); }); } module.exports = ZmqMiddlewareManager; This first code fragment defines a new constructor for our new component. It accepts a ØMQ socket as an argument and: Creates two empty lists that will contain our middleware functions, one for the inbound messages and another one for the outbound messages. Immediately, it starts listening for the new messages coming from the socket by attaching a new listener to the message event. In the listener, we process the inbound message by executing the inboundMiddleware pipeline. The next method of the ZmqMiddlewareManager prototype is responsible for executing the middleware when a new message is sent through the socket: ZmqMiddlewareManager.prototype.send = function(data) { var self = this; var message = { data: data}; self.executeMiddleware(self.outboundMiddleware, message,    function() {    self.socket.send(message.data);    } ); } This time the message is processed using the filters in the outboundMiddleware list and then passed to socket.send() for the actual network transmission. Now, we need a small method to append new middleware functions to our pipelines; we already mentioned that such a method is conventionally called use(): ZmqMiddlewareManager.prototype.use = function(middleware) { if(middleware.inbound) {    this.inboundMiddleware.push(middleware.inbound); }if(middleware.outbound) {    this.outboundMiddleware.unshift(middleware.outbound); } } Each middleware comes in pairs; in our implementation it's an object that contains two properties, inbound and outbound, that contain the middleware functions to be added to the respective list. It's important to observe here that the inbound middleware is pushed to the end of the inboundMiddleware list, while the outbound middleware is inserted at the beginning of the outboundMiddleware list. This is because complementary inbound/outbound middleware functions usually need to be executed in an inverted order. For example, if we want to decompress and then deserialize an inbound message using JSON, it means that for the outbound, we should instead first serialize and then compress. It's important to understand that this convention for organizing the middleware in pairs is not strictly part of the general pattern, but only an implementation detail of our specific example. Now, it's time to define the core of our component, the function that is responsible for executing the middleware: ZmqMiddlewareManager.prototype.executeMiddleware = function(middleware, arg, finish) {var self = this;(    function iterator(index) {      if(index === middleware.length) {        return finish && finish();      }      middleware[index].call(self, arg, function(err) { if(err) {        console.log('There was an error: ' + err.message);      }      iterator(++index);    }); })(0); } The preceding code should look very familiar; in fact, it is a simple implementation of the asynchronous sequential iteration pattern. Each function in the middleware array received in input is executed one after the other, and the same arg object is provided as an argument to each middleware function; this is the trickthat makes it possible to propagate the data from one middleware to the next. At the end of the iteration, the finish() callback is invoked. Please note that for brevity we are not supporting an error middleware pipeline. Normally, when a middleware function propagates an error, another set of middleware specifically dedicated to handling errors is executed. This can be easily implemented using the same technique that we are demonstrating here. A middleware to support JSON messages Now that we have implemented our Middleware Manager, we can create a pair of middleware functions to demonstrate how to process inbound and outbound messages. As we said, one of the goals of our middleware infrastructure is having a filter that serializes and deserializes JSON messages, so let's create a new middleware to take care of this. In a new module called middleware.js; let's include the following code: module.exports.json = function() { return {    inbound: function(message, next) {      message.data = JSON.parse(message.data.toString());      next();    },    outbound: function(message, next) {      message.data = new Buffer(JSON.stringify(message.data));      next();    } } } The json middleware that we just created is very simple: The inbound middleware deserializes the message received as an input and assigns the result back to the data property of message, so that it can be further processed along the pipeline The outbound middleware serializes any data found into message.data Design Patterns Please note how the middleware supported by our framework is quite different from the one used in express; this is totally normal and a perfect demonstration of how we can adapt this pattern to fit our specific need. Using the ØMQ middleware framework We are now ready to use the middleware infrastructure that we just created. To do that, we are going to build a very simple application, with a client sending a ping to a server at regular intervals and the server echoing back the message received. From an implementation perspective, we are going to rely on a request/reply messaging pattern using the req/rep socket pair provided by ØMQ (http://zguide. zeromq.org/page:all#Ask-and-Ye-Shall-Receive). We will then wrap the socketswith our zmqMiddlewareManager to get all the advantages from the middleware infrastructure that we built, including the middleware for serializing/deserializing JSON messages. The server Let's start by creating the server side (server.js). In the first part of the module we initialize our components: var zmq = require('zmq'); var ZmqMiddlewareManager = require('./zmqMiddlewareManager'); var middleware = require('./middleware'); var reply = zmq.socket('rep'); reply.bind('tcp://127.0.0.1:5000'); In the preceding code, we loaded the required dependencies and bind a ØMQ 'rep' (reply) socket to a local port. Next, we initialize our middleware: var zmqm = new ZmqMiddlewareManager(reply); zmqm.use(middleware.zlib()); zmqm.use(middleware.json()); We created a new ZmqMiddlewareManager object and then added two middlewares, one for compressing/decompressing the messages and another one for parsing/ serializing JSON messages. For brevity, we did not show the implementation of the zlib middleware. Now we are ready to handle a request coming from the client, we will do this by simply adding another middleware, this time using it as a request handler: zmqm.use({ inbound: function(message, next) { console.log('Received: ',    message.data); if(message.data.action === 'ping') {     this.send({action: 'pong', echo: message.data.echo});  }    next(); } }); Since this last middleware is defined after the zlib and json middlewares, we can transparently use the decompressed and deserialized message that is available in the message.data variable. On the other hand, any data passed to send() will be processed by the outbound middleware, which in our case will serialize then compress the data. The client On the client side of our little application, client.js, we will first have to initiate a new ØMQ req (request) socket connected to the port 5000, the one used by our server: var zmq = require('zmq'); var ZmqMiddlewareManager = require('./zmqMiddlewareManager'); var middleware = require('./middleware'); var request = zmq.socket('req'); request.connect('tcp://127.0.0.1:5000'); Then, we need to set up our middleware framework in the same way that we did for the server: var zmqm = new ZmqMiddlewareManager(request); zmqm.use(middleware.zlib()); zmqm.use(middleware.json()); Next, we create an inbound middleware to handle the responses coming from the server: zmqm.use({ inbound: function(message, next) {    console.log('Echoed back: ', message.data);    next(); } }); In the preceding code, we simply intercept any inbound response and print it to the console. Finally, we set up a timer to send some ping requests at regular intervals, always using the zmqMiddlewareManager to get all the advantages of our middleware: setInterval(function() { zmqm.send({action: 'ping', echo: Date.now()}); }, 1000); We can now try our application by first starting the server: node server We can then start the client with the following command: node client At this point, we should see the client sending messages and the server echoing them back. Our middleware framework did its job; it allowed us to decompress/compress and deserialize/serialize our messages transparently, leaving the handlers free to focus on their business logic! Summary In this article, we learned about the middleware pattern and the various facets of the pattern, and we also saw how to create a middleware framework and how to use. Resources for Article:  Further resources on this subject: Selecting and initializing the database [article] Exploring streams [article] So, what is Node.js? [article]
Read more
  • 0
  • 0
  • 7953

Packt
22 Sep 2015
16 min read
Save for later

R ─ Classification and Regression Trees

Packt
22 Sep 2015
16 min read
"The classifiers most likely to be the best are the random forest (RF) versions, the best of which (implemented in R and accessed via caret), achieves 94.1 percent of the maximum accuracy overcoming 90 percent in the 84.3 percent of the data sets."                                                                          – Fernández-Delgado et al (2014) "You can't see the forest for the trees!"                                                                                                     – An old saying (For more resources related to this topic, see here.) In this article by Cory Lesmeister, the author of Mastering Machine Learning with R, the first item of discussion is the basic decision tree, which is both simple to build and understand. However, the single decision tree method does not perform as well as the other methods such as support vector machines or neural networks. Therefore, we will discuss the creation of multiple, sometimes hundreds of, different trees with their individual results combined, leading to a single overall prediction. The first quote written above is from Fernández-Delgado et al in the Journal of Machine Learning Research and is meant to set the stage that the techniques in this article are quite powerful, particularly when used for the classification problems. Certainly, they are not always the best solution, but they do provide a good starting point. Regression trees For an understanding of the tree-based methods, it is probably easier to start with a quantitative outcome and then move on to how it works on a classification problem. The essence of a tree is that the features are partitioned, starting with the first split that improves the residual sum of squares the most. These binary splits continue until the termination of the tree. Each subsequent split/partition is not done on the entire dataset but only on the portion of the prior split that it falls under. This top-down process is referred as recursive partitioning. It is also a process that is greedy, a term you may stumble on in reading about the machine learning methods. Greedy means that in each split in the process, the algorithm looks for the greatest reduction in the residual sum of squares without a regard to how well it will perform in the later partitions. The result is that you may end up with a full tree of unnecessary branches, leading to a low bias but high variance. To control this effect, you need to appropriately prune the tree to an optimal size after building a full tree. The following figure provides a visual of the technique in action. The data is hypothetical with 30 observations, a response ranging from 1 to 10, and two predictor features, both ranging in value from 0 to 10 named X1 and X2. The tree has three splits that lead to four terminal nodes. Each split is basically an if or then statement or uses an R syntax, ifelse(). In the first split, if X1 < 3.5, then the response is split into 4 observations with an average value of 2.4 and the remaining 26 observations. This left branch of 4 observations is a terminal node as any further splits would not substantially improve the residual sum of squares. The predicted value for the 4 observations in that partition of the tree becomes the average. The next split is at X2 < 4 and finally X1 < 7.5. An advantage of this method is that it can handle the highly nonlinear relationships; but can you see a couple of potential problems? The first issue is that an observation is given the average of the terminal node that it falls under. This can hurt the overall predictive performance (high bias). Conversely, if you keep partitioning the data further and further to achieve a low bias, high variance can become an issue. As with the other methods, you can use cross-validation to select the appropriate tree size. Regression Tree with 3 splits and 4 terminal nodes and the corresponding node average and number of observations. Classification trees Classification trees operate under the same principal as regression trees except that the splits are not determined by the residual sum of squares but an error rate. The error rate used is not what you would expect, where the calculation is simply misclassified observations divided by the total observations. As it turns out, when it comes to tree splitting, a misclassification rate by itself may lead to a situation where you can gain information with a further split but not improve the misclassification rate. Let's look at an example. Suppose we have a node—let's call it N0 where you have 7 observations labeled No and 3 observations labeled Yes, and we say that the misclassified rate is 30 percent. With this in mind, let's calculate a common alternative error measure called Gini index. The formula for a single node Gini index is as follows: Gini = 1 – (probability of Class 1)2 – (probability of Class 2)2. For N0, the Gini is 1 - (.7)2 - (.3)2, which is equal to 0.42, versus the misclassification rate of 30 percent. Taking this example further, we will now create an N1 node with 3 of Class 1 and none of Class 2 along with N2, which has 4 observations from Class 1 and 3 from Class 2. Now, the overall misclassification rate for this branch of the tree is still 30 percent, but look at the following to see how the overall Gini index has improved: Gini(N1) = 1 – (3/3)2 – (0/3)2 = 0. Gini(N2) = 1 – (4/7)2 – (3/7)2 = 0.49. The new Gini index = (proportion of N1 x Gini(N1)) + (proportion of N2 x Gini(N2)) which is equal to (.3 x 0) + (.7 x 0.49) or 0.343. By doing a split on a surrogate error rate, we actually improved our model impurity by reducing it from 0.42 to 0.343, whereas the misclassification rate did not change. This is the methodology used by the rpart() package. Random forest To greatly improve our model's predictive ability, we can produce numerous trees and combine the results. The random forest technique does this by applying two different tricks in the model development. The first is the use of bootstrap aggregation or bagging as it is called. In bagging, an individual tree is built on a sample of dataset, roughly two-thirds of the total observations. It is important to note that the remaining one-third is referred to as Out of Bag(OOB). This is repeated for dozens or hundreds of times and the results are averaged. Each of these trees is grown and not pruned based on any error measure and this means that the variance of each of these individual trees is high. However, by averaging the results, you can reduce the variance without increasing the bias. The next thing that the random forest brings to the table is that concurrently with the random sample of the data, it also takes a random sample of the input features at each split. In the randomForest package, we will use the default random number of the sampled predictors, which is the square root of the total predictors for classification problems and total predictors divided by 3 for regression. The number of predictors that the algorithm randomly chooses at each split can be changed via the model tuning process. By doing this random sampling of the features at each split and incorporating it into the methodology, you mitigate the effect of a highly correlated predictor in becoming the main driver in all of your bootstrapped trees and preventing you from reducing the variance that you hoped to achieve with bagging. The subsequent averaging of the trees that are less correlated to each other than if you only performed bagging, is more generalizable and more robust to outliers. Gradient boosting The boosting methods can become extremely complicated for you to learn and understand, but you should keep in mind about what is fundamentally happening behind the curtain. The main idea is to build an initial model of some kind (linear, spline, tree, and so on.) called the base-learner, examine the residuals, and fit a model based on these residuals around the so-called loss function. A loss function is merely the function that measures the discrepancy between the model and desired prediction, for example, a squared error for the regression or the logistic function for the classification. The process continues until it reaches some specified stopping criterion. This is like the student who takes a practice exam and gets 30 out of 100 questions wrong and as a result, studies only those 30 questions that were missed. The next practice exam they get 10 out of these 30 wrong and so only focus on these 10 questions and so on. If you would like to explore the theory behind this further, a great resource for you is available in Frontiers in Neurorobotics, Gradient boosting machines, a tutorial, Natekin A., Knoll A., (2013), at http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3885826/. As previously mentioned, boosting can be applied to many different base learners, but here we will only focus on the specifics of tree-based learning. Each tree iteration is small and we will determine how small it is with one of the tuning parameters referred to as interaction depth. In fact, it may be as small as one split, which is referred to as a stump. Trees are sequentially fit to the residuals according to the loss function up to the number of trees that we specified (our stopping criterion). There is another tuning parameter that we will need to identify and that is shrinkage. You can think of shrinkage as the rate at which your model is learning generally and specifically, as the contribution of each tree or stump to the model. This learning rate acts as a regularization parameter. The other thing about our boosting algorithm is that it is stochastic, meaning that it adds randomness by taking a random sample of our data at each tree. Introducing some randomness to a boosted model usually improves the accuracy and speed and reduces overfitting (Friedman 2002). As you may have guessed, tuning these parameters can be quite a challenge. These parameters can interact with each other and if you just tinker with one without considering the other, your model may actually perform worse. The caret package will help us in this endeavor. Business case The overall business objective in this situation is to see if we can improve the predictive ability for some of the cases. For regression, we will visit the prostate cancer data. For classification purposes, we will utilize both the breast cancer biopsy data and Pima Indian Diabetes data. Both random forests and boosting will be applied to all the three datasets. The simple tree method will be used only on the breast and prostate cancer sets. Regression tree We will jump right into the prostate data set, but first let's load the necessary R package, as follows: > library(rpart) #classification and regression trees > library(partykit) #treeplots > library(MASS) #breast and pima indian data > library(ElemStatLearn) #prostate data > library(randomForest) #random forests > library(gbm) #gradient boosting > library(caret) #tune hyper-parameter First, we will do regression on the prostate data. This involves calling the dataset, coding the gleason score as an indicator variable using the ifelse() function, and creating a test and training set. The training set will be pros.train and the test set will be pros.test, as follows: > data(prostate) > prostate$gleason = ifelse(prostate$gleason == 6, 0, 1) > pros.train = subset(prostate, train==TRUE)[,1:9] > pros.test = subset(prostate, train==FALSE)[,1:9] To build a regression tree on the training data, we will use the following rpart() function from R's party package. The syntax is quite similar to what we used in the other modeling techniques: > tree.pros <- rpart(lpsa~., data=pros.train) We can call this object using the print() function and cptable and then examine the error per split to determine the optimal number of splits in the tree, as follows: > print(tree.pros$cptable) CP nsplit rel error xerror xstd 1 0.35852251 0 1.0000000 1.0364016 0.1822698 2 0.12295687 1 0.6414775 0.8395071 0.1214181 3 0.11639953 2 0.5185206 0.7255295 0.1015424 4 0.05350873 3 0.4021211 0.7608289 0.1109777 5 0.01032838 4 0.3486124 0.6911426 0.1061507 6 0.01000000 5 0.3382840 0.7102030 0.1093327 This is a very important table to analyze. The first column labeled CP is the cost complexity parameter, which states that the second column, nsplit, is the number of splits in the tree. The rel error column stands for relative errors and is the residual sum of squares for that number of splits divided by the residual sum of squares for no splits (RSS(k)/RSS(0). Both xerror and xstd are based on a ten-fold cross-validation with xerror being the average error and xstd being the standard deviation of the cross-validation process. We can see that four splits produced slightly less errors using cross-validation while five splits produced the lowest error on the full dataset. You can examine this using the plotcp() function, as follows: > plotcp(tree.pros) The following is the output of the preceding command: The plot shows us the relative error by the tree size with the corresponding error bars. The horizontal line on the plot is the upper limit of the lowest standard error. Selecting the tree size 5, which is four splits, we can build a new tree object where xerror is minimized by pruning our tree accordingly—first creating an object for cp associated with the pruned tree from the table. Then the prune() function handles the rest as follows: > cp = min(tree.pros$cptable[5,]) > prune.tree.pros <- prune(tree.pros, cp = cp) With this done, you can plot and compare the full and pruned trees. The tree plots produced by the partykit package are much better than those produced by the party package. You can simply use the as.party() function as a wrapper in the plot() function: > plot(as.party(tree.pros)) The output of the preceding command is as follows: > plot(as.party(prune.tree.pros)) The following is the output of the preceding command: Note that the splits are exactly the same in the two trees with the exception of the last split, which includes the age variable for the full tree. Interestingly, both the first and second splits in the tree are related to the log of cancer volume lcavol. These plots are quite informative as they show the splits, nodes, observations per node, and box plots of the outcome that we are trying to predict. Let's see how well the pruned tree performs on the test data. What we will do is create an object of the predicted values using the predict() function by incorporating the test data. Then, we will calculate the errors as the predicted values minus the actual values and finally the mean of the squared errors, as follows: > party.pros.test <- predict(prune.tree.pros, newdata=pros.test) > rpart.resid = party.pros.test - pros.test$lpsa #calculate residuals > mean(rpart.resid^2) #caluclate MSE [1] 0.5267748 One can look at the tree plots that we produced and easily explain what are the primary drivers behind the response. As mentioned in the introduction, the trees are easy to interpret and explain, which, in many cases, may be more important than the accuracy. Classification tree For the classification problem, we will prepare the breast cancer data. After loading the data, you will delete the patient ID, rename the features, eliminate the few missing values, and then create the train/test datasets, as follows: > data(biopsy) > biopsy <- biopsy[,-1] #delete ID > names(biopsy) = c("thick", "u.size", "u.shape", "adhsn", "s.size", "nucl", "chrom", "n.nuc", "mit", "class") #change the feature names > biopsy.v2 = na.omit(biopsy) #delete the observations with missing values > set.seed(123) #random number generator > ind = sample(2, nrow(biopsy.v2), replace=TRUE, prob=c(0.7, 0.3)) > biop.train = biopsy.v2[ind==1,] #the training data set > biop.test = biopsy.v2[ind==2,] #the test data set With the data set up appropriately, we will use the same syntax style for a classification problem as we did previously for a regression problem, but before creating a classification tree, we will need to ensure that the outcome is a factor, which can be done using the str() function. as follows: > str(biop.test[,10]) Factor w/ 2 levels "benign","malignant": 1 1 1 1 1 2 1 2 1 1 … First, we will create the tree: > set.seed(123) > tree.biop <- rpart(class~., data=biop.train) Then, examine the table for the optimal number of splits: > print(tree.biop$cptable) CP nsplit rel error xerror xstd 1 0.79651163 0 1.0000000 1.0000000 0.06086254 2 0.07558140 1 0.2034884 0.2674419 0.03746996 3 0.01162791 2 0.1279070 0.1453488 0.02829278 4 0.01000000 3 0.1162791 0.1744186 0.03082013 The cross-validation error is at a minimum with only two splits (row 3). We can now prune the tree, plot the full and pruned tree, and see how it performs on the test set, as follows: > cp = min(tree.biop$cptable[3,]) > prune.tree.biop <- prune(tree.biop, cp = cp) > plot(as.party(tree.biop)) > plot(as.party(prune.tree.biop)) An examination of the tree plots shows that the uniformity of the cell size is the first split, then bare nuclei. The full tree had an additional split at the cell thickness. We can predict the test observations using type="class" in the predict() function: > rparty.test <- predict(prune.tree.biop, newdata=biop.test, type="class") > table(rparty.test, biop.test$class) rparty.test benign malignant benign 136 3 malignant 6 64 > (136+64)/209 [1] 0.9569378 The basic tree with just two splits gets us almost 96 percent accuracy. This still falls short but should encourage us to believe that we can improve on it with the upcoming methods, starting with random forests. Summary In this article we learned both the power and limitations of tree-based learning methods for both classification and regression problems. To improve on predictive ability, we have the tools of the random forest and gradient boosted trees at our disposal. Resources for Article: Further resources on this subject: Big Data Analysis (R and Hadoop) [article] Using R for Statistics, Research, and Graphics [article] First steps with R [article]
Read more
  • 0
  • 0
  • 7948

article-image-build-intelligent-interfaces-with-coreml-using-a-cnn-tutorial
Savia Lobo
03 Sep 2018
19 min read
Save for later

Build intelligent interfaces with CoreML using a CNN [Tutorial]

Savia Lobo
03 Sep 2018
19 min read
Core ML gives the potential for devices to better serve us rather than us serving them. This adheres to a rule stated by developer Eric Raymond that a computer should never ask the user for any information that it can auto-detect, copy, or deduce. This article is an excerpt taken from Machine Learning with Core ML written by Joshua Newnham. In today's post, we will implement an application that will attempt to guess what the user is trying to draw and provide pre-drawn drawings that the user can substitute with (image search).  We will be exploring two techniques. The first is using a convolutional neural network (CNN), which we are becoming familiar with, to make the prediction, and then look at how we can apply a context-based similarity sorting strategy to better align the suggestions with what the user is trying to sketch. Reviewing the training data and model We will be using a slightly smaller set, with 205 out of the 250 categories; the exact categories can be found in the CSV file /Chapter7/Training/sketch_classes.csv, along with the Jupyter Notebooks used to prepare the data and train the model. The original sketches are available in SVG and PNG formats. Because we're using a CNN, rasterized images (PNG) were used but rescaled from 1111 x 1111 to 256 x 256; this is the expected input of our model. The data was then split into a training and a validation set, using 80% (64 samples from each category) for training and 20% (17 samples from each category) for validation. After 68 iterations (epochs), the model was able to achieve an accuracy of approximately 65% on the validation data. Not exceptional, but if we consider the top two or three predictions, then this accuracy increases to nearly 90%. The following diagram shows the plots comparing training and validation accuracy, and loss during training: With our model trained, our next step is to export it using the Core ML Tools made available by Apple (as discussed in previous chapters) and imported into our project. Classifying sketches Here we will walk through importing the Core ML model into our project and hooking it up, including using the model to perform inference on the user's sketch and also searching and suggesting substitute images for the user to swap their sketch with. Let's get started with importing the Core ML model into our project. Locate the model in the project repositories folder /CoreMLModels/Chapter7/cnnsketchclassifier.mlmodel; with the model selected, drag it into your Xcode project, leaving the defaults for the Import options. Once imported, select the model to inspect the details, which should look similar to the following screenshot: As with all our models, we verify that the model is included in the target by verifying that the appropriate Target Membership is checked, and then we turn our attention to the inputs and outputs, which should be familiar by now. We can see that our model is expecting a single-channel (grayscale) 256 x 256 image and it returns the dominate class via the classLabel property of the output, along with a dictionary of probabilities of all classes via the classLabelProbs property. With our model now imported, let's discuss the details of how we will be integrating it into our project. Recall that our SketchView emits the events UIControlEvents.editingDidStart, UIControlEvents.editingChanged, and UIControlEvents.editingDidEnd as the user draws. If you inspect the SketchViewController, you will see that we have already registered to listen for the UIControlEvents.editingDidEnd event, as shown in the following code snippet: override func viewDidLoad() { super.viewDidLoad() ... ... self.sketchView.addTarget(self, action: #selector(SketchViewController.onSketchViewEditingDidEnd), for: .editingDidEnd) queryFacade.delegate = self } Each time the user ends a stroke, we will start the process of trying to guess what the user is sketching and search for suitable substitutes. This functionality is triggered via the .editingDidEnd action method onSketchViewEditingDidEnd, but will be delegated to the class QueryFacade, which will be responsible for implementing this functionality. This is where we will spend the majority of our time in this section and the next section. It's also probably worth highlighting the statement queryFacade.delegate = self in the previous code snippet. QueryFacade will be performing most of its work off the main thread and will notify this delegate of the status and results once finished, which we will get to in a short while. Let's start by implementing the functionality of the onSketchViewEditingDidEnd method, before turning our attention to the QueryFacade class. Within the SketchViewController class, navigate to the onSketchViewEditingDidEnd method and append the following code: guard self.sketchView.currentSketch != nil, let sketch = self.sketchView.currentSketch as? StrokeSketch else{ return } queryFacade.asyncQuery(sketch: sketch) Here, we are getting the current sketch, and returning it if no sketch is available or if it's not a StrokeSketch; we hand it over to our queryFacade (an instance of the QueryFacade class). Let's now turn our attention to the QueryFacade class; select the QueryFacade.swift file from the left-hand panel within Xcode to bring it up in the editor area. A lot of plumbing has already been implemented to allow us to focus our attention on the core functionality of predicting, searching, and sorting. Let's quickly discuss some of the details, starting with the properties: let context = CIContext() let queryQueue = DispatchQueue(label: "query_queue") var targetSize = CGSize(width: 256, height: 256) weak var delegate : QueryDelegate? var currentSketch : Sketch?{ didSet{ self.newQueryWaiting = true self.queryCanceled = false } } fileprivate var queryCanceled : Bool = false fileprivate var newQueryWaiting : Bool = false fileprivate var processingQuery : Bool = false var isProcessingQuery : Bool{ get{ return self.processingQuery } } var isInterrupted : Bool{ get{ return self.queryCanceled || self.newQueryWaiting } } QueryFacade is only concerned with the most current sketch. Therefore, each time a new sketch is assigned using the currentSketch property, queryCanceled is set to true. During each task (such as performing prediction, search, and downloading), we check the isInterrupted property, and if true, we will exit early and proceed to process the latest sketch. When you pass the sketch to the asyncQuery method, the sketch is assigned to the currentSketch property and then proceeds to call queryCurrentSketch to do the bulk of the work, unless there is one currently being processed: func asyncQuery(sketch:Sketch){ self.currentSketch = sketch if !self.processingQuery{ self.queryCurrentSketch() } } fileprivate func processNextQuery(){ self.queryCanceled = false if self.newQueryWaiting && !self.processingQuery{ self.queryCurrentSketch() } } fileprivate func queryCurrentSketch(){ guard let sketch = self.currentSketch else{ self.processingQuery = false self.newQueryWaiting = false return } self.processingQuery = true self.newQueryWaiting = false queryQueue.async { DispatchQueue.main.async{ self.processingQuery = false self.delegate?.onQueryCompleted( status:self.isInterrupted ? -1 : -1, result:nil) self.processNextQuery() } } } Let's work bottom-up by implementing all the supporting methods before we tie everything together within the queryCurrentSketch method. Let's start by declaring an instance of our model; add the following variable within the QueryFacade class near the top: let sketchClassifier = cnnsketchclassifier() Now, with our model instantiated and ready, we will navigate to the classifySketch method of the QueryFacade class; it is here that we will make use of our imported model to perform inference, but let's first review what already exists: func classifySketch(sketch:Sketch) -> [(key:String,value:Double)]?{ if let img = sketch.exportSketch(size: nil)? .resize(size: self.targetSize).rescalePixels(){ return self.classifySketch(image: img) } return nil } func classifySketch(image:CIImage) -> [(key:String,value:Double)]?{ return nil } Here, we see that the classifySketch is overloaded, with one method accepting a Sketch and the other a CIImage. The former, when called, will obtain the rasterize version of the sketch using the exportSketch method. If successful, it will resize the rasterized image using the targetSize property. Then, it will rescale the pixels before passing the prepared CIImage along to the alternative classifySketch method. Pixel values are in the range of 0-255 (per channel; in this case, it's just a single channel). Typically, you try to avoid having large numbers in your network. The reason is that they make it more difficult for your model to learn (converge)—somewhat analogous to trying to drive a car whose steering wheel can only be turned hard left or hard right. These extremes would cause a lot of over-steering and make navigating anywhere extremely difficult. The second classifySketch method will be responsible for performing the actual inference. Add the following code within the classifySketch(image:CIImage) method: if let pixelBuffer = image.toPixelBuffer(context: self.context, gray: true){ let prediction = try? self.sketchClassifier.prediction(image: pixelBuffer) if let classPredictions = prediction?.classLabelProbs{ let sortedClassPredictions = classPredictions.sorted(by: { (kvp1, kvp2) -> Bool in kvp1.value > kvp2.value }) return sortedClassPredictions } } return nil Here, we use the images, toPixelBuffer method, an extension we added to the CIImage class, to obtain a grayscale CVPixelBuffer representation of itself. Now, with reference to its buffer, we pass it onto the prediction method of our model instance, sketchClassifier, to obtain the probabilities for each label. We finally sort these probabilities from the most likely to the least likely before returning the sorted results to the caller. Now, with some inkling as to what the user is trying to sketch, we will proceed to search and download the ones we are most confident about. The task of searching and downloading will be the responsibility of the downloadImages method within the QueryFacade class. This method will make use of an existing BingService that exposes methods for searching and downloading images. Let's hook this up now; jump into the downloadImages method and append the following highlighted code to its body: func downloadImages(searchTerms:[String], searchTermsCount:Int=4, searchResultsCount:Int=2) -> [CIImage]?{ var bingResults = [BingServiceResult]() for i in 0..<min(searchTermsCount, searchTerms.count){ let results = BingService.sharedInstance.syncSearch( searchTerm: searchTerms[i], count:searchResultsCount) for bingResult in results{ bingResults.append(bingResult) } if self.isInterrupted{ return nil } } } The downloadImages method takes the arguments searchTerms, searchTermsCount, and searchResultsCount. The searchTerms is a sorted list of labels returned by our classifySketch method, from which the searchTermsCount determines how many of these search terms we use (defaulting to 4). Finally, searchResultsCount limits the results returned for each search term. The preceding code performs a sequential search using the search terms passed into the method. And as mentioned previously, here we are using Microsoft's Bing Image Search API, which requires registration, something we will return to shortly. After each search, we check the property isInterrupted to see whether we need to exit early; otherwise, we continue on to the next search. The result returned by the search includes a URL referencing an image; we will use this next to download the image with each of the results, before returning an array of CIImage to the caller. Let's add this now. Append the following code to the downloadImages method: var images = [CIImage]() for bingResult in bingResults{ if let image = BingService.sharedInstance.syncDownloadImage( bingResult: bingResult){ images.append(image) } if self.isInterrupted{ return nil } } return images As before, the process is synchronous and after each download, we check the isInterrupted property to see if we need to exit early, otherwise returning the list of downloaded images to the caller. So far, we have implemented the functionality to support prediction, searching, and downloading; our next task is to hook all of this up. Head back to the queryCurrentSketch method and add the following code within the queryQueue.async block. Ensure that you replace the DispatchQueue.main.async block: queryQueue.async { guard let predictions = self.classifySketch( sketch: sketch) else{ DispatchQueue.main.async{ self.processingQuery = false self.delegate?.onQueryCompleted( status:-1, result:nil) self.processNextQuery() } return } let searchTerms = predictions.map({ (key, value) -> String in return key }) guard let images = self.downloadImages( searchTerms: searchTerms, searchTermsCount: 4) else{ DispatchQueue.main.async{ self.processingQuery = false self.delegate?.onQueryCompleted( status:-1, result:nil) self.processNextQuery() } return } guard let sortedImage = self.sortByVisualSimilarity( images: images, sketch: sketch) else{ DispatchQueue.main.async{ self.processingQuery = false self.delegate?.onQueryCompleted( status:-1, result:nil) self.processNextQuery() } return } DispatchQueue.main.async{ self.processingQuery = false self.delegate?.onQueryCompleted( status:self.isInterrupted ? -1 : 1, result:QueryResult( predictions: predictions, images: sortedImage)) self.processNextQuery() } } It's a large block of code but nothing complicated; let's quickly walk our way through it. We start by calling the classifySketch method we just implemented. As you may recall, this method returns a sorted list of label and probability peers unless interrupted, in which case nil will be returned. We should handle this by notifying the delegate before exiting the method early (a check we apply to all of our tasks). Once we've obtained the list of sorted labels, we pass them to the downloadImages method to receive the associated images, which we then pass to the sortByVisualSimilarity method. This method currently returns just the list of images, but it's something we will get back to in the next section. Finally, the method passes the status and sorted images wrapped in a QueryResult instance to the delegate via the main thread, before checking whether it needs to process a new sketch (by calling the processNextQuery method). At this stage, we have implemented all the functionality required to download our substitute images based on our guess as to what the user is currently sketching. Now, we just need to jump into the SketchViewController class to hook this up, but before doing so, we need to obtain a subscription key to use Bing's Image Search. Within your browser, head to https://azure.microsoft.com/en-gb/services/cognitive-services/bing-image-search-api/ and click on the Try Bing Image Search API, as shown in the following screenshot: After clicking on Try Bing Image Search API, you will be presented with a series of dialogs; read, and once (if) agreed, sign in or register. Continue following the screens until you reach a page informing you that the Bing Search API has been successfully added to your subscription, as shown in the following screenshot: On this page, scroll down until you come across the entry Bing Search APIs v7. If you inspect this block, you should see a list of Endpoints and Keys. Copy and paste one of these keys within the BingService.swift file, replacing the value of the constant subscriptionKey; the following screenshot shows the web page containing the service key: Return to the SketchViewController by selecting the SketchViewController.swift file from the left-hand panel, and locate the method onQueryCompleted: func onQueryCompleted(status: Int, result:QueryResult?){ } Recall that this is a method signature defined in the QueryDelegate protocol, which the QueryFacade uses to notify the delegate if the query fails or completes. It is here that we will present the matching images we have found through the process we just implemented. We do this by first checking the status. If deemed successful (greater than zero), we remove every item that is referenced in the queryImages array, which is the data source for our UICollectionView used to present the suggested images to the user. Once emptied, we iterate through all the images referenced within the QueryResult instance, adding them to the queryImages array before requesting the UICollectionView to reload the data. Add the following code to the body of the onQueryCompleted method: guard status > 0 else{ return } queryImages.removeAll() if let result = result{ for cimage in result.images{ if let cgImage = self.ciContext.createCGImage(cimage, from:cimage.extent){ queryImages.append(UIImage(cgImage:cgImage)) } } } toolBarLabel.isHidden = queryImages.count == 0 collectionView.reloadData() There we have it; everything is in place to handle guessing of what the user draws and present possible suggestions. Now is a good time to build and run the application on either the simulator or the device to check whether everything is working correctly. If so, then you should see something similar to the following: There is one more thing left to do before finishing off this section. Remembering that our goal is to assist the user to quickly sketch out a scene or something similar, our hypothesis is that guessing what the user is drawing and suggesting ready-drawn images will help them achieve their task. So far, we have performed prediction and provided suggestions to the user, but currently the user is unable to replace their sketch with any of the presented suggestions. Let's address this now. Our SketchView currently only renders StrokeSketch (which encapsulates the metadata of the user's drawing). Because our suggestions are rasterized images, our choice is to either extend this class (to render strokes and rasterized images) or create a new concrete implementation of the Sketch protocol. In this example, we will opt for the latter and implement a new type of Sketch capable of rendering a rasterized image. Select the Sketch.swift file to bring it to focus in the editor area of Xcode, scroll to the bottom, and add the following code: class ImageSketch : Sketch{ var image : UIImage! var size : CGSize! var origin : CGPoint! var label : String! init(image:UIImage, origin:CGPoint, size:CGSize, label: String) { self.image = image self.size = size self.label = label self.origin = origin } } We have defined a simple class that is referencing an image, origin, size, and label. The origin determines the top-left position where the image should be rendered, while the size determines its, well, size! To satisfy the Sketch protocol, we must implement the properties center and boundingBox along with the methods draw and exportSketch. Let's implement each of these in turn, starting with boundingBox. The boundingBox property is a computed property derived from the properties origin and size. Add the following code to your ImageSketch class: var boundingBox : CGRect{ get{ return CGRect(origin: self.origin, size: self.size) } } Similarly, center will be another computed property derived from the origin and size properties, simply translating the origin with respect to the size. Add the following code to your ImageSketch class: var center : CGPoint{ get{ let bbox = self.boundingBox return CGPoint(x:bbox.origin.x + bbox.size.width/2, y:bbox.origin.y + bbox.size.height/2) } set{ self.origin = CGPoint(x:newValue.x - self.size.width/2, y:newValue.y - self.size.height/2) } } The draw method will simply use the passed-in context to render the assigned image within the boundingBox; append the following code to your ImageSketch class: func draw(context:CGContext){ self.image.draw(in: self.boundingBox) }   Our last method, exportSketch, is also fairly straightforward. Here, we create an instance of CIImage, passing in the image (of type UIImage). Then, we resize it using the extension method we implemented back in Chapter 3, Recognizing Objects in the World. Add the following code to finish off the ImageSketch class: func exportSketch(size:CGSize?) -> CIImage?{ guard let ciImage = CIImage(image: self.image) else{ return nil } if self.image.size.width == self.size.width && self.image.size.height == self.size.height{ return ciImage } else{ return ciImage.resize(size: self.size) } } We now have an implementation of Sketch that can handle rendering of rasterized images (like those returned from our search). Our final task is to swap the user's sketch with an item the user selects from the UICollectionView. Return to SketchViewController class by selecting the SketchViewController.swift from the left-hand-side panel in Xcode to bring it up in the editor area. Once loaded, navigate to the method collectionView(_ collectionView:, didSelectItemAt:); this should look familiar to most of you. It is the delegate method for handling cells selected from a UICollectionView and it's where we will handle swapping of the user's current sketch with the selected item. Let's start by obtaining the current sketch and associated image that was selected. Add the following code to the body of the collectionView(_collectionView:,didSelectItemAt:) method: guard let sketch = self.sketchView.currentSketch else{ return } self.queryFacade.cancel() let image = self.queryImages[indexPath.row]   Now, with reference to the current sketch and image, we want to try and keep the size relatively the same as the user's sketch. We will do this by simply obtaining the sketch's bounding box and scaling the dimensions to respect the aspect ratio of the selected image. Add the following code, which handles this: var origin = CGPoint(x:0, y:0) var size = CGSize(width:0, height:0) if bbox.size.width > bbox.size.height{ let ratio = image.size.height / image.size.width size.width = bbox.size.width size.height = bbox.size.width * ratio } else{ let ratio = image.size.width / image.size.height size.width = bbox.size.height * ratio size.height = bbox.size.height } Next, we obtain the origin (top left of the image) by obtaining the center of the sketch and offsetting it relative to its width and height. Do this by appending the following code: origin.x = sketch.center.x - size.width / 2 origin.y = sketch.center.y - size.height / 2 We can now use the image, size, and origin to create an ImageSketch, and replace it with the current sketch simply by assigning it to the currentSketch property of the SketchView instance. Add the following code to do just that: self.sketchView.currentSketch = ImageSketch(image:image, origin:origin, size:size, label:"") Finally, some housekeeping; we'll clear the UICollectionView by removing all images from the queryImages array (its data source) and request it to reload itself. Add the following block to complete the collectionView(_ collectionView:,didSelectItemAt:) method: self.queryImages.removeAll() self.toolBarLabel.isHidden = queryImages.count == 0 self.collectionView.reloadData() Now is a good time to build and run to ensure that everything is working as planned. If so then, you should be able to swap out your sketch with one of the suggestions presented at the top, as shown in the following screenshot: We learned how to build Intelligent interfaces using Core ML. If you've enjoyed reading this post, do check out Machine Learning with Core ML to further implement Core ML for visual-based applications using the principles of transfer learning and neural networks. Introducing Intelligent Apps 5 examples of Artificial Intelligence in Web apps Voice, natural language, and conversations: Are they the next web UI?
Read more
  • 0
  • 0
  • 7937
article-image-plotting-haskell
Packt
04 Jun 2015
10 min read
Save for later

Plotting in Haskell

Packt
04 Jun 2015
10 min read
In this article by James Church, author of the book Learning Haskell Data Analysis, we will see the different methods of data analysis by plotting data using Haskell. The other topics that this article covers is using GHCi, scaling data, and comparing stock prices. (For more resources related to this topic, see here.) Can you perform data analysis in Haskell? Yes, and you might even find that you enjoy it. We are going to take a few snippets of Haskell and put some plots of the stock market data together. To get started with, the following software needs to be installed: The Haskell platform (http://www.haskell.org/platform) Gnuplot (http://www.gnuplot.info/) The cabal command-line tool is the tool used to install packages in Haskell. There are three packages that we may need in order to analyze the stock market data. To use cabal, you will use the cabal install [package names] command. Run the following command to install the CSV parsing package, the EasyPlot package, and the Either package: $ cabal install csv easyplot either Once you have the necessary software and packages installed, we are all set for some introductory analysis in Haskell. We need data It is difficult to perform an analysis of data without data. The Internet is rich with sources of data. Since this tutorial looks at the stock market data, we need a source. Visit the Yahoo! Finance website to find the history of every publicly traded stock on the New York Stock Exchange that has been adjusted to reflect splits over time. The good folks at Yahoo! provide this resource in the csv file format. We begin with downloading the entire history of the Apple company from Yahoo! Finance (http://finance.yahoo.com). You can find the content for Apple by performing a quote look up from the Yahoo! Finance home page for the AAPL symbol (that is, 2 As, not 2 Ps). On this page, you can find the link for Historical Prices. On the Historical Prices page, identify the link that says Download to Spreadsheet. The complete link to Apple's historical prices can be found at the following link: http://real-chart.finance.yahoo.com/table.csv?s=AAPL. We should take a moment to explore our dataset. Here are the column headers in the csv file: Date: This is a string that represents the date of a particular date in Apple's history Open: This is the opening value of one share High: This is the high trade value over the course of this day Low: This is the low trade value of the course of this day Close: This is the final price of the share at the end of this trading day Volume: This is the total number of shares traded on this day Adj Close: This is a variation on the closing price that adjusts the dividend payouts and company splits Another feature of this dataset is that each of the rows are written in a table in a chronological reverse order. The most recent date in the table is the first. The oldest is the last. Yahoo! Finance provides this table (Apple's historical prices) under the unhelpful name table.csv. I renamed my csv file aapl.csv, which is provided by Yahoo! Finance. Start GHCi The interactive prompt for Haskell is GHCi. On the command line, type GHCi. We begin with importing our newly installed libraries from the prompt: > import Data.List< > import Text.CSV< > import Data.Either.Combinators< > import Graphics.EasyPlot Parse the csv file that you just downloaded using the parseCSVFromFile command. This command will return an Either type, which represents one of the two things that happened: your file was parsed (Right) or something went wrong (Left). We can inspect the type of our result with the :t command: > eitherErrorOrCells <- parseCSVFromFile "aapl.csv"< > :t eitherErrorOrCells < eitherErrorOrCells :: Either Text.Parsec.Error.ParseError CSV Did we get an error for our result? For this, we are going to use the fromRight and fromLeft commands. Remember, Right is right and Left is wrong. When we run the fromLeft command, we should see this message saying that our content is in the Right: > fromLeft' eitherErrorOrCells < *** Exception: Data.Either.Combinators.fromLeft: Argument takes from 'Right _' Pull the cells of our csv file into cells. We can see the first four rows of our content using take 5 (which will pull our header line and the first four cells): > let cells = fromRight' eitherErrorOrCells< > take 5 cells< [["Date","Open","High","Low","Close","Volume","Adj Close"],["2014-11-10","552.40","560.63","551.62","558.23","1298900","558.23"],["2014-11-07","555.60","555.60","549.35","551.82","1589100","551.82"],["2014-11-06","555.50","556.80","550.58","551.69","1649900","551.69"],["2014-11-05","566.79","566.90","554.15","555.95","1645200","555.95"]] The last column in our csv file is the Adj Close, which is the column we would like to plot. Count the columns (starting with 0), and you will find that Adj Close is number 6. Everything else can be dropped. (Here, we are also using the init function to drop the last row of the data, which is an empty list. Grabbing the 6th element of an empty list will not work in Haskell.): > map (x -> x !! 6) (take 5 (init cells))< ["Adj Close","558.23","551.82","551.69","555.95"] We know that this column represents the adjusted close prices. We should drop our header row. Since we use tail to drop the header row, take 5 returns the first five adjusted close prices: > map (x -> x !! 6) (take 5 (tail (init cells)))< ["558.23","551.82","551.69","555.95","564.19"] We should store all of our adjusted close prices in a value called adjCloseOriginal: > let adjCloseAAPLOriginal = map (x -> x !! 6) (tail (init cells)) These are still raw strings. We need to convert these to a Double type with the read function: > let adjCloseAAPL = map read adjCloseAaplOriginal :: [Double] We are almost done messaging our data. We need to make sure that every value in adjClose is paired with an index position for the purpose of plotting. Remember that our adjusted closes are in a chronological reverse order. This will create a tuple, which can be passed to the plot function: > let aapl = zip (reverse [1.0..length adjCloseAAPL]) adjCloseAAPL< > take 5 aapl < [(2577,558.23),(2576,551.82),(2575,551.69),(2574,555.95),(2573,564.19)] Plotting > plot (PNG "aapl.png") $ Data3D [Title "AAPL"] [] aapl< True The following chart is the result of the preceding command: Open aapl.png, which should be newly created in your current working directory. This is a typical default chart created by EasyPlot. We can see the entire history of the Apple stock price. For most of this history, the adjusted share price was less than $10 per share. At about the 6,000 trading day, we see the quick ascension of the share price to over $100 per share. Most of the time, when we take a look at a share price, we are only interested in the tail portion (say, the last year of changes). Our data is already reversed, so the newest close prices are at the front. There are 252 trading days in a year, so we can take the first 252 elements in our value and plot them. While we are at it, we are going to change the style of the plot to a line plot: > let aapl252 = take 252 aapl< > plot (PNG "aapl_oneyear.png") $ Data2D [Title "AAPL", Style Lines] [] aapl252< True The following chart is the result of the preceding command: Scaling data Looking at the share price of a single company over the course of a year will tell you whether the price is trending upward or downward. While this is good, we can get better information about the growth by scaling the data. To scale a dataset to reflect the percent change, we subtract each value by the first element in the list, divide that by the first element, and then multiply by 100. Here, we create a simple function called percentChange. We then scale the values 100 to 105, using this new function. (Using the :t command is not necessary, but I like to use it to make sure that I have at least the desired type signature correct.): > let percentChange first value = 100.0 * (value - first) / first< > :t percentChange< percentChange :: Fractional a => a -> a -> a< > map (percentChange 100) [100..105]< [0.0,1.0,2.0,3.0,4.0,5.0] We will use this new function to scale our Apple dataset. Our tuple of values can be split using the fst (for the first value containing the index) and snd (for the second value containing the adjusted close) functions: > let firstValue = snd (last aapl252)< > let aapl252scaled = map (pair -> (fst pair, percentChange firstValue (snd pair))) aapl252< > plot (PNG "aapl_oneyear_pc.png") $ Data2D [Title "AAPL PC", Style Lines] [] aapl252scaled< True The following chart is the result of the preceding command: Let's take a look at the preceding chart. Notice that it looks identical to the one we just made, except that the y axis is now changed. The values on the left-hand side of the chart are now the fluctuating percent changes of the stock from a year ago. To the investor, this information is more meaningful. Comparing stock prices Every publicly traded company has a different stock price. When you hear that Company A has a share price of $10 and Company B has a price of $100, there is almost no meaningful content to this statement. We can arrive at a meaningful analysis by plotting the scaled history of the two companies on the same plot. Our Apple dataset uses an index position of the trading day for the x axis. This is fine for a single plot, but in order to combine plots, we need to make sure that all plots start at the same index. In order to prepare our existing data of Apple stock prices, we will adjust our index variable to begin at 0: > let firstIndex = fst (last aapl252scaled)< > let aapl252scaled = map (pair -> (fst pair - firstIndex, percentChange firstValue (snd pair))) aapl252 We will compare Apple to Google. Google uses the symbol GOOGL (spelled Google without the e). I downloaded the history of Google from Yahoo! Finance and performed the same steps that I previously wrote with our Apple dataset: > -- Prep Google for analysis< > eitherErrorOrCells <- parseCSVFromFile "googl.csv"< > let cells = fromRight' eitherErrorOrCells< > let adjCloseGOOGLOriginal = map (x -> x !! 6) (tail (init cells))< > let adjCloseGOOGL = map read adjCloseGOOGLOriginal :: [Double]< > let googl = zip (reverse [1.0..genericLength adjCloseGOOGL]) adjCloseGOOGL< > let googl252 = take 252 googl< > let firstValue = snd (last googl252)< > let firstIndex = fst (last googl252)< > let googl252scaled = map (pair -> (fst pair - firstIndex, percentChange firstValue (snd pair))) googl252 Now, we can plot the share prices of Apple and Google on the same chart, Apple plotted in red and Google plotted in blue: > plot (PNG "aapl_googl.png") [Data2D [Title "AAPL PC", Style Lines, Color Red] [] aapl252scaled, Data2D [Title "GOOGL PC", Style Lines, Color Blue] [] googl252scaled]< True The following chart is the result of the preceding command: You can compare for yourself the growth rate of the stock price for these two competing companies because I believe that the contrast is enough to let the image speak for itself. This type of analysis is useful in the investment strategy known as growth investing. I am not recommending this as a strategy, nor am I recommending either of these two companies for the purpose of an investment. I am recommending Haskell as your language of choice for performing data analysis. Summary In this article, we used data from a csv file and plotted data. The other topics covered in this article were using GHCi and EasyPlot for plotting, scaling data, and comparing stock prices. Resources for Article: Further resources on this subject: The Hunt for Data [article] Getting started with Haskell [article] Driving Visual Analyses with Automobile Data (Python) [article]
Read more
  • 0
  • 0
  • 7936

article-image-detecting-touchscreen-gestures
Packt
06 Aug 2015
18 min read
Save for later

Detecting Touchscreen Gestures

Packt
06 Aug 2015
18 min read
In this article by Kyle Mew author of the book, Android 5 Programming by Example, we will learn how to: Add a GestureDetector to a view Add an OnTouchListener and an OnGestureListener Detect and refine fling gestures Use the DDMS Logcat to observe the MotionEvent class Edit the Logcat filter configuration Simplify code with a SimpleOnGestureListener Add a GestureDetector to an Activity Edit the Manifest to control launch behavior Hide UI elements Create a splash screen Lock screen orientation (For more resources related to this topic, see here.) Adding a GestureDetector to a view Together, view.GestureDetector and view.View.OnTouchListener are all that are required to provide our ImageView with gesture functionality. The listener contains an onTouch() callback that relays each MotionEvent to the detector. We are going to program the large ImageView so that it can display a small gallery of related pictures that can be accessed by swiping left or right on the image. There are two steps to this task as, before we implement our gesture detector, we need to provide the data for it to work on. Adding the gallery data As this app is for demonstration and learning purposes, and so we can progress as quickly as possible, we will only provide extra images for one or two of the ancient sites in the project. Here is how it's done: Open the Ancient Britain project. Open the MainData.java file. Add the following arrays: static Integer[] hengeArray = {R.drawable.henge_large, R.drawable.henge_2, R.drawable.henge_3, R.drawable.henge_4}; static Integer[] horseArray = {}; static Integer[] wallArray = {R.drawable.wall_large, R.drawable.wall_2}; static Integer[] skaraArray = {}; static Integer[] towerArray = {}; static Integer[][] galleryArray = {hengeArray, horseArray, wallArray, skaraArray, towerArray}; Either download the project files from the Packt website or find four of your own images (around 640 x 480 px). Name them henge_2, henge_3, henge_4, and wall_2 and place them in your res/drawable directory. This is all very straightforward, and the code that will accompany it allows you to have individual arrays of any length. This is all we need to add to our gallery data. Now, we need to code our GestureDetector and OnTouchListener. Adding the GestureDetector Along with the OnTouchListener that we will define for our ImageView, the GestureDetector has its own listeners. Here we will use GestureDetector.OnGestureListener to detect a fling gesture and collect the MotionEvent that describe it. Follow these steps to program your ImageView to respond to fling gestures: Open the DetailActivity.java file. Declare the following class fields: private static final int MIN_DISTANCE = 150; private static final int OFF_PATH = 100; private static final int VELOCITY_THRESHOLD = 75; private GestureDetector detector; View.OnTouchListener listener; private int ImageIndex; In the onCreate() method assigns both the detector and listener like this: detector = new GestureDetector(this, new GalleryGestureDetector()); listener = new View.OnTouchListener() { @Override public boolean onTouch(View v, MotionEvent event) { return detector.onTouchEvent(event); } }; Beneath this, add the following line: ImageIndex = 0; Beneath the line detailImage = (ImageView) findViewById(R.id.detail_image);, add the following line: detailImage.setOnTouchListener(listener); Create the following inner class: class GalleryGestureDetector implements GestureDetector.OnGestureListener { } Before dealing with the errors this generates, add the following field to the class: private int item; { item = MainActivity.currentItem; } Click anywhere on the line registering the error and press Alt + Enter. Then select Implement Methods, making sure that you have the Copy JavaDoc and Insert @Override boxes checked. Complete the onDown() method like this: @Override public boolean onDown(MotionEvent e) { return true; } Fill in the onShowPress() method: @Override public void onShowPress(MotionEvent e) { detailImage.setElevation(4); } Then fill in the onFling() method: @Override public boolean onFling(MotionEvent event1, MotionEvent event2, float velocityX, float velocityY) { if (Math.abs(event1.getY() - event2.getY()) > OFF_PATH) return false; if (MainData.galleryArray[item].length != 0) { // Swipe left if (event1.getX() - event2.getX() > MIN_DISTANCE && Math.abs(velocityX) > VELOCITY_THRESHOLD) { ImageIndex++; if (ImageIndex == MainData.galleryArray[item].length) ImageIndex = 0; detailImage.setImageResource(MainData .galleryArray[item][ImageIndex]); } else { // Swipe right if (event2.getX() - event1.getX() > MIN_DISTANCE && Math.abs(velocityX) > VELOCITY_THRESHOLD) { ImageIndex--; if (ImageIndex < 0) ImageIndex = MainData.galleryArray[item].length - 1; detailImage.setImageResource(MainData .galleryArray[item][ImageIndex]); } } } detailImage.setElevation(0); return true; } Test the project on an emulator or handset. The process of gesture detection in the preceding code begins when the OnTouchListener listener's onTouch() method is called. It then passes that MotionEvent to our gesture detector class, GalleryGestureDetector, which monitors motion events, sometimes stringing them together and timing them until one of the recognized gestures is detected. At this point, we can enter our own code to control how our app responds as we did here with the onDown(), onShowPress(), and onFling() callbacks. It is worth taking a quick look at these methods in turn. It may seem, at the first glance, that the onDown() method is redundant; after all, it's the fling gesture that we are trying to catch. In fact, overriding the onDown() method and returning true from it is essential in all gesture detections as all the gestures begin with an onDown() event. The purpose of the onShowPress() method may also appear unclear as it seems to do a little more than onDown(). As the JavaDoc states, this method is handy for adding some form of feedback to the user, acknowledging that their touch has been received. The Material Design guidelines strongly recommend such feedback and here we have raised the view's elevation slightly. Without including our own code, the onFling() method will recognize almost any movement across the bounding view that ends in the user's finger being raised, regardless of direction or speed. We do not want very small or very slow motions to result in action; furthermore, we want to be able to differentiate between vertical and horizontal movement as well as left and right swipes. The MIN_DISTANCE and OFF_PATH constants are in pixels and VELOCITY_THRESHOLD is in pixels per second. These values will need tweaking according to the target device and personal preference. The first MotionEvent argument in onFling() refers to the preceding onDown() event and, like any MotionEvent, its coordinates are available through its getX() and getY() methods. The MotionEvent class contains dozens of useful classes for querying various event properties—for example, getDownTime(), which returns the time in milliseconds since the current onDown() event. In this example, we used GestureDetector.OnGestureListener to capture our gesture. However, the GestureDetector has three such nested classes, the other two being SimpleOnGestureListener and OnDoubleTapListener. SimpleOnGestureListener provides a more convenient way to detect gestures as we only need to implement those methods that relate to the gestures we are interested in capturing. We will shortly edit our Activity so that it implements the SimpleOnGestureListener instead, allowing us to tidy our code and remove the four callbacks that we do not need. The reason for taking this detour, rather than applying the simple listener to begin with, was to get to see all of the gestures available to us through a gesture listener and demonstrate how useful JavaDoc comments can be, particularly if we are new to the framework. For example, take a look at the following screenshot: Another very handy tool is the Dalvik Debug Monitor Server (DDMS), which allows us to see what is going on inside our apps while they are running. The workings of our gesture listener are a good place to do this as most of its methods operate invisibly. Viewing gesture activity with DDMS To view the workings of our OnGestureListener with DDMS, we need to first create a tag to identify our messages and then a filter to view them. The following steps demonstrate how to do this: Open the DetailActivity.java file. Declare the following constant: private static final String DEBUG_TAG = "tag"; Add the following line inside the onDown() method: Log.d(DEBUG_TAG, "onDown"); Add the line Log.d(DEBUG_TAG, "onShowPress"); to the onShowPress() method and do the same for each of our OnGestureDetector methods. Add the following lines to the appropriate clauses in onFling(): Log.d(DEBUG_TAG, "left"); Log.d(DEBUG_TAG, "right"); Open the Android DDMS pane from the Android tab at the bottom of the window or by pressing Alt + 6. If logcat is not visible, it can be opened with the icon to the right of the top-right drop-down menu. Click on this drop-down menu and select Edit Filter Configuration. Complete the dialog as shown in the following screenshot: You can now run the project on a handset or emulator and view, in the Logcat, which gestures are being triggered and how. Your output should resemble the one here: 02-17 14:39:00.990 1430- 1430/com.example.kyle.ancientbritain D/tag﹕ onDown 02-17 14:39:01.039 1430- 1430/com.example.kyle.ancientbritain D/tag﹕ onSingleTapUp 02-17 14:39:03.503 1430- 1430/com.example.kyle.ancientbritain D/tag﹕ onDown 02-17 14:39:03.601 1430- 1430/com.example.kyle.ancientbritain D/tag﹕ onShowPress 02-17 14:39:04.101 1430- 1430/com.example.kyle.ancientbritain D/tag﹕ onLongPress 02-17 14:39:10.484 1430- 1430/com.example.kyle.ancientbritain D/tag﹕ onDown 02-17 14:39:10.541 1430- 1430/com.example.kyle.ancientbritain D/tag﹕ onScroll 02-17 14:39:11.091 1430- 1430/com.example.kyle.ancientbritain D/tag﹕ onScroll 02-17 14:39:11.232 1430- 1430/com.example.kyle.ancientbritain D/tag﹕ onFling 02-17 14:39:11.680 1430- 1430/com.example.kyle.ancientbritain D/tag﹕ right 02-17 14:39:01.039   1430- 1430/com.example.kyle.ancientbritain D/tag﹕ onSingleTapUp DDMS is an invaluable tool when it comes to debugging our apps and seeing what is going on beneath the hood. Once a Log Tag has been defined in the code, we can then create a filter for it so that we see only the messages we are interested in. The Log class contains several methods to report information based on its level of importance. We used Log.d, which stands for debug. All these methods work with the same two parameters: Log.[method](String tag, String message). The full list of these methods is as follows: Log.v: Verbose Log.d: Debug Log.i: Information Log.w: Warning Log.e: Error Log.wtf: Unexpected error It is worth noting that most debug messages will be ignored during the packaging for distribution except for the verbose messages; thus, it is essential to remove them before your final build. Having seen a little more of the inner workings of our gesture detector and listener, we can now strip our code of unused methods by implementing GestureDetector.SimpleOnGestureListener. Implementing a SimpleOnGestureListener It is very simple to convert our gesture detector from one class of listener to another. All we need to do is change the class declaration and delete the unwanted methods. To do this, perform the following steps: Open the DetailActivity file. Change the class declaration for our gesture detector class to the following: class GalleryGestureDetector extends GestureDetector.SimpleOnGestureListener { Delete the onShowPress(), onSingleTapUp(), onScroll(), and onLongPress() methods. This is all you need to do to switch to the SimpleOnGestureListener. We have now successfully constructed and edited a gesture detector to allow the user to browse a series of images. You will have noticed that there is no onDoubleTap() method in the gesture listener. Double-taps can, in fact, be handled with the third GestureDetector listener, OnDoubleTapListener, which operates in a very similar way to the other two. However, Google, in its UI guidelines, recommends that a long press should be used instead, whenever possible. Before moving on to multitouch events, we will take a look at how to attach a GestureDetector listener to an entire Activity by adding a splash screen to our project. In the process, we will also see how to create a Full-Screen Activity and how to edit the Maniftest file so that our app launches with the splash screen. Adding a GestureDetector to an Activity The method we have employed so far allows us to attach a GestureDetector listener to any view or views and this, of course, applies to ViewGroups such as Layouts. There are times when we may want to detect gestures to be applied to the whole screen. For this purpose, we will create a splash screen that can be dismissed with a long press. There are two things we need to do before implementing the gesture detector: creating a layout and editing the Manifest file so that the app launches with our splash screen. Designing the splash screen layout The main difference between processing gestures for a whole Activity and an individual widget, is that we do not need an OnTouchListener as we can override the Activity's own onTouchEvent(). Here is how it is done: Create a new Blank Activity from the Project Explorer context menu called SplashActivity.java. The Activity wizard should have created an associated XML layout called activity_splash.xml. Open this and view it using the Text tab. Remove all the padding properties from the root layout so that it looks similar to this: <RelativeLayout android:layout_width="match_parent" android:layout_height="match_parent" tools:context="com.example.kyle.ancientbritain .SplashActivity"> Here we will need an image to act as the background for our splash screen. If you have not downloaded the project files from the Packt website, find an image, roughly of the size and aspect of your target device's screen, upload it to the project drawable folder, and call it splash. The file I used is 480 x 800 px. Remove the TextView that the wizard placed inside the layout and replace it with this ImageView: <ImageView android:id="@+id/splash_image" android:layout_width="wrap_content" android:layout_height="wrap_content" android:src="@drawable/splash"/> Create a TextView beneath this, such as the following: <TextView android:layout_width="match_parent" android:layout_height="wrap_content" android:layout_alignParentBottom="true" android:layout_centerHorizontal="true" android:layout_marginBottom="40dp" android:gravity="center_horizontal" android:textAppearance="?android:attr/ textAppearanceLarge" android:textColor="#fffcfcbd"/> Add the following text property: android:text="Welcome to <b>Ancient Britain</b>npress and hold nanywhere on the screennto start" To save time adding string resources to the strings.xml file, enter a hardcoded string such as the preceding one and heed the warning from the editor to have the string extracted for you like this: There is nothing in this layout that we have not encountered before. We removed all the padding so that our splash image will fill the layout; however, you will see from the preview that this does not appear to be the case. We will deal with this next in our Java code, but we need to edit our Manifest first so that the app gets launched with our SplashActivity. Editing the Manifest It is very simple to configure the AndroidManifest file so that an app will get launched with whichever Activity we choose; the way it does so is with an intent. While we are editing the Manifest, we will also configure the display to fill the screen. Simply follow these steps: Open the res/values-v21/styles.xml file and add the following style: <style name="SplashTheme" parent="android:Theme.Material. NoActionBar.Fullscreen"> </style> Open the AndroidManifest.xml file. Cut-and-paste the <intent-filter> element from MainActivity to SplashActivity. Include the following properties so that the entire <activity> node looks similar to this: <activity android:name=".SplashActivity" android:theme="@style/SplashTheme" android:screenOrientation="portrait" android:configChanges="orientation|screenSize" android:label="Old UK" > <intent-filter> <action android_name="android.intent.action.MAIN" /> <category android_name="android.intent.category.LAUNCHER" /> </intent-filter> </activity> We have encountered themes and styles before and, here, we took advantage of a built-in theme designed for full screen activities. In many cases, we might have designed a landscape layout here but, as is often the case with splash screens, we locked the orientation with the android:screenOrientation property. The android:configChanges line is not actually needed here, but is included as it is useful to know about it. Configuring any attribute such as this prevents the system from automatically reloading the Activity whenever the device is rotated or the screen size changed. Instead of the Activity restarting, the onConfigurationChanged() method is called. This was not needed here as the screen size and orientation were taken care of in the previous lines of code and this line was only included as a point of interest. Finally, we changed the value of android:label. You may have noticed that, depending on the screen size of the device you are using, the name of our app is not displayed in full on the home screen or apps drawer. In such cases, when you want to use a shortened name for your app, it can be inserted here. With everything else in place, we can get on with adding our gesture detector. This is not dissimilar to the way we did this before but, this time, we will apply the detector to the whole screen and will be listening for a long press, rather than a fling. Adding the GestureDetector Along with implementing a gesture detector for the entire Activity here, we will also take the final step in configuring our splash screen so that the image fills the screen, but maintains its aspect ratio. Follow these steps to complete the app splash screen. Open the SplashActivity file. Declare a GestureDetector as we did in the earlier exercise: private GestureDetector detector; In the onCreate() method, assign and configure our splash image and gesture detector like this: ImageView imageView = (ImageView) findViewById(R.id.splash_image); imageView.setScaleType(ImageView.ScaleType.CENTER_CROP); detector = new GestureDetector(this, new SplashListener()); Now, override the Activity's onTouchEvent() like this: @Override public boolean onTouchEvent(MotionEvent event) { this.detector.onTouchEvent(event); return super.onTouchEvent(event); } Create the following SimpleOnGestureListener class: private class SplashListener extends GestureDetector. SimpleOnGestureListener { @Override public boolean onDown(MotionEvent e) { return true; } @Override public void onLongPress(MotionEvent e) { startActivity(new Intent(getApplicationContext(), MainActivity.class)); } } Build and run the app on your phone or an emulator. The way a gesture detector is implemented across an entire Activity should be familiar by this point, as should the capturing of the long press event. The ImageView.setScaleType(ImageView.ScaleType) method is essential here; it is a very useful method in general. The CENTER_CROP constant scales the image to fill the view while maintaining the aspect ratio, cropping the edges when necessary. There are several similar ScaleTypes, such as CENTER_INSIDE, which scales the image to the maximum size possible without cropping it, and CENTER, which does not scale the image at all. The beauty of CENTER_CROP is that it means that we don't have to design a separate image for every possible aspect ratio on the numerous devices our apps will end up running on. Provided that we make allowances for very wide or very narrow screens by not including essential information too close to the edges, we only need to provide a handful of images of varying pixel densities to maintain the image quality on large, high-resolution devices. The scale type of ImageView can be set from within XML with android:scaleType="centerCrop", for example. You may have wondered why we did not use the built-in Full-Screen Activity from the wizard; we could easily have done so. The template code the wizard creates for a Full-Screen Activity provides far more features than we needed for this exercise. Nevertheless, the template is worth taking a look at, especially if you want a fullscreen that brings the status bar and other components into view when the user interacts with the Activity. That brings us to the end of this article. Not only have we seen how to make our apps interact with touch events and gestures, but also how to send debug messages to the IDE and make a Full-Screen Activity. Summary We began this article by adding a GestureDetector to our project. We then edited it so that we could filter out meaningful touch events (swipe right and left, in this case). We went on to see how the SimpleOnGestureListener can save us a lot of time when we are only interested in catching a subset of the recognized gestures. We also saw how to use DDMS to pass debug messages during runtime and how, through a combination of XML and Java, the status and action bars can be hidden and the entire screen be filled with a single view or view group. Resources for Article: Further resources on this subject: Speeding up Gradle builds for Android [Article] Saying Hello to Unity and Android [Article] Testing with the Android SDK [Article]
Read more
  • 0
  • 0
  • 7934

article-image-creating-time-series-charts-r
Packt
01 Feb 2011
5 min read
Save for later

Creating Time Series Charts in R

Packt
01 Feb 2011
5 min read
Formatting time series data for plotting Time series or trend charts are the most common form of line graphs. There are a lot of ways in R to plot such data, however it is important to first format the data in a suitable format that R can understand. In this recipe, we will look at some ways of formatting time series data using the base and some additional packages. Getting ready In addition to the basic R functions, we will also be using the zoo package in this recipe. So first we need to install it: install.packages("zoo") How to do it... Let's use the dailysales.csv example dataset and format its date column: sales<-read.csv("dailysales.csv") d1<-as.Date(sales$date,"%d/%m/%y") d2<-strptime(sales$date,"%d/%m/%y") data.class(d1) [1] "Date" data.class(d2) [1] "POSIXt" How it works... We have seen two different functions to convert a character vector into dates. If we did not convert the date column, R would not automatically recognize the values in the column as dates. Instead, the column would be treated as a character vector or a factor. The as.Date() function takes at least two arguments: the character vector to be converted to dates and the format to which we want it converted. It returns an object of the Date class, represented as the number of days since 1970-01-01, with negative values for earlier dates. The values in the date column are in a DD/MM/YYYY format (you can verify this by typing sales$date at the R prompt). So, we specify the format argument as "%d/%m/%y". Please note that this order is important. If we instead use "%m/%d/%y", then our days will be read as months and vice-versa. The quotes around the value are also necessary. The strptime() function is another way to convert character vectors into dates. However, strptime() returns a different kind of object of class POSIXlt, which is a named list of vectors representing the different components of a date and time, such as year, month, day, hour, seconds, minutes, and a few more. POSIXlt is one of the two basic classes of date/times in R. The other class POSIXct represents the (signed) number of seconds since the beginning of 1970 (in the UTC time zone) as a numeric vector. POSIXct is more convenient for including in data frames, and POSIXlt is closer to human readable forms. A virtual class POSIXt inherits from both of the classes. That's why when we ran the data.class() function on d2 earlier, we get POSIXt as the result. strptime() also takes a character vector to be converted and the format as arguments. There's more... The zoo package is handy for dealing with time series data. The zoo() function takes an argument x, which can be a numeric vector, matrix, or factor. It also takes an order.by argument which has to be an index vector with unique entries by which the observations in x are ordered: library(zoo) d3<-zoo(sales$units,as.Date(sales$date,"%d/%m/%y")) data.class(d3) [1] "zoo" See the help on DateTimeClasses to find out more details about the ways dates can be represented in R. Plotting date and time on the X axis In this recipe, we will learn how to plot formatted date or time values on the X axis. Getting ready For the first example, we only need to use the base graphics function plot(). How to do it... We will use the dailysales.csv example dataset to plot the number of units of a product sold daily in a month: sales<-read.csv("dailysales.csv") plot(sales$units~as.Date(sales$date,"%d/%m/%y"),type="l", xlab="Date",ylab="Units Sold") How it works... Once we have formatted the series of dates using as.Date(), we can simply pass it to the plot() function as the x variable in either the plot(x,y) or plot(y~x) format. We can also use strptime() instead of using as.Date(). However, we cannot pass the object returned by strptime() to plot() in the plot(y~x) format. We must use the plot(x,y) format as follows: plot(strptime(sales$date,"%d/%m/%Y"),sales$units,type="l", xlab="Date",ylab="Units Sold") There's more... We can plot the example using the zoo() function as follows (assuming zoo is already installed): library(zoo) plot(zoo(sales$units,as.Date(sales$date,"%d/%m/%y"))) Note that we don't need to specify x and y separately when plotting using zoo; we can just pass the object returned by zoo() to plot(). We also need not specify the type as "l". Let's look at another example which has full date and time values on the X axis, instead of just dates. We will use the openair.csv example dataset for this example: air<-read.csv("openair.csv") plot(air$nox~as.Date(air$date,"%d/%m/%Y %H:%M"),type="l", xlab="Time", ylab="Concentration (ppb)", main="Time trend of Oxides of Nitrogen") (Move the mouse over the image to enlarge it.) The same graph can be made using zoo as follows: plot(zoo(air$nox,as.Date(air$date,"%d/%m/%Y %H:%M")), xlab="Time", ylab="Concentration (ppb)", main="Time trend of Oxides of Nitrogen")
Read more
  • 0
  • 0
  • 7915
article-image-data-modeling-naming-standards-ibm-infosphere-data-architect
Packt
24 Dec 2009
4 min read
Save for later

Data Modeling Naming Standards with IBM InfoSphere Data Architect

Packt
24 Dec 2009
4 min read
The Prime-Class-Modifier Words Pattern Prime words represent key business entities. In an insurance business, examples of prime word are policy and coverage. A class word is a category that qualifies a prime word; for example, in policy code name, code is a class word. policy code can further be qualified by a modifier word; for instance, previous policy code where previous is the modifier word. You can define your own naming pattern different from the above modifier prime class pattern for a specific modeling object, including the separator between words and if modifier word or class word in the pattern is optional. You can have, for instance, modifier?_prime_modifer?_class_modifier? pattern for attribute naming in a logical data model. The ? characters indicate the words are optional and the separators are _. An example name with that pattern is permanent employee last name, assuming we have defined in our standard that permanent as a modifier word, employee as a prime word, last a modifier word, and name as a class word. Note that we don’t have the last optional modifier word in this example. In a different business (not insurance), code might well be a prime word and policy might not be a prime word; hence the need to define your own specific list of prime, class and modifier words and naming patterns for their application, and that is what you build in glossary model. Building Glossary Model The InfoSphere Data Architect (IDA) allows you to build a glossary model from blank or from pre-defined enterprise model. Creating glossary model and selecting its template, blank or pre-built enterprise template The enterprise glossary model gives you a head start with its collection of words relevant across various business types, most of which would probably be applicable to your business too. You can customize the glossary: change or delete the existing words, or add new ones. Selecting an existing word or words in the list and then clicking the cross icon will delete the selected words   Clicking the plus icon allows you to add a new word into the glossary   When you add a new word, in addition to the name, you specify its Abbreviation, Alternate name, and most importantly its type (CLASS, PRIME) and if it is a Modifier word. When the glossary is applied for transforming a logical to physical model, the abbreviation is applied to the physical modeling object. Customizing a word being added   Selecting the type of a word   Before we can apply the words to naming our data model objects, we need to define the naming pattern. You can define the naming pattern for logical and physical modeling objects. The sequence of the word types in the pattern from top to bottom is left to right when you apply them in the names. You can also choose the separator for your naming pattern: space or title case for the logical model, and any character for the physical model (most preferred choice would be non alpha numeric character that is not used in any of the words in the glossary). Defining pattern for logical model objects (entity and attribute)   Defining pattern for physical model objects (table and column)   Specifying separator for logical model   Specifying separator for physical model You then choose the glossary model that you want to apply to your data models. Glossary Model.ndm in the packtpub directory is applied   When you have finished building your glossary model and defining naming pattern, you can then apply them for naming your modeling objects. (You can further adjust the words in the glossary them when such a need arises)
Read more
  • 0
  • 0
  • 7912

article-image-python-data-analysis-utilities
Packt
17 Feb 2016
13 min read
Save for later

Python Data Analysis Utilities

Packt
17 Feb 2016
13 min read
After the success of the book Python Data Analysis, Packt's acquisition editor Prachi Bisht gauged the interest of the author, Ivan Idris, in publishing Python Data Analysis Cookbook. According to Ivan, Python Data Analysis is one of his best books. Python Data Analysis Cookbook is meant for a bit more experienced Pythonistas and is written in the cookbook format. In the year after the release of Python Data Analysis, Ivan has received a lot of feedback—mostly positive, as far as he is concerned. Although Python Data Analysis covers a wide range of topics, Ivan still managed to leave out a lot of subjects. He realized that he needed a library as a toolbox. Named dautil for data analysis utilities, the API was distributed by him via PyPi so that it is installable via pip/easy_install. As you know, Python 2 will no longer be supported after 2020, so dautil is based on Python 3. For the sake of reproducibility, Ivan also published a Docker repository named pydacbk (for Python Data Analysis Cookbook). The repository represents a virtual image with preinstalled software. For practical reasons, the image doesn't contain all the software, but it still contains a fair percentage. This article has the following sections: Data analysis, data science, big data – what is the big deal? A brief history of data analysis with Python A high-level overview of dautil IPython notebook utilities Downloading data Plotting utilities Demystifying Docker Future directions (For more resources related to this topic, see here.) Data analysis, data science, big data – what is the big deal? You've probably seen Venn diagrams depicting data science as the intersection of mathematics/statistics, computer science, and domain expertise. Data analysis is timeless and was there before data science and computer science. You could perform data analysis with a pen and paper and, in more modern times, with a pocket calculator. Data analysis has many aspects with goals such as making decisions or coming up with new hypotheses and questions. The hype, status, and financial rewards surrounding data science and big data remind me of the time when data warehousing and business intelligence were the buzzwords. The ultimate goal of business intelligence and data warehousing was to build dashboards for management. This involved a lot of politics and organizational aspects, but on the technical side, it was mostly about databases. Data science, on the other hand, is not database-centric, and leans heavily on machine learning. Machine learning techniques have become necessary because of the bigger volumes of data. Data growth is caused by the growth of the world's population and the rise of new technologies such as social media and mobile devices. Data growth is in fact probably the only trend that we can be sure will continue. The difference between constructing dashboards and applying machine learning is analogous to the way search engines evolved. Search engines (if you can call them that) were initially nothing more than well-organized collections of links created manually. Eventually, the automated approach won. Since more data will be created in time (and not destroyed), we can expect an increase in automated data analysis. A brief history of data analysis with Python The history of the various Python software libraries is quite interesting. I am not a historian, so the following notes are written from my own perspective: 1989: Guido van Rossum implements the very first version of Python at the CWI in the Netherlands as a Christmas hobby project. 1995: Jim Hugunin creates Numeric, the predecessor to NumPy. 1999: Pearu Peterson writes f2py as a bridge between Fortran and Python. 2000: Python 2.0 is released. 2001: The SciPy library is released. Also, Numarray, a competing library of Numeric, is created. Fernando Perez releases IPython, which starts out as an afternoon hack. NLTK is released as a research project. 2002: John Hunter creates the matplotlib library. 2005: NumPy is released by Travis Oliphant. Initially, NumPy is Numeric extended with features inspired by Numarray. 2006: NumPy 1.0 is released. The first version of SQLAlchemy is released. 2007: The scikit-learn project is initiated as a Google Summer of Code project by David Cournapeau. Cython is forked from Pyrex. Cython is later intensively used in pandas and scikit-learn to improve performance. 2008: Wes McKinney starts working on pandas. Python 3.0 is released. 2011: The IPython 0.12 release introduces the IPython notebook. Packt releases NumPy 1.5 Beginner's Guide. 2012: Packt releases NumPy Cookbook. 2013: Packt releases NumPy Beginner's Guide - Second Edition. 2014: Fernando Perez announces Project Jupyter, which aims to make a language-agnostic notebook. Packt releases Learning NumPy Array and Python Data Analysis. 2015: Packt releases NumPy Beginner's Guide - Third Edition and NumPy Cookbook - Second Edition. A high-level overview of dautil The dautil API that Ivan made for this book is a humble toolbox, which he found useful. It is released under the MIT license. This license is very permissive, so you could in theory use the library in a production system. He doesn't recommend doing this currently (as of January, 2016), but he believes that the unit tests and documentation are of acceptable quality. The library has 3000+ lines of code and 180+ unit tests with a reasonable coverage. He has fixed as many issues reported by pep8 and flake8 as possible. Some of the functions in dautil are on the short side and are of very low complexity. This is on purpose. If there is a second edition (knock on wood), dautil will probably be completely transformed. The API evolved as Ivan wrote the book under high time pressure, so some of the decisions he made may not be optimal in retrospect. However, he hopes that people find dautil useful and, ideally, contribute to it. The dautil modules are summarized in the following table: Module Description LOC dautil.collect Contains utilities related to collections 331 dautil.conf Contains configuration utilities 48 dautil.data Contains utilities to download and load data 468 dautil.db Contains database-related utilities 98 dautil.log_api Contains logging utilities 204 dautil.nb Contains IPython/Jupyter notebook widgets and utilities 609 dautil.options Configures dynamic options of several libraries related to data analysis 71 dautil.perf Contains performance-related utilities 162 dautil.plotting Contains plotting utilities 382 dautil.report Contains reporting utilities 232 dautil.stats Contains statistical functions and utilities 366 dautil.ts Contains Utilities for time series and dates 217 dautil.web Contains utilities for web mining and HTML processing 47 IPython notebook utilities The IPython notebook has become a standard tool for data analysis. The dautil.nb has several interactive IPython widgets to help with Latex rendering, the setting of matplotlib properties, and plotting. Ivan has defined a Context class, which represents the configuration settings of the widgets. The settings are stored in a pretty-printed JSON file in the current working directory, which is named dautil.json. This could be extended, maybe even with a database backend. The following is an edited excerpt (so that it doesn't take up a lot of space) of an example dautil.json: { ... "calculating_moments": { "figure.figsize": [ 10.4, 7.7 ], "font.size": 11.2 }, "calculating_moments.latex": [ 1, 2, 3, 4, 5, 6, 7 ], "launching_futures": { "figure.figsize": [ 11.5, 8.5 ] }, "launching_futures.labels": [ [ {}, { "legend": "loc=best", "title": "Distribution of Means" } ], [ { "legend": "loc=best", "title": "Distribution of Standard Deviation" }, { "legend": "loc=best", "title": "Distribution of Skewness" } ] ], ... }  The Context object can be constructed with a string—Ivan recommends using the name of the notebook, but any unique identifier will do. The dautil.nb.LatexRenderer also uses the Context class. It is a utility class, which helps you number and render Latex equations in an IPython/Jupyter notebook, for instance, as follows: import dautil as dl lr = dl.nb.LatexRenderer(chapter=12, context=context) lr.render(r'delta! = x - m') lr.render(r'm' = m + frac{delta}{n}') lr.render(r'M_2' = M_2 + delta^2 frac{ n-1}{n}') lr.render(r'M_3' = M_3 + delta^3 frac{ (n - 1) (n - 2)}{n^2}/ - frac{3delta M_2}{n}') lr.render(r'M_4' = M_4 + frac{delta^4 (n - 1) / (n^2 - 3n + 3)}{n^3} + frac{6delta^2 M_2}/ {n^2} - frac{4delta M_3}{n}') lr.render(r'g_1 = frac{sqrt{n} M_3}{M_2^{3/2}}') lr.render(r'g_2 = frac{n M_4}{M_2^2}-3.') The following is the result:   Another widget you may find useful is RcWidget, which sets matplotlib settings, as shown in the following screenshot: Downloading data Sometimes, we require sample data to test an algorithm or prototype a visualization. In the dautil.data module, you will find many utilities for data retrieval. Throughout this book, Ivan has used weather data from the KNMI for the weather station in De Bilt. A couple of the utilities in the module add a caching layer on top of existing pandas functions, such as the ones that download data from the World Bank and Yahoo! Finance (the caching depends on the joblib library and is currently not very configurable). You can also get audio, demographics, Facebook, and marketing data. The data is stored under a special data directory, which depends on the operating system. On the machine used in the book, it is stored under ~/Library/Application Support/dautil. The following example code loads data from the SPAN Facebook dataset and computes the clique number: import networkx as nx import dautil as dl fb_file = dl.data.SPANFB().load() G = nx.read_edgelist(fb_file, create_using=nx.Graph(), nodetype=int) print('Graph Clique Number', nx.graph_clique_number(G.subgraph(list(range(2048)))))  To understand what is going on in detail, you will need to read the book. In a nutshell, we load the data and use the NetworkX API to calculate a network metric. Plotting utilities Ivan visualizes data very often in the book. Plotting helps us get an idea about how the data is structured and helps you form hypotheses or research questions. Often, we want to chart multiple variables, but we want to easily see what is what. The standard solution in matplotlib is to cycle colors. However, Ivan prefers to cycle line widths and line styles as well. The following unit test demonstrates his solution to this issue: def test_cycle_plotter_plot(self): m_ax = Mock() cp = plotting.CyclePlotter(m_ax) cp.plot([0], [0]) m_ax.plot.assert_called_with([0], [0], '-', lw=1) cp.plot([0], [1]) m_ax.plot.assert_called_with([0], [1], '--', lw=2) cp.plot([1], [0]) m_ax.plot.assert_called_with([1], [0], '-.', lw=1) The dautil.plotting module currently also has a helper tool for subplots, histograms, regression plots, and dealing with color maps. The following example code (the code for the labels has been omitted) demonstrates a bar chart utility function and a utility function from dautil.data, which downloads stock price data: import dautil as dl import numpy as np import matplotlib.pyplot as plt ratios = [] STOCKS = ['AAPL', 'INTC', 'MSFT', 'KO', 'DIS', 'MCD', 'NKE', 'IBM'] for symbol in STOCKS: ohlc = dl.data.OHLC() P = ohlc.get(symbol)['Adj Close'].values N = len(P) mu = (np.log(P[-1]) - np.log(P[0]))/N var_a = 0 var_b = 0 for k in range(1, N): var_a = (np.log(P[k]) - np.log(P[k - 1]) - mu) ** 2 var_a = var_a / N for k in range(1, N//2): var_b = (np.log(P[2 * k]) - np.log(P[2 * k - 2]) - 2 * mu) ** 2 var_b = var_b / N ratios.append(var_b/var_a - 1) _, ax = plt.subplots() dl.plotting.bar(ax, STOCKS, ratios) plt.show() Refer to the following screenshot for the end result: The code performs a random walk test and calculates the corresponding ratio for a list of stock prices. The data is retrieved whenever you run the code, so you may get different results. Some of you have a finance aversion, but rest assured that this book has very little finance-related content. The following script demonstrates a linear regression utility and caching downloader for World Bank data (the code for the watermark and plot labels has been omitted): import dautil as dl import matplotlib.pyplot as plt import numpy as np wb = dl.data.Worldbank() countries = wb.get_countries()[['name', 'iso2c']] inf_mort = wb.get_name('inf_mort') gdp_pcap = wb.get_name('gdp_pcap') df = wb.download(country=countries['iso2c'], indicator=[inf_mort, gdp_pcap], start=2010, end=2010).dropna() loglog = df.applymap(np.log10) x = loglog[gdp_pcap] y = loglog[inf_mort] dl.options.mimic_seaborn() fig, [ax, ax2] = plt.subplots(2, 1) ax.set_ylim([0, 200]) ax.scatter(df[gdp_pcap], df[inf_mort]) ax2.scatter(x, y) dl.plotting.plot_polyfit(ax2, x, y) plt.show()  The following image should be displayed by the code: The program downloads World Bank data for 2010 and plots the infant mortality rate against the GDP per capita. Also shown is a linear fit of the log-transformed data. Demystifying Docker Docker uses Linux kernel features to provide an extra virtualization layer. It was created in 2013 by Solomon Hykes. Boot2Docker allows us to install Docker on Windows and Mac OS X as well. Boot2Docker uses a VirtualBox VM that contains a Linux environment with Docker. Ivan's Docker image, which is mentioned in the introduction, is based on the continuumio/miniconda3 Docker image. The Docker installation docs are at https://docs.docker.com/index.html. Once you install Boot2Docker, you need to initialize it. This is only necessary once, and Linux users don't need this step: $ boot2docker init The next step for Mac OS X and Windows users is to start the VM: $ boot2docker start Check the Docker environment by starting a sample container: $ docker run hello-world Docker images are organized in a repository, which resembles GitHub. A producer pushes images and a consumer pulls images. You can pull Ivan's repository with the following command. The size is currently 387 MB. $ docker pull ivanidris/pydacbk Future directions The dautil API consists of items Ivan thinks will be useful outside of the context of this book. Certain functions and classes that he felt were only suitable for a particular chapter are placed in separate per-chapter modules, such as ch12util.py. In retrospect, parts of those modules may need to be included in dautil as well. In no particular order, Ivan has the following ideas for future dautil development: He is playing with the idea of creating a parallel library with "Cythonized" code, but this depends on how dautil is received Adding more data loaders as required There is a whole range of streaming (or online) algorithms that he thinks should be included in dautil as well The GUI of the notebook widgets should be improved and extended The API should have more configuration options and be easier to configure Summary In this article, Ivan roughly sketched what data analysis, data science, and big data are about. This was followed by a brief of history of data analysis with Python. Then, he started explaining dautil—the API he made to help him with this book. He gave a high-level overview and some examples of the IPython notebook utilities, features to download data, and plotting utilities. He used Docker for testing and giving readers a reproducible data analysis environment, so he spent some time on that topic too. Finally, he mentioned the possible future directions that could be taken for the library in order to guide anyone who wants to contribute. Resources for Article:   Further resources on this subject: Recommending Movies at Scale (Python) [article] Python Data Science Up and Running [article] Making Your Data Everything It Can Be [article]
Read more
  • 0
  • 0
  • 7911
Modal Close icon
Modal Close icon