Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Events
Videos
Audiobooks
Packt Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials

7018 Articles
article-image-programmatically-creating-ssrs-report-microsoft-sql-server-2008
Packt
09 Oct 2009
4 min read
Save for later

Programmatically Creating SSRS Report in Microsoft SQL Server 2008

Packt
09 Oct 2009
4 min read
Introduction In order to design the MS SQL Server Reporting Services report programmatically you need to understand what goes into a report. We will start with a simple report shown in the next figure: The above tabular report gets its data from the SQL Server database TestNorthwind using the query shown below: Select EmployeeID, LastName, FirstName, City, Country from Employees. A report is based completely on a Report Definition file, a file in XML format. The file consists of information about the data connection, the datasource in which a dataset is defined, and the layout information together with the data bindings to the report. In the following, we will be referring to the Report Server file called RDLGenSimple.rdl. This is a file written in Report Definition Language in XML Syntax. The next figure shows this file opened as an XML file with the significant nodes collapsed. Note the namespace references. The significant items are the following: The XML Processing instructions The root element of the report collapsed and contained in the root element are: The DataSources Datasets Contained in the body are the ReportItems This is followed by the Page containing the PageHeader and PageFooter items In order to generate a RDL file of the above type the XMLTextWriter class will be used in Visual Studio 2008. In some of the hands-on you have seen how to connect to the SQL Server programmatically as well as how to retrieve data using the ADO.NET objects. This is precisely what you will be doing in this hands-on exercise. The XMLTextWriter Class In order to review the properties of the XMLTextWriter you need to add a reference to the project (or web site) indicating this item. This is carried out by right-clicking the Project (or Website) | Add Reference… and then choosing SYSTEM.XML (http://msdn.microsoft.com/en-us/library/system.xml.aspx) in the Add Reference window. After adding the reference, the ObjectBrowser can be used to look at the details of this class as shown in the next figure. You can access this from View | Object Browser, or by clicking the F2 key with your VS 2008 IDE open. A formal description of this can be found at the bottom of the next figure. The XMLTextWriter takes care of all the elements found in the XML DOM model (see for example, http://www.devarticles.com/c/a/XML/Roaming-through-XMLDOM-An-AJAX-Prerequisite). Hands-on exercise: Generating a Report Definition Language file using Visual Studio 2008 In this hands-on, you will be generating a server report that will display the report shown in the first figure. The coding you will be using is adopted from this article (http://technet.microsoft.com/en-us/library/ms167274.aspx) available  at Microsoft TechNet (http://technet.microsoft.com/en-us/sqlserver/default.aspx). Follow on In this section, you will create a project and add a reference. You add code to the page that is executed by the button click events. The code is scripted and is not generated by any tool. Create project and add reference You will create a Visual Studio 2008 Windows Forms Application and add controls to create a simple user interface for testing the code. Create a Windows Forms Application project in Visual Studio 2008 from File | New | Project… by providing a name. Herein, it is called RDLGen2. Drag-and-drop two labels, three buttons and two text boxes onto the form  as shown: When the Test Connection button Button1 in the code is clicked, a connection to the TestNorthwind database will be made. When the button is clicked, the code in the procedure Connection () is executed. If there are any errors, they will show up in the label at the bottom. When the Get list of Fields button Button2 in the code is clicked, the Query will be run against the database and the retrieved field list will be shown in the adjoining textbox. The Generate a RDL file button Button 3 in the code, creates a report file at the location indicated in the code.
Read more
  • 0
  • 0
  • 9056

article-image-prompt-engineering-principles
Martin Yanev
04 Jun 2023
5 min read
Save for later

Prompt Engineering Principles

Martin Yanev
04 Jun 2023
5 min read
Prompt Engineering and design play a very vital role in controlling the output of the model. Here are some best practices you can use to improve your prompts, as well as some practices you should avoid:Clarity: Use simple sentences and instructions that can easily be understood by ChatGPT. Conciseness: Favor short prompts and short sentences. This can be achieved by chunking your instructions into smaller sentences with clear intentions.Focus: Keep the focus of the prompt on a well-defined topic so that you don’t risk your output being too generic.Consistency: Maintain a consistent tone and language during the conversation so that you can ensure a coherent conversation.“Acting as…”: The hack of letting ChatGPT act as someone or something has proven to be extremely powerful. You can shorten the context you have to provide to the model by simply asking him to act like the person or system you want information from. We’ve already seen the interview-candidate example, where ChatGPT acted as an interviewer for a data scientist position. A very interesting prompt is that of asking ChatGPT to act as a console. Here is an example of it: Figure 1 – Example of ChatGPT acting as a Python console Note that the console, as it would be if it were real, is also reporting the error I made for the cycle, indicating that I was missing the brackets.There is a continuously growing list of Act as prompts you can try in the following GitHub repository: https://github.com/f/awesome-chatgpt-prompts.Considering the few-shot learning capabilities, there are some good tips for leveraging this feature in prompt designing. An ideal conversation is as follows: On the other hand, there are some things you should avoid while designing your prompt:Start with a concise, clear, and focused prompt. This will help you have an overview of the topic you want to discuss, as well as provide food for thought and potential expansion of particular elements. Here’s an exampleFigure 2 – Example of a clear and focused prompt to initiate a conversation with ChatGPTOnce you have identified the relevant elements in the discussion, you can ask ChatGPT to elaborate on them with much more focusFigure 3 – Example of a deep-dive follow-up question in a ChatGPT Sometimes, it might be useful to remember the model and the context in which you are inquiring, especially if the question might apply to various domainsFigure 4 – Example of a reminder about the context in a conversation with ChatGPTFinally, always in mind the limitations we mentioned in previous chapters. ChatGPT may provide partial or incorrect information, so it is always a good practice to double-check. One nice tip you could try is asking the model to provide documentation about its responses so that you can easily find proof of themFigure 5 – Example of ChatGPT providing documentation supporting its previous responses On the other hand, there are some things you should avoid while designing your prompt: Information overload: Avoid providing too much information to ChatGPT, since it could reduce the accuracy of the response.Open-ended questions: Avoid asking ChatGPT vague, open-ended questions. Prompts such as What can you tell me about the world? or Can you help me with my exam? are far too generic and will result in ChatGPT generating vague, useless, and sometimes hallucinated responses.Lack of constraints: If you are expecting an output with a specific structure, don’t forget to specify that to ChatGPT! If you think about the earlier example of ChatGPT acting as an interviewer, you can see how strict I was in specifying not to generate questions all at once. It took several tries before getting to the result since ChatGPT is thought to generate a continuous flow of text.Furthermore, as a general consideration, we still must remember that the knowledge base of ChatGPT is limited to 2021, so we should avoid asking questions about facts that occurred after that date. You can still provide context; however, all the responses will be biased toward the knowledge base before 2021. SummaryIn this article, we get to learn some strong principles that can help you learn how to prompt effectively.  We cover the importance of a good prompt, and all the important Do’s and Don'ts while designing a good prompt with a practical example. About the Author Valentina Alto graduated in 2021 in Data Science. Since 2020 she has been working in Microsoft as Azure Solution Specialist and, since 2022, she focused on Data&AI workloads within the Manufacturing and Pharmaceutical industry. She has been working on customers’ projects closely with system integrators to deploy cloud architecture with a focus on datalake house and DWH, data integration and engineering, IoT and real-time analytics, Azure Machine Learning, Azure cognitive services (including Azure OpenAI Service), and PowerBI for dashboarding. She holds a BSc in Finance and an MSc degree in Data Science from Bocconi University, Milan, Italy. Since her academic journey, she has been writing Tech articles about Statistics, Machine Learning, Deep Learning, and AI in various publications. She has also written a book about the fundamentals of Machine Learning with Python.LinkedInMedium
Read more
  • 0
  • 0
  • 9054

article-image-using-ai-functions-to-solve-complex-problems
Shaun McDonogh
09 Jun 2023
8 min read
Save for later

Using AI Functions to Solve Complex Problems

Shaun McDonogh
09 Jun 2023
8 min read
GPT-4 feels like Einstein's theory of general relativity, in that it will take programmers a long time to uncover all the new possibilities which is analogous to how physicists today are still extracting useful insights from Einstein's theory of general relativity.Here is one example. Big problems can be solved by breaking them down into small tasks. Our jobs usually boil down into performing one problem-solving or productive activity after the next. So, what if you could discern clearly what those smaller tasks were and give them to your side-kick brain to figure out and run for you ‘auto-magically’?Note that the above says, what if you could discern. Already we bump up against a lesson in using LLMs. Why do you need to discern it? Can’t GPT-4 be given the job of discerning the tasks required? If so, could it then orchestrate other GPT-4 “agents” to break down the task further? This is exactly where we can make use of a tool like Auto-GPT.Let’s look at an example that I am developing right now for an Auto-GPT using Rust.Say you build websites for a living. Websites need a web server. Web servers take a request (usually through an API), run some code in line with that request and send back a response. Let’s break down the goal of building a web server into what actors or agents might be required when building an Auto-GPT:Solution Architect: Work with the customer (you) to develop scope and aim of the application. Confirm the recommended technology stack to work within.Project (Agent) Manager: Compile and track preliminary list of tasks needed to complete the project.Data Manager: Identify required External and Internal API endpoints.Code Admin: Write code spec sheet for developer.Junior Developer:Take a first stab at code.Run unit testing.Senior Developer:Sense check initial code and adjust.Run unit testing.Quality Control: Test and send back code for adjustments.Customer (you): Supply feedback and request adjustments.You may have guessed that we will not be hiring 7 members of staff to build web servers. However, planning out this structure is necessary to help us organise building a bot that can itself build, test, and deploy any backend server you ask it to.Our agents will need tools to work with. They need to be given a set of instructions and functions that will get the subtask done without the usual response of “I am just an AI language model”. When developing such a tool, it became clear to me that a new paradigm has hit developers, “AI Functions”. What we are about to discuss brings up potential gaps in the guard-rails of GPT-4 which is unlikely to be news to the OpenAI team but may be of some surprise to you.What are AI Functions?The fact that AI hallucinates would annoy most people. But not to programmers and the folk who developed Marvin. Right now, when you ask ChatGPT to provide you with a response, it sends a whole bunch of jargon with it.What if I told you there was a hack, a way to get distilled and relevant information out of ChatGPT automatically through the OpenAI API that leads to a structured response that can drive actions in your code. Here is an example. Suppose you give an AI the following “function” which is just text disguised as a programming function:struct Sentiment {       happy_rating: 0, sad_rating: 0 } fn convert_comments_into_sentiment(comments: String) -> Sentiment {       // Function takes in comments from YouTube video       // Considers all aspects of what is said in comments       // Returns Sentiment struct with happy and sad rating       // Does not return anything else. No commentary.       return sentiment; }You then ask the AI to print only what the function would return and pass it the YouTube comments as an input. The AI (GPT-4) would then hallucinate a response, even without there being any actual code within the function. None. There is no code to perform this task.No jargon like “I am an AI I cannot blah blah blah” is returned. Nothing like that, you would extract the AI’s pure sentiment analysis in a format which you can then build into your application.The hack here is that we have built a pretend function, without any code in it. We described exactly what this function (written in Rust syntax in this example) does and the structure of its output using comments and a struct.GPT-4 then interprets what it thinks the function does and assesses it against the input provided (YouTube comments in this example). The output is then structured in the LLM’s response. Therefore, we can have LLMs supply responses that are structured in such a way that can be used by other agents (also the GPT-4 LLM).The entire process would look something like this: This is an oversimplified view of what is happening. Note that agents are not always needed. Sometimes the function can just be called as regular code, no LLM needed.The Bigger PictureWhy stop there? Auto-GPTs will likely then become like plugins. With Auto-GPTs popping up everywhere, most software related tasks can be taken care of. Rather than calling an API, your Auto-GPT could call another Auto-GPT. Rather than write AI Functions, a primary agent would know what Auto-GPTs should be assigned to various tasks. It could then write its own program that connects to them.This sort of recursion or self-improvement is where predicted growth in AI is already shown to function as predicted either exponentially or in sigmoid-like jumps. My own version of this self-programming task is already done. It was surprisingly trivial to do with GPT-4’s help. It works well, but it doesn’t replace human developers yet.Not All Sunshine and RosesGPT-4 is not without some annoying limitations; these include:Quality of performance at the given taskChanges in quality response depending on a slight change in input.Text formatting to occasionally include something the model should not.Size and dollar cost of outputTime, latency, delay and crashing of API output.These limitations could technically be worked around by breaking down tasks into even smaller chunks, but this is where I draw a line. What is the point of trying to automate anything, if you end up having to code everything? However, it has become clear through this exercise, that just a few improvements on the above limitations will lead to a significant amount of task automation with AutoGPTs.Quick Wins with Self-testingOne unique algorithm which has worked very well with the current technology is the ability for GPT-4 to write code and for our program to then execute and test the code and provide feedback to GPT-4 of what is not working. The AI Function then prompts GPT-4 as a “Senior Developer Agent” to re-write the code and return to the “Quality Agent” for more unit testing.What you end up with is a program that can write code, test it, and confirm all is working. This is working now and is useful. The downside is still the cost of making too many calls to the OpenAI API and some quality limitations with GPT-4. Therefore, until GPT-5, I remain working directly in my development environment or in the ChatGPT interface extracting quick code snippets.Next StepsIt sounds clichéd but I’m learning something new every day working with LLMs. A recurring realisation is that we should not only be trying to solve newer problems in less time, but rather, learning what the right questions to ask our current LLMs even are. They are capable of more than I believe we realise. When Open Source LLMs reach a GPT-5 like level, Auto-GPTs I suspect will be everywhere.Author BioShaun McDonogh is the lead developer and researcher at Crypto Wizards. He has 6 years’ experience in developing high performance trading software solutions with ML and 15 years’ experience working with frontend and backend web technologies.In his prior career, Shaun worked as a Finance Director, Data Scientist and Analytics Lead in various multi-million-dollar entities before making the transition to developing pioneering solutions for developers in emerging tech industries. His favorite programming languages to teach include Rust, Typescript and Python.Shaun’s mission is to support developers in navigating the ever-changing landscape of emerging technologies in computer software engineering through a new coding arm, Code Raiders.
Read more
  • 0
  • 0
  • 9033

article-image-copying-database-sql-server-2005-sql-server-2008-using-copy-database-wizard
Packt
24 Oct 2009
3 min read
Save for later

Copying a Database from SQL Server 2005 to SQL Server 2008 using the Copy Database Wizard

Packt
24 Oct 2009
3 min read
(For more resources on Microsoft, see here.) Using the Copy Database Wizard you will be creating an SQL Server Integration Services package which will be executed by an SQL Server Agent job. It is therefore necessary to set up the SQL Server Agent to work with a proxy that you need to create which can execute the package. Since the proxy needs a credential to workout outside the SQL 2008 boundary you need to create a Credential and a Principal who has the permissions. Creating a credential has been described elsewhere. The main steps in migration using this route are: Create an Credential Create an SQL Server Agent Proxy to work with SSIS Package execution Create the job using the Copy Database Wizard Creating the Proxy In the SQL Server 2008 Management Studio expand the SQL Server Agent node and then expand the Proxies node. You can create proxies for various actions that you may undertake. In the present case the Copy Database wizard creates an Integration Services package and therefore a proxy is needed for this. Right click the SSIS Package Execution folder as shown in the next figure. Click on New Proxy.... This opens the New Proxy Account window as shown. Here Proxy name is the one you provide which will be needed in the Copy Database Wizard. Credential name is the one you created earlier which uses a database login name and password. Description is an optional info to keep track of the proxy. As seen in the previous figure you can create different proxies to deal with different activities. In the present case a proxy will be created for Integration Service Package execution as shown in the next figure. The name CopyPubx has been created as shown. Now click on the ellipsis button along the Credential name and this brings up the Select Credential window as shown. Now click on the Browse... button. This brings up the Browse for Objects window displaying the credential you created earlier. Place a checkmark as shown and click on the OK button. The [mysorian] credential is entered into the Select Credential window. Click on the OK button on the Select Credential window. The credential name gets entered into the New Proxy Account's Credential name. The optional description can be anything suitable as shown. Place a checkmark on the SQL Server Integration Services Package as shown and click on Principals. Since the present proxy is going to be used by the sysadmin, there is no need to add it specifically. Click on the OK button to close this New Proxy Account window. You can now expand the SSIS Package Execution node of the Proxies and verify that CopyPubx has been added. There are two other proxies created in the same way in this folder. Since the SQL Server Agent is needed for this process to succeed, make sure the SQL Server Agent is running. If it has not started yet, you can start this service from the Control Panel.  
Read more
  • 0
  • 0
  • 9030

article-image-use-macros-ibm-cognos-8-report-studio
Packt
25 May 2010
13 min read
Save for later

Use of macros in IBM Cognos 8 Report Studio

Packt
25 May 2010
13 min read
Cognos Report Studio is widely used for creating and managing business reports in medium to large companies. It is simple enough for any business analyst, power user, or developer to pick up and start developing basic reports. However, when it comes to developing more sophisticated, fully functional business reports for wider audiences, report authors will need guidance. In this article, by Abhishek Sanghani, author of IBM Cognos 8 Report Studio Cookbook, we will show you  that even though macros are often considered a Framework Modeler's tool, they can be used within Report Studio as well. These recipes will show you some very useful macros around security, string manipulation, and prompting. (Read more interesting articles on Compiere here.) Introduction This article will introduce you to an interesting and useful tool of Cognos BI, called 'macros'. They can be used in Framework Manager as well as Report Studio. The Cognos engine understands the presence of a macro as it is written within a pair of hashes (#). It executes the macros first and puts the result back into report specification like a literal string replacement. We can use this to alter data items, filters, and slicers at run time. You won't find the macro functions and their details within Report Studio environment (which is strange, as it fully supports them). Anyways, you can always open Framework Manager and check different macro functions and their syntaxes from there. Also, there is documentation available in Cognos' help and online materials. Working with Dimensional Model (in the"Swapping dimension" recipe). In this article, I will show you more examples and introduce you to more functions which you can later build upon to achieve sophisticated functionalities. We will be writing some SQL straight against the GO Data Warehouse data source. Also, we will use the "GO Data Warehouse (Query)" package for some recipes. Add data level security using CSVIdentityMap macro A report shows the employee names by region and country. We need to implement data security in this report such that a user can see the records only for the country he belongs to. There are already User Groups defined on the Cognos server (in the directory) and users are made members of appropriate groups. For this sample, I have added my user account to a user group called 'Spain'. Getting ready Open a new list report with GO Data Warehouse (Query) as the package. How to do it... Drag the appropriate columns (Region, Country, and Employee name) on to the report from Employee by Region query subject. Go to Query Explorer and drag a new detail filter. Define the filter as: [Country] in (#CSVIdentityNameList(',')#) Run the report to test it. You will notice that a user can see only the rows of the country/countries of which he is a member. How it works... Here we are using a macro function called CSVIdentityNameList. This function returns a list of groups and roles that the user belongs to, along with the user's account name. Hence, when I run the report, one of the values returned will be 'Spain' and I will see data for Spain. The function accepts a string parameter which is used as a separator in the result. Here we are passing a comma (,) as the separator. If a user belongs to multiple country groups, he will see data for all the countries listed in the result of a macro. There's more... This solution, conspicuously, has its limitations. None of the user accounts or roles should be same as a country name, because that will wrongly show data for a country the user doesnot belong to. For example, for a user called 'Paris', it will show data for the 'Paris' region. So, there need to be certain restrictions. However, you can build upon the knowledge of this macro function and use it in many practical business scenarios. Using prompt macro in native SQL In this recipe, we will write an SQL statement straight to be fired on the data source. We will use the Prompt macro to dynamically change the filter condition. We will write a report that shows list of employee by Region and Country. We will use the Prompt macro to ask the users to enter a country name. Then the SQL statement will search for the employee belonging to that country. Getting ready Create a new blank list report against 'GO Data Warehouse (Query)' package. How to do it... Go to the Query Explorer and drag an SQL object on the Query Subject that is linked to the list (Query1 in usual case). Select the SQL object and ensure that great_outdoor_warehouse is selected as the data source. Open the SQL property and add the following statement: select distinct "Branch_region_dimension"."REGION_EN" "Region" ,"Branch_region_dimension"."COUNTRY_EN" "Country" , "EMP_EMPLOYEE_DIM"."EMPLOYEE_NAME" "Employee_name"from "GOSALESDW"."GO_REGION_DIM" "Branch_region_dimension","GOSALESDW"."EMP_EMPLOYEE_DIM" "EMP_EMPLOYEE_DIM","GOSALESDW"."GO_BRANCH_DIM" "GO_BRANCH_DIM"where ("Branch_region_dimension"."COUNTRY_EN" in(#prompt('Region')#))and "Branch_region_dimension"."COUNTRY_CODE" = "GO_BRANCH_DIM"."COUNTRY_CODE" and "EMP_EMPLOYEE_DIM"."BRANCH_CODE" = "GO_BRANCH_DIM"."BRANCH_CODE" Hit the OK button. This will validate the query and will close the dialog box. You will see that three data items (Region, Country, and Employee_Name) are added to Query1. Now go to the report page. Drag these data items on the list and run the report to test it. How it works... Here we are using the macro in native SQL statement. Native SQL allows us to directly fire a query on the data source and use the result on the report. This is useful in certain scenarios where we don't need to define any Framework Model. If you examine the SQL statement, you will notice that it is a very simple one that joins three tables and returns appropriate columns. We have added a filter condition on country name which is supposed to dynamically change depending on the value entered by user. The macro function that we have used here is Prompt(). As the name suggests, it is used to generate a prompt and returns the parameter value back to be used in an SQL statement. Prompt() function takes five arguments. The first argument is the parameter name and it is mandatory. It allows us to link a prompt page object (value prompt, date prompt, and so on) to the prompt function. The rest of the four arguments are optional and we are not using them here. You will read about them in the next recipe. Please note that we also have an option of adding a detail filter in the query subject instead of using PROMPT() macro within query. However, sometimes you would want to filter a table before joining it with other tables. In that case, using PROMPT() macro within the query helps. There's more... Similar to the Prompt() function, there is a i macro function. This works in exactly the same way and allows users to enter multiple values for the parameter. Those values are returned as a comma-separated list. Making prompt optional The previous recipe showed you how to generate a prompt through a macro. In this recipe, we will see how to make it optional using other arguments of the function. We will generate two simple list reports, both based on a native SQL. These lists will show product details for selected product line. However, the product line prompt will be made optional using two different approaches. Getting ready Create a report with two simple list objects based on native SQL. For that, create the Query Subjects in the same way as we did in the previous recipe. Use the following query in the SQL objects: select distinct "SLS_PRODUCT_LINE_LOOKUP"."PRODUCT_LINE_EN" "Product_line" , "SLS_PRODUCT_LOOKUP"."PRODUCT_NAME" "Product_name" , "SLS_PRODUCT_COLOR_LOOKUP"."PRODUCT_COLOR_EN" "Product_color" , "SLS_PRODUCT_SIZE_LOOKUP"."PRODUCT_SIZE_EN" "Product_size"from "GOSALESDW"."SLS_PRODUCT_DIM" "SLS_PRODUCT_DIM","GOSALESDW"."SLS_PRODUCT_LINE_LOOKUP" "SLS_PRODUCT_LINE_LOOKUP","GOSALESDW"."SLS_PRODUCT_TYPE_LOOKUP" "SLS_PRODUCT_TYPE_LOOKUP", "GOSALESDW"."SLS_PRODUCT_LOOKUP" "SLS_PRODUCT_LOOKUP","GOSALESDW"."SLS_PRODUCT_COLOR_LOOKUP" "SLS_PRODUCT_COLOR_LOOKUP","GOSALESDW"."SLS_PRODUCT_SIZE_LOOKUP" "SLS_PRODUCT_SIZE_LOOKUP","GOSALESDW"."SLS_PRODUCT_BRAND_LOOKUP" "SLS_PRODUCT_BRAND_LOOKUP"where "SLS_PRODUCT_LOOKUP"."PRODUCT_LANGUAGE" = N'EN' and "SLS_PRODUCT_DIM"."PRODUCT_LINE_CODE" = "SLS_PRODUCT_LINE_LOOKUP"."PRODUCT_LINE_CODE" and "SLS_PRODUCT_DIM"."PRODUCT_NUMBER" = "SLS_PRODUCT_LOOKUP"."PRODUCT_NUMBER" and "SLS_PRODUCT_DIM"."PRODUCT_SIZE_CODE"= "SLS_PRODUCT_SIZE_LOOKUP"."PRODUCT_SIZE_CODE" and "SLS_PRODUCT_DIM"."PRODUCT_TYPE_CODE" = "SLS_PRODUCT_TYPE_LOOKUP"."PRODUCT_TYPE_CODE" and "SLS_PRODUCT_DIM"."PRODUCT_COLOR_CODE" = "SLS_PRODUCT_COLOR_LOOKUP"."PRODUCT_COLOR_CODE" and "SLS_PRODUCT_BRAND_LOOKUP"."PRODUCT_BRAND_CODE" = "SLS_PRODUCT_DIM"."PRODUCT_BRAND_CODE" This is a simple query that joins product related tables and retrieves required columns. How to do it... We have created two list reports based on two SQL query subjects. Both the SQL objects use the same query as mentioned above. Now, we will start with altering them. For that open Query Explorer. Rename first query subject as Optional_defaultValue and the second one as Pure_Optional. In the Optional_defaultValue SQL object, amend the query with following lines: and "SLS_PRODUCT_LINE_LOOKUP"."PRODUCT_LINE_EN" = #sq(prompt ('ProductLine','string','Golf Equipment'))# Similarly, amend the Pure_Optional SQL object query with the following line: #prompt ('Product Line','string','and 1=1', ' and "SLS_PRODUCT_LINE_LOOKUP"."PRODUCT_LINE_EN" = ')# Now run the report. You will be prompted to enter a product line. Don't enter any value and just hit OK button. Notice that the report runs (which means the prompt is optional). First, list object returns rows for 'Golf Equipment'. The second list is populated by all the products. How it works... Fundamentally, this report works the same as the one in the previous report. We are firing the SQL statements straight on the data source. The filter condition in the WHERE clause are using the PROMPT macro. Optional_defaultValue In this query, we are using the second and third arguments of Prompt() function. Second argument defines the data type of value which is 'String' in our case. The third argument defines default value of the prompt. When the user doesn't enter any value for the prompt, this default value is used. This is what makes the prompt optional. As we have defined 'Golf Equipment' as the default value, the first list object shows data for 'Golf Equipment' when prompt is left unfilled. Pure_Optional In this query, we are using fourth argument of Prompt() function. This argument is of string type. If the user provides any value for the prompt, the prompt value is concatenated to this string argument and the result is returned. In our case, the fourth argument is the left part of filtering condition that is, 'and . 'and "SLS_PRODUCT_LINE_LOOKUP"."PRODUCT_LINE_EN" ='. So, if the user enters the value as 'XYZ', the macro is replaced by the following filter: and "SLS_PRODUCT_LINE_LOOKUP"."PRODUCT_LINE_EN" = 'XYZ' Interestingly, if the user doesn't provide any prompt value, then the fourth argument is simply ignored. The macro is then replaced by the third argument which is in our case is 'and 1=1'. Hence, the second list returns all the rows when user doesn't provide any value for the prompt. This way it makes the PRODUCT_LINE_EN filter purely optional. There's more... Prompt macro accepts two more arguments (fifth and sixth). Please check the help documents or internet sources to find information and examples about them. Adding token using macro In this recipe, we will see how to dynamically change the field on which filter is being applied using macro. We will use prompt macro to generate one of the possible tokens and then use it in the query. Getting ready Create a list report based on native SQL similar to the previous recipe. We will use the same query that works on the product tables but filtering will be different. For that, define the SQL as following: select distinct "SLS_PRODUCT_LINE_LOOKUP"."PRODUCT_LINE_EN" "Product_line" , "SLS_PRODUCT_LOOKUP"."PRODUCT_NAME" "Product_name" , "SLS_PRODUCT_COLOR_LOOKUP"."PRODUCT_COLOR_EN" "Product_color" , "SLS_PRODUCT_SIZE_LOOKUP"."PRODUCT_SIZE_EN" "Product_size"from "GOSALESDW"."SLS_PRODUCT_DIM" "SLS_PRODUCT_DIM","GOSALESDW"."SLS_PRODUCT_LINE_LOOKUP" "SLS_PRODUCT_LINE_LOOKUP","GOSALESDW"."SLS_PRODUCT_TYPE_LOOKUP" "SLS_PRODUCT_TYPE_LOOKUP", "GOSALESDW"."SLS_PRODUCT_LOOKUP" "SLS_PRODUCT_LOOKUP","GOSALESDW"."SLS_PRODUCT_COLOR_LOOKUP" "SLS_PRODUCT_COLOR_LOOKUP","GOSALESDW"."SLS_PRODUCT_SIZE_LOOKUP" "SLS_PRODUCT_SIZE_LOOKUP","GOSALESDW"."SLS_PRODUCT_BRAND_LOOKUP" "SLS_PRODUCT_BRAND_LOOKUP"where "SLS_PRODUCT_LOOKUP"."PRODUCT_LANGUAGE" = N'EN' and "SLS_PRODUCT_DIM"."PRODUCT_LINE_CODE" = "SLS_PRODUCT_LINE_LOOKUP"."PRODUCT_LINE_CODE" and "SLS_PRODUCT_DIM"."PRODUCT_NUMBER" = "SLS_PRODUCT_LOOKUP"."PRODUCT_NUMBER" and "SLS_PRODUCT_DIM"."PRODUCT_SIZE_CODE"= "SLS_PRODUCT_SIZE_LOOKUP"."PRODUCT_SIZE_CODE" and "SLS_PRODUCT_DIM"."PRODUCT_TYPE_CODE" = "SLS_PRODUCT_TYPE_LOOKUP"."PRODUCT_TYPE_CODE" and "SLS_PRODUCT_DIM"."PRODUCT_COLOR_CODE" = "SLS_PRODUCT_COLOR_LOOKUP"."PRODUCT_COLOR_CODE" and "SLS_PRODUCT_BRAND_LOOKUP"."PRODUCT_BRAND_CODE" = "SLS_PRODUCT_DIM"."PRODUCT_BRAND_CODE"and#prompt ('Field','token','"SLS_PRODUCT_LINE_LOOKUP"."PRODUCT_LINE_EN"')# like #prompt ('Value','string')# This is the same basic query that joins the product related tables and fetches required columns. The last statement in WHERE clause uses two prompt macros. We will talk about it in detail. How to do it... We have already created a list report based on an SQL query subject as mentioned previously. Drag the columns from the query subject on the list over the report page. Now create a new prompt page. Add a value prompt on the prompt page. Define two static choices for this. Display value Use value Filter on product line "SLS_PRODUCT_LINE_LOOKUP"."PRODUCT_LINE_EN" Filter on product name "SLS_PRODUCT_LOOKUP"."PRODUCT_NAME Set the parameter for this prompt to 'Field'. This will come pre-populated as existing parameter, as it is defined in the query subject. Choose the UI as radio button group and Filter on Product Line as default selection. Now add a text box prompt on to the prompt page. Set its parameter to Value which comes as a choice in an existing parameter (as it is already defined in the query). Run the report to test it. You will see an option to filter on product line or product name. The value you provide in the text box prompt will be used to filter either of the fields depending on the choice selected in radio buttons.
Read more
  • 0
  • 0
  • 9025

article-image-driving-visual-analyses-automobile-data-python
Packt
19 Sep 2014
19 min read
Save for later

Driving Visual Analyses with Automobile Data (Python)

Packt
19 Sep 2014
19 min read
This article written by Tony Ojeda, Sean Patrick Murphy, Benjamin Bengfort, and Abhijit Dasgupta, authors of the book Practical Data Science Cookbook, will cover the following topics: Getting started with IPython Exploring IPython Notebook Preparing to analyze automobile fuel efficiencies Exploring and describing the fuel efficiency data with Python (For more resources related to this topic, see here.) The dataset, available at http://www.fueleconomy.gov/feg/epadata/vehicles.csv.zip, contains fuel efficiency performance metrics over time for all makes and models of automobiles in the United States of America. This dataset also contains numerous other features and attributes of the automobile models other than fuel economy, providing an opportunity to summarize and group the data so that we can identify interesting trends and relationships. We will perform the entire analysis using Python. However, we will ask the same questions and follow the same sequence of steps as before, again following the data science pipeline. With study, this will allow you to see the similarities and differences between the two languages for a mostly identical analysis. In this article, we will take a very different approach using Python as a scripting language in an interactive fashion that is more similar to R. We will introduce the reader to the unofficial interactive environment of Python, IPython, and the IPython notebook, showing how to produce readable and well-documented analysis scripts. Further, we will leverage the data analysis capabilities of the relatively new but powerful pandas library and the invaluable data frame data type that it offers. pandas often allows us to complete complex tasks with fewer lines of code. The drawback to this approach is that while you don't have to reinvent the wheel for common data manipulation tasks, you do have to learn the API of a completely different package, which is pandas. The goal of this article is not to guide you through an analysis project that you have already completed but to show you how that project can be completed in another language. More importantly, we want to get you, the reader, to become more introspective with your own code and analysis. Think not only about how something is done but why something is done that way in that particular language. How does the language shape the analysis? Getting started with IPython IPython is the interactive computing shell for Python that will change the way you think about interactive shells. It brings to the table a host of very useful functionalities that will most likely become part of your default toolbox, including magic functions, tab completion, easy access to command-line tools, and much more. We will only scratch the surface here and strongly recommend that you keep exploring what can be done with IPython. Getting ready If you have completed the installation, you should be ready to tackle the following recipes. Note that IPython 2.0, which is a major release, was launched in 2014. How to do it… The following steps will get you up and running with the IPython environment: Open up a terminal window on your computer and type ipython. You should be immediately presented with the following text: Python 2.7.5 (default, Mar 9 2014, 22:15:05)Type "copyright", "credits" or "license" for more information. IPython 2.1.0 -- An enhanced Interactive Python.?         -> Introduction and overview of IPython's features.%quickref -> Quick reference.help     -> Python's own help system.object?   -> Details about 'object', use 'object??' for extra details.In [1]: Note that your version might be slightly different than what is shown in the preceding command-line output. Just to show you how great IPython is, type in ls, and you should be greeted with the directory listing! Yes, you have access to common Unix commands straight from your Python prompt inside the Python interpreter. Now, let's try changing directories. Type cd at the prompt, hit space, and now hit Tab. You should be presented with a list of directories available from within the current directory. Start typing the first few letters of the target directory, and then, hit Tab again. If there is only one option that matches, hitting the Tab key automatically will insert that name. Otherwise, the list of possibilities will show only those names that match the letters that you have already typed. Each letter that is entered acts as a filter when you press Tab. Now, type ?, and you will get a quick introduction to and overview of IPython's features. Let's take a look at the magic functions. These are special functions that IPython understands and will always start with the % symbol. The %paste function is one such example and is amazing for copying and pasting Python code into IPython without losing proper indentation. We will try the %timeit magic function that intelligently benchmarks Python code. Enter the following commands: n = 100000%timeit range(n)%timeit xrange(n) We should get an output like this: 1000 loops, best of 3: 1.22 ms per loop1000000 loops, best of 3: 258 ns per loop This shows you how much faster xrange is than range (1.22 milliseconds versus 2.58 nanoseconds!) and helps show you the utility of generators in Python. You can also easily run system commands by prefacing the command with an exclamation mark. Try the following command: !ping www.google.com You should see the following output: PING google.com (74.125.22.101): 56 data bytes64 bytes from 74.125.22.101: icmp_seq=0 ttl=38 time=40.733 ms64 bytes from 74.125.22.101: icmp_seq=1 ttl=38 time=40.183 ms64 bytes from 74.125.22.101: icmp_seq=2 ttl=38 time=37.635 ms Finally, IPython provides an excellent command history. Simply press the up arrow key to access the previously entered command. Continue to press the up arrow key to walk backwards through the command list of your session and the down arrow key to come forward. Also, the magic %history command allows you to jump to a particular command number in the session. Type the following command to see the first command that you entered: %history 1 Now, type exit to drop out of IPython and back to your system command prompt. How it works… There isn't much to explain here and we have just scratched the surface of what IPython can do. Hopefully, we have gotten you interested in diving deeper, especially with the wealth of new features offered by IPython 2.0, including dynamic and user-controllable data visualizations. See also IPython at http://ipython.org/ The IPython Cookbook at https://github.com/ipython/ipython/wiki?path=Cookbook IPython: A System for Interactive Scientific Computing at http://fperez.org/papers/ipython07_pe-gr_cise.pdf Learning IPython for Interactive Computing and Data Visualization, Cyrille Rossant, Packt Publishing, available at http://www.packtpub.com/learning-ipython-for-interactive-computing-and-data-visualization/book The future of IPython at http://www.infoworld.com/print/236429 Exploring IPython Notebook IPython Notebook is the perfect complement to IPython. As per the IPython website: "The IPython Notebook is a web-based interactive computational environment where you can combine code execution, text, mathematics, plots and rich media into a single document." While this is a bit of a mouthful, it is actually a pretty accurate description. In practice, IPython Notebook allows you to intersperse your code with comments and images and anything else that might be useful. You can use IPython Notebooks for everything from presentations (a great replacement for PowerPoint) to an electronic laboratory notebook or a textbook. Getting ready If you have completed the installation, you should be ready to tackle the following recipes. How to do it… These steps will get you started with exploring the incredibly powerful IPython Notebook environment. We urge you to go beyond this simple set of steps to understand the true power of the tool. Type ipython notebook --pylab=inline in the command prompt. The --pylab=inline option should allow your plots to appear inline in your notebook. You should see some text quickly scroll by in the terminal window, and then, the following screen should load in the default browser (for me, this is Chrome). Note that the URL should be http://127.0.0.1:8888/, indicating that the browser is connected to a server running on the local machine at port 8888. You should not see any notebooks listed in the browser (note that IPython Notebook files have a .ipynb extension) as IPython Notebook searches the directory you launched it from for notebook files. Let's create a notebook now. Click on the New Notebook button in the upper right-hand side of the page. A new browser tab or window should open up, showing you something similar to the following screenshot: From the top down, you can see the text-based menu followed by the toolbar for issuing common commands, and then, your very first cell, which should resemble the command prompt in IPython. Place the mouse cursor in the first cell and type 5+5. Next, either navigate to Cell | Run or press Shift + Enter as a keyboard shortcut to cause the contents of the cell to be interpreted. You should now see something similar to the following screenshot. Basically, we just executed a simple Python statement within the first cell of our first IPython Notebook. Click on the second cell, and then, navigate to Cell | Cell Type | Markdown. Now, you can easily write markdown in the cell for documentation purposes. Close the two browser windows or tabs (the notebook and the notebook browser). Go back to the terminal in which you typed ipython notebook, hit Ctrl + C, then hit Y, and press Enter. This will shut down the IPython Notebook server. How it works… For those of you coming from either more traditional statistical software packages, such as Stata, SPSS, or SAS, or more traditional mathematical software packages, such as MATLAB, Mathematica, or Maple, you are probably used to the very graphical and feature-rich interactive environments provided by the respective companies. From this background, IPython Notebook might seem a bit foreign but hopefully much more user friendly and less intimidating than the traditional Python prompt. Further, IPython Notebook offers an interesting combination of interactivity and sequential workflow that is particularly well suited for data analysis, especially during the prototyping phases. R has a library called Knitr (http://yihui.name/knitr/) that offers the report-generating capabilities of IPython Notebook. When you type in ipython notebook, you are launching a server running on your local machine, and IPython Notebook itself is really a web application that uses a server-client architecture. The IPython Notebook server, as per ipython.org, uses a two-process kernel architecture with ZeroMQ (http://zeromq.org/) and Tornado. ZeroMQ is an intelligent socket library for high-performance messaging, helping IPython manage distributed compute clusters among other tasks. Tornado is a Python web framework and asynchronous networking module that serves IPython Notebook's HTTP requests. The project is open source and you can contribute to the source code if you are so inclined. IPython Notebook also allows you to export your notebooks, which are actually just text files filled with JSON, to a large number of alternative formats using the command-line tool called nbconvert (http://ipython.org/ipython-doc/rel-1.0.0/interactive/nbconvert.html). Available export formats include HTML, LaTex, reveal.js HTML slideshows, Markdown, simple Python scripts, and reStructuredText for the Sphinx documentation. Finally, there is IPython Notebook Viewer (nbviewer), which is a free web service where you can both post and go through static, HTML versions of notebook files hosted on remote servers (these servers are currently donated by Rackspace). Thus, if you create an amazing .ipynb file that you want to share, you can upload it to http://nbviewer.ipython.org/ and let the world see your efforts. There's more… We will try not to sing too loudly the praises of Markdown, but if you are unfamiliar with the tool, we strongly suggest that you try it out. Markdown is actually two different things: a syntax for formatting plain text in a way that can be easily converted to a structured document and a software tool that converts said text into HTML and other languages. Basically, Markdown enables the author to use any desired simple text editor (VI, VIM, Emacs, Sublime editor, TextWrangler, Crimson Editor, or Notepad) that can capture plain text yet still describe relatively complex structures such as different levels of headers, ordered and unordered lists, and block quotes as well as some formatting such as bold and italics. Markdown basically offers a very human-readable version of HTML that is similar to JSON and offers a very human-readable data format. See also IPython Notebook at http://ipython.org/notebook.html The IPython Notebook documentation at http://ipython.org/ipython-doc/stable/interactive/notebook.html An interesting IPython Notebook collection at https://github.com/ipython/ipython/wiki/A-gallery-of-interesting-IPython-Notebooks The IPython Notebook development retrospective at http://blog.fperez.org/2012/01/ipython-notebook-historical.html Setting up a remote IPython Notebook server at http://nbviewer.ipython.org/github/Unidata/tds-python-workshop/blob/master/ipython-notebook-server.ipynb The Markdown home page at https://daringfireball.net/projects/markdown/basics Preparing to analyze automobile fuel efficiencies In this recipe, we are going to start our Python-based analysis of the automobile fuel efficiencies data. Getting ready If you completed the first installation successfully, you should be ready to get started. How to do it… The following steps will see you through setting up your working directory and IPython for the analysis for this article: Create a project directory called fuel_efficiency_python. Download the automobile fuel efficiency dataset from http://fueleconomy.gov/feg/epadata/vehicles.csv.zip and store it in the preceding directory. Extract the vehicles.csv file from the zip file into the same directory. Open a terminal window and change the current directory (cd) to the fuel_efficiency_python directory. At the terminal, type the following command: ipython notebook Once the new page has loaded in your web browser, click on New Notebook. Click on the current name of the notebook, which is untitled0, and enter in a new name for this analysis (mine is fuel_efficiency_python). Let's use the top-most cell for import statements. Type in the following commands: import pandas as pdimport numpy as npfrom ggplot import *%matplotlib inline Then, hit Shift + Enter to execute the cell. This imports both the pandas and numpy libraries, assigning them local names to save a few characters while typing commands. It also imports the ggplot library. Please note that using the from ggplot import * command line is not a best practice in Python and pours the ggplot package contents into our default namespace. However, we are doing this so that our ggplot syntax most closely resembles the R ggplot2 syntax, which is strongly not Pythonic. Finally, we use a magic command to tell IPython Notebook that we want matploblib graphs to render in the notebook. In the next cell, let's import the data and look at the first few records: vehicles = pd.read_csv("vehicles.csv")vehicles.head Then, press Shift + Enter. The following text should be shown: However, notice that a red warning message appears as follows: /Library/Python/2.7/site-packages/pandas/io/parsers.py:1070: DtypeWarning: Columns (22,23,70,71,72,73) have mixed types. Specify dtype option on import or set low_memory=False.   data = self._reader.read(nrows) This tells us that columns 22, 23, 70, 71, 72, and 73 contain mixed data types. Let's find the corresponding names using the following commands: column_names = vehicles.columns.values column_names[[22, 23, 70, 71, 72, 73]]   array([cylinders, displ, fuelType2, rangeA, evMotor, mfrCode], dtype=object) Mixed data types sounds like it could be problematic so make a mental note of these column names. Remember, data cleaning and wrangling often consume 90 percent of project time. How it works… With this recipe, we are simply setting up our working directory and creating a new IPython Notebook that we will use for the analysis. We have imported the pandas library and very quickly read the vehicles.csv data file directly into a data frame. Speaking from experience, pandas' robust data import capabilities will save you a lot of time. Although we imported data directly from a comma-separated value file into a data frame, pandas is capable of handling many other formats, including Excel, HDF, SQL, JSON, Stata, and even the clipboard using the reader functions. We can also write out the data from data frames in just as many formats using writer functions accessed from the data frame object. Using the bound method head that is part of the Data Frame class in pandas, we have received a very informative summary of the data frame, including a per-column count of non-null values and a count of the various data types across the columns. There's more… The data frame is an incredibly powerful concept and data structure. Thinking in data frames is critical for many data analyses yet also very different from thinking in array or matrix operations (say, if you are coming from MATLAB or C as your primary development languages). With the data frame, each column represents a different variable or characteristic and can be a different data type, such as floats, integers, or strings. Each row of the data frame is a separate observation or instance with its own set of values. For example, if each row represents a person, the columns could be age (an integer) and gender (a category or string). Often, we will want to select the set of observations (rows) that match a particular characteristic (say, all males) and examine this subgroup. The data frame is conceptually very similar to a table in a relational database. See also Data structures in pandas at http://pandas.pydata.org/pandas-docs/stable/dsintro.html Data frames in R at http://www.r-tutor.com/r-introduction/data-frame Exploring and describing the fuel efficiency data with Python Now that we have imported the automobile fuel efficiency dataset into IPython and witnessed the power of pandas, the next step is to replicate the preliminary analysis performed in R, getting your feet wet with some basic pandas functionality. Getting ready We will continue to grow and develop the IPython Notebook that we started in the previous recipe. If you've completed the previous recipe, you should have everything you need to continue. How to do it… First, let's find out how many observations (rows) are in our data using the following command: len(vehicles) 34287 If you switch back and forth between R and Python, remember that in R, the function is length and in Python, it is len. Next, let's find out how many variables (columns) are in our data using the following command: len(vehicles.columns) 74 Let's get a list of the names of the columns using the following command: print(vehicles.columns) Index([u'barrels08', u'barrelsA08', u'charge120', u'charge240', u'city08', u'city08U', u'cityA08', u'cityA08U', u'cityCD', u'cityE', u'cityUF', u'co2', u'co2A', u'co2TailpipeAGpm', u'co2TailpipeGpm', u'comb08', u'comb08U', u'combA08', u'combA08U', u'combE', u'combinedCD', u'combinedUF', u'cylinders', u'displ', u'drive', u'engId', u'eng_dscr', u'feScore', u'fuelCost08', u'fuelCostA08', u'fuelType', u'fuelType1', u'ghgScore', u'ghgScoreA', u'highway08', u'highway08U', u'highwayA08', u'highwayA08U', u'highwayCD', u'highwayE', u'highwayUF', u'hlv', u'hpv', u'id', u'lv2', u'lv4', u'make', u'model', u'mpgData', u'phevBlended', u'pv2', u'pv4', u'range', u'rangeCity', u'rangeCityA', u'rangeHwy', u'rangeHwyA', u'trany', u'UCity', u'UCityA', u'UHighway', u'UHighwayA', u'VClass', u'year', u'youSaveSpend', u'guzzler', u'trans_dscr', u'tCharger', u'sCharger', u'atvType', u'fuelType2', u'rangeA', u'evMotor', u'mfrCode'], dtype=object) The u letter in front of each string indicates that the strings are represented in Unicode (http://docs.python.org/2/howto/unicode.html) Let's find out how many unique years of data are included in this dataset and what the first and last years are using the following command: len(pd.unique(vehicles.year)) 31 min(vehicles.year) 1984 max(vehicles["year"]) 2014 Note that again, we have used two different syntaxes to reference individual columns within the vehicles data frame. Next, let's find out what types of fuel are used as the automobiles' primary fuel types. In R, we have the table function that will return a count of the occurrences of a variable's various values. In pandas, we use the following: pd.value_counts(vehicles.fuelType1)Regular Gasoline     24587Premium Gasoline     8521Diesel                    1025Natural Gas               57Electricity                 56Midgrade Gasoline      41dtype: int64 Now if we want to explore what types of transmissions these automobiles have, we immediately try the following command: pd.value_counts(vehicles.trany) However, this results in a bit of unexpected and lengthy output: What we really want to know is the number of cars with automatic and manual transmissions. We notice that the trany variable always starts with the letter A when it represents an automatic transmission and M for manual transmission. Thus, we create a new variable, trany2, that contains the first character of the trany variable, which is a string: vehicles["trany2"] = vehicles.trany.str[0]pd.value_counts(vehicles.trany2) The preceding command yields the answer that we wanted or twice as many automatics as manuals: A   22451M   11825dtype: int64 How it works… In this recipe, we looked at some basic functionality in Python and pandas. We have used two different syntaxes (vehicles['trany'] and vehicles.trany) to access variables within the data frame. We have also used some of the core pandas functions to explore the data, such as the incredibly useful unique and the value_counts function. There's more... In terms of the data science pipeline, we have touched on two stages in a single recipe: data cleaning and data exploration. Often, when working with smaller datasets where the time to complete a particular action is quite short and can be completed on our laptop, we will very quickly go through multiple stages of the pipeline and then loop back, depending on the results. In general, the data science pipeline is a highly iterative process. The faster we can accomplish steps, the more iterations we can fit into a fixed time, and often, we can create a better final analysis. See also The pandas API overview at http://pandas.pydata.org/pandas-docs/stable/api.html Summary This article took you through the process of analyzing and visualizing automobile data to identify trends and patterns in fuel efficiency over time using the powerful programming language, Python. Resources for Article: Further resources on this subject: Importing Dynamic Data [article] MongoDB data modeling [article] Report Data Filtering [article]
Read more
  • 0
  • 0
  • 9021
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at €18.99/month. Cancel anytime
article-image-brainnet-an-interface-to-communicate-between-human-brains-could-soon-make-telepathy-real
Sunith Shetty
28 Sep 2018
3 min read
Save for later

BrainNet, an interface to communicate between human brains, could soon make Telepathy real

Sunith Shetty
28 Sep 2018
3 min read
BrainNet provides the first multi-person brain-to-brain interface which allows a nonthreatening direct collaboration between human brains. It can help small teams collaborate to solve a range of tasks using direct brain-to-brain communication. How does BrainNet operate? The noninvasive interface combines electroencephalography (EEG) to record brain signals and transcranial magnetic stimulation (TMS) to deliver the required information to the brain. For now, the interface allows three human subjects to collaborate, handle and solve a task using direct brain-to-brain communication. Two out of three human subjects are “Senders”. The senders’ brain signals are decoded using real-time EEG data analysis. This technique allows extracting decisions which are vital in communicating in order to solve the required challenges. Let’s take an example of a Tetris-like game--where you need quick decisions to decide whether to rotate a block or drop as it is in order to fill a line. The senders’ signals (decisions) are transmitted to the third subject human brain via the Internet, the “Receiver” in this case. The decisions are sent to the receiver brain via magnetic stimulation of the occipital cortex. The receiver can’t see the game screen to decide if the rotation of the block is required. The receiver integrates the decisions received and makes an informed call using an EEG interface regarding turning the position of the block or keeping it in the same position. The second round of the game allows the senders to validate the previous move and provide the necessary feedback to the receiver’s action. How did the results look? The group of researchers has implemented this technique for the Tetris game to evaluate the performance of BrainNet considering the following factors: Group-level performance during the game True/False positive rates of subject’s decisions Mutual information between subjects This was implemented among five groups of three human brain subjects to perform the Tetris task using BrainNet interface. The average accuracy result for the task was 0.813. Furthermore, they also tried varying the information reliability by injecting artificially generated noise into one of the senders’ signals. However, the receiver was able to classify which sender is more reliable based on the information transmitted to their brains. These positive results have open the gates and the possibilities of future brain-to-brain interfaces which holds the power of enabling cooperative problem solving by humans using a "social network" of connected brains. To know more, you can refer to the research paper. Read more Diffractive Deep Neural Network (D2NN): UCLA-developed AI device can identify objects at the speed of light Baidu announces ClariNet, a neural network for text-to-speech synthesis Optical training of Neural networks is making AI more efficient
Read more
  • 0
  • 0
  • 9020

article-image-ibm-lotus-domino-exploring-view-options-web
Packt
11 May 2011
8 min read
Save for later

IBM Lotus Domino: exploring view options for the web

Packt
11 May 2011
8 min read
Views are important to most Domino applications. They provide the primary means by which documents are located and retrieved. But working with views on the Web is often more complicated or less satisfactory than using views with the Notes client. Several classic view options are available for web applications, all of which have draw-backs and implementation issues. A specific view can be displayed on the Web in several different ways. So it is helpful to consider view attributes that influence design choices, in particular: View content View structure How a view is translated for the Web How a view looks in a browser Whether or not a view template is used View performance Document hierarchy In terms of content, a view contains: Data only Data and HTML tags In terms of structure, views are: Uncategorized Categorized In terms of the techniques used by Domino to translate views for the Web, there are four basic methods: Domino-generated HTML (the default) Developer-generated HTML (the view contains data and HTML tags) View Applet (used with data only views) XML (the view is transmitted to the browser as an XML document) The first three of these methods are easier to implement. Two options on the Advanced tab of View Properties control which of these three methods is used: Treat view contents as HTML Use applet in the browser If neither option is checked, then Domino translates the view into an HTML table and then sends the page to the browser. If Treat view contents as HTML is checked, then Domino sends the view to the browser as is, assuming that the developer has encoded HTML table tags in the view. If Use applet in the browser is checked, then Domino uses the Java View Applet to display the view. (As mentioned previously, the Java Applets can be slow to load, and they do require a locally installed JVM (Java Virtual Machine)). Using XML to display views in a browser is a more complicated proposition, and we will not deal with it here. Pursue this and other XML-related topics in Designer Help or on the Web. Here is a starting point: http://www.ibm.com/developerworks/xml/ In terms of how a view looks when displayed in a browser, two alternatives can be used: Native styling with Designer Styling with Cascading Style Sheets In terms of whether or not a view template is used, there are three choices: A view template is not used The default view template is used A view template created for a specific view is used Finally, view performance can be an issue for views with many: Documents Columns Column formulas Column sorting options Each view is indexed and refreshed according to a setting on the Advanced tab of View Properties. By default, view indices are set to refresh automatically when documents are added or deleted. If the re-indexing process takes longer, then application response time can suffer. In general, smaller and simpler views with fewer column formulas perform better than long, complicated and computationally intensive views. The topics in this section deal with designing views for the Web. The first few topics review the standard options for displaying views. Later topics offer suggestions about improving view look and feel. Understand view Action buttons As you work with views on the Web, keep in mind that Action buttons are always placed at the top of the page regardless of how the view is displayed on the Web (standard view, view contents as HTML) and regardless of whether or not a view template is used. Unless the Action Bar is displayed with the Java applet, Action buttons are rendered in a basic HTML table; a horizontal rule separates the Action buttons from the rest of the form. Bear in mind that the Action buttons are functionally connected to but stylistically independent of the view and view template design elements that display lower on the form. Use Domino-generated default views When you look at a view on the Web, the view consists only of column headings and data rows. Everything else on the page (below any Action buttons) is contained on a view template form. You can create view templates in your design, or you can let Domino provide a default form. If Domino supplies the view template, the rendered page is fairly basic. Below the Action buttons and the horizontal rule, standard navigational hotspots are displayed; these navigational hotspots are repeated below the view. Expand and Collapse hotspots are included to support categorized views and views that include documents in response hierarchies. The view title displays below the top set of navigational hotspots, and then the view itself appears. If you supply a view template for a view, you must design the navigational hotspots, view title, and other design elements that may be required. View contents are rendered as an HTML table with columns that expand or contract depending upon the width of cell contents. If view columns enable sorting, then sorting arrows appear to the right of column headings. Here is an example of how Domino displays a view by default on the Web: In this example, clicking the blue underscored values in the left-most Last Name column opens the corresponding documents. By default, values in the left-most column are rendered as URL links, but any other column—or several columns—can serve this purpose. To change which column values are clickable, enable or disable the Show values in this column as links option on the Advanced tab of Column Properties: Typically a title, subject, or another unique document attribute is enabled as the link. Out of the box default views are a good choice for rapid prototyping or for one-time needs where look-and-feel are less important. Beyond designing the views, nothing else is required. Domino merges the views with HTML tags and a little JavaScript to produce fully functional pages. On the down side, what you see is what you get. Default views are stylistically uninspiring, and there is not a lot that can be done with them beyond some modest Designer-applied styling. Many Designer-applied styles, such as column width, are not translated to the Web. Still, some visual improvements can be made. In this example, the font characteristics are modified, and an alternate row background color is added: Include HTML tags to enhance views Some additional styling and behavior can be coded into standard views using HTML tags and CSS rules. Here is how this is done: In this example, <font> tags surround the column Title. Note the square brackets that identify the tags as HTML: Tags can also be inserted into column value formulas: "[<font color='darkblue'>]" + ContactLast + "[</font>]" When viewed with a browser, the new colors are displayed as expected. But when the view is opened in Notes, it looks like this: (Move the mouse over the image to enlarge it.) The example illustrates how to code the additional tags, but frankly the same effects can be achieved using Designer-applied formatting, so there is no real gain here. The view takes longer to code and the final result is not very reader-friendly when viewed with the Notes client. That being said, there still may be occasions when you want to add HTML tags to achieve a particular result. Here is a somewhat more complicated application of the same technique. This next line of code is added to the Title of a column. Note the use of <sup> and <font> tags. These tags apply only to the message See footnote 1: Last Name[<sup><font color='red'>]See footnote 1[</font></sup>] The result achieves the desired effect: More challenging is styling values in view columns. You do not have access to the <td> or <font> tags that Domino inserts into the page to define table cell contents. But you can add <span> tags around a column value, and then use CSS rules to style the span. Here is what the column formula might look like: "[<span class='column1'>]" + ContactLast + "[</span>]" Here is the CSS rule for the column1 class: .column1 { background-color: #EEE; cursor: default; display: block; font-weight: bold; text-decoration: none; width: 100%; } These declarations change the background color of the cell to a light gray and the pointer to the browser default. The display and width declarations force the span to occupy the width of the table cell. The text underscoring (for the link) is removed and the text is made bold. Without the CSS rule, the view displays as expected: With the CSS rule applied, a different look for the first column is achieved:
Read more
  • 0
  • 0
  • 9010

article-image-beagle-boards
Packt
23 Dec 2014
10 min read
Save for later

Beagle Boards

Packt
23 Dec 2014
10 min read
In this article by Hunyue Yau, author of Learning BeagleBone, we will provide a background on the entire family of Beagle boards with brief highlights of what is unique about every member, such as things that favor one member over the other. This article will help you identify Beagle boards that might have been mislabeled. The following topics will be covered here: What are Beagle boards How do they relate to other development boards BeagleBoard Classic BeagleBoard-xM BeagleBone White BeagleBone Black (For more resources related to this topic, see here.) The Beagle board family The Beagle boards are a family of low-cost, open development boards that provide everyday students, developers, and other interested people with access to the current mobile processor technology on a path toward developing ideas. Prior to the invention of the Beagle family of boards, the available options with the user were primarily limited to either low-computing power boards, such as the 8-bit microcontroller-based Arduino boards, or dead-end options, such as repurposing existing products. There were even other options such as compromising the physical size or electrical power consumption by utilizing the nonmobile-oriented technology, for example, embedding a small laptop or desktop into a project. The Beagle boards attempt to address these points and more. The Beagle board family provides you with access to the technologies that were originally developed for mobile devices, such as phones and tablets, and use them to develop projects and for educational purposes. By leveraging the same technology for education, students can be less reliant on obsolete technologies. All this access comes affordably. Prior to the Beagle boards being available, development boards of this class easily exceeded thousands of dollars. In contrast, the initial Beagle board offering was priced at a mere 150 dollars! The Beagle boards The Beagle family of boards began in late 2008 with the original Beagle board. The original board has quite a few characteristics similar to all members of the Beagle board family. All the current boards are based on an ARM core and can be powered by a single 5V source or by varying degrees from a USB port. All boards have a USB port for expansion and provide direct access to the processor I/O for advance interfacing and expansion. Examples of the processor I/O available for expansion include Serial Peripheral Interface (SPI), I2C, pulse width modulation (PWM), and general-purpose input/output (GPIO). The USB expansion path was introduced at an early stage providing a cheap path to add features by leveraging the existing desktop and laptop accessories. All the boards are designed keeping the beginner board in mind and, as such, are impossible to brick on software basis. To brick a board is a common slang term that refers to damaging a board beyond recovery, thus, turning the board from an embedded development system to something as useful for embedded development as a brick. This doesn't mean that they cannot be damaged electrically or physically. For those who are interested, the design and manufacturing material is also available for all the boards. The bill of material is designed to be available via the distribution so that the boards themselves can be customized and manufactured even in small quantities. This allows projects to be manufactured if desired. Do not power up the board on any conductive surfaces or near conductive materials, such as metal tools or exposed wires. The board is fully exposed and doing so can subject your board to electrical damage. The only exception is a proper ESD mat design for use with electrons. The proper ESD mats are designed to be only conductive enough to discharge static electricity without damaging the circuits. The following sections highlight the specifications for each member presented in the order they were introduced. They are based on the latest revision of the board. As these boards leverage mobile technology, the availability changes and the designs are partly revised to accommodate the available parts. The design information for older versions is available at http://www.beagleboard.org/. BeagleBoard Classic The initial member of the Beagle board family is the BeagleBoard Classic (BBC), which features the following specs: OMAP3530 clocked up to 720 MHz, featuring an ARM Cortex-A8 core along with integrated 3D and video decoding accelerators 256 MB of LPDDR (low-power DDR) memory with 512 MB of integrated (NAND) flash memory on board; older revisions had less memory USB OTG (switchable between a USB device and a USB host) along with a pure USB high-speed host only port A low-level debug port accessible using a common desktop DB-9 adapter Analog audio in and out DVI-D video output to connect to a desktop monitor or a digital TV A full-size SD card interface A 28-pin general expansion header along with two 20-pin headers for video expansion 1.8V I/O Only a nominal 5V is available on the expansion connector. Expansion boards should have their own regulator. At the original release of the BBC in 2008, OMAP3530 was comparable to the processors of mobile phones of that time. The BBC is the only member to feature a full-size SD card interface. You can see the BeagleBoard Classic in the following image: BeagleBoard-xM As an upgrade to the BBC, the BeagleBoard-xM (BBX) was introduced later. It features the following specs: DM3730 clocked up to 1 GHz, featuring an ARM Cortex-A8 core along with integrated 3D and video decoding accelerators compared to 720 MHz of the BBC. 512 MB of LPDDR but no onboard flash memory compared to 256 MB of LPDDR with up to 512 MB of onboard flash memory. USB OTG (switchable between a USB device and a USB host) along with an onboard hub to provide four USB host ports and an onboard USB connected the Ethernet interface. The hub and Ethernet connect to the same port as the only high-speed port of the BBC. The hub allows low-speed devices to work with the BBX. A low-level debug port accessible with a standard DB-9 serial cable. An adapter is no longer needed. Analog audio in and out. This is the same analog audio in and out as that of the BBC. DVI-D video output to connect to a desktop monitor or a digital TV. This is the same DVI-D video output as used in the BBC. A microSD interface. It replaces the full-size SD interface on the BBC. The difference is mainly the physical size. A 28-pin expansion interface and two 20-pin video expansion interfaces along with an additional camera interface board. The 28-pin and two 20-pin interfaces are physically and electrically compatible with the BBC. 1.8V I/O. Only a nominal 5V is available on the expansion connector. Expansion boards should have their own regulator. The BBX has a faster processor and added capabilities when compared to the BBC. The camera interface is a unique feature for the BBX and provides a direct interface for raw camera sensors. The 28-pin interface, along with the two 20-pin video interfaces, is electrically and mechanically compatible with the BBC. Mechanical mounting holes were purposely made backward compatible. Beginning with the BBX, boards were shipped with a microSD card containing the Angström Linux distribution. The latest version of the kernel and bootloader are shared between the BBX and BBC. The software can detect and utilize features available on each board as the DM3730 and the OMAP3530 processors are internally very similar. You can see the BeagleBoard-xM in the following image: BeagleBone To simplify things and to bring in a low-entry cost, the BeagleBone subfamily of boards was introduced. While many concepts in this article can be shared with the entire Beagle family, this article will focus on this subfamily. All current members of BeagleBone can be purchased for less than 100 dollars. BeagleBone White The initial member of this subfamily is the BeagleBone White (BBW). This new form factor has a footprint to allow the board itself to be stored inside an Altoids tin. The Altoids tin is conductive and can electrically damage the board if an operational BeagleBone without additional protection is placed inside it. The BBW features the following specs: AM3358 clocked at up to 720 MHz, featuring an ARM Cortex-A8 core along with a 3D accelerator, an ARM Cortex-M3 for power management, and a unique feature—the Programmable Real-time Unit Subsystem (PRUSS) 256 MB of DDR2 memory Two USB ports, namely, a dedicated USB host and dedicated USB device An onboard JTAG debugger An onboard USB interface to access the low-level serial interfaces 10/100 MB Ethernet interfaces Two 46-pin expansion interfaces with up to eight channels of analog input 10-pin power expansion interface A microSD interface 3.3V digital I/O 1.8V analog I/O As with the BBX, the BBW ships with the Angström Linux distribution. You can see the BeagleBone White in the following image: BeagleBone Black Intended as a lower cost version of the BeagleBone, the BeagleBone Black (BBB) features the following specs: AM3358 clocked at up to 1 GHz, featuring an ARM Cortex-A8 core along with a 3D accelerator, an ARM Cortex-M3 for power management, and a unique feature: the PRUSS. This is an improved revision of the same processor in BBW. 512 MB of DDR3 memory compared to 256 MB of DDR2 memory on the BBW. 4 GB of onboard flash embedded MMC (eMMC) memory for the latest version compared to a complete lack of onboard flash memory on the BBW. Two USB ports, namely, a dedicated USB host and dedicated USB device. A low-level serial interface is available as a dedicated 6-pin header. 10/100 MB Ethernet interfaces. Two 46-pin expansion interfaces with up to eight channels of analog input. A microSD interface. A micro HDMI interface to connect to a digital monitor or a digital TV. A digital audio is available on the same interface. This is new to the BBB. 3.3V digital I/O. 1.8V analog I/O. The overall mechanical form factor of the BBB is the same as that of the BBW. However, due to the added features, there are some slight electrical changes in the expansion interface. The power expansion header was removed to make room for added features. Unlike other boards, the BBB is shipped with a Linux distribution on the internal flash memory. Early revisions shipped with Angström Linux and later revisions shipped with Debian Linux as an attempt to simplify things for new users. Unlike the BBW, the BBB does not provide an onboard JTAG debugger or an onboard USB to serial converter. Both these features were provided by a single chip on the BBW and were removed from the BBB for cost reasons. JTAG debugging is possible on the BBB by soldering a connector to the back of the BBB and using an external debugger. A serial port access on the BBB is provided by a serial header. This article will focus solely on the BeagleBone subfamily (BBW and BBB). The difference between them will be noted where applicable. It should be noted that for more advanced projects, the BBC/BBX should be considered as they offer additional unique features that are not available in the BBW/BBB. Most concepts learned on the BBB/BBW boards are entirely applicable to the BBC/BBX boards. You can see the BeagleBone Black in the following image: Summary In this article, we looked at the Beagle board offerings and a few unique features of each offering. Then, we went through the process of setting up a BeagleBone board and understood the basics to access it from a laptop/desktop. Resources for Article: Further resources on this subject: Protecting GPG Keys in BeagleBone [article] Making the Unit Very Mobile - Controlling Legged Movement [article] Pulse width modulator [article]
Read more
  • 0
  • 0
  • 9003

article-image-introduction-logging-tomcat-7
Packt
21 Mar 2012
9 min read
Save for later

Introduction to Logging in Tomcat 7

Packt
21 Mar 2012
9 min read
(For more resources on Apache, see here.) JULI Previous versions of Tomcat (till 5.x) use Apache common logging services for generating logs. A major disadvantage with this logging mechanism is that it can handle only single JVM configuration and makes it difficult to configure separate logging for each class loader for independent application. In order to resolve this issue, Tomcat developers have introduced a separate API for Tomcat 6 version, that comes with the capability of capturing each class loader activity in the Tomcat logs. It is based on java.util.logging framework. By default, Tomcat 7 uses its own Java logging API to implement logging services. This is also called as JULI. This API can be found in TOMCAT_HOME/bin of the Tomcat 7 directory structures (tomcat-juli.jar). The following screenshot shows the directory structure of the bin directory where tomcat-juli.jar is placed. JULI also provides the feature for custom logging for each web application, and it also supports private per-application logging configurations. With the enhanced feature of separate class loader logging, it also helps in detecting memory issues while unloading the classes at runtime. For more information on JULI and the class loading issue, please refer to http://tomcat.apache.org/tomcat-7.0-doc/logging.html and http://tomcat.apache.org/tomcat-7.0-doc/class-loader-howto.html respectively. Loggers, appenders, and layouts There are some important components of logging which we use at the time of implementing the logging mechanism for applications. Each term has its individual importance in tracking the events of the application. Let's discuss each term individually to find out their usage: Loggers:It can be defined as the logical name for the log file. This logical name is written in the application code. We can configure an independent logger for each application. Appenders: The process of generation of logs are handled by appenders. There are many types of appenders, such as FileAppender, ConsoleAppender, SocketAppender, and so on, which are available in log4j. The following are some examples of appenders for log4j: log4j.appender.CATALINA=org.apache.log4j.DailyRollingFileAppender log4j.appender.CATALINA.File=${catalina.base}/logs/catalina.out log4j.appender.CATALINA.Append=true log4j.appender.CATALINA.Encoding=UTF-8 The previous four lines of appenders define the DailyRollingFileAppender in log4j, where the filename is catalina.out . These logs will have UTF-8 encoding enabled. If log4j.appender.CATALINA.append=false, then logs will not get updated in the log files. # Roll-over the log once per day log4j.appender.CATALINA.DatePattern='.'dd-MM-yyyy'.log' log4j.appender.CATALINA.layout = org.apache.log4j.PatternLayout log4j.appender.CATALINA.layout.ConversionPattern = %d [%t] %-5p %c- %m%n The previous three lines of code show the roll-over of log once per day. Layout: It is defined as the format of logs displayed in the log file. The appender uses layout to format the log files (also called as patterns). The highlighted code shows the pattern for access logs: <Valve className="org.apache.catalina.valves.AccessLogValve" directory="logs" prefix="localhost_access_log." suffix=".txt" pattern="%h %l %u %t &quot;%r&quot; %s %b" resolveHosts="false"/> Loggers, appenders, and layouts together help the developer to capture the log message for the application event. Types of logging in Tomcat 7 We can enable logging in Tomcat 7 in different ways based on the requirement. There are a total of five types of logging that we can configure in Tomcat, such as application, server, console, and so on. The following figure shows the different types of logging for Tomcat 7. These methods are used in combination with each other based on environment needs. For example, if you have issues where Tomcat services are not displayed, then console logs are very helpful to identify the issue, as we can verify the real-time boot sequence. Let's discuss each logging method briefly. Application log These logs are used to capture the application event while running the application transaction. These logs are very useful in order to identify the application level issues. For example, suppose your application performance is slow on a particular transition, then the details of that transition can only be traced in application log. The biggest advantage of application logs is we can configure separate log levels and log files for each application, making it very easy for the administrators to troubleshoot the application. Log4j is used in 90 percent of the cases for application log generation. Server log Server logs are identical to console logs. The only advantage of server logs is that they can be retrieved anytime but console logs are not available after we log out from the console. Console log This log gives you the complete information of Tomcat 7 startup and loader sequence. The log file is named as catalina.out and is found in TOMCAT_HOME/logs. This log file is very useful in checking the application deployment and server startup testing for any environment. This log is configured in the Tomcat file catalina.sh, which can be found in TOMCAT_HOME/bin. The previous screenshot shows the definition for Tomcat logging. By default, the console logs are configured as INFO mode. There are different levels of logging in Tomcat such as WARNING, INFORMATION, CONFIG, and FINE. The previous screenshot shows the Tomcat log file location, after the start of Tomcat services. The previous screenshot shows the output of the catalina.out file, where Tomcat services are started in 1903 ms. Access log Access logs are customized logs, which give information about the following: Who has accessed the application What components of the application are accessed Source IP and so on These logs play a vital role in traffic analysis of many applications to analyze the bandwidth requirement and also helps in troubleshooting the application under heavy load. These logs are configured in server.xml in TOMCAT_HOME/conf. The following screenshot shows the definition of access logs. You can customize them according to the environment and your auditing requirement. Let's discuss the pattern format of the access logs and understand how we can customize the logging format: <Valve className="org.apache.catalina.valves.AccessLogValve" directory="logs" prefix="localhost_access_log." suffix=".txt" pattern="%h %l %u %t &quot;%r&quot; %s %b" resolveHosts="false"/> Class Name: This parameter defines the class name used for generation of logs. By default, Apache Tomcat 7 uses the org.apache.catalina.valves.AccessLogValve class for access logs. Directory: This parameter defines the directory location for the log file. All the log files are generated in the log directory—TOMCAT_HOME/logs—but we can customize the log location based on our environment setup and then update the directory path in the definition of access logs. Prefix: This parameter defines the prefix of the access log filename, that is, by default, access log files are generated by the name localhost_access_log.yy-mm-dd.txt. Suffix: This parameter defines the file extension of the log file. Currently it is in .txt format. Pattern: This parameter defines the format of the log file. The pattern is a combination of values defined by the administrator, for example, %h = remote host address. The following screenshot shows the default log format for Tomcat 7. Access logs show the remote host address, date/time of request, method used for response, URI mapping, and HTTP status code. In case you have installed the web traffic analysis tool for application, then you have to change the access logs to a different format. Host manager These logs define the activity performed using Tomcat Manager, such as various tasks performed, status of application, deployment of application, and lifecycle of Tomcat. These configurations are done on the logging.properties, which can be found in TOMCAT_HOME/conf. The previous screenshot shows the definition of host, manager, and host-manager details. If you see the definitions, it defines the log location, log level, and prefix of the filename. In logging.properties, we are defining file handlers and appenders using JULI. The log file for manager looks similar to the following: I28 Jun, 2011 3:36:23 AM org.apache.catalina.core.ApplicationContext log INFO: HTMLManager: list: Listing contexts for virtual host 'localhost' 28 Jun, 2011 3:37:13 AM org.apache.catalina.core.ApplicationContext log INFO: HTMLManager: list: Listing contexts for virtual host 'localhost' 28 Jun, 2011 3:37:42 AM org.apache.catalina.core.ApplicationContext log INFO: HTMLManager: undeploy: Undeploying web application at '/sample' 28 Jun, 2011 3:37:43 AM org.apache.catalina.core.ApplicationContext log INFO: HTMLManager: list: Listing contexts for virtual host 'localhost' 28 Jun, 2011 3:42:59 AM org.apache.catalina.core.ApplicationContext log INFO: HTMLManager: list: Listing contexts for virtual host 'localhost' 28 Jun, 2011 3:43:01 AM org.apache.catalina.core.ApplicationContext log INFO: HTMLManager: list: Listing contexts for virtual host 'localhost' 28 Jun, 2011 3:53:44 AM org.apache.catalina.core.ApplicationContext log INFO: HTMLManager: list: Listing contexts for virtual host 'localhost' Types of log levels in Tomcat 7 There are seven levels defined for Tomcat logging services (JULI). They can be set based on the application requirement. The following figure shows the sequence of log levels for JULI: Every log level in JULI had its own functionality. The following table shows the functionality of each log level in JULI: Log level Description SEVERE(highest) Captures exception and Error WARNING Warning messages INFO Informational message, related to server activity CONFIG Configuration message FINE Detailed activity of server transaction (similar to debug) FINER More detailed logs than FINE FINEST(least) Entire flow of events (similar to trace) For example, let's take an appender from logging.properties and find out the log level used; the first log appender for localhost is using FINE as the log level, as shown in the following code snippet: localhost.org.apache.juli.FileHandler.level = FINE localhost.org.apache.juli.FileHandler.directory = ${catalina.base}/logs localhost.org.apache.juli.FileHandler.prefix = localhost. The following code shows the default file handler configuration for logging in Tomcat 7 using JULI. The properties are mentioned and log levels are highlighted: ############################################################ # Facility specific properties. # Provides extra control for each logger. ############################################################ org.apache.catalina.core.ContainerBase.[Catalina].[localhost].level = INFO org.apache.catalina.core.ContainerBase.[Catalina].[localhost].handlers = 2localhost.org.apache.juli.FileHandler org.apache.catalina.core.ContainerBase.[Catalina].[localhost].[/manager] .level = INFO org.apache.catalina.core.ContainerBase.[Catalina].[localhost].[/manager] .handlers = 3manager.org.apache.juli.FileHandler org.apache.catalina.core.ContainerBase.[Catalina].[localhost].[/host- manager].level = INFO org.apache.catalina.core.ContainerBase.[Catalina].[localhost].[/host- manager].handlers = 4host-manager.org.apache.juli.FileHandler
Read more
  • 0
  • 0
  • 8999
article-image-integrating-muzzley
Packt
10 Aug 2015
12 min read
Save for later

Integrating with Muzzley

Packt
10 Aug 2015
12 min read
In this article by Miguel de Sousa, author of the book Internet of Things with Intel Galileo, we will cover the following topics: Wiring the circuit The Muzzley IoT ecosystem Creating a Muzzley app Lighting up the entrance door (For more resources related to this topic, see here.) One identified issue regarding IoT is that there will be lots of connected devices and each one speaks its own language, not sharing the same protocols with other devices. This leads to an increasing number of apps to control each of those devices. Every time you purchase connected products, you'll be required to have the exclusive product app, and, in the near future, where it is predicted that more devices will be connected to the Internet than people, this is indeed a problem, which is known as the basket of remotes. Many solutions have been appearing for this problem. Some of them focus on creating common communication standards between the devices or even creating their own protocol such as the Intel Common Connectivity Framework (CCF). A different approach consists in predicting the device's interactions, where collected data is used to predict and trigger actions on the specific devices. An example using this approach is Muzzley. It not only supports a common way to speak with the devices, but also learns from the users' interaction, allowing them to control all their devices from a common app, and on collecting usage data, it can predict users' actions and even make different devices work together. In this article, we will start by understanding what Muzzley is and how we can integrate with it. We will then do some development to allow you to control your own building's entrance door. For this purpose, we will use Galileo as a bridge to communicate with a relay and the Muzzley cloud, allowing you to control the door from a common mobile app and from anywhere as long as there is Internet access. Wiring the circuit In this article, we'll be using a real home AC inter-communicator with a building entrance door unlock button and this will require you to do some homework. This integration will require you to open your inter-communicator and adjust the inner circuit, so be aware that there are always risks of damaging it. If you don't want to use a real inter-communicator, you can replace it by an LED or even by the buzzer module. If you want to use a real device, you can use a DC inter-communicator, but in this guide, we'll only be explaining how to do the wiring using an AC inter-communicator. The first thing you have to do is to take a look at the device manual and check whether it works with AC current, and the voltage it requires. If you can't locate your product manual, search for it online. In this article, we'll be using the solid state relay. This relay accepts a voltage range from 24 V up to 380 V AC, and your inter-communicator should also work in this voltage range. You'll also need some electrical wires and electrical wires junctions: Wire junctions and the solid state relay This equipment will be used to adapt the door unlocking circuit to allow it to be controlled from the Galileo board using a relay. The main idea is to use a relay to close the door opener circuit, resulting in the door being unlocked. This can be accomplished by joining the inter-communicator switch wires with the relay wires. Use some wire and wire junctions to do it, as displayed in the following image: Wiring the circuit The building/house AC circuit is represented in yellow, and S1 and S2 represent the inter-communicator switch (button). On pressing the button, we will also be closing this circuit, and the door will be unlocked. This way, the lock can be controlled both ways, using the original button and the relay. Before starting to wire the circuit, make sure that the inter-communicator circuit is powered off. If you can't switch it off, you can always turn off your house electrical board for a couple of minutes. Make sure that it is powered off by pressing the unlock button and trying to open the door. If you are not sure of what you must do or don't feel comfortable doing it, ask for help from someone more experienced. Open your inter-communicator, locate the switch, and perform the changes displayed in the preceding image (you may have to do some soldering). The Intel Galileo board will then activate the relay using pin 13, where you should wire it to the relay's connector number 3, and the Galileo's ground (GND) should be connected to the relay's connector number 4. Beware that not all the inter-communicator circuits work the same way and although we try to provide a general way to do it, there're always the risk of damaging your device or being electrocuted. Do it at your own risk. Power on your inter-communicator circuit and check whether you can open the door by pressing the unlock door button. If you prefer not using the inter-communicator with the relay, you can always replace it with a buzzer or an LED to simulate the door opening. Also, since the relay is connected to Galileo's pin 13, with the same relay code, you'll have visual feedback from the Galileo's onboard LED. The Muzzley IoT ecosystem Muzzley is an Internet of Things ecosystem that is composed of connected devices, mobile apps, and cloud-based services. Devices can be integrated with Muzzley through the device cloud or the device itself: It offers device control, a rules system, and a machine learning system that predicts and suggests actions, based on the device usage. The mobile app is available for Android, iOS, and Windows phone. It can pack all your Internet-connected devices in to a single common app, allowing them to be controlled together, and to work with other devices that are available in real-world stores or even other homemade connected devices, like the one we will create in this article. Muzzley is known for being one of the first generation platforms with the ability to predict a users' actions by learning from the user's interaction with their own devices. Human behavior is mostly unpredictable, but for convenience, people end up creating routines in their daily lives. The interaction with home devices is an example where human behavior can be observed and learned by an automated system. Muzzley tries to take advantage of these behaviors by identifying the user's recurrent routines and making suggestions that could accelerate and simplify the interaction with the mobile app and devices. Devices that don't know of each others' existence get connected through the user behavior and may create synergies among themselves. When the user starts using the Muzzley app, the interaction is observed by a profiler agent that tries to acquire a behavioral network of the linked cause-effect events. When the frequency of these network associations becomes important enough, the profiler agent emits a suggestion for the user to act upon. For instance, if every time a user arrives home, he switches on the house lights, check the thermostat, and adjust the air conditioner accordingly, the profiler agent will emit a set of suggestions based on this. The cause of the suggestion is identified and shortcuts are offered for the effect-associated action. For instance, the user could receive in the Muzzley app the following suggestions: "You are arriving at a known location. Every time you arrive here, you switch on the «Entrance bulb». Would you like to do it now?"; or "You are arriving at a known location. The thermostat «Living room» says that the temperature is at 15 degrees Celsius. Would you like to set your «Living room» air conditioner to 21 degrees Celsius?" When it comes to security and privacy, Muzzley takes it seriously and all the collected data is used exclusively to analyze behaviors to help make your life easier. This is the system where we will be integrating our door unlocker. Creating a Muzzley app The first step is to own a Muzzley developer account. If you don't have one yet, you can obtain one by visiting https://muzzley.com/developers, clicking on the Sign up button, and submitting the displayed form. To create an app, click on the top menu option Apps and then Create app. Name your App Galileo Lock and if you want to, add a description to your project. As soon as you click on Submit, you'll see two buttons displayed, allowing you to select the integration type: Muzzley allows you to integrate through the product manufacturer cloud or directly with a device. In this example, we will be integrating directly with the device. To do so, click on Device to Cloud integration. Fill in the provider name as you wish and pick two image URLs to be used as the profile (for example, http://hub.packtpub.com/wp-content/uploads/2015/08/Commercial1.jpg) and channel (for example, http://hub.packtpub.com/wp-content/uploads/2015/08/lock.png) images. We can select one of two available ways to add our device: it can be done using UPnP discovery or by inserting a custom serial number. Pick the device discovery option Serial number and ignore the fields Interface UUID and Email Access List; we will come back for them later. Save your changes by pressing the Save changes button. Lighting up the entrance door Now that we can unlock our door from anywhere using the mobile phone with an Internet connection, a nice thing to have is the entrance lights turn on when you open the building door using your Muzzley app. To do this, you can use the Muzzley workers to define rules to perform an action when the door is unlocked using the mobile app. To do this, you'll need to own one of the Muzzley-enabled smart bulbs such as Philips Hue, WeMo LED Lighting, Milight, Easybulb, or LIFX. You can find all the enabled devices in the app profiles selection list: If you don't have those specific lighting devices but have another type of connected device, search the available list to see whether it is supported. If it is, you can use that instead. Add your bulb channel to your account. You should now find it listed in your channels under the category Lighting. If you click on it, you'll be able to control the lights. To activate the trigger option in the lock profile we created previously, go to the Muzzley website and head back to the Profile Spec app, located inside App Details. Expand the property lock status by clicking on the arrow sign in the property #1 - Lock Status section and then expand the controlInterfaces section. Create a new control interface by clicking on the +controlInterface button. In the new controlInterface #1 section, we'll need to define the possible choices of label-values for this property when setting a rule. Feel free to insert an id, and in the control interface option, select the text-picker option. In the config field, we'll need to specify each of the available options, setting the display label and the real value that will be published. Insert the following JSON object: {"options":[{"value":"true","label":"Lock"}, {"value":"false","label":"Unlock"}]}. Now we need to create a trigger. In the profile spec, expand the trigger section. Create a new trigger by clicking on the +trigger button. Inside the newly created section, select the equals condition. Create an input by clicking on +input, insert the ID value, insert the ID of the control interface you have just created in the controlInterfaceId text field. Finally, add the [{"source":"selection.value","target":"data.value"}].path to map the data. Open your mobile app and click on the workers icon. Clicking on Create Worker will display the worker creation menu to you. Here, you'll be able to select a channel component property as a trigger to some other channel component property: Select the lock and select the Lock Status is equal to Unlock trigger. Save it and select the action button. In here, select the smart bulb you own and select the Status On option: After saving this rule, give it a try and use your mobile phone to unlock the door. The smart bulb should then turn on. With this, you can configure many things in your home even before you arrive there. In this specific scenario, we used our door locker as a trigger to accomplish an action on a lightbulb. If you want, you can do the opposite and open the door when a lightbulb lights up a specific color for instance. To do it, similar to how you configured your device trigger, you just have to set up the action options in your device profile page. Summary Everyday objects that surround us are being transformed into information ecosystems and the way we interact with them is slowly changing. Although IoT is growing up fast, it is nowadays in an early stage, and many issues must be solved in order to make it successfully scalable. By 2020, it is estimated that there will be more than 25 billion devices connected to the Internet. This fast growth without security regulations and deep security studies are leading to major concerns regarding the two biggest IoT challenges—security and privacy. Devices in our home that are remotely controllable or even personal data information getting into the wrong hands could be the recipe for a disaster. In this article you have learned the basic steps in wiring the circuit of your Galileo board, creating a Muzzley app, and lighting up the entrance door of your building through your Muzzley app, by using Intel Galileo board as a bridge to communicate with Muzzley cloud. Resources for Article: Further resources on this subject: Getting Started with Intel Galileo [article] Getting the current weather forecast [article] Controlling DC motors using a shield [article]
Read more
  • 0
  • 0
  • 8991

article-image-common-performance-issues
Packt
19 Jun 2014
16 min read
Save for later

Common performance issues

Packt
19 Jun 2014
16 min read
(For more resources related to this topic, see here.) Threading performance issues Threading performance issues are the issues related to concurrency, as follows: Lack of threading or excessive threading Threads blocking up to starvation (usually from competing on shared resources) Deadlock until the complete application hangs (threads waiting for each other) Memory performance issues Memory performance issues are the issues that are related to application memory management, as follows: Memory leakage: This issue is an explicit leakage or implicit leakage as seen in improper hashing Improper caching: This issue is due to over caching, inadequate size of the object, or missing essential caching Insufficient memory allocation: This issue is due to missing JVM memory tuning Algorithmic performance issues Implementing the application logic requires two important parameters that are related to each other; correctness and optimization. If the logic is not optimized, we have algorithmic issues, as follows: Costive algorithmic logic Unnecessary logic Work as designed performance issues The work as designed performance issue is a group of issues related to the application design. The application behaves exactly as designed but if the design has issues, it will lead to performance issues. Some examples of performance issues are as follows: Using synchronous when asynchronous should be used Neglecting remoteness, that is, using remote calls as if they are local calls Improper loading technique, that is, eager versus lazy loading techniques Selection of the size of the object Excessive serialization layers Web services granularity Too much synchronization Non-scalable architecture, especially in the integration layer or middleware Saturated hardware on a shared infrastructure Interfacing performance issues Whenever the application is dealing with resources, we may face the following interfacing issues that could impact our application performance: Using an old driver/library Missing frequent database housekeeping Database issues, such as, missing database indexes Low performing JMS or integration service bus Logging issues (excessive logging or not following the best practices while logging) Network component issues, that is, load balancer, proxy, firewall, and so on Miscellaneous performance issues Miscellaneous performance issues include different performance issues, as follows: Inconsistent performance of application components, for example, having slow components can cause the whole application to slow down Introduced performance issues to delay the processing speed Improper configuration tuning of different components, for example, JVM, application server, and so on Application-specific performance issues, such as excessive validations, apply many business rules, and so on Fake performance issues Fake performance issues could be a temporary issue or not even an issue. The famous examples are as follows: Networking temporary issues Scheduled running jobs (detected from the associated pattern) Software automatic updates (it must be disabled in production) Non-reproducible issues In the following sections, we will go through some of the listed issues. Threading performance issues Multithreading has the advantage of maximizing the hardware utilization. In particular, it maximizes the processing power by executing multiple tasks concurrently. But it has different side effects, especially if not used wisely inside the application. For example, in order to distribute tasks among different concurrent threads, there should be no or minimal data dependency, so each thread can complete its task without waiting for other threads to finish. Also, they shouldn't compete over different shared resources or they will be blocked, waiting for each other. We will discuss some of the common threading issues in the next section. Blocking threads A common issue where threads are blocked is waiting to obtain the monitor(s) of certain shared resources (objects), that is, holding by other threads. If most of the application server threads are consumed in a certain blocked status, the application becomes gradually unresponsive to user requests. In the Weblogic application server, if a thread keeps executing for more than a configurable period of time (not idle), it gets promoted to the Stuck thread. The more the threads are in the stuck status, the more the server status becomes critical. Configuring the stuck thread parameters is part of the Weblogic performance tuning. Performance symptoms The following symptoms are the performance symptoms that usually appear in cases of thread blocking: Slow application response (increased single request latency and pending user requests) Application server logs might show some stuck threads. The server's healthy status becomes critical on monitoring tools (application server console or different monitoring tools) Frequent application server restarts either manually or automatically Thread dump shows a lot of threads in the blocked status waiting for different resources Application profiling shows a lot of thread blocking An example of thread blocking To understand the effect of thread blocking on application execution, open the HighCPU project and measure the time it takes for execution by adding the following additional lines: long start= new Date().getTime(); .. .. long duration= new Date().getTime()-start; System.err.println("total time = "+duration); Now, try to execute the code with a different number of the thread pool size. We can try using the thread pool size as 50 and 5, and compare the results. In our results, the execution of the application with 5 threads is much faster than 50 threads! Let's now compare the NetBeans profiling results of both the executions to understand the reason behind this unexpected difference. The following screenshot shows the profiling of 50 threads; we can see a lot of blocking for the monitor in the column and the percentage of Monitor to the left waiting around at 75 percent: To get the preceding profiling screen, click on the Profile menu inside NetBeans, and then click on Profile Project (HighCPU). From the pop-up options, select Monitor and check all the available options, and then click on Run. The following screenshot shows the profiling of 5 threads, where there is almost no blocking, that is, less threads compete on these resources: Try to remove the System.out statement from inside the run() method, re-execute the tests, and compare the results. Another factor that also affects the selection of the pool size, especially when the thread execution takes long time, is the context switching overhead. This overhead requires the selection of the optimal pool size, usually related to the number of available processors for our application. Context switching is the CPU switching from one process (or thread) to another, which requires restoration of the execution data (different CPU registers and program counters). The context switching includes suspension of the current executing process, storing its current data, picking up the next process for execution according to its priority, and restoring its data. Although it's supported on the hardware level and is faster, most operating systems do this on the level of software context switching to improve the performance. The main reason behind this is the ability of the software context switching to selectively choose the required registers to save. Thread deadlock When many threads hold the monitor for objects that they need, this will result in a deadlock unless the implementation uses the new explicit Lock interface. In the example, we had a deadlock caused by two different threads waiting to obtain the monitor that the other thread held. The thread profiling will show these threads in a continuous blocking status, waiting for the monitors. All threads that go into the deadlock status become out of service for the user's requests, as shown in the following screenshot: Usually, this happens if the order of obtaining the locks is not planned. For example, if we need to have a quick and easy fix for a multidirectional thread deadlock, we can always lock the smallest or the largest bank account first, regardless of the transfer direction. This will prevent any deadlock from happening in our simple two-threaded mode. But if we have more threads, we need to have a much more mature way to handle this by using the Lock interface or some other technique. Memory performance issues In spite of all this great effort put into the allocated and free memory in an optimized way, we still see memory issues in Java Enterprise applications mainly due to the way people are dealing with memory in these applications. We will discuss mainly three types of memory issues: memory leakage, memory allocation, and application data caching. Memory leakage Memory leakage is a common performance issue where the garbage collector is not at fault; it is mainly the design/coding issues where the object is no longer required but it remains referenced in the heap, so the garbage collector can't reclaim its space. If this is repeated with different objects over a long period (according to object size and involved scenarios), it may lead to an out of memory error. The most common example of memory leakage is adding objects to the static collections (or an instance collection of long living objects, such as a servlet) and forgetting to clean collections totally or partially. Performance symptoms The following symptoms are some of the expected performance symptoms during a memory leakage in our application: The application uses heap memory increased by time The response slows down gradually due to memory congestion OutOfMemoryError occurs frequently in the logs and sometimes an application server restart is required Aggressive execution of garbage collection activities Heap dump shows a lot of objects retained (from the leakage types) A sudden increase of memory paging as reported by the operating system monitoring tools An example of memory leakage We have a sample application ExampleTwo; this is a product catalog where users can select products and add them to the basket. The application is written in spaghetti code, so it has a lot of issues, including bad design, improper object scopes, bad caching, and memory leakage. The following screenshot shows the product catalog browser page: One of the bad issues is the usage of the servlet instance (or static members), as it causes a lot of issues in multiple threads and has a common location for unnoticed memory leakages. We have added the following instance variable as a leakage location: private final HashMap<String, HashMap> cachingAllUsersCollection = new HashMap(); We will add some collections to the preceding code to cause memory leakage. We also used the caching in the session scope, which causes implicit leakage. The session scope leakage is difficult to diagnose, as it follows the session life cycle. Once the session is destroyed, the leakage stops, so we can say it is less severe but more difficult to catch. Adding global elements, such as a catalog or stock levels, to the session scope has no meaning. The session scope should only be restricted to the user-specific data. Also, forgetting to remove data that is not required from a session makes the memory utilization worse. Refer to the following code: @Stateful public class CacheSessionBean Instead of using a singleton class here or stateless bean with a static member, we used the Stateful bean, so it is instantiated per user session. We used JPA beans in the application layers instead of using View Objects. We also used loops over collections instead of querying or retrieving the required object directly, and so on. It would be good to troubleshoot this application with different profiling aspects to fix all these issues. All these factors are enough to describe such a project as spaghetti. We can use our knowledge in Apache JMeter to develop simple testing scenarios. As shown in the following screenshot, the scenario consists of catalog navigations and details of adding some products to the basket: Executing the test plan using many concurrent users over many iterations will show the bad behavior of our application, where the used memory is increased by time. There is no justification as the catalog is the same for all users and there's no specific user data, except for the IDs of the selected products. Actually, it needs to be saved inside the user session, which won't take any remarkable memory space. In our example, we intend to save a lot of objects in the session, implement a wrong session level, cache, and implement meaningless servlet level caching. All this will contribute to memory leakage. This gradual increase in the memory consumption is what we need to spot in our environment as early as possible (as we can see in the following screenshot, the memory consumption in our application is approaching 200 MB!): Improper data caching Caching is one of the critical components in the enterprise application architecture. It increases the application performance by decreasing the time required to query the object again from its data store, but it also complicates the application design and causes a lot of other secondary issues. The main concerns in the cache implementation are caching refresh rate, caching invalidation policy, data inconsistency in a distributed environment, locking issues while waiting to obtain the cached object's lock, and so on. Improper caching issue types The improper caching issue can take a lot of different variants. We will pick some of them and discuss them in the following sections. No caching (disabled caching) Disabled caching will definitely cause a big load over the interfacing resources (for example, database) by hitting it in with almost every interaction. This should be avoided while designing an enterprise application; otherwise; the application won't be usable. Fortunately, this has less impact than using wrong caching implementation! Most of the application components such as database, JPA, and application servers already have an out-of-the-box caching support. Too small caching size Too small caching size is a common performance issue, where the cache size is initially determined but doesn't get reviewed with the increase of the application data. The cache sizing is affected by many factors such as the memory size. If it allows more caching and the type of the data, lookup data should be cached entirely when possible, while transactional data shouldn't be cached unless required under a very strict locking mechanism. Also, the cache replacement policy and invalidation play an important role and should be tailored according to the application's needs, for example, least frequently used, least recently used, most frequently used, and so on. As a general rule, the bigger the cache size, the higher the cache hit rate and the lower the cache miss ratio. Also, the proper replacement policy contributes here; if we are working—as in our example—on an online product catalog, we may use the least recently used policy so all the old products will be removed, which makes sense as the users usually look for the new products. Monitoring of the caching utilization periodically is an essential proactive measure to catch any deviations early and adjust the cache size according to the monitoring results. For example, if the cache saturation is more than 90 percent and the missed cache ratio is high, a cache resizing is required. Missed cache hits are very costive as they hit the cache first and then the resource itself (for example, database) to get the required object, and then add this loaded object into the cache again by releasing another object (if the cache is 100 percent), according to the used cache replacement policy. Too big caching size Too big caching size might cause memory issues. If there is no control over the cache size and it keeps growing, and if it is a Java cache, the garbage collector will consume a lot of time trying to garbage collect that huge memory, aiming to free some memory. This will increase the garbage collection pause time and decrease the cache throughput. If the cache throughput is decreased, the latency to get objects from the cache will increase causing the cache retrieval cost to be high to the level it might be slower than hitting the actual resources (for example, database). Using the wrong caching policy Each application's cache implementation should be tailored according to the application's needs and data types (transactional versus lookup data). If the selection of the caching policy is wrong, the cache will affect the application performance rather than improving it. Performance symptoms According to the cache issue type and different cache configurations, we will see the following symptoms: Decreased cache hit rate (and increased cache missed ratio) Increased cache loading because of the improper size Increased cache latency with a huge caching size Spiky pattern in the performance testing response time, in case the cache size is not correct, causes continuous invalidation and reloading of the cached objects An example of improper caching techniques In our example, ExampleTwo, we have demonstrated many caching issues, such as no policy defined, global cache is wrong, local cache is improper, and no cache invalidation is implemented. So, we can have stale objects inside the cache. Cache invalidation is the process of refreshing or updating the existing object inside the cache or simply removing it from the cache. So in the next load, it reflects its recent values. This is to keep the cached objects always updated. Cache hit rate is the rate or ratio in which cache hits match (finds) the required cached object. It is the main measure for cache effectiveness together with the retrieval cost. Cache miss rate is the rate or ratio at which the cache hits the required object that is not found in the cache. Last access time is the timestamp of the last access (successful hit) to the cached objects. Caching replacement policies or algorithms are algorithms implemented by a cache to replace the existing cached objects with other new objects when there are no rooms available for any additional objects. This follows missed cache hits for these objects. Some examples of these policies are as follows: First-in-first-out (FIFO): In this policy, the cached object is aged and the oldest object is removed in favor of the new added ones. Least frequently used (LFU): In this policy, the cache picks the less frequently used object to free the memory, which means the cache will record statistics against each cached object. Least recently used (LRU): In this policy, the cache replaces the least recently accessed or used items; this means the cache will keep information like the last access time of all cached objects. Most recently used (MRU): This policy is the opposite of the previous one; it removes the most recently used items. This policy fits the application where items are no longer needed after the access, such as used exam vouchers. Aging policy: Every object in the cache will have an age limit, and once it exceeds this limit, it will be removed from the cache in the simple type. In the advanced type, it will also consider the invalidation of the cache according to predefined configuration rules, for example, every three hours, and so on. It is important for us to understand that caching is not our magic bullet and it has a lot of related issues and drawbacks. Sometimes, it causes overhead if not correctly tailored according to real application needs.
Read more
  • 0
  • 0
  • 8988

article-image-faqs-backtrack-4
Packt
26 May 2011
8 min read
Save for later

FAQs on BackTrack 4

Packt
26 May 2011
8 min read
BackTrack 4: Assuring Security by Penetration Testing Master the art of penetration testing with BackTrack         Q: Which version of Backtrack is to be chosen? A: On the BackTrack website (http://www.backtrack-linux.org/downloads/) or using third-party mirrors like http://mirrors.rit.edu/backtrack/ OR ftp://mirror.switch.ch/mirror/backtrack/), you will find two different formats of BackTrack version 4 (recently BackTrack 5 is out and hence you may not find version 4 on the official site but the mirror sites like http://mirrors.rit.edu/backtrack/ OR ftp://mirror.switch.ch/mirror/backtrack/ still provide it). One is formatted in ISO image file. You can use this file format if you want to burn it to a DVD, USB, Memory Cards (SSD, SDHC, SDXC, etc) or want to install BackTrack directly to your machine. The second file format is VMWare image. If you want to use BackTrack in a virtual environment, you might want to use this image to speed up the installation and configuration.   Q: What is Portable BackTrack? A: You can also install BackTrack to a USB flash disk, we call this method Portable BackTrack. After you install it to the USB flash disk, you can easily boot up into BackTrack from any machine provided with USB port. The key advantage of this method compared to the Live DVD is that you can permanently save changes to the USB flash disk. When compared to the hard disk installation, this method is more portable and convenient. To create a portable BackTrack, you can use several tools including UNetbootin (http://unetbootin.sourceforge.net), LinuxLive USB Creator (http://www.linuxliveusb.com) and LiveUSB MultiBoot (http://liveusb.info/dotclear/). These tools are available for Windows, Linux/UNIX, and Mac operating system.   Q: How to install BackTrack in a dual-boot environment? A: One of the resources that describe how to install BackTrack with other operating systems such as Windows XP can be found at: http://www.backtrack-linux.org/tutorials/dual-boot-install/.   Q: What types of penetration testing tools are available under Backtrack 4? A: BackTrack 4 comes with number of security tools that can be used during the penetration testing process. These are categorized into the following: Information gathering: This category contains tools that can be used to collect information regarding target DNS, routing, e-mail addresses, websites, mail servers, and so on. This information is usually gathered from the publicly available resources such as Internet, without touching the target environment. Network mapping: This category contains tools that can be used to assess the live status of the target host, fingerprint the operating system and, probe and map the applications/services through various port scanning techniques. Vulnerability identification: In this category you will find different set of tools to scan for vulnerabilities in various IT technologies. It also contains tools to carry out manual and automated fuzzy testing and analyzing SMB and SNMP protocols. Web application analysis: This category contains tools that can be used to assess the security of web servers and web applications. Radio network analysis: To audit wireless networks, Bluetooth and radio-frequency identification (RFID) technologies, you can use the tools in this category. Penetration: This category contains tools that can be used to exploit the vulnerabilities found in the target environment. Privilege escalation: After exploiting the vulnerabilities and gaining access to the target system, you can use the tools in this category to escalate your privileges to the highest level. Maintaining access: Tools in this category will be able to help you in maintaining access to the target machine. Note that, you might need to escalate your privileges before attempting to install any of these tools on the compromised host. Voice over IP (VOIP): In order to analyze the security of VOIP technology you can utilize the tools in this category. BackTrack 4 also contains tools that can be used for: Digital forensics: In this category you can find several tools that can be used to perform digital forensics and investigation, such as acquiring hard disk image, carving files, and analyzing disk archive. Some practical forensic procedures require you to mount the hard drive in question and swap the files in read-only mode to preserve evidence integrity. Reverse engineering: This category contains tools that can be used to debug, decompile and disassemble the executable file.   Q: Do I have to install additional tools with BackTrack 4? A: Although BackTrack 4 comes preloaded with so many security tools, however there are situations where you may need to add additional tools or packages because: It is not included with the default BackTrack 4 installation. You want to have the latest version of a particular tool which is not available in the repository. Our first suggestion is to try search the package in the software repository. If you find the package in the repository, please use that package, but if you can't find it, then you can get the software package from the author's website and install it by yourself. However, the prior method is highly recommended to avoid any installation and configuration conflicts. You can search for tools in the BackTrack repository using the apt-cache search command. However, if you can't find the package in the repository and you are sure that the package will not cause any problems later on, you can install the package by yourself.   Q: Why do we use the WebSecurify tool? A: WebSecurify is a web security testing environment that can be used to find vulnerabilities in the web applications. It can be used to check for the following vulnerabilities: SQL injection Local and remote file include Cross-site scripting Cross-site request forgery Information disclosure Session security flaws WebSecurify is readily available from the BackTrack repository. To install it you can use the apt-get command: # apt-get install websecurify You can search for tools in the BackTrack repository using the apt-cache search command.   Q: What are the types of penetration testing? A: Black-box testing: The black-box approach is also known as external testing. While applying this approach, the security auditor will be assessing the network infrastructure from a remote location and will not be aware of any internal technologies deployed by the concerning organization. By employing the number of real world hacker techniques and following through organized test phases, it may reveal some known and unknown set of vulnerabilities which may otherwise exist on the network. White-box testing: The white-box approach is also referred to as internal testing. An auditor involved in this kind of penetration testing process should be aware of all the internal and underlying technologies used by the target environment. Hence, it opens a wide gate for an auditor to view and critically evaluate the security vulnerabilities with minimum possible efforts. Grey-Box testing: The combination of both types of penetration testing provides a powerful insight for internal and external security viewpoints. This combination is known as Grey-Box testing. The key benefit in devising and practicing a gray-box approach is a set of advantages posed by both approaches mentioned earlier.   Q: What is the difference between vulnerability assessment and penetration testing? A: A key difference between vulnerability assessment and penetration testing is that penetration testing goes beyond the level of identifying vulnerabilities and hooks into the process of exploitation, privilege escalation, and maintaining access to the target system. On the other hand, vulnerability assessment provides a broad view of any existing flaws in the system without measuring the impact of these flaws to the system under consideration. Another major difference between both of these terms is that the penetration testing is considerably more intrusive than vulnerability assessment and aggressively applies all the technical methods to exploit the live production environment. However, the vulnerability assessment process carefully identifies and quantifies all the vulnerabilities in a non-invasive manner. Penetration testing is an expensive service when compared to vulnerability assessment   Q: Which class of vulnerability is considered to be the worst to resolve? A: "Design vulnerability" takes a developer to derive the specifications based on the security requirements and address its implementation securely. Thus, it takes more time and effort to resolve the issue when compared to other classes of vulnerability.   Q: Which OSSTMM test type follows the rules of Penetration Testing? A: Double blind testing   Q: What is an Application Layer? A: Layer-7 of the Open Systems Interconnection (OSI) model is known as the “Application Layer”. The key function of this model is to provide a standardized way of communication across heterogeneous networks. A model is divided into seven logical layers, namely, Physical, Data link, Network, Transport, Session, Presentation, and Application. The basic functionality of the application layer is to provide network services to user applications. More information on this can be obtained from: http://en.wikipedia.org/wiki/OSI_model.   Q: What are the steps for BackTrack testing methodology? A: The illustration below shows the BackTrack testing process. Summary In this article we took a look at some of the frequently asked questions on BackTrack 4 so that we can use it more efficiently Further resources on this subject: Tips and Tricks on BackTrack 4 [Article] BackTrack 4: Target Scoping [Article] BackTrack 4: Security with Penetration Testing Methodology [Article] Blocking Common Attacks using ModSecurity 2.5 [Article] Telecommunications and Network Security Concepts for CISSP Exam [Article] Preventing SQL Injection Attacks on your Joomla Websites [Article]
Read more
  • 0
  • 0
  • 8987
article-image-getting-started-inkscape
Packt
09 Mar 2011
9 min read
Save for later

Getting Started with Inkscape

Packt
09 Mar 2011
9 min read
Inkscape 0.48 Essentials for Web Designers Use the fascinating Inkscape graphics editor to create attractive layout designs, images, and icons for your website   Vector graphics Vector graphics are made up of paths. Each path is basically a line with a start and end point, curves, angles, and points that are calculated with a mathematical equation. These paths are not limited to being straight—they can be of any shape, size, and even encompass any number of curves. When you combine them, they create drawings, diagrams, and can even help create certain fonts. These characteristics make vector graphics very different than JPEGs, GIFs, or BMP images—all of which are considered rasterized or bitmap images made up of tiny squares which are called pixels or bits. If you magnify these images, you will see they are made up of a grid (bitmaps) and if you keep magnifying them, they will become blurry and grainy as each pixel with bitmap square's zoom level grows larger. Computer monitors also use pixels in a grid. However, they use millions of them so that when you look at a display, your eyes see a picture. In high-resolution monitors, the pixels are smaller and closer together to give a crisper image. How does this all relate to vector-based graphics? Vector-based graphics aren't made up of squares. Since they are based on paths, you can make them larger (by scaling) and the image quality stays the same, lines and edges stay clean, and the same images can be used on items as small as letterheads or business cards or blown up to be billboards or used in high definition animation sequences. This flexibility, often accompanied by smaller file sizes, makes vector graphics ideal—especially in the world of the Internet, varying computer displays, and hosting services for web spaces, which leads us nicely to Inkscape, a tool that can be invaluable for use in web design. What is Inkscape and how can it be used? Inkscape is a free, open source program developed by a group of volunteers under the GNU General Public License (GPL). You not only get a free download but can use the program to create items with it and freely distribute them, modify the program itself, and share that modified program with others. Inkscape uses Scalable Vector Graphics (SVG), a vector-based drawing language that uses some basic principles: A drawing can (and should) be scalable to any size without losing detail A drawing can use an unlimited number of smaller drawings used in any number of ways (and reused) and still be a part of a larger whole SVG and World Wide Web Consortium (W3C) web standards are built into Inkscape which give it a number of features including a rich body of XML (eXtensible Markup Language) format with complete descriptions and animations. Inkscape drawings can be reused in other SVG-compliant drawing programs and can adapt to different presentation methods. It has support across most web browsers (Firefox, Chrome, Opera, Safari, Internet Explorer). When you draw your objects (rectangles, circles, and so on.), arbitrary paths, and text in Inkscape, you also give them attributes such as color, gradient, or patterned fills. Inkscape automatically creates a web code (XML) for each of these objects and tags your images with this code. If need be, the graphics can then be transformed, cloned, and grouped in the code itself, Hyperlinks can even be added for use in web browsers, multi-lingual scripting (which isn't available in most commercial vector-based programs) and more—all within Inkscape or in a native programming language. It makes your vector graphics more versatile in the web space than a standard JPG or GIF graphic. There are still some limitations in the Inkscape program, even though it aims to be fully SVG compliant. For example, as of version 0.48 it still does not support animation or SVG fonts—though there are plans to add these capabilities into future versions. Installing Inkscape Inkscape is available for download for Windows, Macintosh, Linux, or Solaris operating systems. To run on the Mac OS X operating system, it typically runs under X11—an implementation of the X Window System software that makes it possible to run X11-based applications in Mac OS X. The X11 application has shipped with the Mac OS X since version 10.5. When you open Inkscape on a Mac, it will first open X11 and run Inkscape within that program. Loss of some shortcut key options will occur but all functionality is present using menus and toolbars. Let's briefly go over how to download and install Inkscape: Go to the official Inkscape website at: http://www.inkscape.org/ and download the appropriate version of the software for your computer. For the Mac OS X Leopard software, you will also need to download an additional application. It is the X11 application package 2.4.0 or greater from this website: http://xquartz.macosforge.org/trac/wiki/X112.4.0. Once downloaded, double-click the X11-2.4.0.DMG package first. It will open another folder with the X11 application installer. Double-click that icon to be prompted through an installation wizard. Double-click the downloaded Inkscape installation package to start the installation. For the Mac OS, a DMG file is downloaded. Double-click on it and then drag and drop the Inkscape package to the Application Folder. For any Windows device, an .EXE file is downloaded. Double-click that file to start and complete the installation. For Linux-based computers, there are a number of distributions available. Be sure to download and install the correct installation package for your system. Now find the Inkscape icon in the Application or Programs folders to open the program. Double-click the Inkscape icon and the program will automatically open to the main screen. The basics of the software When you open Inkscape for the first time, you'll see that the main screen and a new blank document opened are ready to go. If you are using a Macintosh computer, Inkscape opens within the X11 application and may take slightly longer to load. The Inkscape interface is based on the GNOME UI standard which uses visual cues and feedback for any icons. For example: Hovering your mouse over any icon displays a pop-up description of the icon. If an icon has a dark gray border, it is active and can be used. If an icon is grayed out, it is not currently available to use with the current selection. All icons that are in execution mode (or busy) are covered by a dark shadow. This signifies that the application is busy and won't respond to any edit request. There is a Notification Display on the main screen that displays dynamic help messages to key shortcuts and basic information on how to use the Inkscape software in its current state or based on what objects and tools are selected. Main screen basics Within the main screen there is the main menu, a command, snap and status bar, tool controls, and a palette bar. Main menu You will use the main menu bar the most when working on your projects. This is the central location to find every tool and menu item in the program—even those found in the visual-based toolbars below it on the screen. When you select a main menu item the Inkscape dialog displays the icon, a text description, and shortcut key combination for the feature. This can be helpful while first learning the program—as it provides you with easier and often faster ways to use your most commonly used functions of the program. Toolbars Let's take a general tour of the tool bars seen on this main screen. We'll pay close attention to the tools we'll use most frequently. If you don't like the location of any of the toolbars, you can also make them as floating windows on your screen. This lets you move them from their pre-defined locations and move them to a location of your liking. To move any of the toolbars, from their docking point on the left side, click and drag them out of the window. When you click the upper left button to close the toolbar window, it will be relocated back into the screen. Command bar This toolbar represents the common and most frequently used commands in Inkscape: As seen in the previous screenshot you can create a new document, open an existing one, save, print, cut, paste, zoom, add text, and much more. Hover your mouse over each icon for details on its function. By default, when you open Inkscape, this toolbar is on the right side of the main screen. Snap bar Also found vertically on the right side of the main screen, this toolbar is designed to help with the Snap to features of Inkscape. It lets you easily align items (snap to guides), force objects to align to paths (snap to paths), or snap to bounding boxes and edges. Tool controls This toolbar's options change depending on which tool you have selected in the toolbox (described in the next section). When you are creating objects, it provides you all the detailed options—size, position, angles, and attributes specific to the tool you are currently using. By default, it looks like the following screenshot: (Move the mouse over the image to enlarge.) You have options to select/deselect objects within a layer, rotate or mirror objects, adjust object locations on the canvas, and scaling options and much more. Use it to define object properties when they are selected on the canvas. Toolbox bar You'll use the tool box frequently. It contains all of the main tools for creating objects, selecting and modifying objects, and drawing. To select a tool, click the icon. If you double-click a tool, you can see that tool's preferences (and change them). If you are new to Inkscape, there are a couple of hints about creating and editing text. The Text tool (A icon) in the Tool Box shown above is the only way of creating new text on the canvas. The T icon shown in the Command Bar is used only while editing text that already exists on the canvas.  
Read more
  • 0
  • 0
  • 8985

article-image-apache-solr-php-integration
Packt
25 Nov 2013
7 min read
Save for later

Apache Solr PHP Integration

Packt
25 Nov 2013
7 min read
(For more resources related to this topic, see here.) We will be looking at installation on both Windows and Linux environments. We will be using the Solarium library for communication between Solr and PHP. This article will give a brief overview of the Solarium library and showcase some of the concepts and configuration options on Solr end for implementing certain features. Calling Solr using PHP code A ping query is used in Solr to check the status of the Solr server. The Solr URL for executing the ping query is http://localhost:8080/solr/collection1/admin/ping/?wt=json. Response of Solr ping query in browser We can use Curl to get the ping response from Solr via PHP code; a sample code for executing the previous ping query is as below $curl = curl_init("http://localhost:8080/solr/collection1/admin/ping/?wt=json"); curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1); $output = curl_exec($curl); $data = json_decode($output, true); echo "Ping Status : ".$data["status"].PHP_EOL; Though Curl can be used to execute almost any query on Solr, but it is preferable to use a library which does the work for us. In our case we will be using Solarium. To execute the same query on Solr using the Solarium library the code is as follows. include_once("vendor/autoload.php"); $config = array("endpoint" => array("localhost" => array("host"=>"127.0.0.1", "port"=>"8080", "path"=>"/solr", "core"=>"collection1",) ) ); We have included the Solarium library in our code. And defined the connection parameters for our Solr server. Next we will need to create a Solarium client with the previous Solr configuration. And call the createPing() function to create the ping query. $client = new SolariumClient($config); $ping = $client->createPing(); Finally execute the ping query and get the result. $result = $client->ping($ping); $result->getStatus(); The output should be similar to the one shown below. Output of ping query using PHP Adding documents to Solr index To create a Solr index, we need to add documents to the Solr index using the command line, Solr web interface or our PHP program. But before we create a Solr index, we need to define the structure or the schema of the Solr index. Schema consists of fields and field types. It defines how each field will be treated and handled during indexing or during search. Let us see a small piece of code for adding documents to the Solr index using PHP and Solarium library. Create a solarium client. Create an instance of the update query. Create the document in PHP and finally add fields to the document. $client = new SolariumClient($config); $updateQuery = $client->createUpdate(); $doc1 = $updateQuery->createDocument(); $doc1->id = 112233445; $doc1->cat = 'book'; $doc1->name = 'A Feast For Crows'; $doc1->price = 8.99; $doc1->inStock = 'true'; $doc1->author = 'George R.R. Martin'; $doc1->series_t = '"A Song of Ice and Fire"'; Id field has been marked as unique in our schema. So we will have to keep different values for Id field for different documents that we add to Solr. Add documents to the update query followed by commit command. Finally execute the query. $updateQuery->addDocuments(array($doc1)); $updateQuery->addCommit(); $result = $client->update($updateQuery); Let us execute the code. php insertSolr.php After executing the code, a search for martin will give these documents in the result. http://localhost:8080/solr/collection1/select/?q=martin Document added to Solr index Executing search on Solr Index Documents added to the Solr index can be searched using the following piece of PHP code. $selectConfig = array( 'query' => 'cat:book AND author:Martin', 'start' => 3, 'rows' => 3,'fields' => array('id','name','price','author'), 'sort' => array('price' => 'asc') ); $query = $client->createSelect($selectConfig); $resultSet = $client->select($query); The above code creates a simple Solr query and searches for book in cat field and Martin in author field. The results are sorted in ascending order or price and fields returned are id, name of book, price and author of book. Pagination has been implemented as 3 results per page, so this query returns results for 2nd page starting from 3rd result. In addition to this simple select query, Solr also supports some advanced query modes known as dismax and edismax. With the help of these query modes, we can boost certain fields to give more importance to certain fields in our query. We can also use function queries to do some type of dynamic boosting based on values in fields. If no sorting is provided, the Solr results are sorted by the score of documents which are calculated based on the terms in the query and the matching terms in the documents in the index. Score is calculated for each document in the result set using two main factors - term frequency known as tf and inverse document frequency known as idf. In addition to these, Solr provides a way of narrowing down the results using filter queries. Also facets can be created based on fields in the index and it can be used by the end users to narrow down the results. Highlighting search results using PHP and Solr Solr can be used to highlight the fields returned in a search result based on the query. Here is a sample code for highlighting the results for search keyword harry. $query->setQuery('harry'); $query->setFields(array('id','name','author','series_t','score','last_modified')); Get the highlighting component from the query, set the fields to be highlighted and also set the html tags to be used for highlighting. $hl = $query->getHighlighting(); $hl->setFields('name,series_t'); $hl->setSimplePrefix('<strong>')->setSimplePostfix('</strong>'); Once the query is run and result set is received, we will need to retrieve the highlighted results from the result set. Here is the output for the highlighting code. Highlighted search results In addition to highlighting, Solr can be used to create a spelling suggester and a spell checker. Spelling suggester can be used to prompt input query to the end user as the user keeps on typing. Spell check can be used to prompt spelling corrections similar to 'did you mean' to the user. Solr can also be used for finding documents which are similar to a certain document based on words in certain fields. This functionality of Solr is known as more like this and is exposed via Solarium by the MoreLikeThis component. Solr also provides grouping of the result based on a particular query or a certain field. Scaling Solr Solr can be scaled to handle large number of search requests by using master slave architecture. Also if the index is huge, it can be sharded across multiple Solr instances and we can run a distributed search to get results for our query from all the sharded instances. Solarium provides a load balancing plug-in which can be used to load balance queries across master-slave architecture. Summary Solr provides an extensive list of features for implementing search. These features can be easily accessed in PHP using the Solarium library to build a full features search application which can be used to power search on any website. Resources for Article: Further resources on this subject: Apache Solr Configuration [Article] Getting Started with Apache Solr [Article] Making Big Data Work for Hadoop and Solr [Article]
Read more
  • 0
  • 0
  • 8984
Modal Close icon
Modal Close icon