Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials

7008 Articles
article-image-top-5-free-business-intelligence-tools
Amey Varangaonkar
02 Apr 2018
7 min read
Save for later

Top 5 free Business Intelligence tools

Amey Varangaonkar
02 Apr 2018
7 min read
There is no shortage of business intelligence tools available to modern businesses today. But they're not always easy on the pocket. Great functionality, stylish UI and ease of use always comes with a price tag. If you can afford it, great - if not, it's time to start thinking about open source and free business intelligence tools.  Free business intelligence tools can power your business Take a look at 5 of the best free or open source business intelligence tools. They're all as effective and powerful as anything you'd pay a premium for. You simply need to know what you're doing with them. BIRT BIRT (Business Intelligence and Reporting Tools) is an open-source project that offers industry-standard reporting and BI capabilities. It's available as both a desktop and web application. As a top-level project within the umbrella of the Eclipse Foundation, it's got a good pedigree that means you can be confident in its potency. BIRT is especially useful for businesses which have a working environment built around Java and Java EE, as its reporting and charting engines can integrate seamlessly with Java. From creating a range of reports to different types of charts and graphs, BIRT can also be used for advanced analytical tasks. You can learn about the impressive reporting capabilities that BIRT offers on its official features page. Pros: The BIRT platform is one of the most popularly used open source business intelligence tools across the world, with more than 12 million downloads and 2.5 million users across more than 150 countries. With a large community of users, getting started with this tool, or getting solutions to problems that you might come across should be easy. Cons: Some programming experience, preferably in Java, is required to make the best use of this tool. The complex functions and features may not be easy to grasp for absolute beginners. Jaspersoft Community Jaspersoft, formerly known as Panscopic, is one of the leading open source suites of tools for a variety of reporting and business intelligence tasks. It was acquired by TIBCO in 2014 in a deal worth approximately $185 million, and has grown in popularity ever since. Jaspersoft began with the promise of “saving the world from the oppression of complex, heavyweight business intelligence”, and the Community edition offers the following set of tools for easier reporting and analytics: JasperReports Server: This tool is used for designing standalone or embeddable reports which can be used across third party applications JasperReports Library: You can design pixel-perfect reports from different kinds of datasets Jaspersoft ETL: This is a popular warehousing tool powered by Talend for extracting useful insights from a variety of data sources Jaspersoft Studio: Eclipse-based report designer for JasperReports and JasperReports Server Visualize.js: A JavaScript-based framework to embed Jaspersoft applications Pros: Jaspersoft, like BIRT, has a large community of developers looking to actively solve any problem they might come across. More often than not, your queries are bound to be answered satisfactorily. Cons: Absolute beginners might struggle with the variety of offerings and their applications. The suite of Jaspersoft tools is more suited for someone with an intermediate programming experience. KNIME KNIME is a free, open-source data analytics and business intelligence company that offers a robust platform for reporting and data integration. Used commonly by data scientists and analysts, KNIME offers features for data mining, machine learning and data visualization in order to build effective end-to-end data pipelines. There are 2 major product offerings from KNIME: KNIME Analytics Platform KNIME Cloud Analytics Platform Considered to be one of the most established players in the Analytics and business intelligence market, KNIME has customers in over 60 countries worldwide. You can often find KNIME featured as a ‘Leader’ in the Gartner Magic Quadrant. It finds applications in a variety of enterprise use-cases, including pharma, CRM, finance, and more. Pros: If you want to leverage the power of predictive analytics and machine learning, KNIME offers you just the perfect environment to build industry-standard, accurate models. You can create a wide variety of visualizations including complex plots and charts, and perform complex ETL tasks with relative ease. Cons: KNIME is not suited for beginners. It's built instead for established professionals such as data scientists and analysts who want to conduct analyses quickly and efficiently. Tableau Public Tableau Public’s promise is simple - “Visualize and share your data in minutes - for free”. Tableau is one of the most popular business intelligence tools out there, rivalling the likes of Qlik, Spotfire, Power BI among others. Along with its enterprise edition which offers premium analytics, reporting and dashboarding features, Tableau also offers a freely available Public version for effective visual analytics. Last year, Tableau released an announcement that the interactive stories and reports published on the Tableau Public platform had received more than 1 billion views worldwide. Leading news organizations around the world, including BBC and CNBC, use Tableau Public for data visualization. Pros: Tableau Public is a very popular tool with a very large community of users. If you find yourself struggling to understand or execute any feature on this platform, there are ample number of solutions available on the community forums and also on forums such as Stack Overflow. The quality of visualizations is industry-standard, and you can publish them anywhere on the web without any hassle. Cons:It’s quite difficult to think of any drawback of using Tableau Public, to be honest. Having limited features as compared to the enterprise edition of Tableau is obviously a shortcoming, though. [box type="info" align="" class="" width=""]Editor’s tip: If you want to get started with Tableau Public and create interesting data stories using it, Creating Data Stories with Tableau Public is one book you do not want to miss out on![/box] Microsoft Power BI Microsoft Power BI is a paid, enterprise-ready offering by Microsoft to empower businesses to find intuitive data insights across a variety of data formats. Microsoft also offers a stripped-down version of Power BI with limited Business Intelligence capabilities called as Power BI Desktop. In this free version, users are offered up to 1 GB of data to work on, and the ability to create different kinds of visualizations on CSV data as well as Excel spreadsheets. The reports and visualizations built using Power BI Desktop can be viewed on mobile devices as well as on browsers, and can be updated on the go. Pros: Free, very easy to use. Power BI Desktop allows you to create intuitive visualizations and reports. For beginners looking to learn the basics of Business Intelligence and data visualization, this is a great tool to use. You can also work with any kind of data and connect it to the Power BI Desktop effortlessly. Cons: You don’t get the full suite of features on Power BI Desktop which make Power BI such an elegant and wonderful Business Intelligence tool. Also, new reports and dashboards cannot be created via the mobile platform. [box type="info" align="" class="" width=""]Editor’s Tip: If you want to get started with Microsoft Power BI, or want handy tips on using Power BI effectively, our Microsoft Power BI Cookbook will prove to be of great use! [/box] There are a few other free and open source tools which are quite effective and find a honorary mention in this article. We were absolutely spoilt for choices, and choosing the top 5 tools list among all these options was a lot of hard work! Some other tools which deserve a honorary mention are - Dataiku Free Edition, Pentaho Community Edition, QlikView Personal Edition, Rapidminer, among others. You may want to check them out as well. What do you think about this list? Are there any other free/open source business intelligence tools which should’ve made it into list?
Read more
  • 0
  • 0
  • 44955

article-image-how-to-handle-exceptions-and-synchronization-methods-with-selenium-webdriver-api
Amey Varangaonkar
02 Apr 2018
11 min read
Save for later

How to handle exceptions and synchronization methods with Selenium WebDriver API

Amey Varangaonkar
02 Apr 2018
11 min read
One of the areas often misunderstood, but is important in framework design is exception handling. Users must program into their tests and methods on how to handle exceptions that might occur in tests, including those that are thrown by applications themselves, and those that occur using the Selenium WebDriver API. In this article, we will see how to do that effectively. Let us look at different kinds of exceptions that users must account for: Implicit exceptions: Implicit exceptions are internal exceptions raised by the API method when a certain condition is not met, such as an illegal index of an array, null pointer, file not found, or something unexpected occurring at runtime. Explicit exceptions: Explicit exceptions are thrown by the user to transfer control out of the current method, and to another event handler when certain conditions are not met, such as an object is not found on the page, a test verification fails, or something expected as a known state is not met. In other words, the user is predicting that something will occur, and explicitly throws an exception if it does not. WebDriver exceptions: The Selenium WebDriver API has its own set of exceptions that can implicitly occur when elements are not found, elements are not visible, elements are not enabled or clickable, and so on. They are thrown by the WebDriver API method, but users can catch those exceptions and explicitly handle them in a predictable way. Try...catch blocks: In Java, exception handling can be completely controlled using a try...catch block of statements to transfer control to another method, so that the exit out of the current routine doesn't transfer control to the call handler up the chain, but rather, is handled in a predictable way before the exception is thrown. Let us examine the different ways of handling exceptions during automated testing. Implicit exception handling A simple example of Selenium WebDriver implicit exception handling can be described as follows: Define an element on a page Create a method to retrieve the text from the element on the page In the signature of the method, add throws Exception Do not handle a specific exception like ElementNotFoundException: // create a method to retrieve the text from an element on a page @FindBy(id="submit") protected M submit; public String getText(WebElement element) throws Exception { return element.getText(); } // use the method LoginPO.getText(submit); Now, when using an assertion method, TestNG will implicitly throw an exception if the condition is not met: Define an element on a page Create a method to verify the text of the element on a page Cast the expected and actual text to the TestNG's assertEquals method TestNG will throw an AssertionError TestNG engages the difference viewer to compare the result if it fails: // create a method to verify the text from an element on a page @FindBy(id="submit") protected M submit; public void verifyText(WebElement element, String expText) throws AssertionError { assertEquals(element.getText(), expText, "Verify Submit Button Text"); } // use the method LoginPO.verifyText(submit, "Sign Inx"); // throws AssertionError java.lang.AssertionError: Verify Text Label expected [ Sign Inx] but found [ Sign In] Expected : Sign Inx Actual : Sign In <Click to see difference> TestNG difference viewer When using the TestNG's assertEquals methods, a difference viewer will be engaged if the comparison fails. There will be a link in the stacktrace in the console to open it. Since it is an overloaded method, it can take a number of data types, such as String, Integer, Boolean, Arrays, Objects, and so on. The following screenshot displays the TestNG difference viewer: Explicit exception handling In cases where the user can predict when an error might occur in the application, they can check for that error and explicitly raise an exception if it is found. Take the login function of a browser or mobile application as an example. If the user credentials are incorrect, the app will throw an exception saying something like "username invalid, try again" or "password incorrect, please re-enter". The exception can be explicitly handled in a way that the actual error message can be thrown in the exception. Here is an example of the login method we wrote earlier with exception handling added to it: @FindBy(id="myApp_exception") protected M error; /** * login - method to login to app with error handling * * @param username * @param password * @throws Exception */ public void login(String username, String password) throws Exception { if ( !this.username.getAttribute("value").equals("") ) { this.username.clear(); } this.username.sendKeys(username); if ( !this.password.getAttribute( "value" ).equals( "" ) ) { this.password.clear(); } this.password.sendKeys(password); submit.click(); // exception handling if ( BrowserUtils.elementExists(error, Global_VARS.TIMEOUT_SECOND) ) { String getError = error.getText(); throw new Exception("Login Failed with error = " + getError); } } Try...catch exception handling Now, sometimes the user will want to trap an exception instead of throwing it, and perform some other action such as retry, reload page, cleanup dialogs, and so on. In cases like that, the user can use try...catch in Java to trap the exception. The action would be included in the try clause, and the user can decide what to do in the catch condition. Here is a simple example that uses the ExpectedConditions method to look for an element on a page, and only return true or false if it is found. No exception will be raised:  /** * elementExists - wrapper around the WebDriverWait method to * return true or false * * @param element * @param timer * @throws Exception */ public static boolean elementExists(WebElement element, int timer) { try { WebDriver driver = CreateDriver.getInstance().getCurrentDriver(); WebDriverWait exists = new WebDriverWait(driver, timer); exists.until(ExpectedConditions.refreshed( ExpectedConditions.visibilityOf(element))); return true; } catch (StaleElementReferenceException | TimeoutException | NoSuchElementException e) { return false; } } In cases where the element is not found on the page, the Selenium WebDriver will return a specific exception such as ElementNotFoundException. If the element is not visible on the page, it will return ElementNotVisibleException, and so on. Users can catch those specific exceptions in a try...catch...finally block, and do something specific for each type (reload page, re-cache element, and so on): try { .... } catch(ElementNotFoundException e) { // do something } catch(ElementNotVisibleException f) { // do something else } finally { // cleanup } Synchronizing methods Earlier, the login method was introduced, and in that method, we will now call one of the synchronization methods waitFor(title, timer) that we created in the utility classes. This method will wait for the login page to appear with the title element as defined. So, in essence, after the URL is loaded, the login method is called, and it synchronizes against a predefined page title. If the waitFor method doesn't find it, it will throw an exception, and the login will not be attempted. It's important to predict and synchronize the page object methods so that they do not get out of "sync" with the application and continue executing when a state has not been reached during the test. This becomes a tedious process during the development of the page object methods, but pays big dividends in the long run when making those methods "robust". Also, users do not have to synchronize before accessing each element. Usually, you would synchronize against the last control rendered on a page when navigating between them. In the same login method, it's not enough to just check and wait for the login page title to appear before logging in; users must also wait for the next page to render, that being the home page of the application. So, finally, in the login method we just built, another waitFor will be added: public void login(String username, String password) throws Exception { BrowserUtils.waitFor(getPageTitle(), getElementWait()); if ( !this.username.getAttribute("value").equals("") ) { this.username.clear(); } this.username.sendKeys(username); if ( !this.password.getAttribute( "value" ).equals( "" ) ) { this.password.clear(); } this.password.sendKeys(password); submit.click(); // exception handling if ( BrowserUtils.elementExists(error, Global_VARS.TIMEOUT_SECOND) ) { String getError = error.getText(); throw new Exception("Login Failed with error = " + getError); } // wait for the home page to appear BrowserUtils.waitFor(new MyAppHomePO<WebElement>().getPageTitle(), getElementWait()); } Table classes When building the page object classes, there will frequently be components on a page that are common to multiple pages, but not all pages, and rather than including the similar locators and methods in each class, users can build a common class for just that portion of the page. HTML tables are a typical example of a common component that can be classed. So, what users can do is create a generic class for the common table rows and columns, extend the subclasses that have a table with this new class, and pass in the dynamic ID or locator to the constructor when extending the subclass with that table class. Let's take a look at how this is done: Create a new page object class for the table component in the application, but do not derive it from the base class in the framework In the constructor of the new class, add a parameter of the type WebElement, requiring users to pass in the static element defined in each subclass for that specific table Create generic methods to get the row count, column count, row data, and cell data for the table In each subclass that inherits these methods, implement them for each page, varying the starting row number and/or column header rows if <th> is used rather than <tr> When the methods are called on each table, it will identify them using the WebElement passed into the constructor: /** * WebTable Page Object Class * * @author Name */ public class WebTablePO { private WebElement table; /** constructor * * @param table * @throws Exception */ public WebTablePO(WebElement table) throws Exception { setTable(table); } /** * setTable - method to set the table on the page * * @param table * @throws Exception */ public void setTable(WebElement table) throws Exception { this.table = table; } /** * getTable - method to get the table on the page * * @return WebElement * @throws Exception */ public WebElement getTable() throws Exception { return this.table; } .... Now, the structure of the class is simple so far, so let's add in some common "generic" methods that can be inherited and extended by each subclass that extends the class: // Note: JavaDoc will be eliminated in these examples for simplicity sake public int getRowCount() { List<WebElement> tableRows = table.findElements(By.tagName("tr")); return tableRows.size(); } public int getColumnCount() { List<WebElement> tableRows = table.findElements(By.tagName("tr")); WebElement headerRow = tableRows.get(1); List<WebElement> tableCols = headerRow.findElements(By.tagName("td")); return tableCols.size(); } public int getColumnCount(int index) { List<WebElement> tableRows = table.findElements(By.tagName("tr")); WebElement headerRow = tableRows.get(index); List<WebElement> tableCols = headerRow.findElements(By.tagName("td")); return tableCols.size(); } public String getRowData(int rowIndex) { List<WebElement> tableRows = table.findElements(By.tagName("tr")); WebElement currentRow = tableRows.get(rowIndex); return currentRow.getText(); } public String getCellData(int rowIndex, int colIndex) { List<WebElement> tableRows = table.findElements(By.tagName("tr")); WebElement currentRow = tableRows.get(rowIndex); List<WebElement> tableCols = currentRow.findElements(By.tagName("td")); WebElement cell = tableCols.get(colIndex - 1); return cell.getText(); } Finally, let's extend a subclass with the new WebTablePO class, and implement some of the methods: /** * Homepage Page Object Class * * @author Name */ public class MyHomepagePO<M extends WebElement> extends WebTablePO<M> { public MyHomepagePO(M table) throws Exception { super(table); } @FindBy(id = "my_table") protected M myTable; // table methods public int getTableRowCount() throws Exception { WebTablePO table = new WebTablePO(getTable()); return table.getRowCount(); } public int getTableColumnCount() throws Exception { WebTablePO table = new WebTablePO(getTable()); return table.getColumnCount(); } public int getTableColumnCount(int index) throws Exception { WebTablePO table = new WebTablePO(getTable()); return table.getColumnCount(index); } public String getTableCellData(int row, int column) throws Exception { WebTablePO table = new WebTablePO(getTable()); return table.getCellData(row, column); } public String getTableRowData(int row) throws Exception { WebTablePO table = new WebTablePO(getTable()); return table.getRowData(row).replace("\n", " "); } public void verifyTableRowData(String expRowText) { String actRowText = ""; int totalNumRows = getTableRowCount(); // parse each row until row data found for ( int i = 0; i < totalNumRows; i++ ) { if ( this.getTableRowData(i).contains(expRowText) ) { actRowText = this.getTableRowData(i); break; } } // verify the row data try { assertEquals(actRowText, expRowText, "Verify Row Data"); } catch (AssertionError e) { String error = "Row data '" + expRowText + "' Not found!"; throw new Exception(error); } } } We saw, how fairly effective it is to handle object class methods, especially when it comes to handling synchronization and exceptions. You read an excerpt from the book Selenium Framework Design in Data-Driven Testing by Carl Cocchiaro. The book will show you how to design your own automation testing framework without any hassle.
Read more
  • 0
  • 0
  • 22693

article-image-3-ways-to-deploy-a-qt-and-opencv-application
Gebin George
02 Apr 2018
16 min read
Save for later

3 ways to deploy a QT and OpenCV application

Gebin George
02 Apr 2018
16 min read
[box type="note" align="" class="" width=""]This article is an excerpt from the book, Computer Vision with OpenCV 3 and Qt5 written by Amin Ahmadi Tazehkandi.  This book covers how to build, test, and deploy Qt and OpenCV apps, either dynamically or statically.[/box] Today, we will learn three different methods to deploy a QT + OpenCV application. It is extremely important to provide the end users with an application package that contains everything it needs to be able to run on the target platform. And demand very little or no effort at all from the users in terms of taking care of the required dependencies. Achieving this kind of works-out-of-the-box condition for an application relies mostly on the type of the linking (dynamic or static) that is used to create an application, and also the specifications of the target operating system. Deploying using static linking Deploying an application statically means that your application will run on its own and it eliminates having to take care of almost all of the needed dependencies, since they are already inside the executable itself. It is enough to simply make sure you select the Release mode while building your application, as seen in the following screenshot: When your application is built in the Release mode, you can simply pick up the produced executable file and ship it to your users. If you try to deploy your application to Windows users, you might face an error similar to the following when your application is executed: The reason for this error is that on Windows, even when building your Qt application statically, you still need to make sure that Visual C++ Redistributables exist on the target system. This is required for C++ applications that are built by using Microsoft Visual C++, and the version of the required redistributables correspond to the Microsoft Visual Studio installed on your computer. In our case, the official title of the installer for these libraries is Visual C++ Redistributables for Visual Studio 2015, and it can be downloaded from the following link: https:/ / www. microsoft. com/en- us/ download/ details. aspx? id= 48145. It is a common practice to include the redistributables installer inside the installer for our application and perform a silent installation of them if they are not already installed. This process happens with most of the applications you use on your Windows PCs, most of the time, without you even noticing it. We already quite briefly talked about the advantages (fewer files to deploy) and disadvantages (bigger executable size) of static linking. But when it is meant in the context of deployment, there are some more complexities that need to be considered. So, here is another (more complete) list of disadvantages, when using static linking to deploy your applications: The building takes more time and the executable size gets bigger and bigger. You can't mix static and shared (dynamic) Qt libraries, which means you can't use the power of plugins and extending your application without building everything from scratch. Static linking, in a sense, means hiding the libraries used to build an application. Unfortunately, this option is not offered with all libraries, and failing to comply with it can lead to licensing issues with your application. This complexity arises partly because of the fact that Qt Framework uses some third-party libraries that do not offer the same set of licensing options as Qt itself. Talking about licensing issues is not a discussion suitable for this book, so we'll suffice with mentioning that you must be careful when you plan to create commercial applications using static linking of Qt libraries. For a detailed list of licenses used by third-party libraries within Qt, you can always refer to the Licenses Used in Qt web page from the following link: http://doc.qt.io/qt-5/ licenses-used-in-qt.html Static linking, even with all of its disadvantages that we just mentioned, is still an option, and a good one in some cases, provided that you can comply with the licensing options of the Qt Framework. For instance, in Linux operating systems where creating an installer for our application requires some extra work and care, static linking can help extremely reduce the effort needed to deploy applications (merely a copy and paste). So, the final decision of whether to use static linking or not is mostly on you and how you plan to deploy your application. Making this important decision will be much easier by the end of this chapter, when you have an overview of the possible linking and deployment methods. Deploying using dynamic linking When you deploy an application built with Qt and OpenCV using shared libraries (or dynamic linking), you need to make sure that the executable of your application is able to reach the runtime libraries of Qt and OpenCV, in order to load and use them. This reachability or visibility of runtime libraries can have different meanings depending on the operating system. For instance, on Windows, you need to copy the runtime libraries to the same folder where your application executable resides, or put them in a folder that is appended to the PATH environment value. Qt Framework offers command-line tools to simplify the deployment of Qt applications on Windows and macOS. As mentioned before, the first thing you need to do is to make sure your application is built in the Release mode, and not Debug mode. Then, if you are on Windows, first copy the executable (let us assume it is called app.exe) from the build folder into a separate folder (which we will refer to as deploy_path) and execute the following commands using a command-line instance: cd deploy_path QT_PATHbinwindeployqt app.exe The windeployqt tool is a deployment helper tool that simplifies the process of copying the required Qt runtime libraries into the same folder as the application executable. Itsimply takes an executable as a parameter and after determining the modules used to create it, copies all required runtime libraries and any additional required dependencies, such as Qt plugins, translations, and so on. This takes care of all the required Qt runtime libraries, but we still need to take care of OpenCV runtime libraries. If you followed all of the steps in Chapter 1, Introduction to OpenCV and Qt, for building OpenCV libraries dynamically, then you only need to manually copy the opencv_world330.dll and opencv_ffmpeg330.dll files from OpenCV installation folder (inside the x86vc14bin folder) into the same folder where your application executable resides. We didn't really go into the benefits of turning on the BUILD_opencv_world option when we built OpenCV in the early chapters of the book; however, it should be clear now that this simplifies the deployment and usage of the OpenCV libraries, by requiring only a single entry for LIBS in the *.pro file and manually copying only a single file (not counting the ffmpeg library) when deploying OpenCV applications. It should be also noted that this method has the disadvantage of copying all OpenCV codes (in a single library) along your application even when you do not need or use all of its modules in a project. Also note that on Windows, as mentioned in the Deploying using static linking section, you still need to similarly provide the end users of your application with Microsoft Visual C++ Redistributables. On a macOS operating system, it is also possible to easily deploy applications written using Qt Framework. For this reason, you can use the macdeployqt command-line tool provided by Qt. Similar to windeployqt, which accepts a Windows executable and fills the same folder with the required libraries, macdeployqt accepts a macOS application bundle and makes it deployable by copying all of the required Qt runtimes as private frameworks inside the bundle itself. Here is an example: cd deploy_path QT_PATH/bin/macdeployqt my_app_bundle Optionally, you can also provide an additional -dmg parameter, which leads to the creation of a macOS *.dmg (disk image) file. As for the deployment of OpenCV libraries when dynamic linking is used, you can create an installer using Qt Installer Framework (which we will learn about in the next section), a third-party provider, or a script that makes sure the required runtime libraries are copied to their required folders. This is because of the fact that simply copying your runtime libraries (whether it is OpenCV or anything else) to the same folder as the application executable does not help with making them visible to an application on macOS. The same also applies to the Linux operating system, where unfortunately even a tool for deploying Qt runtime libraries does not exist (at least for the moment), so we also need to take care of Qt libraries in addition to OpenCV libraries, either by using a trusted third-party provider (which you can search for online) or by using the cross-platform installer provided by Qt itself, combined with some scripting to make sure everything is in place when our application is executed. Deploy using Qt Installer Framework Qt Installer Framework allows you to create cross-platform installers of your Qt applications for Windows, macOS, and Linux operating systems. It allows for creating standard installer wizards where the user is taken through consecutive dialogs that provide all the necessary information, and finally display the progress for when the application is being installed and so on, similar to most of installations you have probably faced, and especially the installation of Qt Framework itself. Qt Installer Framework is based on Qt Framework itself but is provided as a different package and does not require Qt SDK (Qt Framework, Qt Creator, and so on) to be present on a computer. It is also possible to use Qt Installer Framework in order to create installer packages for any application, not just Qt applications. In this section, we are going to learn how to create a basic installer using Qt Installer Framework, which takes care of installing your application on a target computer and copying all the necessary dependencies. The result will be a single executable installer file that you can put on a web server to be downloaded or provide it in a USB stick or CD, or any other media type. This example project will help you get started with working your way around the many great capabilities of Qt Installer Framework by yourself. You can use the following link to download and install the Qt Installer Framework. Make sure to simply download the latest version when you use this link, or any other source for downloading it. At the moment, the latest version is 3.0.2: https://download.qt.io/official_releases/qt-installer-framework After you have downloaded and installed Qt Installer Framework, you can start creating the required files that Qt Installer Framework needs in order to create an installer. You can do this by simply browsing to the Qt Installer Framework, and from the examples folder copying the tutorial folder, which is also a template in case you want to quickly rename and re-edit all of the files and create your installer quickly. We will go the other way and create them manually; first because we want to understand the structure of the required files and folders for the Qt Installer Framework, and second, because it is still quite easy and simple. Here are the required steps for creating an installer: Assuming that you have already finished developing your Qt and OpenCV application, you can start by creating a new folder that will contain the installer files. Let's assume this folder is called deploy. Create an XML file inside the deploy folder and name it config.xml. This XML file must contain the following: <?xml version="1.0" encoding="UTF-8"?> <Installer> <Name>Your application</Name> <Version>1.0.0</Version> <Title>Your application Installer</Title> <Publisher>Your vendor</Publisher> <StartMenuDir>Super App</StartMenuDir> <TargetDir>@HomeDir@/InstallationDirectory</TargetDir> </Installer> Make sure to replace the required XML fields in the preceding code with information relevant to your application and then save and close this file: Now, create a folder named packages inside the deploy folder. This folder will contain the individual packages that you want the user to be able to install, or make them mandatory or optional so that the user can review and decide what will be installed. In the case of simpler Windows applications that are written using Qt and OpenCV, usually it is enough to have just a single package that includes the required files to run your application, and even do silent installation of Microsoft Visual C++ Redistributables. But for more complex cases, and especially when you want to have more control over individual installable elements of your application, you can also go for two or more packages, or even sub-packages. This is done by using domain-like folder names for each package. Each package folder can have a name like com.vendor.product, where vendor and product are replaced by the developer name or company and the application. A subpackage (or sub-component) of a package can be identified by adding. subproduct to the name of the parent package. For instance, you can have the following folders inside the packages folder: com.vendor.product com.vendor.product.subproduct1 com.vendor.product.subproduct2 com.vendor.product.subproduct1.subsubproduct1 … This can go on for as many products (packages) and sub-products (sub-packages) as we like. For our example case, let's create a single folder that contains our executable, since it describes it all and you can create additional packages by simply adding them to the packages folder. Let's name it something like com.amin.qtcvapp. Now, follow these required steps: Now, create two folders inside the new package folder that we created, the com.amin.qtcvapp folder. Rename them to data and meta. These two folders must exist inside all packages. Copy your application files inside the data folder. This folder will be extracted into the target folder exactly as it is (we will talk about setting the target folder of a package in the later steps). In case you are planning to create more than one package, then make sure to separate their data correctly and in a way that it makes sense. Of course, you won't be faced with any errors if you fail to do so, but the users of your application will probably be confused, for instance by skipping a package that should be installed at all times and ending up with an installed application that does not work. Now, switch to the meta folder and create the following two files inside that folder, and fill them with the codes provided for each one of them. The package.xml file should contain the following. There's no need to mention that you must fill the fields inside the XML with values relevant to your package: <?xml version="1.0" encoding="UTF-8"?> <Package> <DisplayName>The component</DisplayName> <Description>Install this component.</Description> <Version>1.0.0</Version> <ReleaseDate>1984-09-16</ReleaseDate> <Default>script</Default> <Script>installscript.qs</Script> </Package> The script in the previous XML file, which is probably the most important part of the creation of an installer, refers to a Qt Installer Script (*.qs file), which is named installerscript.qs and can be used to further customize the package, its target folder, and so on. So, let us create a file with the same name (installscript.qs) inside the meta folder, and use the following code inside it: function Component() { // initializations go here } Component.prototype.isDefault = function() { // select (true) or unselect (false) the component by default return true; } Component.prototype.createOperations = function() { try { // call the base create operations function component.createOperations(); } catch (e) { console.log(e); } } This is the most basic component script, which customizes our package (well, it only performs the default actions) and it can optionally be extended to change the target folder, create shortcuts in the Start menu or desktop (on Windows), and so on. It is a good idea to keep an eye on the Qt Installer Framework documentation and learn about its scripting to be able to create more powerful installers that can put all of the required dependencies of your app in place, and automatically. You can also browse through all of the examples inside the examples folder of the Qt Installer Framework and learn how to deal with different deployment cases. For instance, you can try to create individual packages for Qt and OpenCV dependencies and allow the users to deselect them, in case they already have the Qt runtime libraries on their computer. The last step is to use the binarycreator tool to create our single and standalone installer. Simply run the following command by using a Command Prompt (or Terminal) instance: binarycreator -p packages -c config.xml myinstaller The binarycreator is located inside the Qt Installer Framework bin folder. It requires two parameters that we have already prepared. -p must be followed by our packages folder and -c must be followed by the configuration file (or config.xml) file. After executing this command, you will get myinstaller (on Windows, you can append *.exe to it), which you can execute to install your application. This single file should contain all of the required files needed to run your application, and the rest is taken care of. You only need to provide a download link to this file, or provide it on a CD to your users. The following are the dialogs you will face in this default and most basic installer, which contains most of the usual dialogs you would expect when installing an application: If you go to the installation folder, you will notice that it contains a few more files than you put inside the data folder of your package. Those files are required by the installer to handle modifications and uninstall your application. For instance, the users of your application can easily uninstall your application by executing the maintenance tool executable, which would produce another simple and user-friendly dialog to handle the uninstall process: We saw how to deploy a QT + OpenCV applications using static linking, dynamic linking, and QT installer. If you found our post useful, do check out this book Computer Vision with OpenCV 3 and Qt5  to accentuate your OpenCV applications by developing them with Qt.  
Read more
  • 0
  • 0
  • 37237

article-image-3-best-practices-to-develop-effective-test-automation-with-selenium
Amey Varangaonkar
30 Mar 2018
5 min read
Save for later

3 best practices to develop effective test automation with Selenium

Amey Varangaonkar
30 Mar 2018
5 min read
In this article, we will look at some of the industry best practices and standards to use in order to develop and maintain effective test automation strategies with Selenium. 1. Naming Convention When developing the framework, it is important to establish some naming convention standards for each type of file created. In general, this is completely subjective. But it is important to establish them upfront so users can use the same file naming conventions for the same file types to avoid confusion later on, when there are many users building them. Here are a few suggestions: Utility classes: Utility classes don't use any prefix or suffix in their names, but do follow Java standards such as having the first letter of each word capitalized, and ending with .java extensions. (Acronyms used can be all caps). Examples include CreateDriver.java, Global_VARS.java, BrowserUtils.java, DataProvider_JSON.java, and so on. Page object classes: It is useful to be able to differentiate the page object classes from the utility classes. A good way to name them is FeaturePO.java, where PO stands for page object and is capitalized, along with the first letter of each word. End the name with a .java extension. Test classes: It is useful to be able to differentiate the test classes from the PO and utility classes. A good way to name them is FeatureTest.java, where Test stands for test class, and the first letter of each word is capitalized. End the name with a .java extension. Data files: Data files are obviously named with an extension for the type of file, such as .json, .csv, .xls, and so on. But, in the case of this framework, the files can be named the same as the corresponding test class, but without the word Test. For example, LoginCredsTest.java would have the data file LoginCreds.json. Setup classes: Usually, there is a common setup class for setup and teardown for all test classes, that can be named AUTSetup.java. So, as an example, GmailSetup.java would be the setup class for all test classes derived from it, and contains only TestNG annotated methods. Test methods: Most test methods in each test class are named using sequential numbering, followed by a feature and action. For example: tc001_gmailLoginCreds, tc002_gmailLoginPassword, and so on. Setup/teardown methods: The setup and teardown methods can be named according to the setup or teardown action they perform. The following naming conventions can be used in conjunction with the TestNG annotations: @BeforeSuite: The suiteSetup method @AfterSuite: The suiteTeardown method @BeforeClass: The classSetup method @AfterClass: The classTeardown method @BeforeMethod: The methodSetup method @AfterMethod: The methodTeardown method 2. Comments Although obvious and somewhat subjective, it is good practice to comment on code when it is not obvious why something is done, there is a complex routine, or there is a "kluge" added to work around a problem. In Java, there are two types of comments used, as well as a set of standards for JavaDoc. We will look at a couple of examples here: [box type="info" align="" class="" width=""]There is an Oracle article on using comments in Java located at http://www. oracle.com/ technetwork/java/codeconventions-141999.html#385[/box] Block comment: /* single line block comment */ code goes here… /* * multi-line block * comment */ code goes here... End-of-line comment: code goes here // end of line comment JavaDoc comments: /** * Description of the method * * @param arg1 to the method * @param arg2 to the method * return value returned from the method */ [box type="info" align="" class="" width=""]The Oracle documentation on using the JavaDoc tool is located at http://www.oracle.com/technetwork/java/javase/documentation/index-137868.html. [/box] 3. Folder names and structures As the framework starts to evolve, there needs to be some organization around the folder structure in the IDE, along with a naming convention. The IntelliJ IDE uses modules to organize the repo, and under those modules, users can create the folder structures. It is common to also separate the page object and utility classes from the test classes. So, as an example, under the top-level folder src, create main/java/com/yourCo/page objects and test/java/com/yourCo/tests folders. From there, under each structure, users can create feature-based folders. Also, to retain a completely independent set of libraries for the Selenium driver and utility classes, create a separate module called something like Selenium3 with the same folder structures. This will allow users to use the same driver class and utilities for any additional modules that are added to the repo/framework. It is common to automate testing for more than one application, and this will allow the inclusion of the module in those additional modules. Here are a few suggestions regarding folder naming conventions: Name all the folders using lowercase names, so there won't be a mix-and-match of different standards. Name the page object class folders after the features they pertain to; for instance, login for the LoginPO.java, email for the GmailPO.java, and so on. Name the test class folders after the same features as the PO classes, but under the test folder. Then there can be a one-to-one correlation between the PO and test class folders. Store the common base classes under a common folder under main. Store the common setup classes under a common folder under test. Store all the utility classes for the AUT under a utils folder under main. Store all the suite files for the tests under a suites folder under test. Here is an example of a folder structure for the Selenium3 module. Of course, there are no test folders under this one: Here is an example of a folder structure for an AUT module showing the PO and test class Folders: You read an excerpt from the book Selenium Framework Design in Data-Driven Testing  written by Carl Cocchiaro. This book presents effective techniques for building data-driven test frameworks using Selenium WebDriver.
Read more
  • 0
  • 0
  • 41407

article-image-analyzing-moby-dick-frequency-distribution-nltk
Richard Gall
30 Mar 2018
2 min read
Save for later

Analyzing Moby Dick through frequency distribution with NLTK

Richard Gall
30 Mar 2018
2 min read
What is frequency distribution and why does it matter? In the context of natural language processing, frequency distribution is simply a tally of the number of times each unique word is used in a text. Recording the individual word counts of a text can better help us understand not only what topics are being discussed and what information is important but also how that information is being discussed as well. It's a useful method for better understanding language and different types of texts. This video tutorial has been taken from from Natural Language Processing with Python. Word frequency distribution is central to performing content analysis with NLP. Its applications are wide ranging. From understanding and characterizing an author’s writing style to analyzing the vocabulary of rappers, the technique is playing a large part in wider cultural conversations. It’s also used in psychological research in a number of ways to analyze how patients use language to form frameworks for thinking about themselves and the world. Trivial or serious, word frequency distribution is becoming more and more important in the world of research. Of course, manually creating such a word frequency distribution models would be time consuming and inconvenient for data scientists. Fortunately for us, NLTK, Python’s toolkit for natural language processing, makes life much easier. How to use NLTK for frequency distribution Take a look at how to use NLTK to create a frequency distribution for Herman Melville’s Moby Dick in the video tutorial above. In it, you'll find a step by step guide to performing an important data analysis task. Once you've done that, you can try it for yourself, or have a go at performing a similar analysis on another data set. Read Analyzing Textual information using the NLTK library. Learn more about natural language processing - read How to create a conversational assistant or chatbot using Python.
Read more
  • 0
  • 0
  • 9766

article-image-django-and-django-rest-frameworks-build-restful-app
Sugandha Lahoti
29 Mar 2018
12 min read
Save for later

Getting started with Django and Django REST frameworks to build a RESTful app

Sugandha Lahoti
29 Mar 2018
12 min read
In this article, we will learn how to install Django and Django REST framework in an isolated environment. We will also look at the Django folders, files, and configurations, and how to create an app with Django. We will also introduce various command-line and GUI tools that are use to interact with the RESTful Web Services. Installing Django and Django REST frameworks in an isolated environment First, run the following command to install the Django web framework: pip install django==1.11.5 The last lines of the output will indicate that the django package has been successfully installed. The process will also install the pytz package that provides world time zone definitions. Take into account that you may also see a notice to upgrade pip. The next lines show a sample of the four last lines of the output generated by a successful pip installation: Collecting django Collecting pytz (from django) Installing collected packages: pytz, django Successfully installed django-1.11.5 pytz-2017.2 Now that we have installed the Django web framework, we can install Django REST framework. Django REST framework works on top of Django and provides us with a powerful and flexible toolkit to build RESTful Web Services. We just need to run the following command to install this package: pip install djangorestframework==3.6.4 The last lines for the output will indicate that the djangorestframework package has been successfully installed, as shown here: Collecting djangorestframework Installing collected packages: djangorestframework Successfully installed djangorestframework-3.6.4 After following the previous steps, we will have Django REST framework 3.6.4 and Django 1.11.5 installed in our virtual environment. Creating an app with Django Now, we will create our first app with Django and we will analyze the directory structure that Django creates. First, go to the root folder for the virtual environment: 01. In Linux or macOS, enter the following command: cd ~/HillarDjangoREST/01 If you prefer Command Prompt, run the following command in the Windows command line: cd /d %USERPROFILE%\HillarDjangoREST\01 If you prefer Windows PowerShell, run the following command in Windows PowerShell: cd /d $env:USERPROFILE\HillarDjangoREST\01 In Linux or macOS, run the following command to create a new Django project named restful01. The command won't produce any output: python bin/django-admin.py startproject restful01 In Windows, in either Command Prompt or PowerShell, run the following command to create a new Django project named restful01. The command won't produce any output: python Scripts\django-admin.py startproject restful01 The previous command creates a restful01 folder with other subfolders and Python files. Now, go to the recently created restful01 folder. Just execute the following command on any platform: cd restful01 Then, run the following command to create a new Django app named toys within the restful01 Django project. The command won't produce any output: python manage.py startapp toys The previous command creates a new restful01/toys subfolder, with the following files: views.py tests.py models.py apps.py admin.py   init  .py In addition, the restful01/toys folder will have a migrations subfolder with an init  .py Python script. The following diagram shows the folders and files in the directory tree, starting at the restful01 folder with two subfolders - toys and restful01: Understanding Django folders, files, and configurations After we create our first Django project and then a Django app, there are many new folders and files. First, use your favorite editor or IDE to check the Python code in the apps.py file within the restful01/toys folder (restful01\toys in Windows). The following lines show the code for this file: from django.apps import AppConfig class ToysConfig(AppConfig): name = 'toys' The code declares the ToysConfig class as a subclass of the django.apps.AppConfig class that represents a Django application and its configuration. The ToysConfig class just defines the name class attribute and sets its value to 'toys'. Now, we have to add toys.apps.ToysConfig as one of the installed apps in the restful01/settings.py file that configures settings for the restful01 Django project. I built the previous string by concatenating many values as follows: app name + .apps. + class name, which is, toys + .apps. + ToysConfig. In addition, we have to add the rest_framework app to make it possible for us to use Django REST framework. The restful01/settings.py file is a Python module with module-level variables that define the configuration of Django for the restful01 project. We will make some changes to this Django settings file. Open the restful01/settings.py file and locate the highlighted lines that specify the strings list that declares the installed apps. The following code shows the first lines for the settings.py file. Note that the file has more code: """ Django settings for restful01 project. Generated by 'django-admin startproject' using Django 1.11.5. For more information on this file, see https://docs.djangoproject.com/en/1.11/topics/settings/ For the full list of settings and their values, see https://docs.djangoproject.com/en/1.11/ref/settings/ """ import os # Build paths inside the project like this: os.path.join(BASE_DIR, ...) BASE_DIR = os.path.dirname(os.path.dirname(os.path.abspath( file ))) # Quick-start development settings - unsuitable for production # See https://docs.djangoproject.com/en/1.11/howto/deployment/checklist/ # SECURITY WARNING: keep the secret key used in production secret! SECRET_KEY = '+uyg(tmn%eo+fpg+fcwmm&x(2x0gml8)=cs@$nijab%)y$a*xe' # SECURITY WARNING: don't run with debug turned on in production! DEBUG = True ALLOWED_HOSTS = [] # Application definition INSTALLED_APPS = [ 'django.contrib.admin', 'django.contrib.auth', 'django.contrib.contenttypes', 'django.contrib.sessions', 'django.contrib.messages', 'django.contrib.staticfiles', ]         Add the following two strings to the INSTALLED_APPS strings list and save the changes to the restful01/settings.py file: 'rest_framework' 'toys.apps.ToysConfig' The following lines show the new code that declares the INSTALLED_APPS string list with the added lines highlighted and with comments to understand what each added string means. The code file for the sample is included in the hillar_django_restful_01 folder: INSTALLED_APPS = [ 'django.contrib.admin', 'django.contrib.auth', 'django.contrib.contenttypes', 'django.contrib.sessions', 'django.contrib.messages', 'django.contrib.staticfiles', # Django REST framework 'rest_framework', # Toys application 'toys.apps.ToysConfig', ] This way, we have added Django REST framework and the toys application to our initial Django project named restful01. Installing tools Now, we will leave Django for a while and we will install many tools that we will use to interact with the RESTful Web Services that we will develop throughout this book. We will use the following different kinds of tools to compose and send HTTP requests and visualize the responses throughout our book: Command-line tools GUI tools Python code Web browser JavaScript code You can use any other application that allows you to compose and send HTTP requests. There are many apps that run on tablets and smartphones that allow you to accomplish this task. However, we will focus our attention on the most useful tools when building RESTful Web Services with Django. Installing Curl We will start installing command-line tools. One of the key advantages of command-line tools is that you can easily run again the HTTP requests again after we have built them for the first time, and we don't need to use the mouse or tap the screen to run requests. We can also easily build a script with batch requests and run them. As happens with any command-line tool, it can take more time to perform the first requests compared with GUI tools, but it becomes easier once we have performed many requests and we can easily reuse the commands we have written in the past to compose new requests. Curl, also known as cURL, is a very popular open source command-line tool and library that allows us to easily transfer data. We can use the curl command-line tool to easily compose and send HTTP requests and check their responses. In Linux or macOS, you can open a Terminal and start using curl from the command line. In Windows, you have two options. You can work with curl in Command Prompt or you can decide to install curl as part of the Cygwin package installation option and execute it from the Cygwin terminal. You can read more about the Cygwin terminal and its installation procedure at: http://cygwin.com/install.html. Windows Powershell includes a curl alias that calls the Invoke-WebRequest command, and therefore, if you want to work with Windows Powershell with curl, it is necessary to remove the curl alias. If you want to use the curl command within Command Prompt, you just need to download and unzip the latest version of the curl download page: https://curl.haxx.se/download.html. Make sure you download the version that includes SSL and SSH. The following screenshot shows the available downloads for Windows. The Win64 - Generic section includes the versions that we can run in Command Prompt or Windows Powershell. After you unzip the .7zip or .zip file you have downloaded, you can include the folder in which curl.exe is included in your path. For example, if you unzip the Win64 x86_64.7zip file, you will find curl.exe in the bin folder. The following screenshot shows the results of executing curl --version on Command Prompt in Windows 10. The --version option makes curl display its version and all the libraries, protocols, and features it supports: Installing HTTPie Now, we will install HTTPie, a command-line HTTP client written in Python that makes it easy to send HTTP requests and uses a syntax that is easier than curl. By default, HTTPie displays colorized output and uses multiple lines to display the response details. In some cases, HTTPie makes it easier to understand the responses than the curl utility. However, one of the great disadvantages of HTTPie as a command-line utility is that it takes more time to load than curl, and therefore, if you want to code scripts with too many commands, you have to evaluate whether it makes sense to use HTTPie. We just need to make sure we run the following command in the virtual environment we have just created and activated. This way, we will install HTTPie only for our virtual environment. Run the following command in the terminal, Command Prompt, or Windows PowerShell to install the httpie package: pip install --upgrade httpie The last lines of the output will indicate that the httpie package has been successfully installed: Collecting httpie Collecting colorama>=0.2.4 (from httpie) Collecting requests>=2.11.0 (from httpie) Collecting Pygments>=2.1.3 (from httpie) Collecting idna<2.7,>=2.5 (from requests>=2.11.0->httpie) Collecting urllib3<1.23,>=1.21.1 (from requests>=2.11.0->httpie) Collecting chardet<3.1.0,>=3.0.2 (from requests>=2.11.0->httpie) Collecting certifi>=2017.4.17 (from requests>=2.11.0->httpie) Installing collected packages: colorama, idna, urllib3, chardet, certifi, requests, Pygments, httpie Successfully installed Pygments-2.2.0 certifi-2017.7.27.1 chardet-3.0.4 colorama-0.3.9 httpie-0.9.9 idna-2.6 requests-2.18.4 urllib3-1.22 Now, we will be able to use the http command to easily compose and send HTTP requests to our future RESTful Web Services build with Django. The following screenshot shows the results of executing http on Command Prompt in Windows 10. HTTPie displays the valid options and indicates that a URL is required: Installing the Postman REST client So far, we have installed two terminal-based or command-line tools to compose and send HTTP requests to our Django development server: cURL and HTTPie. Now, we will start installing Graphical User Interface (GUI) tools. Postman is a very popular API testing suite GUI tool that allows us to easily compose and send HTTP requests, among other features. Postman is available as a standalone app in Linux, macOS, and Windows. You can download the versions of the Postman app from the following URL: https://www.getpostman.com. The following screenshot shows the HTTP GET request builder in Postman: Installing Stoplight Stoplight is a very useful GUI tool that focuses on helping architects and developers to model complex APIs. If we need to consume our RESTful Web Service in many different programming languages, we will find Stoplight extremely helpful. Stoplight provides an HTTP request maker that allows us to compose and send requests and generate the necessary code to make them in different programming languages, such as JavaScript, Swift, C#, PHP, Node, and Go, among others. Stoplight provides a web version and is also available as a standalone app in Linux, macOS, and Windows. You can download the versions of Stoplight from the following URL: http://stoplight.io/. The following screenshot shows the HTTP GET request builder in Stoplight with the code generation at the bottom: Installing iCurlHTTP We can also use apps that can compose and send HTTP requests from mobile devices to work with our RESTful Web Services. For example, we can work with the iCurlHTTP app on iOS devices such as iPad and iPhone: https://itunes.apple.com/us/app/icurlhttp/id611943891. On Android devices, we can work with the HTTP Request app: https://play.google.com/store/apps/details?id=air.http.request&hl=en. The following screenshot shows the UI for the iCurlHTTP app running on an iPad Pro: At the time of writing, the mobile apps that allow you to compose and send HTTP requests do not provide all the features you can find in Postman or command-line utilities. We learnt to set up a virtual environment with Django and Django REST framework and created an app with Django. We looked at Django folders, files, and configurations and installed command-line and GUI tools to interact with the RESTful Web Services. This article is an excerpt from the book, Django RESTful Web Services, written by Gaston C. Hillar. This book serves as an easy guide to build Python RESTful APIs and web services with Django. The code bundle for the article is hosted on GitHub.
Read more
  • 0
  • 0
  • 17835
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at €18.99/month. Cancel anytime
article-image-how-to-build-and-deploy-microservices-using-payara-micro
Gebin George
28 Mar 2018
9 min read
Save for later

How to build and deploy Microservices using Payara Micro

Gebin George
28 Mar 2018
9 min read
Payara Micro offers a new way to run Java EE or microservice applications. It is based on the Web profile of Glassfish and bundles few additional APIs. The distribution is designed keeping modern containerized environment in mind. Payara Micro is available to download as a standalone executable JAR, as well as a Docker image. It's an open source MicroProfile compatible runtime. Today, we will learn to use payara micro to build and deploy microservices. Here’s a list of APIs that are supported in Payara Micro: Servlets, JSTL, EL, and JSPs WebSockets JSF JAX-RS Chapter 4 [ 91 ] EJB lite JTA JPA Bean Validation CDI Interceptors JBatch Concurrency JCache We will be exploring how to build our services using Payara Micro in the next section. Building services with Payara Micro Let's start building parts of our Issue Management System (IMS), which is going to be a one-stop-destination for collaboration among teams. As the name implies, this system will be used for managing issues that are raised as tickets and get assigned to users for resolution. To begin the project, we will identify our microservice candidates based on the business model of IMS. Here, let's define three functional services, which will be hosted in their own independent Git repositories: ims-micro-users ims-micro-tasks ims-micro-notify You might wonder, why these three and why separate repositories? We could create much more fine-grained services and perhaps it wouldn't be wrong to do so. The answer lies in understanding the following points: Isolating what varies: We need to be able to independently develop and deploy each unit. Changes to one business capability or domain shouldn't require changes in other services more often than desired. Organisation or Team structure: If you define teams by business capability, then they can work independent of others and release features with greater agility. The tasks team should be able to evolve independent of the teams that are handling users or notifications. The functional boundaries should allow independent version and release cycle management. Transactional boundaries for consistency: Distributed transactions are not easy, thus creating services for related features that are too fine grained, and lead to more complexity than desired. You would need to become familiar with concepts like eventual consistency, but these are not easy to achieve in practice. Source repository per service: Setting up a single repository that hosts all the services is ideal when it's the same team that works on these services and the project is relatively small. But we are building our fictional IMS, which is a large complex system with many moving parts. Separate teams would get tightly coupled by sharing a repository. Moreover, versioning and tagging of releases will be yet another problem to solve. The projects are created as standard Java EE projects, which are Skinny WARs, that will be deployed using the Payara Micro server. Payara Micro allows us to delay the decision of using a Fat JAR or Skinny WAR. This gives us flexibility in picking the deployment choice at a later stage. As Maven is a widely adopted build tool among developers, we will use the same to create our example projects, using the following steps: mvn archetype:generate -DgroupId=org.jee8ng -DartifactId=ims-micro-users - DarchetypeArtifactId=maven-archetype-webapp -DinteractiveMode=false mvn archetype:generate -DgroupId=org.jee8ng -DartifactId=ims-micro-tasks - DarchetypeArtifactId=maven-archetype-webapp -DinteractiveMode=false mvn archetype:generate -DgroupId=org.jee8ng -DartifactId=ims-micro-notify - DarchetypeArtifactId=maven-archetype-webapp -DinteractiveMode=false Once the structure is generated, update the properties and dependencies section of pom.xml with the following contents, for all three projects: <properties> <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> <maven.compiler.source>1.8</maven.compiler.source> <maven.compiler.target>1.8</maven.compiler.target> <failOnMissingWebXml>false</failOnMissingWebXml> </properties> <dependencies> <dependency> <groupId>javax</groupId> <artifactId>javaee-api</artifactId> <version>8.0</version> <scope>provided</scope> </dependency> Chapter 4 [ 93 ] <dependency> <groupId>junit</groupId> <artifactId>junit</artifactId> <version>4.12</version> <scope>test</scope> </dependency> </dependencies> Next, create a beans.xml file under WEB-INF folder for all three projects: <?xml version="1.0" encoding="UTF-8"?> <beans xmlns="http://xmlns.jcp.org/xml/ns/javaee" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://xmlns.jcp.org/xml/ns/javaee http://xmlns.jcp.org/xml/ns/javaee/beans_2_0.xsd" bean-discovery-mode="all"> </beans> You can delete the index.jsp and web.xml files, as we won't be needing them. The following is the project structure of ims-micro-users. The same structure will be used for ims-micro-tasks and ims-micro-notify: The package name for users, tasks, and notify service will be as shown as the following: org.jee8ng.ims.users (inside ims-micro-users) org.jee8ng.ims.tasks (inside ims-micro-tasks) org.jee8ng.ims.notify (inside ims-micro-notify) Each of the above will in turn have sub-packages called boundary, control, and entity. The structure follows the Boundary-Control-Entity (BCE)/Entity-Control-Boundary (ECB) pattern. The JaxrsActivator shown as follows is required to enable the JAX-RS API and thus needs to be placed in each of the projects: import javax.ws.rs.ApplicationPath; import javax.ws.rs.core.Application; @ApplicationPath("resources") public class JaxrsActivator extends Application {} All three projects will have REST endpoints that we can invoke over HTTP. When doing RESTful API design, a popular convention is to use plural names for resources, especially if the resource could represent a collection. For example: /users /tasks The resource class names in the projects use the plural form, as it's consistent with the resource URL naming used. This avoids confusions such as a resource URL being called a users resource, while the class is named UserResource. Given that this is an opinionated approach, feel free to use singular class names if desired. Here's the relevant code for ims-micro-users, ims-micro-tasks, and ims-micronotify projects respectively. Under ims-micro-users, define the UsersResource endpoint: package org.jee8ng.ims.users.boundary; import javax.ws.rs.*; import javax.ws.rs.core.*; @Path("users") public class UsersResource { @GET Chapter 4 [ 95 ] @Produces(MediaType.APPLICATION_JSON) public Response get() { return Response.ok("user works").build(); } } Under ims-micro-tasks, define the TasksResource endpoint: package org.jee8ng.ims.tasks.boundary; import javax.ws.rs.*; import javax.ws.rs.core.*; @Path("tasks") public class TasksResource { @GET @Produces(MediaType.APPLICATION_JSON) public Response get() { return Response.ok("task works").build(); } } Under ims-micro-notify, define the NotificationsResource endpoint: package org.jee8ng.ims.notify.boundary; import javax.ws.rs.*; import javax.ws.rs.core.*; @Path("notifications") public class NotificationsResource { @GET @Produces(MediaType.APPLICATION_JSON) public Response get() { return Response.ok("notification works").build(); } } Once you build all three projects using mvn clean install, you will get your Skinny WAR files generated in the target directory, which can be deployed on the Payara Micro server. Running services with Payara Micro Download the Payara Micro server if you haven't already, from this link: https://www.payara.fish/downloads. The micro server will have the name payara-micro-xxx.jar, where xxx will be the version number, which might be different when you download the file. Here's how you can start Payara Micro with our services deployed locally. When doing so, we need to ensure that the instances start on different ports, to avoid any port conflicts: >java -jar payara-micro-xxx.jar --deploy ims-micro-users/target/ims-microusers. war --port 8081 >java -jar payara-micro-xxx.jar --deploy ims-micro-tasks/target/ims-microtasks. war --port 8082 >java -jar payara-micro-xxx.jar --deploy ims-micro-notify/target/ims-micronotify. war --port 8083 This will start three instances of Payara Micro running on the specified ports. This makes our applications available under these URLs: http://localhost:8081/ims-micro-users/resources/users/ http://localhost:8082/ims-micro-tasks/resources/tasks/ http://localhost:8083/ims-micro-notify/resources/notifications/ Payar Micro can be started on a non-default port by using the --port parameter, as we did earlier. This is useful when running multiple instances on the same machine. Another option is to use the --autoBindHttp parameter, which will attempt to connect on 8080 as the default port, and if that port is unavailable, it will try to bind on the next port up, repeating until it finds an available port. Examples of starting Payara Micro: Uber JAR option: Now, there's one more feature that Payara Micro provides. We can generate an Uber JAR as well, which would be the Fat JAR approach that we learnt in the Fat JAR section. To package our ims-micro-users project as an Uber JAR, we can run the following command: java -jar payara-micro-xxx.jar --deploy ims-micro-users/target/ims-microusers. war --outputUberJar users.jar This will generate the users.jar file in the directory where you run this command. The size of this JAR will naturally be larger than our WAR file, since it will also bundle the Payara Micro runtime in it. Here's how you can start the application using the generated JAR: java -jar users.jar The server parameters that we used earlier can be passed to this runnable JAR file too. Apart from the two choices we saw for running our microservice projects, there's a third option as well. Payara Micro provides an API based approach, which can be used to programmatically start the embedded server. We will expand upon these three services as we progress further into the realm of cloud based Java EE. We saw how to leverage the power of Payara Micro to run Java EE or microservice applications. You read an excerpt from the book, Java EE 8 and Angular written by Prashant Padmanabhan. This book helps you build high-performing enterprise applications using Java EE powered by Angular at the frontend.  
Read more
  • 0
  • 0
  • 33742

article-image-how-to-build-microservices-using-rest-framework
Gebin George
28 Mar 2018
7 min read
Save for later

How to build Microservices using REST framework

Gebin George
28 Mar 2018
7 min read
Today, we will learn to build microservices using REST framework. Our microservices are Java EE 8 web projects, built using maven and published as separate Payara Micro instances, running within docker containers. The separation allows them to scale individually, as well as have independent operational activities. Given the BCE pattern used, we have the business component split into boundary, control, and entity, where the boundary comprises of the web resource (REST endpoint) and business service (EJB). The web resource will publish the CRUD operations and the EJB will in turn provide the transactional support for each of it along with making external calls to other resources. Here's a logical view for the boundary consisting of the web resource and business service: The microservices will have the following REST endpoints published for the projects shown, along with the boundary classes XXXResource and XXXService: Power Your APIs with JAXRS and CDI, for Server-Sent Events. In IMS, we publish task/issue updates to the browser using an SSE endpoint. The code observes for the events using the CDI event notification model and triggers the broadcast. The ims-users and ims-issues endpoints are similar in API format and behavior. While one deals with creating, reading, updating, and deleting a User, the other does the same for an Issue. Let's look at this in action. After you have the containers running, we can start firing requests to the /users web resource. The following curl command maps the URI /users to the @GET resource method named getAll() and returns a collection (JSON array) of users. The Java code will simply return a Set<User>, which gets converted to JsonArray due to the JSON binding support of JSON-B. The method invoked is as follows: @GET public Response getAll() {... } curl -v -H 'Accept: application/json' http://localhost:8081/ims-users/resources/users ... HTTP/1.1 200 OK ... [{ "id":1,"name":"Marcus","email":"marcus_jee8@testem.com" "credential":{"password":"1234","username":"marcus"} }, { "id":2,"name":"Bob","email":"bob@testem.com" "credential":{"password":"1234","username":"bob"} }] Next, for selecting one of the users, such as Marcus, we will issue the following curl command, which uses the /users/xxx path. This will map the URI to the @GET method which has the additional @Path("{id}") annotation as well. The value of the id is captured using the @PathParam("id") annotation placed before the field. The response is a User entity wrapped in the Response object returned. The method invoked is as follows: @GET @Path("{id}") public Response get(@PathParam("id") Long id) { ... } curl -v -H 'Accept: application/json' http://localhost:8081/ims-users/resources/users/1 ... HTTP/1.1 200 OK ... { "id":1,"name":"Marcus","email":"marcus_jee8@testem.com" "credential":{"password":"1234","username":"marcus"} } In both the preceding methods, we saw the response returned as 200 OK. This is achieved by using a Response builder. Here's the snippet for the method: return Response.ok( ..entity here..).build(); Next, for submitting data to the resource method, we use the @POST annotation. You might have noticed earlier that the signature of the method also made use of a UriInfo object. This is injected at runtime for us via the @Context annotation. A curl command can be used to submit the JSON data of a user entity. The method invoked is as follows: @POST public Response add(User newUser, @Context UriInfo uriInfo) We make use of the -d flag to send the JSON body in the request. The POST request is implied: curl -v -H 'Content-Type: application/json' http://localhost:8081/ims-users/resources/users -d '{"name": "james", "email":"james@testem.io", "credential": {"username":"james","password":"test123"}}' ... HTTP/1.1 201 Created ... Location: http://localhost:8081/ims-users/resources/users/3 The 201 status code is sent by the API to signal that an entity has been created, and it also returns the location for the newly created entity. Here's the relevant snippet to do this: //uriInfo is injected via @Context parameter to this method URI location = uriInfo.getAbsolutePathBuilder() .path(newUserId) // This is the new entity ID .build(); // To send 201 status with new Location return Response.created(location).build(); Similarly, we can also send an update request using the PUT method. The method invoked is as follows: @PUT @Path("{id}") public Response update(@PathParam("id") Long id, User existingUser) curl -v -X PUT -H 'Content-Type: application/json' http://localhost:8081/ims-users/resources/users/3 -d '{"name": "jameson", "email":"james@testem.io"}' ... HTTP/1.1 200 Ok The last method we need to map is the DELETE method, which is similar to the GET operation, with the only difference being the HTTP method used. The method invoked is as follows: @DELETE @Path("{id}") public Response delete(@PathParam("id") Long id) curl -v -X DELETE http://localhost:8081/ims-users/resources/users/3 ... HTTP/1.1 200 Ok You can try out the Issues endpoint in a similar manner. For the GET requests of /users or /issues, the code simply fetches and returns a set of entity objects. But when requesting an item within this collection, the resource method has to look up the entity by the passed in id value, captured by @PathParam("id"), and if found, return the entity, or else a 404 Not Found is returned. Here's a snippet showing just that: final Optional<Issue> issueFound = service.get(id); //id obtained if (issueFound.isPresent()) { return Response.ok(issueFound.get()).build(); } return Response.status(Response.Status.NOT_FOUND).build(); The issue instance can be fetched from a database of issues, which the service object interacts with. The persistence layer can return a JPA entity object which gets converted to JSON for the calling code. We will look at persistence using JPA in a later section. For the update request which is sent as an HTTP PUT, the code captures the identifier ID using @PathParam("id"), similar to the previous GET operation, and then uses that to update the entity. The entity itself is submitted as a JSON input and gets converted to the entity instance along with the passed in message body of the payload. Here's the code snippet for that: @PUT @Path("{id}") public Response update(@PathParam("id") Long id, Issue updated) { updated.setId(id); boolean done = service.update(updated); return done ? Response.ok(updated).build() : Response.status(Response.Status.NOT_FOUND).build(); } The code is simple to read and does one thing—it updates the identified entity and returns the response containing the updated entity or a 404 for a non-existing entity. The service references that we have looked at so far are @Stateless beans which are injected into the resource class as fields: // Project: ims-comments @Stateless public class CommentsService {... } // Project: ims-issues @Stateless public class IssuesService {... } // Project: ims-users @Stateless public class UsersService {... } These will in turn have the EntityManager injected via @PersistenceContext. Combined with the resource and service, our components have made the boundary ready for clients to use. Similar to the WebSockets section in Chapter 6, Power Your APIs with JAXRS and CDI, in IMS, we use a @ServerEndpoint which maintains the list of active sessions and then uses that to broadcast a message to all users who are connected. A ChatThread keeps track of the messages being exchanged through the @ServerEndpoint class. For the message to besent, we use the stream of sessions and filter it by open sessions, then send the message for each of the sessions: chatSessions.getSessions().stream().filter(Session::isOpen) .forEach(s -> { try { s.getBasicRemote().sendObject(chatMessage); }catch(Exception e) {...} }); To summarize, we practically saw how to leverage REST framework to build microservices. This article is an excerpt from the book, Java EE 8 and Angular written by Prashant Padmanabhan. The book covers building modern user friendly web apps with Java EE  
Read more
  • 0
  • 0
  • 51959

article-image-getting-started-with-django-restful-web-services
Sugandha Lahoti
27 Mar 2018
19 min read
Save for later

Getting started with Django RESTful Web Services

Sugandha Lahoti
27 Mar 2018
19 min read
In this article, we will kick start with learning RESTful Web Service and subsequently learn how to create models and perform migration, serialization and deserialization in Django. Here’s a quick glance of what to expect in this article: Defining the requirements for RESTful Web Service Analyzing and understanding Django tables and the database Controlling, serialization and deserialization in Django Defining the requirements for RESTful Web Service Imagine a team of developers working on a mobile app for iOS and Android and requires a RESTful Web Service to perform CRUD operations with toys. We definitely don't want to use a mock web service and we don't want to spend time choosing and configuring an ORM (short for Object-Relational Mapping). We want to quickly build a RESTful Web Service and have it ready as soon as possible to start interacting with it in the mobile app. We really want the toys to persist in a database but we don't need it to be production-ready. Therefore, we can use the simplest possible relational database, as long as we don't have to spend time performing complex installations or configurations. Django REST framework, also known as DRF, will allow us to easily accomplish this task and start making HTTP requests to the first version of our RESTful Web Service. In this case, we will work with a very simple SQLite database, the default database for a new Django REST framework project. First, we must specify the requirements for our main resource: a toy. We need the following attributes or fields for a toy entity: An integer identifier A name An optional description A toy category description, such as action figures, dolls, or playsets A release date A bool value indicating whether the toy has been on the online store's homepage at least once In addition, we want to have a timestamp with the date and time of the toy's addition to the database table, which will be generated to persist toys. In a RESTful Web Service, each resource has its own unique URL. In our web service, each toy will have its own unique URL. The following table shows the HTTP verbs, the scope, and the semantics of the methods that our first version of the web service must support. Each method is composed of an HTTP verb and a scope. All the methods have a well-defined meaning for toys and collections: HTTP verb Scope Semantics GET Toy Retrieve a single toy GET Collection of toys Retrieve all the stored toys in the collection, sorted by their name in ascending order POST Collection of toys Create a new toy in the collection PUT Toy Update an existing toy DELETE Toy Delete an existing toy In the previous table, the GET HTTP verb appears twice but with two different scopes: toys and collection of toys. The first row shows a GET HTTP verb applied to a toy, that is, to a single resource. The second row shows a GET HTTP verb applied to a collection of toys, that is, to a collection of resources. We want our web service to be able to differentiate collections from a single resource of the collection in the URLs. When we refer to a collection, we will use a slash (/) as the last character for the URL, as in http://localhost:8000/toys. When we refer to a single resource of the collection we won't use a slash (/) as the last character for the URL, as in http://localhost:8000/toys/5. Let's consider that http://localhost:8000/toys/ is the URL for the collection of toys. If we add a number to the previous URL, we identify a specific toy with an ID or primary key equal to the specified numeric value. For example, http://localhost:8000/toys/42 identifies the toy with an ID equal to 42. We have to compose and send an HTTP request with the POST HTTP verb and http://localhost:8000/toys/ request URL to create a new toy and add it to the toys collection. In this example, our RESTful Web Service will work with JSON (short for JavaScript Object Notation), and therefore we have to provide the JSON key-value pairs with the field names and the values to create the new toy. As a result of the request, the server will validate the provided values for the fields, make sure that it is a valid toy, and persist it in the database. The server will insert a new row with the new toy in the appropriate table and it will return a 201 Created status code and a JSON body with the recently added toy serialized to JSON, including the assigned ID that was automatically generated by the database and assigned to the toy object: POST http://localhost:8000/toys/ We have to compose and send an HTTP request with the GET HTTP verb and http://localhost:8000/toys/{id} request URL to retrieve the toy whose ID matches the specified numeric value in {id}. For example, if we use the request URL http://localhost:8000/toys/25, the server will retrieve the toy whose ID matches 25. As a result of the request, the server will retrieve a toy with the specified ID from the database and create the appropriate toy object in Python. If a toy is found, the server will serialize the toy object into JSON, return a 200 OK status code, and return a JSON body with the serialized toy object. If no toy matches the specified ID, the server will return only a 404 Not Found status: GET http://localhost:8000/toys/{id} We have to compose and send an HTTP request with the PUT HTTP verb and request URL http://localhost:8000/toys/{id} to retrieve the toy whose ID matches the value in {id} and replace it with a toy created with the provided data. In addition, we have to provide the JSON key-value pairs with the field names and the values to create the new toy that will replace the existing one. As a result of the request, the server will validate the provided values for the fields, make sure that it is a valid toy, and replace the one that matches the specified ID with the new one in the database. The ID for the toy will be the same after the update operation. The server will update the existing row in the appropriate table and it will return a 200 OK status code and a JSON body with the recently updated toy serialized to JSON. If we don't provide all the necessary data for the new toy, the server will return a 400 Bad Request status code. If the server doesn't find a toy with the specified ID, the server will only return a 404 Not Found status: PUT http://localhost:8000/toys/{id} We have to compose and send an HTTP request with the DELETE HTTP verb and request URL http://localhost:8000/toys/{id} to remove the toy whose ID matches the specified numeric value in {id}. For example, if we use the request URL http://localhost:8000/toys/34, the server will delete the toy whose ID matches 34. As a result of the request, the server will retrieve a toy with the specified ID from the database and create the appropriate toy object in Python. If a toy is found, the server will request the ORM delete the toy row associated with this toy object and the server will return a 204 No Content status code. If no toy matches the specified ID, the server will return only a 404 Not Found status: DELETE http://localhost:8000/toys/{id} Creating first Django model Now, we will create a simple Toy model in Django, which we will use to represent and persist toys. Open the toys/models.py file. The following lines show the initial code for this file with just one import statement and a comment that indicates we should create the models: from django.db import models #Create your models here. The following lines show the new code that creates a Toy class, specifically, a Toy model in the toys/models.py file. The code file for the sample is included in the hillar_django_restful_02_01 folder in the restful01/toys/models.py file: from django.db import models class Toy(models.Model): created = models.DateTimeField(auto_now_add=True) name = models.CharField(max_length=150, blank=False, default='') description = models.CharField(max_length=250, blank=True, default='') toy_category = models.CharField(max_length=200, blank=False, default='') release_date = models.DateTimeField() was_included_in_home = models.BooleanField(default=False) class Meta: ordering = ('name',) The Toy class is a subclass of the django.db.models.Model class and defines the following attributes: created, name, description, toy_category, release_date, and was_included_in_home. Each of these attributes represents a database column or field. We specified the field types, maximum lengths, and defaults for many attributes. The class declares a Meta inner class that declares an ordering attribute and sets its value to a tuple of string whose first value is the 'name' string. This way, the inner class indicates to Django that, by default, we want the results ordered by the name attribute in ascending order. Running initial migration Now, it is necessary to create the initial migration for the new Toy model we recently coded. We will also synchronize the SQLite database for the first time. By default, Django uses the popular self-contained and embedded SQLite database, and therefore we don't need to make changes in the initial ORM configuration. In this example, we will be working with this default configuration. Of course, we will upgrade to another database after we have a sample web service built with Django. We will only use SQLite for this example. We just need to run the following Python script in the virtual environment that we activated in the previous chapter. Make sure you are in the restful01 folder within the main folder for the virtual environment when you run the following command: python manage.py makemigrations toys The following lines show the output generated after running the previous command: Migrations for 'toys': toys/migrations/0001_initial.py: - Create model Toy The output indicates that the restful01/toys/migrations/0001_initial.py file includes the code to create the Toy model. The following lines show the code for this file that was automatically generated by Django. The code file for the sample is included in the hillar_django_restful_02_01 folder in the restful01/toys/migrations/0001_initial.py file: # -*- coding: utf-8 -*- # Generated by Django 1.11.5 on 2017-10-08 05:19 from   future    import unicode_literals from django.db import migrations, models class Migration(migrations.Migration): initial = True dependencies = [ ] operations = [ migrations.CreateModel( name='Toy', fields=[ ('id', models.AutoField(auto_created=True, primary_key=True, serialize=False, verbose_name='ID')), ('created', models.DateTimeField(auto_now_add=True)), ('name', models.CharField(default='', max_length=150)), ('description', models.CharField(blank=True, default='', max_length=250)), ('toy_category', models.CharField(default='', max_length=200)), ('release_date', models.DateTimeField()), ('was_included_in_home', models.BooleanField(default=False)), ], options={ 'ordering': ('name',), }, ), ] Understanding migrations The automatically generated code defines a subclass of the django.db.migrations.Migration class named Migration, which defines an operation that creates the Toy model's table and includes it in the operations attribute. The call to the migrations.CreateModel method specifies the model's name, the fields, and the options to instruct the ORM to create a table that will allow the underlying database to persist the model. The fields argument is a list of tuples that includes information about the field name, the field type, and additional attributes based on the data we provided in our model, that is, in the Toy class. Now, run the following Python script to apply all the generated migrations. Make sure you are in the restful01 folder within the main folder for the virtual environment when you run the following command: python manage.py migrate The following lines show the output generated after running the previous command: Operations to perform: Apply all migrations: admin, auth, contenttypes, sessions, toys Running migrations: Applying contenttypes.0001_initial... OK Applying auth.0001_initial... OK Applying admin.0001_initial... OK Applying admin.0002_logentry_remove_auto_add... OK Applying contenttypes.0002_remove_content_type_name... OK Applying auth.0002_alter_permission_name_max_length... OK Applying auth.0003_alter_user_email_max_length... OK Applying auth.0004_alter_user_username_opts... OK Applying auth.0005_alter_user_last_login_null... OK Applying auth.0006_require_contenttypes_0002... OK Applying auth.0007_alter_validators_add_error_messages... OK Applying auth.0008_alter_user_username_max_length... OK Applying sessions.0001_initial... OK Applying toys.0001_initial... OK After we run the previous command, we will notice that the root folder for our restful01 project now has a db.sqlite3 file that contains the SQLite database. We can use the SQLite command line or any other application that allows us to easily check the contents of the SQLite database to check the tables that Django generated. The first migration will generate many tables required by Django and its installed apps before running the code that creates the table for the Toys model. These tables provide support for user authentication, permissions, groups, logs, and migration management. We will work with the models related to these tables after we add more features and security to our web services. After the migration process creates all these Django tables in the underlying database, the first migration runs the Python code that creates the table required to persist our model. Thus, the last line of the running migrations section displays Applying toys.0001_initial. Analyzing the database In most modern Linux distributions and macOS, SQLite is already installed, and therefore you can run the sqlite3 command-line utility. In Windows, if you want to work with the sqlite3.exe command-line utility, you have to download the bundle of command-line tools for managing SQLite database files from the downloads section of the SQLite webpage at http://www.sqlite.org/download.html. For example, the ZIP file that includes the command-line tools for version 3.20.1 is sqlite- tools-win32-x8 6-3200100.zip. The name for the file changes with the SQLite version. You just need to make sure that you download the bundle of command-line tools and not the ZIP file that provides the SQLite DLLs. After you unzip the file, you can include the folder that includes the command-line tools in the PATH environment variable, or you can access the sqlite3.exe command-line utility by specifying the full path to it. Run the following command to list the generated tables. The first argument, db.sqlite3, specifies the file that contains that SQLite database and the second argument indicates the command that we want the sqlite3 command-line utility to run against the specified database: sqlite3 db.sqlite3 ".tables" The following lines show the output for the previous command with the list of tables that Django generated in the SQLite database: auth_group                django_admin_log auth_group_permissions                          django_content_type auth_permission                         django_migrations auth_user                 django_session auth_user_groups                      toys_toy auth_user_user_permissions The following command will allow you to check the contents of the toys_toy table after we compose and send HTTP requests to the RESTful Web Service and the web service makes CRUD operations to the toys_toy table: sqlite3 db.sqlite3 "SELECT * FROM toys_toy ORDER BY name;" Instead of working with the SQLite command-line utility, you can use a GUI tool to check the contents of the SQLite database. DB Browser for SQLite is a useful, free, multiplatform GUI tool that allows us to easily check the database contents of an SQLite database in Linux, macOS, and Windows. You can read more information about this tool and download its different versions from http://sqlitebrowser.org. Once you have installed the tool, you just need to open the db.sqlite3 file and you can check the database structure and browse the data for the different tables. After we start working with the first version of our web service, you need to check the contents of the toys_toy table with this tool The SQLite database engine and the database file name are specified in the restful01/settings.py Python file. The following lines show the declaration of the DATABASES dictionary, which contains the settings for all the databases that Django uses. The nested dictionary maps the database named default with the django.db.backends.sqlite3 database engine and the db.sqlite3 database file located in the BASE_DIR folder (restful01): DATABASES = { 'default': { 'ENGINE': 'django.db.backends.sqlite3', 'NAME': os.path.join(BASE_DIR, 'db.sqlite3'), } } After we execute the migrations, the SQLite database will have the following tables. Django uses prefixes to identify the modules and applications that each table belongs to. The tables that start with the auth_ prefix belong to the Django authentication module. The table that starts with the toys_ prefix belongs to our toys application. If we add more models to our toys application, Django will create new tables with the toys_ prefix: auth_group: Stores authentication groups auth_group_permissions: Stores permissions for authentication groups auth_permission: Stores permissions for authentication auth_user: Stores authentication users auth_user_groups: Stores authentication user groups auth_user_groups_permissions: Stores permissions for authentication user groups django_admin_log: Stores the Django administrator log django_content_type: Stores Django content types django_migrations: Stores the scripts generated by Django migrations and the date and time at which they were applied django_session: Stores Django sessions toys_toy: Persists the Toys model sqlite_sequence: Stores sequences for SQLite primary keys with autoincrement fields Understanding the table generated by Django The toys_toy table persists in the database the Toy class we recently created, specifically, the Toy model. Django's integrated ORM generated the toys_toy table based on our Toy model. Run the following command to retrieve the SQL used to create the toys_toy table: sqlite3 db.sqlite3 ".schema toys_toy" The following lines show the output for the previous command together with the SQL that the migrations process executed, to create the toys_toy table that persists the Toy model. The next lines are formatted to make it easier to understand the SQL code. Notice that the output from the command is formatted in a different way: CREATE TABLE IF NOT EXISTS "toys_toy" ( "id" integer NOT NULL PRIMARY KEY AUTOINCREMENT, "created" datetime NOT NULL, "name" varchar(150) NOT NULL, "description" varchar(250) NOT NULL, "toy_category" varchar(200) NOT NULL, "release_date" datetime NOT NULL, "was_included_in_home" bool NOT NULL ); The toys_toy table has the following columns (also known as fields) with their SQLite types, all of them not nullable: id: The integer primary key, an autoincrement row created: DateTime name: varchar(150) description: varchar(250) toy_category: varchar(200) release_date: DateTime was_included_in_home: bool Controlling, serialization and deserialization Our RESTful Web Service has to be able to serialize and deserialize the Toy instances into JSON representations. In Django REST framework, we just need to create a serializer class for the Toy instances to manage serialization to JSON and deserialization from JSON. Now, we will dive deep into the serialization and deserialization process in Django REST framework. It is very important to understand how it works because it is one of the most important components for all the RESTful Web Services we will build. Django REST framework uses a two-phase process for serialization. The serializers are mediators between the model instances and Python primitives. Parser and renderers handle as mediators between Python primitives and HTTP requests and responses. We will configure our mediator between the Toy model instances and Python primitives by creating a subclass of the rest_framework.serializers.Serializer class to declare the fields and the necessary methods to manage serialization and deserialization. We will repeat some of the information about the fields that we have included in the Toy model so that we understand all the things that we can configure in a subclass of the Serializer class. However, we will work with shortcuts, which will reduce boilerplate code later in the following examples. We will write less code in the following examples by using the ModelSerializer class. Now, go to the restful01/toys folder and create a new Python code file named serializers.py. The following lines show the code that declares the new ToySerializer class. The code file for the sample is included in the hillar_django_restful_02_01 folder in the restful01/toys/serializers.py file: from rest_framework import serializers from toys.models import Toy class ToySerializer(serializers.Serializer): pk = serializers.IntegerField(read_only=True) name = serializers.CharField(max_length=150) description = serializers.CharField(max_length=250) release_date = serializers.DateTimeField() toy_category = serializers.CharField(max_length=200) was_included_in_home = serializers.BooleanField(required=False) def create(self, validated_data): return Toy.objects.create(**validated_data) def update(self, instance, validated_data): instance.name = validated_data.get('name', instance.name) instance.description = validated_data.get('description', instance.description) instance.release_date = validated_data.get('release_date', instance.release_date) instance.toy_category = validated_data.get('toy_category', instance.toy_category) instance.was_included_in_home = validated_data.get('was_included_in_home', instance.was_included_in_home) instance.save() return instance The ToySerializer class declares the attributes that represent the fields that we want to be serialized. Notice that we have omitted the created attribute that was present in the Toy model. When there is a call to the save method that ToySerializer inherits from the serializers.Serializer superclass, the overridden create and update methods define how to create a new instance or update an existing instance. In fact, these methods must be implemented in our class because they only raise a NotImplementedError exception in their base declaration in the serializers.Serializer superclass. The create method receives the validated data in the validated_data argument. The code creates and returns a new Toy instance based on the received validated data. The update method receives an existing Toy instance that is being updated and the new validated data in the instance and validated_data arguments. The code updates the values for the attributes of the instance with the updated attribute values retrieved from the validated data. Finally, the code calls the save method for the updated Toy instance and returns the updated and saved instance. We designed a RESTful Web Service to interact with a simple SQLite database and perform CRUD operations with toys. We defined the requirements for our web service and understood the tasks performed by each HTTP method and different scope. You read an excerpt from the book, Django RESTful Web Services, written by Gaston C. Hillar. This book helps developers build complex RESTful APIs from scratch with Django and the Django REST Framework. The code bundle for the article is hosted on GitHub.  
Read more
  • 0
  • 0
  • 19255

article-image-how-to-perform-full-text-search-fts-in-postgresql
Sugandha Lahoti
27 Mar 2018
8 min read
Save for later

How to perform full-text search (FTS) in PostgreSQL

Sugandha Lahoti
27 Mar 2018
8 min read
[box type="note" align="" class="" width=""]This article is an excerpt from the book, Mastering  PostgreSQL 10, written by Hans-Jürgen Schönig. This book provides expert techniques on PostgreSQL 10 development and administration.[/box] If you are looking up names or for simple strings, you are usually querying the entire content of a field. In Full-Text-Search (FTS), this is different. The purpose of the full-text search is to look for words or groups of words, which can be found in a text. Therefore, FTS is more of a contains operation as you are basically never looking for an exact string. In this article, we will show how to perform a full-text search operation in PostgreSQL. In PostgreSQL, FTS can be done using GIN indexes. The idea is to dissect a text, extract valuable lexemes (= "preprocessed tokens of words"), and index those elements rather than the underlying text. To make your search even more successful, those words are preprocessed. Here is an example: test=# SELECT to_tsvector('english', 'A car, I want a car. I would not even mind having many cars'); to_tsvector --------------------------------------------------------------- 'car':2,6,14 'even':10 'mani':13 'mind':11 'want':4 'would':8 (1 row) The example shows a simple sentence. The to_tsvector function will take the string, apply English rules, and perform a stemming process. Based on the configuration (english), PostgreSQL will parse the string, throw away stop words, and stem individual words. For example, car and cars will be transformed to the car. Note that this is not about finding the word stem. In the case of many, PostgreSQL will simply transform the string to mani by applying standard rules working nicely with the English language. Note that the output of the to_tsvector function is highly language dependent. If you tell PostgreSQL to treat the string as dutch, the result will be totally different: test=# SELECT to_tsvector('dutch', 'A car, I want a car. I would not even mind having many cars'); to_tsvector ----------------------------------------------------------------- 'a':1,5 'car':2,6,14 'even':10 'having':12 'i':3,7 'many':13 'mind':11 'not':9 'would':8 (1 row) To figure out which configurations are supported, consider running the following query: SELECT cfgname FROM pg_ts_config; Comparing strings After taking a brief look at the stemming process, it is time to figure out how a stemmed text can be compared to a user query. The following code snippet checks for the word wanted: test=# SELECT to_tsvector('english', 'A car, I want a car. I would not even mind having many cars') @@ to_tsquery('english', 'wanted'); ?column? ---------- t (1 row) Note that wanted does not actually show up in the original text. Still, PostgreSQL will return true. The reason is that want and wanted are both transformed to the same lexeme, so the result is true. Practically, this makes a lot of sense. Imagine you are looking for a car on Google. If you find pages selling cars, this is totally fine. Finding common lexemes is, therefore, an intelligent idea. Sometimes, people are not only looking for a single word, but want to find a set of words. With to_tsquery, this is possible, as shown in the next example: test=# SELECT to_tsvector('english', 'A car, I want a car. I would not even mind having many cars') @@ to_tsquery('english', 'wanted & bmw'); ?column? ---------- f (1 row) In this case, false is returned because bmw cannot be found in our input string. In the to_tsquery function, & means and and | means or. It is therefore easily possible to build complex search strings. Defining GIN indexes If you want to apply text search to a column or a group of columns, there are basically two choices: Create a functional index using GIN Add a column containing ready-to-use tsvectors and a trigger to keep them in sync In this section, both options will be outlined. To show how things work, I have created some sample data: test=# CREATE TABLE t_fts AS SELECT comment FROM pg_available_extensions; SELECT 43 Indexing the column directly with a functional index is definitely a slower but more space efficient way to get things done: test=# CREATE INDEX idx_fts_func ON t_fts USING gin(to_tsvector('english', comment)); CREATE INDEX Deploying an index on the function is easy, but it can lead to some overhead. Adding a materialized column needs more space, but will lead to a better runtime behavior: test=# ALTER TABLE t_fts ADD COLUMN ts tsvector; ALTER TABLE The only trouble is, how do you keep this column in sync? The answer is by using a trigger: test=# CREATE TRIGGER tsvectorupdate BEFORE INSERT OR UPDATE ON t_fts FOR EACH ROW EXECUTE PROCEDURE tsvector_update_trigger(somename, 'pg_catalog.english', 'comment'); Fortunately, PostgreSQL already provides a C function that can be used by a trigger to sync the tsvector column. Just pass a name, the desired language, as well as a couple of columns to the function, and you are already done. The trigger function will take care of all that is needed. Note that a trigger will always operate within the same transaction as the statement making the modification. Therefore, there is no risk of being inconsistent. Debugging your search Sometimes, it is not quite clear why a query matches a given search string. To debug your query, PostgreSQL offers the ts_debug function. From a user's point of view, it can be used just like to_tsvector. It reveals a lot about the inner workings of the FTS infrastructure: test=# x Expanded display is on. test=# SELECT * FROM ts_debug('english', 'go to www.postgresql-support.de'); -[ RECORD 1 ]+---------------------------- alias  | asciiword description | Word, all ASCII token      | go dictionaries | {english_stem} dictionary           | english_stem lexemes    | {go} -[ RECORD 2 ]+---------------------------- alias  | blank Description | Space symbols token   |         dictionaries | {}         dictionary     |         lexemes       | -[ RECORD 3 ]+---------------------------- alias  | asciiword description | Word, all ASCII token      | to dictionaries | {english_stem} dictionary   | english_stem lexemes    | {} -[ RECORD 4 ]+---------------------------- alias  | blank description | Space symbols token | dictionaries | {} dictionary   | lexemes          | -[ RECORD 5 ]+---------------------------- alias  | host description | Host token      | www.postgresql-support.de dictionaries | {simple} dictionary | simple lexemes    | {www.postgresql-support.de} ts_debug will list every token found and display information about the token. You will see which token the parser found, the dictionary used, as well as the type of object. In my example, blanks, words, and hosts have been found. You might also see numbers, email addresses, and a lot more. Depending on the type of string, PostgreSQL will handle things differently. For example, it makes absolutely no sense to stem hostnames and e-mail addresses. Gathering word statistics Full-text search can handle a lot of data. To give end users more insights into their texts, PostgreSQL offers the pg_stat function, which returns a list of words: SELECT * FROM ts_stat('SELECT to_tsvector(''english'', comment) FROM pg_available_extensions') ORDER BY 2 DESC LIMIT 3; word   | ndoc | nentry ----------+------+-------- function | 10 |   10 data      |      10 |  10 type        |   7  |     7 (3 rows) The word column contains the stemmed word, ndoc tells us about the number of documents a certain word occurs.nentry indicates how often a word was found all together. Taking advantage of exclusion operators So far, indexes have been used to speed things up and to ensure uniqueness. However, a couple of years ago, somebody came up with the idea of using indexes for even more. As you have seen in this chapter, GiST supports operations such as intersects, overlaps, contains, and a lot more. So, why not use those operations to manage data integrity? Here is an example: test=# CREATE EXTENSION btree_gist; test=# CREATE TABLE t_reservation ( room int, from_to tsrange, EXCLUDE USING GiST (room with =, from_to with &&) ); CREATE TABLE The EXCLUDE  USING  GiST clause defines additional constraints. If you are selling rooms, you might want to allow different rooms to be booked at the same time. However, you don't want to sell the same room twice during the same period. What the EXCLUDE clause says in my example is this, if a room is booked twice at the same time, an error should pop up (the data in from_to with must not overlap (&&) if it is related to the same room). The following two rows will not violate constraints: test=# INSERT INTO t_reservation VALUES (10, '["2017-01-01", "2017-03-03"]'); INSERT 0 1 test=# INSERT INTO t_reservation VALUES (13, '["2017-01-01", "2017-03-03"]'); INSERT 0 1 However, the next INSERT will cause a violation because the data overlaps: test=# INSERT INTO t_reservation VALUES (13, '["2017-02-02", "2017-08-14"]'); ERROR:  conflicting key value violates exclusion constraint "t_reservation_room_from_to_excl" DETAIL:   Key (room, from_to)=(13, ["2017-02-02 00:00:00","2017-08-14 00:00:00"]) conflicts with existing key (room, from_to)=(13, ["2017-01-01 00:00:00","2017-03-03 00:00:00"]). The use of exclusion operators is very useful and can provide you with highly advanced means to handle integrity. To summarize, we learnt how to perform full-text search operation in PostgreSQL. If you liked our article, check out the book Mastering  PostgreSQL 10 to understand how to perform operations such as indexing, query optimization, concurrent transactions, table partitioning, server tuning, and more.  
Read more
  • 0
  • 0
  • 21944
article-image-how-to-secure-data-in-salesforce-einstein-analytics
Amey Varangaonkar
22 Mar 2018
5 min read
Save for later

How to secure data in Salesforce Einstein Analytics

Amey Varangaonkar
22 Mar 2018
5 min read
[box type="note" align="" class="" width=""]The following excerpt is taken from the book Learning Einstein Analytics written by Santosh Chitalkar. This book includes techniques to build effective dashboards and Business Intelligence metrics to gain useful insights from data.[/box] Before getting into security in Einstein Analytics, it is important to set up your organization, define user types so that it is available to use. In this article we explore key aspects of security in Einstein Analytics. The following are key points to consider for data security in Salesforce: Salesforce admins can restrict access to data by setting up field-level security and object-level security in Salesforce. These settings prevent data flow from loading sensitive Salesforce data into a dataset. Dataset owners can restrict data access by using row-level security. Analytics supports security predicates, a robust row-level security feature that enables you to model many different types of access control on datasets. Analytics also supports sharing inheritance. Take a look at the following diagram: Salesforce data security In Einstein Analytics, dataflows bring the data to the Analytics Cloud from Salesforce. It is important that Einstein Analytics has all the necessary permissions and access to objects as well as fields. If an object or a field is not accessible to Einstein then the data flow fails and it cannot extract data from Salesforce. So we need to make sure that the required access is given to the integration user and security user. We can configure the permission set for these users. Let’s configure permissions for an integration user by performing the following steps: Switch to classic mode and enter Profiles in the Quick Find / Search… box Select and clone the Analytics Cloud Integration User profile and Analytics Cloud Security User profile for the integration user and security user respectively: Save the cloned profiles and then edit them Set the permission to Read for all objects and fields Save the profile and assign it to users Take a look at the following diagram: Data pulled from Salesforce can be made secure from both sides: Salesforce as well as Einstein Analytics. It is important to understand that Salesforce and Einstein Analytics are two independent databases. So, a user security setting given to Einstein will not affect the data in Salesforce. There are the following ways to secure data pulled from Salesforce: Salesforce Security Einstein Analytics Security Roles and profiles Inheritance security Organization-Wide Defaults (OWD) and record ownership Security predicates Sharing rules Application-level security Sharing mechanism in Einstein All Analytics users start off with Viewer access to the default Shared App that’s available out-of-the-box; administrators can change this default setting to restrict or extend access. All other applications created by individual users are private, by default; the application owner and administrators have Manager access and can extend access to other Users, groups, or roles. The following diagram shows how the sharing mechanism works in Einstein Analytics: Here’s a summary of what users can do with Viewer, Editor, and Manager access: Action / Access level Viewer Editor Manager View dashboards, lenses, and datasets in the application. If the underlying dataset is in a different application than a lens or dashboard, the user must have access to both applications to view the lens or dashboard. Yes Yes Yes See who has access to the application. Yes Yes Yes Save contents of the application to another application that the user has Editor or Manager access to. Yes Yes Yes Save changes to existing dashboards, lenses, and datasets in the application (saving dashboards requires the appropriate permission set license and permission). Yes Yes Change the application’s sharing settings. Yes Rename the application. Yes Delete the application. Yes Confidentiality, integrity, and availability together are referred to as the CIA Triad and it is designed to help organizations decide what security policies to implement within the organization. Salesforce knows that keeping information private and restricting access by unauthorized users is essential for business. By sharing the application, we can share a lens, dashboard, and dataset all together with one click. To share the entire application, do the following: Go to your Einstein Analytics and then to Analytics Studio Click on the APPS tab and then the icon for your application that you want to share, as shown in the following screenshot: 3. Click on Share and it will open a new popup window, as shown in the following screenshot: Using this window, you can share the application with an individual user, a group of users, or a particular role. You can define the access level as Viewer, Editor, or Manager After selecting User, click on the user you wish to add and click on Add Save and then close the popup And that’s it. It’s done. Mass-sharing the application Sometimes, we are required to share the application with a wide audience: There are multiple approaches to mass-sharing the Wave application such as by role or by username In Salesforce classic UI, navigate to Setup|Public Groups | New For example, to share a sales application, label a public group as Analytics_Sales_Group Search and add users to a group by Role, Roles and Subordinates, or by Users (username): 5. Search for the Analytics_Sales public group 6. Add the Viewer option as shown in the following screenshot: 7. Click on Save Protecting data from breaches, theft, or from any unauthorized user is very important. And we saw that Einstein Analytics provides the necessary tools to ensure the data is secure. If you found this excerpt useful and want to know more about securing your analytics in Einstein, make sure to check out this book Learning Einstein Analytics.  
Read more
  • 0
  • 0
  • 51523

article-image-creating-2d-3d-plots-using-matplotlib
Pravin Dhandre
22 Mar 2018
10 min read
Save for later

Creating 2D and 3D plots using Matplotlib

Pravin Dhandre
22 Mar 2018
10 min read
[box type="note" align="" class="" width=""]This article is an excerpt from a book written by L. Felipe Martins, Ruben Oliva Ramos and V Kishore Ayyadevara titled SciPy Recipes. This book provides data science recipes for users to effectively process, manipulate, and visualize massive datasets using SciPy.[/box] In today’s tutorial, we will demonstrate how to create two-dimensional and three-dimensional plots for displaying graphical representation of data using a full-fledged scientific library -  Matplotlib. Creating two-dimensional plots of functions and data We will present the basic kind of plot generated by Matplotlib: a two-dimensional display, with axes, where datasets and functional relationships are represented by lines. Besides the data being displayed, a good graph will contain a title (caption), axes labels, and, perhaps, a legend identifying each line in the plot. Getting ready Start Jupyter and run the following commands in an execution cell: %matplotlib inline import numpy as np import matplotlib.pyplot as plt How to do it… Run the following code in a single Jupyter cell: xvalues = np.linspace(-np.pi, np.pi) yvalues1 = np.sin(xvalues) yvalues2 = np.cos(xvalues) plt.plot(xvalues, yvalues1, lw=2, color='red', label='sin(x)') plt.plot(xvalues, yvalues2, lw=2, color='blue', label='cos(x)') plt.title('Trigonometric Functions') plt.xlabel('x') plt.ylabel('sin(x), cos(x)') plt.axhline(0, lw=0.5, color='black') plt.axvline(0, lw=0.5, color='black') plt.legend() None This code will insert the plot shown in the following screenshot into the Jupyter Notebook: How it works… We start by generating the data to be plotted, with the three following statements: xvalues = np.linspace(-np.pi, np.pi, 300) yvalues1 = np.sin(xvalues) yvalues2 = np.cos(xvalues) We first create an xvalues array, containing 300 equally spaced values between -π and π. We then compute the sine and cosine functions of the values in xvalues, storing the results in the yvalues1 and yvalues2 arrays. Next, we generate the first line plot with the following statement: plt.plot(xvalues, yvalues1, lw=2, color='red', label='sin(x)') The arguments to the plot() function are described as follows: xvalues and yvalues1 are arrays containing, respectively, the x and y coordinates of the points to be plotted. These arrays must have the same length. The remaining arguments are formatting options. lw specifies the line width and color the line color. The label argument is used by the legend() function, discussed as follows. The next line of code generates the second line plot and is similar to the one explained previously. After the line plots are defined, we set the title for the plot and the legends for the axes with the following commands: plt.title('Trigonometric Functions') plt.xlabel('x') plt.ylabel('sin(x), cos(x)') We now generate axis lines with the following statements: plt.axhline(0, lw=0.5, color='black') plt.axvline(0, lw=0.5, color='black') The first arguments in axhline() and axvline() are the locations of the axis lines and the options specify the line width and color. We then add a legend for the plot with the following statement: plt.legend() Matplotlib tries to place the legend intelligently, so that it does not interfere with the plot. In the legend, one item is being generated by each call to the plot() function and the text for each legend is specified in the label option of the plot() function. Generating multiple plots in a single figure Wouldn't it be interesting to know how to generate multiple plots in a single figure? Well, let's get started with that. Getting ready Start Jupyter and run the following three commands in an execution cell: %matplotlib inline import numpy as np import matplotlib.pyplot as plt How to do it… Run the following commands in a Jupyter cell: plt.figure(figsize=(6,6)) xvalues = np.linspace(-2, 2, 100) plt.subplot(2, 2, 1) yvalues = xvalues plt.plot(xvalues, yvalues, color='blue') plt.xlabel('$x$') plt.ylabel('$x$') plt.subplot(2, 2, 2) yvalues = xvalues ** 2 plt.plot(xvalues, yvalues, color='green') plt.xlabel('$x$') plt.ylabel('$x^2$') plt.subplot(2, 2, 3) yvalues = xvalues ** 3 plt.plot(xvalues, yvalues, color='red') plt.xlabel('$x$') plt.ylabel('$x^3$') plt.subplot(2, 2, 4) yvalues = xvalues ** 4 plt.plot(xvalues, yvalues, color='black') plt.xlabel('$x$') plt.ylabel('$x^3$') plt.suptitle('Polynomial Functions') plt.tight_layout() plt.subplots_adjust(top=0.90) None Running this code will produce results like those in the following screenshot: How it works… To start the plotting constructions, we use the figure() function, as shown in the following line of code: plt.figure(figsize=(6,6)) The main purpose of this call is to set the figure size, which needs adjustment, since we plan to make several plots in the same figure. After creating the figure, we add four plots with code, as demonstrated in the following segment: plt.subplot(2, 2, 3) yvalues = xvalues ** 3 plt.plot(xvalues, yvalues, color='red') plt.xlabel('$x$') plt.ylabel('$x^3$') In the first line, the plt.subplot(2, 2, 3) call tells pyplot that we want to organize the plots in a two-by-two layout, that is, in two rows and two columns. The last argument specifies that all following plotting commands should apply to the third plot in the array. Individual plots are numbered, starting with the value 1 and counting across the rows and columns of the plot layout. We then generate the line plot with the following statements: yvalues = xvalues ** 3 plt.plot(xvalues, yvalues, color='red') The first line of the preceding code computes the yvalues array, and the second draws the corresponding graph. Notice that we must set options such as line color individually for each subplot. After the line is plotted, we use the xlabel() and ylabel() functions to create labels for the axes. Notice that these have to be set up for each individual subplot too. After creating the subplots, we explain the subplots: plt.suptitle('Polynomial Functions') sets a common title for all Subplots plt.tight_layout() adjusts the area taken by each subplot, so that axes' legends do not overlap plt.subplots_adjust(top=0.90) adjusts the overall area taken by the plots, so that the title displays correctly Creating three-dimensional plots Matplotlib offers several different ways to visualize three-dimensional data. In this recipe, we will demonstrate the following methods: Drawing surfaces plots Drawing two-dimensional contour plots Using color maps and color bars Getting ready Start Jupyter and run the following three commands in an execution cell: %matplotlib inline import numpy as np import matplotlib.pyplot as plt How to do it… Run the following code in a Jupyter code cell: from mpl_toolkits.mplot3d import Axes3D from matplotlib import cm f = lambda x,y: x**3 - 3*x*y**2 fig = plt.figure(figsize=(12,6)) ax = fig.add_subplot(1,2,1,projection='3d') xvalues = np.linspace(-2,2,100) yvalues = np.linspace(-2,2,100) xgrid, ygrid = np.meshgrid(xvalues, yvalues) zvalues = f(xgrid, ygrid) surf = ax.plot_surface(xgrid, ygrid, zvalues, rstride=5, cstride=5, linewidth=0, cmap=cm.plasma) ax = fig.add_subplot(1,2,2) plt.contourf(xgrid, ygrid, zvalues, 30, cmap=cm.plasma) fig.colorbar(surf, aspect=18) plt.tight_layout() None Running this code will produce a plot of the monkey saddle surface, which is a famous example of a surface with a non-standard critical point. The displayed graph is shown in the following screenshot: How it works… We start by importing the Axes3D class from the mpl_toolkits.mplot3d library, which is the Matplotlib object used for creating three-dimensional plots. We also import the cm class, which represents a color map. We then define a function to be plotted, with the following line of code: f = lambda x,y: x**3 - 3*x*y**2 The next step is to define the Figure object and an Axes object with a 3D projection, as done in the following lines of code: fig = plt.figure(figsize=(12,6)) ax = fig.add_subplot(1,2,1,projection='3d') Notice that the approach used here is somewhat different than the other recipes in this chapter. We are assigning the output of the figure() function call to the fig variable and then adding the subplot by calling the add_subplot() method from the fig object. This is the recommended method of creating a three-dimensional plot in the most recent version of Matplotlib. Even in the case of a single plot, the add_subplot() method should be used, in which case the command would be ax = fig.add_subplot(1,1,1,projection='3d'). The next few lines of code, shown as follows, compute the data for the plot: xvalues = np.linspace(-2,2,100) yvalues = np.linspace(-2,2,100) xgrid, ygrid = np.meshgrid(xvalues, yvalues) zvalues = f(xgrid, ygrid) The most important feature of this code is the call to meshgrid(). This is a NumPy convenience function that constructs grids suitable for three-dimensional surface plots. To understand how this function works, run the following code: xvec = np.arange(0, 4) yvec = np.arange(0, 3) xgrid, ygrid = np.meshgrid(xvec, yvec) After running this code, the xgrid array will contain the following values: array([[0, 1, 2, 3], [0, 1, 2, 3], [0, 1, 2, 3]]) The ygrid array will contain the following values: array([[0, 0, 0, 0], [1, 1, 1, 1], [2, 2, 2, 2]]) Notice that the two arrays have the same dimensions. Each grid point is represented by a pair of the (xgrid[i,j],ygrid[i,j]) type. This convention makes the computation of a vectorized function on a grid easy and efficient, with the f(xgrid, ygrid) expression. The next step is to generate the surface plot, which is done with the following function call: surf = ax.plot_surface(xgrid, ygrid, zvalues, rstride=5, cstride=5, linewidth=0, cmap=cm.plasma) The first three arguments, xgrid, ygrid, and zvalues, specify the data to be plotted. We then use the rstride and cstride options to select a subset of the grid points. Notice that the xvalues and yvalues arrays both have length 100, so that xgrid and ygrid will have 10,000 entries each. Using all grid points would be inefficient and produce a poor plot from the visualization point of view. Thus, we set rstride=5 and cstride=5, which results in a plot containing every fifth point across each row and column of the grid. The next option, linewidth=0, sets the line width of the plot to zero, preventing the display of a wireframe. The final argument, cmap=cm.plasma, specifies the color map for the plot. We use the cm.plasma color map, which has the effect of plotting higher functional values with a hotter color. Matplotlib offer as large number of built-in color maps, listed at https:/​/​matplotlib.​org/​examples/​color/​colormaps_​reference.​html.​ Next, we add the filled contour plot with the following code: ax = fig.add_subplot(1,2,2) ax.contourf(xgrid, ygrid, zvalues, 30, cmap=cm.plasma) Notice that, when selecting the subplot, we do not specify the projection option, which is not necessary for two-dimensional plots. The contour plot is generated with the contourf() method. The first three arguments, xgrid, ygrid, zvalues, specify the data points, and the fourth argument, 30, sets the number of contours. Finally, we set the color map to be the same one used for the surface plot. The final component of the plot is a color bar, which provides a visual representation of the value associated with each color in the plot, with the fig.colorbar(surf, aspect=18) method call. Notice that we have to specify in the first argument which plot the color bar is associated to. The aspect=18 option is used to adjust the aspect ratio of the bar. Larger values will result in a narrower bar. To finish the plot, we call the tight_layout() function. This adjusts the sizes of each plot, so that axis labels are displayed correctly.   We generated 2D and 3D plots using Matplotlib and represented the results of technical computation in graphical manner. If you want to explore other types of plots such as scatter plot or bar chart, you may read Visualizing 3D plots in Matplotlib 2.0. Do check out the book SciPy Recipes to take advantage of other libraries of the SciPy stack and perform matrices, data wrangling and advanced computations with ease.
Read more
  • 0
  • 0
  • 91754

article-image-write-high-quality-code-python-15-tips-data-scientists-researchers
Aarthi Kumaraswamy
21 Mar 2018
5 min read
Save for later

How to write high quality code in Python: 15+ tips for data scientists and researchers

Aarthi Kumaraswamy
21 Mar 2018
5 min read
Writing code is easy. Writing high quality code is much harder. Quality is to be understood both in terms of actual code (variable names, comments, docstrings, and so on) and architecture (functions, modules, and classes). In general, coming up with a well-designed code architecture is much more challenging than the implementation itself. In this post, we will give a few tips about how to write high quality code. This is a particularly important topic in academia, as more and more scientists without prior experience in software development need to code. High quality code writing first principles Writing readable code means that other people (or you in a few months or years) will understand it quicker and will be more willing to use it. It also facilitates bug tracking. Modular code is also easier to understand and to reuse. Implementing your program's functionality in independent functions that are organized as a hierarchy of packages and modules is an excellent way of achieving high code quality. It is easier to keep your code loosely coupled when you use functions instead of classes. Spaghetti code is really hard to understand, debug, and reuse. Iterate between bottom-up and top-down approaches while working on a new project. Starting with a bottom-up approach lets you gain experience with the code before you start thinking about the overall architecture of your program. Still, make sure you know where you're going by thinking about how your components will work together. How these high quality code writing first principles translate in Python? Take the time to learn the Python language seriously. Review the list of all modules in the standard library—you may discover that functions you implemented already exist. Learn to write Pythonic code, and do not translate programming idioms from other languages such as Java or C++ to Python. Learn common design patterns; these are general reusable solutions to commonly occurring problems in software engineering. Use assertions throughout your code (the assert keyword) to prevent future bugs (defensive programming). Start writing your code with a bottom-up approach; write independent Python functions that implement focused tasks. Do not hesitate to refactor your code regularly. If your code is becoming too complicated, think about how you can simplify it. Avoid classes when you can. If you can use a function instead of a class, choose the function. A class is only useful when you need to store persistent state between function calls. Make your functions as pure as possible (no side effects). In general, prefer Python native types (lists, tuples, dictionaries, and types from Python's collections module) over custom types (classes). Native types lead to more efficient, readable, and portable code. Choose keyword arguments over positional arguments in your functions. Argument names are easier to remember than argument ordering. They make your functions self-documenting. Name your variables carefully. Names of functions and methods should start with a verb. A variable name should describe what it is. A function name should describe what it does. The importance of naming things well cannot be overstated. Every function should have a docstring describing its purpose, arguments, and return values, as shown in the following example. You can also look at the conventions chosen in popular libraries such as NumPy. The exact convention does not matter, the point is to be consistent within your code. You can use a markup language such as Markdown or reST to do that. Follow (at least partly) Guido van Rossum's Style Guide for Python, also known as Python Enhancement Proposal number 8 (PEP8). It is a long read, but it will help you write well-readable Python code. It covers many little things such as spacing between operators, naming conventions, comments, and docstrings. For instance, you will learn that it is considered a good practice to limit any line of your code to 79 or 99 characters. This way, your code can be correctly displayed in most situations (such as in a command-line interface or on a mobile device) or side by side with another file. Alternatively, you can decide to ignore certain rules. In general, following common guidelines is beneficial on projects involving many developers. You can check your code automatically against most of the style conventions in PEP8 with the pycodestyle Python package. You can also automatically make your code PEP8-compatible with the autopep8 package. Use a tool for static code analysis such as flake8 or Pylint. It lets you find potential errors or low-quality code statically, that is, without running your code. Use blank lines to avoid cluttering your code (see PEP8). You can also demarcate sections in a long Python module with salient comments. A Python module should not contain more than a few hundreds lines of code. Having too many lines of code in a module may be a sign that you need to split it into several modules. Organize important projects (with tens of modules) into subpackages (subdirectories). Take a look at how major Python projects are organized. For example, the code of IPython is well-organized into a hierarchy of subpackages with focused roles. Reading the code itself is also quite instructive. Learn best practices to create and distribute a new Python package. Make sure that you know setuptools, pip, wheels, virtualenv, PyPI, and so on. Also, you are highly encouraged to take a serious look at conda, a powerful and generic packaging system created by Anaconda. Packaging has long been a rapidly evolving topic in Python, so read only the most recent references. You enjoyed an excerpt from Cyrille Rossant’s latest book, IPython Cookbook, Second Edition. This book contains 100+ recipes for high-performance scientific computing and data analysis, from the latest IPython/Jupyter features to the most advanced tricks, to help you write better and faster code. For free recipes from the book, head over to the Ipython Cookbook Github page. If you loved what you saw, support Cyrille’s work by buying a copy of the book today!
Read more
  • 0
  • 1
  • 28444
article-image-embed-einstein-dashboards-salesforce-classic
Amey Varangaonkar
21 Mar 2018
5 min read
Save for later

How to embed Einstein dashboards on Salesforce Classic

Amey Varangaonkar
21 Mar 2018
5 min read
[box type="note" align="" class="" width=""]The following excerpt is taken from the book Learning Einstein Analytics written by Santosh Chitalkar. This book highlights the key techniques and know-how to unlock critical insights from your data using Salesforce Einstein Analytics.[/box] With Einstein Analytics, users have the power to embed their dashboards on various third-party applications and even on their web applications. In this article, we will show how to embed an Einstein dashboard on Salesforce Classic. In order to start embedding the dashboard, let's create a sample dashboard by performing the following steps: Navigate to Analytics Studio | Create | Dashboard. Add three chart widgets on the dashboard. Click on the Chart button in the middle and select the Opportunity dataset. Select Measures as Sum of Amount and select BillingCountry under Group by. Click on Done. Repeat the second step for the second widget, but select Account Source under Group by and make it a donut chart. Repeat the second step for the third widget but select Stage under Group by and make it a funnel chart. Click on Save (s) and enter Embedding Opportunities in the title field, as shown in the following screenshot: Now that we have created a dashboard, let's embed this dashboard in Salesforce Classic. In order to start embedding the dashboard, exit from the Einstein Analytics platform and go to Classic mode. The user can embed the dashboard on the record detail page layout in Salesforce Classic. The user can view the dashboard, drill in, and apply a filter, just like in the Einstein Analytics window. Let's add the dashboard to the account detail page by performing the following steps: Navigate to Setup | Customize | Accounts | Page Layouts as shown in the following screenshot: Click on Edit of Account Layout and it will open a page layout editor which has two parts: a palette on the upper portion of the screen, and the page layout on the lower portion of the screen. The palette contains the user interface elements that you can add to your page layout, such as Fields, Buttons, Links, and Actions, and Related Lists, as shown in the following screenshot: Click on the Wave Analytics Assets option from the palette and you can see all the dashboards on the right-side panel. Drag and drop a section onto the page layout, name it Einstein Dashboard, and click on OK. Drag and drop the dashboard which you wish to add to the record detail page. We are going to add Embedded Opportunities. Click on Save. Go to any accounting record and you should see a new section within the dashboard: Users can easily configure the embedded dashboards by using attributes. To access the dashboard properties, go to edit page layout again, and go to the section where we added the dashboard to the layout. Hover over the dashboard and click on the Tool icon. It will open an Asset Properties window: The Asset Properties window gives the user the option to change the following features: Width (in pixels or %): This feature allows you to adjust the width of the dashboard section. Height (in pixels): This feature allows you to adjust the height of the dashboard section. Show Title: This feature allows you to display or hide the title of the dashboard. Show Sharing Icon: Using this feature, by default, the share icon is disabled. The Show Sharing Icon option gives the user a flexibility to include the share icon on the dashboard. Show Header: This feature allows you to display or hide the header. Hide on error: This feature gives you control over whether the Analytics asset appears if there is an error. Field mapping: Last but not least, field mapping is used to filter the relevant data to the record on the dashboard. To set up the dashboard to show only the data that’s relevant to the record being viewed, use field mapping. Field mapping links data fields in the dashboard to the object’s fields. We are using the Embedded Opportunity dashboard. Let's add field mapping to it. The following is the format for field mapping: { "datasets": { "datasetName":[{ "fields":["Actual Field name from object"], "filter":{"operator": "matches", "values":["$dataset Fieldname"]} }] } Let's add field mapping for account by using the following format: { "datasets": { "Account":[{ "fields":["Name"], "filter":{"operator": "matches", "values":["$Name"]} }] } } If your dashboard uses multiple datasets, then you can use the following format: { "datasets": { "datasetName1":[{ "fields":["Actual Field name from object"], "filter":{"operator": "matches", "values":["$dataset1 Fieldname"]} }], "datasetName2":[{ "fields":["Actual Field name from object"], "filter":{"operator": "matches", "values":["$dataset2 Fieldname"]} }] } Let's add field mapping for account and opportunities: { "datasets": { "Opportunities":[{ "fields":["Account.Name"], "Filter":{"operator": "Matches", "values":["$Name"]} }], "Account":[{ "fields":["Name"], "filter":{"operator": "matches", "values":["$Name"]} }] } } Now that we have added field mapping, save the page layout and go to the actual record. Observe that the dashboard is getting filtered now per record, as shown in the following screenshot: To summarize, we saw it’s fairly easy to embed your custom dashboards in Salesforce. Similarly, you can do so on other platforms such as Lightning, Visualforce pages, and even on your websites and web applications. If you are keen to learn more, you may check out the book Learning Einstein Analytics.    
Read more
  • 0
  • 0
  • 3847

article-image-getting-started-with-python-web-scraping
Amarabha Banerjee
20 Mar 2018
13 min read
Save for later

Getting started with Python Web Scraping

Amarabha Banerjee
20 Mar 2018
13 min read
[box type="note" align="" class="" width=""]Our article is an excerpt from the book Web Scraping with Python, written by Richard Lawson. This book contains step by step tutorials on how to leverage Python programming techniques for ethical web scraping. [/box] The amount of data available on the web is consistently growing both in quantity and in form. Businesses require this data to make decisions, particularly with the explosive growth of machine learning tools which require large amounts of data for training. Much of this data is available via Application Programming Interfaces, but at the same time a lot of valuable data is still only available through the process of web scraping. Python is the choice of programing language for many who build systems to perform scraping. It is an easy to use programming language with a rich ecosystem of tools for other tasks. In this article, we will focus on the fundamentals of setting up a scraping environment and perform basic requests for data with several tools of trade. Setting up a Python development environment If you have not used Python before, it is important to have a working development  environment. The recipes in this book will be all in Python and be a mix of interactive examples, but primarily implemented as scripts to be interpreted by the Python interpreter. This recipe will show you how to set up an isolated development environment with virtualenv and manage project dependencies with pip . We also get the code for the book and install it into the Python virtual environment. Getting ready We will exclusively be using Python 3.x, and specifically in my case 3.6.1. While Mac and Linux normally have Python version 2 installed, and Windows systems do not. So it is likely that in any case that Python 3 will need to be installed. You can find references for Python installers at www.python.org. You can check Python's version with python --version pip comes installed with Python 3.x, so we will omit instructions on its installation. Additionally, all command line examples in this book are run on a Mac. For Linux users the commands should be identical. On Windows, there are alternate commands (like dir instead of ls), but these alternatives will not be covered. How to do it We will be installing a number of packages with pip. These packages are installed into a Python environment. There often can be version conflicts with other packages, so a good practice for following along with the recipes in the book will be to create a new virtual Python environment where the packages we will use will be ensured to work properly. Virtual Python environments are managed with the virtualenv tool. This can be installed with the following command: ~ $ pip install virtualenv Collecting virtualenv Using cached virtualenv-15.1.0-py2.py3-none-any.whl Installing collected packages: virtualenv Successfully installed virtualenv-15.1.0 Now we can use virtualenv. But before that let's briefly look at pip. This command installs Python packages from PyPI, a package repository with literally 10's of thousands of packages. We just saw using the install subcommand to pip, which ensures a package is installed. We can also see all currently installed packages with pip list: ~ $ pip list alabaster (0.7.9) amqp (1.4.9) anaconda-client (1.6.0) anaconda-navigator (1.5.3) anaconda-project (0.4.1) aniso8601 (1.3.0) Packages can also be uninstalled using pip uninstall followed by the package name. I'll leave it to you to give it a try. Now back to virtualenv. Using virtualenv is very simple. Let's use it to create an environment and install the code from github. Let's walk through the steps: Create a directory to represent the project and enter the directory. ~ $ mkdir pywscb ~ $ cd pywscb Initialize a virtual environment folder named env: pywscb $ virtualenv env Using base prefix '/Users/michaelheydt/anaconda' New python executable in /Users/michaelheydt/pywscb/env/bin/python copying /Users/michaelheydt/anaconda/bin/python => /Users/michaelheydt/pywscb/env/bin/python copying /Users/michaelheydt/anaconda/bin/../lib/libpython3.6m.dylib => /Users/michaelheydt/pywscb/env/lib/libpython3. 6m.dylib Installing setuptools, pip, wheel...done. This creates an env folder. Let's take a look at what was installed. pywscb $ ls -la env total 8 drwxr-xr-x 6 michaelheydt staff 204 Jan 18 15:38 . drwxr-xr-x 3 michaelheydt staff 102 Jan 18 15:35 .. drwxr-xr-x 16 michaelheydt staff 544 Jan 18 15:38 bin drwxr-xr-x 3 michaelheydt staff 102 Jan 18 15:35 include drwxr-xr-x 4 michaelheydt staff 136 Jan 18 15:38 lib -rw-r--r-- 1 michaelheydt staff 60 Jan 18 15:38 pipselfcheck. json New we activate the virtual environment. This command uses the content in the env folder to configure Python. After this all python activities are relative to this virtual environment. pywscb $ source env/bin/activate (env) pywscb $ We can check that python is indeed using this virtual environment with the following command: (env) pywscb $ which python /Users/michaelheydt/pywscb/env/bin/python With our virtual environment created, let's clone the books sample code and take a look at its structure. (env) pywscb $ git clone https://github.com/PacktBooks/PythonWebScrapingCookbook.git Cloning into 'PythonWebScrapingCookbook'... remote: Counting objects: 420, done. remote: Compressing objects: 100% (316/316), done. remote: Total 420 (delta 164), reused 344 (delta 88), pack-reused 0 Receiving objects: 100% (420/420), 1.15 MiB | 250.00 KiB/s, done. Resolving deltas: 100% (164/164), done. Checking connectivity... done. This created a PythonWebScrapingCookbook directory. (env) pywscb $ ls -l total 0 drwxr-xr-x 9 michaelheydt staff 306 Jan 18 16:21 PythonWebScrapingCookbook drwxr-xr-x 6 michaelheydt staff 204 Jan 18 15:38 env Let's change into it and examine the content. (env) PythonWebScrapingCookbook $ ls -l total 0 drwxr-xr-x 15 michaelheydt staff 510 Jan 18 16:21 py drwxr-xr-x 14 michaelheydt staff 476 Jan 18 16:21 www There are two directories. Most the the Python code is is the py directory. www contains some web content that we will use from time-to-time using a local web server. Let's look at the contents of the py directory: (env) py $ ls -l total 0 drwxr-xr-x 9 michaelheydt staff 306 Jan 18 16:21 01 drwxr-xr-x 25 michaelheydt staff 850 Jan 18 16:21 03 drwxr-xr-x 21 michaelheydt staff 714 Jan 18 16:21 04 drwxr-xr-x 10 michaelheydt staff 340 Jan 18 16:21 05 drwxr-xr-x 14 michaelheydt staff 476 Jan 18 16:21 06 drwxr-xr-x 25 michaelheydt staff 850 Jan 18 16:21 07 drwxr-xr-x 14 michaelheydt staff 476 Jan 18 16:21 08 drwxr-xr-x 7 michaelheydt staff 238 Jan 18 16:21 09 drwxr-xr-x 7 michaelheydt staff 238 Jan 18 16:21 10 drwxr-xr-x 9 michaelheydt staff 306 Jan 18 16:21 11 drwxr-xr-x 8 michaelheydt staff 272 Jan 18 16:21 modules Code for each chapter is in the numbered folder matching the chapter (there is no code for chapter 2 as it is all interactive Python). Note that there is a modules folder. Some of the recipes throughout the book use code in those modules. Make sure that your Python path points to this folder. On Mac and Linux you can sets this in your .bash_profile file (and environments variables dialog on Windows): Export PYTHONPATH="/users/michaelheydt/dropbox/packt/books/pywebscrcookbook/code/py/modules" export PYTHONPATH The contents in each folder generally follows a numbering scheme matching the sequence of the recipe in the chapter. The following is the contents of the chapter 6 folder: (env) py $ ls -la 06 total 96 drwxr-xr-x 14 michaelheydt staff 476 Jan 18 16:21 . drwxr-xr-x 14 michaelheydt staff 476 Jan 18 16:26 .. -rw-r--r-- 1 michaelheydt staff 902 Jan 18 16:21 01_scrapy_retry.py -rw-r--r-- 1 michaelheydt staff 656 Jan 18 16:21 02_scrapy_redirects.py -rw-r--r-- 1 michaelheydt staff 1129 Jan 18 16:21 03_scrapy_pagination.py -rw-r--r-- 1 michaelheydt staff 488 Jan 18 16:21 04_press_and_wait.py -rw-r--r-- 1 michaelheydt staff 580 Jan 18 16:21 05_allowed_domains.py -rw-r--r-- 1 michaelheydt staff 826 Jan 18 16:21 06_scrapy_continuous.py -rw-r--r-- 1 michaelheydt staff 704 Jan 18 16:21 07_scrape_continuous_twitter.py -rw-r--r-- 1 michaelheydt staff 1409 Jan 18 16:21 08_limit_depth.py -rw-r--r-- 1 michaelheydt staff 526 Jan 18 16:21 09_limit_length.py -rw-r--r-- 1 michaelheydt staff 1537 Jan 18 16:21 10_forms_auth.py -rw-r--r-- 1 michaelheydt staff 597 Jan 18 16:21 11_file_cache.py -rw-r--r-- 1 michaelheydt staff 1279 Jan 18 16:21 12_parse_differently_based_on_rules.py In the recipes I'll state that we'll be using the script in <chapter directory>/<recipe filename>. Now just the be complete, if you want to get out of the Python virtual environment, you can exit using the following command: (env) py $ deactivate py $ And checking which python we can see it has switched back: py $ which python /Users/michaelheydt/anaconda/bin/python Scraping Python.org with Requests and Beautiful Soup In this recipe we will install Requests and Beautiful Soup and scrape some content from www.python.org. We'll install both of the libraries and get some basic familiarity with them. We'll come back to them both in subsequent chapters and dive deeper into each. Getting ready In this recipe, we will scrape the upcoming Python events from https:/ / www. python. org/events/ pythonevents. The following is an an example of The Python.org Events Page (it changes frequently, so your experience will differ): We will need to ensure that Requests and Beautiful Soup are installed. We can do that with the following: pywscb $ pip install requests Downloading/unpacking requests Downloading requests-2.18.4-py2.py3-none-any.whl (88kB): 88kB downloaded Downloading/unpacking certifi>=2017.4.17 (from requests) Downloading certifi-2018.1.18-py2.py3-none-any.whl (151kB): 151kB downloaded Downloading/unpacking idna>=2.5,<2.7 (from requests) Downloading idna-2.6-py2.py3-none-any.whl (56kB): 56kB downloaded Downloading/unpacking chardet>=3.0.2,<3.1.0 (from requests) Downloading chardet-3.0.4-py2.py3-none-any.whl (133kB): 133kB downloaded Downloading/unpacking urllib3>=1.21.1,<1.23 (from requests) Downloading urllib3-1.22-py2.py3-none-any.whl (132kB): 132kB downloaded Installing collected packages: requests, certifi, idna, chardet, urllib3 Successfully installed requests certifi idna chardet urllib3 Cleaning up... pywscb $ pip install bs4 Downloading/unpacking bs4 Downloading bs4-0.0.1.tar.gz Running setup.py (path:/Users/michaelheydt/pywscb/env/build/bs4/setup.py) egg_info for package bs4 How to do it Now let's go and learn to scrape a couple events. For this recipe we will start by using interactive python. Start it with the ipython command: $ ipython Python 3.6.1 |Anaconda custom (x86_64)| (default, Mar 22 2017, 19:25:17) Type "copyright", "credits" or "license" for more information. IPython 5.1.0 -- An enhanced Interactive Python. ? -> Introduction and overview of IPython's features. %quickref -> Quick reference. help -> Python's own help system. object? -> Details about 'object', use 'object??' for extra details. In [1]: Next we import Requests In [1]: import requests We now use requests to make a GET HTTP request for the following url:https://www.python.org/events/ python-events/ by making a GET request: In [2]: url = 'https://www.python.org/events/python-events/' In [3]: req = requests.get(url) That downloaded the page content but it is stored in our requests object req. We can retrieve the content using the .text property. This prints the first 200 characters. req.text[:200] Out[4]: '<!doctype html>n<!--[if lt IE 7]> <html class="no-js ie6 lt-ie7 lt-ie8 lt-ie9"> <![endif]-->n<!--[if IE 7]> <html class="no-js ie7 lt-ie8 lt-ie9"> <![endif]-->n<!--[if IE 8]> <h' We now have the raw HTML of the page. We can now use beautiful soup to parse the HTML and retrieve the event data. First import Beautiful Soup In [5]: from bs4 import BeautifulSoup Now we create a BeautifulSoup object and pass it the HTML. In [6]: soup = BeautifulSoup(req.text, 'lxml') Now we tell Beautiful Soup to find the main <ul> tag for the recent events, and then to get all the <li> tags below it. In [7]: events = soup.find('ul', {'class': 'list-recentevents'}). findAll('li') And finally we can loop through each of the <li> elements, extracting the event details, and print each to the console: In [13]: for event in events: ...: event_details = dict() ...: event_details['name'] = event_details['name'] = event.find('h3').find("a").text ...: event_details['location'] = event.find('span', {'class' 'event-location'}).text ...: event_details['time'] = event.find('time').text ...: print(event_details) ...: {'name': 'PyCascades 2018', 'location': 'Granville Island Stage, 1585 Johnston St, Vancouver, BC V6H 3R9, Canada', 'time': '22 Jan. – 24 Jan. 2018'} {'name': 'PyCon Cameroon 2018', 'location': 'Limbe, Cameroon', 'time': '24 Jan. – 29 Jan. 2018'} {'name': 'FOSDEM 2018', 'location': 'ULB Campus du Solbosch, Av. F. Roosevelt 50, 1050 Bruxelles, Belgium', 'time': '03 Feb. – 05 Feb. 2018'} {'name': 'PyCon Pune 2018', 'location': 'Pune, India', 'time': '08 Feb. – 12 Feb. 2018'} {'name': 'PyCon Colombia 2018', 'location': 'Medellin, Colombia', 'time': '09 Feb. – 12 Feb. 2018'} {'name': 'PyTennessee 2018', 'location': 'Nashville, TN, USA', 'time': '10 Feb. – 12 Feb. 2018'} This entire example is available in the 01/01_events_with_requests.py script file. The following is its content and it pulls together all of what we just did step by step: import requests from bs4 import BeautifulSoup def get_upcoming_events(url): req = requests.get(url) soup = BeautifulSoup(req.text, 'lxml') events = soup.find('ul', {'class': 'list-recent-events'}).findAll('li') for event in events: event_details = dict() event_details['name'] = event.find('h3').find("a").text event_details['location'] = event.find('span', {'class', 'eventlocation'}). text event_details['time'] = event.find('time').text print(event_details) get_upcoming_events('https://www.python.org/events/python-events/') You can run this using the following command from the terminal: $ python 01_events_with_requests.py {'name': 'PyCascades 2018', 'location': 'Granville Island Stage, 1585 Johnston St, Vancouver, BC V6H 3R9, Canada', 'time': '22 Jan. – 24 Jan. 2018'} {'name': 'PyCon Cameroon 2018', 'location': 'Limbe, Cameroon', 'time': '24 Jan. – 29 Jan. 2018'} {'name': 'FOSDEM 2018', 'location': 'ULB Campus du Solbosch, Av. F. D. Roosevelt 50, 1050 Bruxelles, Belgium', 'time': '03 Feb. – 05 Feb. 2018'} {'name': 'PyCon Pune 2018', 'location': 'Pune, India', 'time': '08 Feb. – 12 Feb. 2018'} {'name': 'PyCon Colombia 2018', 'location': 'Medellin, Colombia', 'time': '09 Feb. – 12 Feb. 2018'} {'name': 'PyTennessee 2018', 'location': 'Nashville, TN, USA', 'time': '10 Feb. – 12 Feb. 2018'} How it works We will dive into details of both Requests and Beautiful Soup in the next chapter, but for now let's just summarize a few key points about how this works. The following important points about Requests: Requests is used to execute HTTP requests. We used it to make a GET verb request of the URL for the events page. The Requests object holds the results of the request. This is not only the page content, but also many other items about the result such as HTTP status codes and headers. Requests is used only to get the page, it does not do an parsing. We use Beautiful Soup to do the parsing of the HTML and also the finding of content within the HTML. To understand how this worked, the content of the page has the following HTML to start the Upcoming Events section: We used the power of Beautiful Soup to: Find the <ul> element representing the section, which is found by looking for a <ul> with the a class attribute that has a value of list-recent-events. From that object, we find all the <li> elements. Each of these <li> tags represent a different event. We iterate over each of those making a dictionary from the event data found in child HTML tags: The name is extracted from the <a> tag that is a child of the <h3> tag The location is the text content of the <span> with a class of event-location And the time is extracted from the datetime attribute of the <time> tag. To summarize, we saw how to setup a Python environment for effective data scraping from the web and also explored ways to use Beautiful Soup to perform preliminary data scraping for ethical purposes. If you liked this post, be sure to check out Web Scraping with Python, which consists of useful recipes to work with Apache Kafka installation.        
Read more
  • 0
  • 0
  • 17002
Modal Close icon
Modal Close icon