Data | Tech News, Tutorials & Expert Insights

article-image-creating-your-own-functions-mysql-python

27 Sep 2010

6 min read

Creating Your Own Functions in MySQL for Python

27 Sep 2010

MySQL for Python Integrate the flexibility of Python and the power of MySQL to boost the productivity of your Python applications Implement the outstanding features of Python's MySQL library to their full potential See how to make MySQL take the processing burden from your programs Learn how to employ Python with MySQL to power your websites and desktop applications Apply your knowledge of MySQL and Python to real-world problems instead of hypothetical scenarios A manual packed with step-by-step exercises to integrate your Python applications with the MySQL database server Read more about this book (For more resources on Phython see here.) Hello() To create a function, we necessarily have to go back to the CREATE statement. As in a Python function definition, MySQL expects us to declare the name of the function as well as any arguments it requires. Unlike Python, MySQL also wants the type of data that will be received by the function. The beginning of a basic MySQL function definition looks like this: CREATE FUNCTION hello(s CHAR(20)) MySQL then expects to know what kind of data to return. Again, we use the MySQL data type definitions for this. RETURNS CHAR(50) This just tells MySQL that the function will return a character string of 50 characters or less. If the function will always perform the same task, it is best for the sake of performance to include the keyword DETERMINISTIC next. If the behavior of the function varies, use the keyword NON-DETERMINISTIC. If no keyword is set for the characteristic of the function, MySQL defaults to NON-DETERMINISTIC. You can learn more about the characteristic keywords used in function definitions at: http://dev.mysql.com/doc/refman/5.5/en/create-procedure.html Finally comes the meat of the function definition. Here we can set variables and perform any calculations that we want. For our basic definition, we will simply return a concatenated string: RETURN CONCAT('Hello, ', s, '!'); The function obviously concatenates the word 'Hello' with whatever argument is passed to it and appends an exclamation point at the end. To call it we use SELECT as with the other functions: mysql> SELECT hello('world') as Greeting; Capitalise() A function to capitalize every initial letter in a string follows the same pattern. The main point of the function is to walk through the string, character by character, and use UPPER() on every character that does not follow a letter. DELIMITER Obviously, we need a way to pass the entire function to MySQL without having any of the lines evaluated until we call it. To do this, we use the keyword DELIMITER. DELIMITER allows users to tell MySQL to evaluate lines that end in the character(s) we set. So the process for complex function definitions becomes: Change the delimiter. Pass the function with the usual semicolons to indicate the end of the line. Change the delimiter back to a semicolon. Call the function. The DELIMITER keyword allows us to specify more than one character as the line delimiter. So in order to ensure we don't need to worry about our code inadvertently conflicting with a line delimiter, let's make the delimiter @@: DELIMITER @@ The function definition From here, we are free to define a function to our specification. The definition line will read as follows: CREATE FUNCTION `Capitalise`(instring VARCHAR(1000)) The function will return a character string of similar length and variability: RETURNS VARCHAR(1000) When MySQL functions extend beyond the simplest calculations, such as hello(), MySQL requires us to specify the beginning and ending of the function. We do that with the keywords BEGIN and END. So let's begin the function: BEGIN Next, we need to declare our variables and their types using the keyword DECLARE: DECLARE i INT DEFAULT 1;DECLARE achar, imark CHAR(1);DECLARE outstring VARCHAR(1000) DEFAULT LOWER(instring); The DEFAULT keyword allows us to specify what should happen if outstring should fail for some reason. Next, we define a WHILE loop: WHILE i <= CHAR_LENGTH(instring) DO The WHILE loop obviously begins with a conditional statement based on the character length of instring. The resulting action begins with the keyword DO. From here, we set a series of variables and express what should happen where a character follows one of the following: blank space & '' _ ? ; : ! , - / ( . The operational part of the function looks like this: SET achar = SUBSTRING(instring, i, 1); SET imark = CASE WHEN i = 1 THEN ' ' ELSE SUBSTRING(instring, i - 1, 1) END CASE; IF imark IN (' ', '&', '''', '_', '?', ';', ':', '!', ',', '-', '/', '(', '.') THEN SET outstring = INSERT(outstring, i, 1, UPPER(achar)); END IF; SET i = i + 1; Much of this code is self-explanatory. It is worth noting, however, that the apodosis of any conditional in MySQL must end with the keyword END. In the case of IF, we use END IF. In the second SET statement, the keyword CASE is an evaluative keyword that functions similar to the try...except structure in Python. If the WHEN condition is met, the empty THEN apodosis is executed. Otherwise, the ELSE exception applies and the SUBSTRING function is run. The CASE structure ends with END CASE. MySQL will equally recognize the use of END instead. The subsequent IF clause evaluates whether imark, defined as the character before achar, is one of the declared characters. If it is, then that character in instring is replaced with its uppercase equivalent in outstring. After the IF clause is finished, the loop is incremented by one. After the entire string is processed, we then end the WHILE loop with: END WHILE; After the function's operations are completed, we return the value of outstring and indicate the end of the function: RETURN outstring;END@@ Finally, we must not forget to return the delimiter to a semicolon: DELIMITER ; It is worth noting that, instead of defining a function in a MySQL session we can define it in a separate file and load it on the fly with the SOURCE command. If we save the function to a file called capfirst.sql in a directory temp, we can source it relatively: We can also use: SOURCE /home/skipper/temp/capfirst.sql;

0
0
9802

Packt

27 Apr 2015

9 min read

Apache Solr and Big Data – integration with MongoDB

Packt

27 Apr 2015

9 min read

In this article by Hrishikesh Vijay Karambelkar, author of the book Scaling Big Data with Hadoop and Solr - Second Edition, we will go through Apache Solr and MongoDB together. In an enterprise, data is generated from all the software that is participating in day-to-day operations. This data has different formats, and bringing in this data for big-data processing requires a storage system that is flexible enough to accommodate a data with varying data models. A NoSQL database, by its design, is best suited for this kind of storage requirements. One of the primary objectives of NoSQL is horizontal scaling, that is, the P in CAP theorem, but this works at the cost of sacrificing Consistency or Availability. Visit http://en.wikipedia.org/wiki/CAP_theorem to understand more about CAP theorem (For more resources related to this topic, see here.) What is NoSQL and how is it related to Big Data? As we have seen, data models for NoSQL differ completely from that of a relational database. With the flexible data model, it becomes very easy for developers to quickly integrate with the NoSQL database, and bring in large sized data from different data sources. This makes the NoSQL database ideal for Big Data storage, since it demands that different data types be brought together under one umbrella. NoSQL also has different data models, like KV store, document store and Big Table storage. In addition to flexible schema, NoSQL offers scalability and high performance, which is again one of the most important factors to be considered while running big data. NoSQL was developed to be a distributed type of database. When traditional relational stores rely on the high computing power of CPUs and the high memory focused on a centralized system, NoSQL can run on your low-cost, commodity hardware. These servers can be added or removed dynamically from the cluster running NoSQL, making the NoSQL database easier to scale. NoSQL enables most advanced features of a database, like data partitioning, index sharding, distributed query, caching, and so on. Although NoSQL offers optimized storage for big data, it may not be able to replace the relational database. While relational databases offer transactional (ACID), high CRUD, data integrity, and a structured database design approach, which are required in many applications, NoSQL may not support them. Hence, it is most suited for Big Data where there is less possibility of need for data to be transactional. MongoDB at glance MongoDB is one of the popular NoSQL databases, (just like Cassandra). MongoDB supports the storing of any random schemas in the document oriented storage of its own. MongoDB supports the JSON-based information pipe for any communication with the server. This database is designed to work with heavy data. Today, many organizations are focusing on utilizing MongoDB for various enterprise applications. MongoDB provides high availability and load balancing. Each data unit is replicated and the combination of a data with its copes is called a replica set. Replicas in MongoDB can either be primary or secondary. Primary is the active replica, which is used for direct read-write operations, while the secondary replica works like a backup for the primary. MongoDB supports searches by field, range queries, and regular expression searches. Queries can return specific fields of documents and also include user-defined JavaScript functions. Any field in a MongoDB document can be indexed. More information about MongoDB can be read at https://www.mongodb.org/. The data on MongoDB is eventually consistent. Apache Solr can be used to work with MongoDB, to enable database searching capabilities on a MongoDB-based data store. Unlike Cassandra, where the Solr indexes are stored directly in Cassandra through solandra, MongoDB integration with Solr brings in the indexes in the Solr-based optimized storage. There are various ways in which the data residing in MongoDB can be analyzed and searched. MongoDB's replication works by recording all operations made on a database in a log file, called the oplog (operation log). Mongo's oplog keeps a rolling record of all operations that modify the data stored in your databases. Many of the implementers suggest reading this log file using a standard file IO program to push the data directly to Apache Solr, using CURL, SolrJ. Since oplog is a collection of data with an upper limit on maximum storage, it is feasible to synch such querying with Apache Solr. Oplog also provides tailable cursors on the database. These cursors can provide a natural order to the documents loaded in MongoDB, thereby, preserving their order. However, we are going to look at a different approach. Let's look at the schematic following diagram: In this case, MongoDB is exposed as a database to Apache Solr through the custom database driver. Apache Solr reads MongoDB data through the DataImportHandler, which in turns calls the JDBC-based MongoDB driver for connecting to MongoDB and running data import utilities. Since MongoDB supports replica sets, it manages the distribution of data across nodes. It also supports Sharding just like Apache Solr. Installing MongoDB To install MongoDB in your development environment, please follow the following steps: Download the latest version of MongoDB from https://www.mongodb.org/downloads for your supported operating system. Unzip the zipped folder. MongoDB comes up with a default set of different command-line components and utilities: bin/mongod: The database process. bin/mongos: Sharding controller. bin/mongo: The database shell (uses interactive JavaScript). Now, create a directory for MongoDB, which it will use for user data creation and management, and run the following command to start the single node server: $ bin/mongod –dbpath <path to your data directory> --rest In this case, --rest parameter enables support for simple rest APIs that can be used for getting the status. Once the server is started, access http://localhost:28017 from your favorite browser, you should be able to see following administration status page: Now that you have successfully installed MongoDB, try loading a sample data set from the book on MongoDB by opening a new command-line interface. Change the directory to $MONGODB_HOME and run the following command: $ bin/mongoimport --db solr-test --collection zips --file "<file-dir>/samples/zips.json" Please note that the database name is solr-test. You can see the stored data using the MongoDB-based CLI by running the following set of commands from your shell: $ bin/mongo MongoDB shell version: 2.4.9 connecting to: test Welcome to the MongoDB shell. For interactive help, type "help". For more comprehensive documentation, see http://docs.mongodb.org/ Questions? Try the support group http://groups.google.com/group/mongodb-user > use test Switched to db test > show dbs exampledb 0.203125GB local 0.078125GB test 0.203125GB > db.zips.find({city:"ACMAR"}) { "city" : "ACMAR", "loc" : [ -86.51557, 33.584132 ], "pop" : 6055, "state" :"AL", "_id" : "35004" } Congratulations! MongoDB is installed successfully Creating Solr indexes from MongoDB To run MongoDB as a database, you will need a JDBC driver built for MongoDB. However, the Mongo-JDBC driver has certain limitations, and it does not work with the Apache Solr DataImportHandler. So, I have extended Mongo-JDBC to work under the Solr-based DataImportHandler. The project repository is available at https://github.com/hrishik/solr-mongodb-dih. Let's look at the setting-up procedure for enabling MongoDB based Solr integration: You may not require a complete package from the solr-mongodb-dih repository, but just the jar file. This can be downloaded from https://github.com/hrishik/solr-mongodb-dih/tree/master/sample-jar. You will also need the following additional jar files: jsqlparser.jar mongo.jar These jars are available with the book Scaling Big Data with Hadoop and Solr, Second Edition for download. In your Solr setup, copy these jar files into the library path, that is, the $SOLR_WAR_LOCATION/WEB-INF/lib folder. Alternatively, point your container classpath variable to link them up. Using simple Java source code DataLoad.java (link https://github.com/hrishik/solr-mongodb-dih/blob/master/examples/DataLoad.java), populate the database with some sample schema and tables that you will use to load in Apache Solr. Now create a data source file (data-source-config.xml) as follows: <dataConfig> <dataSource name="mongod" type="JdbcDataSource" driver="com.mongodb. jdbc.MongoDriver" url="mongodb://localhost/solr-test"/> <document> <entity name="nameage" dataSource="mongod" query="select name, price from grocery"> <field column="name" name="name"/> <field column="name" name="id"/>  </entity> </document> </dataConfig> Copy the solr-dataimporthandler-*.jar from your contrib directory to a container/application library path. Modify $SOLR_COLLECTION_ROOT/conf/solr-config.xml with DIH entry:  <requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler"> <lst name="defaults"> <str name="config"><path to config>/data-source-config.xml</str> </lst> </requestHandler>  Once this configuration is done, you are ready to test it out. Access http://localhost:8983/solr/dataimport?command=full-import from your browser to run the full import on Apache Solr, where you will see that your import handler has successfully ran, and has loaded the data in Solr store, as shown in the following screenshot: You can validate the content created by your new MongoDB DIH by accessing the Solr Admin page, and running a query: Using this connector, you can perform operations for full-import on various data elements. Since MongoDB is not a relational database, it does support join queries. However, it supports selects, order by, and so on. Summary In this article, we have understood the distributed aspects of any enterprise search where went through Apache Solr and MongoDB together. Resources for Article: Further resources on this subject: Evolution of Hadoop [article] In the Cloud [article] Learning Data Analytics with R and Hadoop [article]

0
0
9767

article-image-analyzing-moby-dick-frequency-distribution-nltk

Richard Gall

30 Mar 2018

2 min read

Analyzing Moby Dick through frequency distribution with NLTK

Richard Gall

30 Mar 2018

2 min read

What is frequency distribution and why does it matter? In the context of natural language processing, frequency distribution is simply a tally of the number of times each unique word is used in a text. Recording the individual word counts of a text can better help us understand not only what topics are being discussed and what information is important but also how that information is being discussed as well. It's a useful method for better understanding language and different types of texts. This video tutorial has been taken from from Natural Language Processing with Python. Word frequency distribution is central to performing content analysis with NLP. Its applications are wide ranging. From understanding and characterizing an author’s writing style to analyzing the vocabulary of rappers, the technique is playing a large part in wider cultural conversations. It’s also used in psychological research in a number of ways to analyze how patients use language to form frameworks for thinking about themselves and the world. Trivial or serious, word frequency distribution is becoming more and more important in the world of research. Of course, manually creating such a word frequency distribution models would be time consuming and inconvenient for data scientists. Fortunately for us, NLTK, Python’s toolkit for natural language processing, makes life much easier. How to use NLTK for frequency distribution Take a look at how to use NLTK to create a frequency distribution for Herman Melville’s Moby Dick in the video tutorial above. In it, you'll find a step by step guide to performing an important data analysis task. Once you've done that, you can try it for yourself, or have a go at performing a similar analysis on another data set. Read Analyzing Textual information using the NLTK library. Learn more about natural language processing - read How to create a conversational assistant or chatbot using Python.

0
0
9766

article-image-interact-with-hbase-using-hbase-shell-tutorial

Amey Varangaonkar

25 Jun 2018

9 min read

How to interact with HBase using HBase shell [Tutorial]

Amey Varangaonkar

25 Jun 2018

9 min read

0
0
9711

article-image-getting-started-tableau-public

Packt

04 Nov 2015

12 min read

Getting Started with Tableau Public

Packt

04 Nov 2015

12 min read

In this article by Ashley Ohmann and Matthew Floyd, the authors of Creating Data Stories with tableau Public. Making sense of data is a valued service in today's world. It may be a cliché, but it's true that we are drowning in data and yet, we are thirsting for knowledge. The ability to make sense of data and the skill of using data to tell a compelling story is becoming one of the most valued capabilities in almost every field—business, journalism, retail, manufacturing, medicine, and public service. Tableau Public (for more information, visit www.tableaupublic.com), which is Tableau 's free Cloud-based data visualization client, is a powerfully transformative tool that you can use to create rich, interactive, and compelling data stories. It's a great platform if you wish to explore data through visualization. It enables your consumers to ask and answer questions that are interesting to them. This article is written for people who are new to Tableau Public and would like to learn how to create rich, interactive data visualizations from publicly available data sources that they can easily share with others. Once you publish visualizations and data to Tableau Public, they are accessible to everyone, and they can be viewed and downloaded. A typical Tableau Public data visualization contains public data sets such as sports, politics, public works, crime, census, socioeconomic metrics, and social media sentiment data (you also can create and use your own data). Many of these data sets either are readily available on the Internet, or can accessed via a public records request or search (if they are harder to find, they can be scraped from the Internet). You can now control who can download your visualizations and data sets, which is a feature that was previously available only to the paid subscribers. Tableau Public has a current maximum data set size of 10 million rows and/or 10 GB of data. (For more resources related to this topic, see here.) In this article, we will walk through an introduction to Tableau, which includes the following topics: A discussion on how you can use Tableau Public to tell your data story Examples of organizations that use Tableau Public Downloading and installing the Tableau Public software Logging in to Tableau Public Creating your very own Tableau Public profile Discovering the Tableau Public features and resources Taking a look at the author profiles and galleries on the Tableau website to browse other authors' data visualizations (this is a great way to learn and gather ideas on how to best present our data) An Tableau Public overview Tableau Public allows everyone to tell their data story and create compelling and interactive data visualizations that encourage discovery and learning. Tableau Public is sold at a great price—free! It allows you as a data storyteller to create and publish data visualizations without learning how to code or having special knowledge of web publishing. In fact, you can publish data sets of up to 10 million rows or 10 GB to Tableau Public in a single workbook. Tableau Public is a data discovery tool. It should not be confused with enterprise-grade business intelligence tools, such as Tableau Desktop and Tableau Server, QlikView, and Cognos Insight. Those tools integrate with corporate networks and security protocol as well as server-based data warehouses. Data visualization software is not a new thing. Businesses have used software to generate dashboards and reports for decades. The twist comes with data democracy tools, such as Tableau Public. Journalists and bloggers who would like to augment their reporting of static text and graphics can use these data discovery tools, such as Tableau Public, to create riveting, rich data visualizations, which may comprise one or more charts, graphs, tables, and other objects that may be controlled by the readers to allow for discovery. The people who are active members of the Tableau Public community have a few primary traits in common, they are curious, generous with their knowledge and time, and enjoy conversations that relate data to the world around us. Tableau Public maintains a list of blogs of data visualization experts using Tableau software. In the following screenshot, Tableau Zen Masters, Anya A'hearn of Databrick and Allan Walker, used data on San Francisco bike sharing to show the financial benefits of the Bay Area Bike Share, a city-sponsored 30-minute bike sharing program, as well as a map of both the proposed expansion of the program and how far a person can actually ride a bike in half an hour. This dashboard is featured in the Tableau Public gallery because it relates data to users clearly and concisely. It presents a great public interest story (commuting more efficiently in a notoriously congested city) and then grabs the viewer's attention with maps of current and future offerings. The second dashboard within the analysis is significant as well. The authors described the Geographic Information Systems (GIS) tools that they used to create their innovative maps as well as the methodology that went into the final product so that the users who are new to the tool can learn how to create a similar functionality for their own purposes: Image republished under the terms of fair use, creators: Anya A'hearn and Allan Walker. Source: https://public.tableausoftware.com/views/30Minutes___BayAreaBikeShare/30Minutes___?:embed=y&:loadOrderID=0&:display_count=yes As humans, we relate our experiences to each other in stories, and data points are an important component of stories. They quantify phenomena and, when combined with human actions and emotions, can make them more memorable. When authors create public interest story elements with Tableau Public, readers can interact with the analyses, which creates a highly personal experience and translates into increased participation and decreased abandonment. It's not difficult to embed the Tableau Public visualizations into websites and blogs. It is as easy as copying and pasting JavaScript that Tableau Public renders for you automatically. Using Tableau Public increases accessibility to stories, too. You can view data stories on mobile devices with a web browser and then share it with friends on social media sites such as Twitter and Facebook using Tableau Public's sharing functionality. Stories can be told with the help of text as well as popular and tried-and-true visualization types such as maps, bar charts, lists, heat maps, line charts, and scatterplots. Maps are particularly easier to build in Tableau Public than most other data visualization offerings because Tableau has integrated geocoding (down to the city and postal code) directly into the application. Tableau Public has a built-in date hierarchy that makes it easy for users to drill through time dimensions just by clicking on a button. One of Tableau Software's taglines, Data to the People, is a reflection not only of the ability to distribute analysis sets to thousands of people in one go, but also of the enhanced abilities of nontechnical users to explore their own data easily and derive relevant insights for their own community without having to learn a slew of technical skills. Telling your story with Tableau Public Tableau was originally developed in the Stanford University Computer Science department, where a research project sponsored by the U.S. Department of Defense was launched to study how people can analyze data rapidly. This project merged two branches of computer science, understanding data relationships and computer graphics. This mash-up was discovered to be the best way for people to understand and sometimes digest complex data relationships rapidly and, in effect, to help readers consume data. This project eventually moved from the Stanford campus to the corporate world, and Tableau Software was born. The Tableau usage and adoption has since skyrocketed at the time of writing this book. Tableau is the fastest growing software company in the world and now, Tableau competes directly with the older software manufacturers for data visualization and discovery—Microsoft, IBM, SAS, Qlik, and Tibco, to name a few. Most of these are compared to each other by Gartner in its annual Magic Quadrant. For more information, visit http://www.gartner.com/technology/home.jsp. Tableau Software's flagship program, Tableau Desktop, is commercial software used by many organizations and corporations throughout the world. Tableau Public is the free version of Tableau's offering. It is typically used with nonconfidential data either from the public domain or that which we collected ourselves. This free public offering of Tableau Public is truly unique in the business intelligence and data discovery industry. There is no other software like it—powerful, free, and open to data story authors. There are a few terms in this article that might be new to you. You, as an author, will load data into a workbook, which will be saved by you in the Tableau Public cloud. A visualization is a single graph. It is typically on a worksheet. One or more visualizations are on a dashboard, which is where your users will interact with your data. One of the wonderful features about Tableau Public is that you can load data and visualize it on your own. Traditionally, this has been an activity that was undertaken with the help of programmers at work. With Tableau Public and newer blogging platforms, nonprogrammers can develop data visualization, publish it to the Tableau Public website, and then embed the data visualization on their own website. The basic steps that are required to create a Tableau Public visualization are as follows: Gather your data sources, usually in a spreadsheet or a .csv file. Prepare and format your data to make it usable in Tableau Public. Connect to the data and start building the data visualizations (charts, graphs, and many other objects). Save and publish your data visualization to the Tableau Public website. Embed your data visualization in your web page by using the code that Tableau Public provides. Tableau Public is used by some of the leading news organizations across the world, including The New York Times, The Guardian (UK), National Geographic (US), the Washington Post (US), the Boston Globe (US), La Informacion (Spain), and Época (Brazil). Now, we will discuss installing Tableau Public. Then, we will take a look at how we can find some of these visualizations out there in the wild so that we can learn from others and create our own original visualizations. Installing Tableau Public Let's look at the steps required for the installation of Tableau Public: To download Tableau Public, visit the Tableau Software website at http://public.tableau.com/s/. Enter your e-mail address and click on the Download the App button located at the center of the screen, as shown in following screenshot: The downloaded version of Tableau Public is free, and it is not a limited release or demo version. It is a fully functional version of Tableau Public. Once the download begins, a Thank You screen gives you an option of retrying the download if it does not begin automatically or starts downloading a different version. The version of Tableau Public that gets downloaded automatically is the 64-bit version for Windows. Users of Macs should download the appropriate version for their computers, and users with 32-bit Windows machines should download the 32-bit version. Check your Windows computer system type (32- or 64-bit) by navigating to Start then Computer and right-clicking on the Computer option. Select Properties, and view the System properties. 64-bit systems will be noted as such. 32-bit systems will either state that they are 32-bit ones, or not have any indication of being a 32- or 64-bit system. While the Tableau Public executable file downloads, you can scroll the Thank You page to the lower section to learn more about the new features of Tableau Public 9.0. The speed with which Tableau Public downloads depends on the download speed of your network, and the 109 MB file usually takes a few minutes to download. The TableauPublicDesktop-xbit.msi (where x=32 or 64, depending on which version you selected) is downloaded. Navigate to the .msi file in Windows Explorer or in the browser window and click on Open. Then, click on Run in the Open File - Security Warning dialog box that appears in the following screenshot. The Windows installer starts the Tableau installation process: Once you have opted to Run the application, the next screen prompts you to view the License Agreement and accept its terms: If you wish to read the terms of the license agreement, click on the View License Agreement… button. You can customize the installation if you'd like. Options include the directory in which the files are installed as well as the creation of a desktop icon and a Start Menu shortcut (for Windows machines). If you do not customize the installation, Tableau Public will be installed in the default directory on your computer, and the desktop icon and Start Menu shortcut will be created. Select the checkbox that indicates I have read and accept the terms of this License Agreement, and click on Install. If a User Account Control dialog box appears with the Do you want to allow the following program to install software on this computer? prompt, click on Yes: Tableau Public will be installed on your computer, with the status bar indicating the progress: When Tableau Public has been installed successfully, the home screen opens. Exploring Tableau Public The Tableau Public home screen has several features that allow you to do following operations: Connect to data Open your workbooks Discover the features of Tableau Public Tableau encourages new users to watch the video on this first welcome page. To do so, click on the button named Watch the Getting Started Video. You can start building your first Tableau Public workbook any time. Connecting to data You can connect to the following four different data source types in Tableau Public by clicking on the appropriate format name: Microsoft Excel files Text files with a variety of delimiters Microsoft Access files Odata files Summary In this article, we learned how Tableau Public is commonly used. We also learned how to download and install Tableau Public, explore Tableau Public's features and learn about the Tableau Desktop tool, and discover other authors' data visualizations using the Tableau Galleries and Recommended Authors and Profile Finder function on the Tableau website. Resources for Article: Further resources on this subject: Data Acquisition and Mapping [article] Interacting with Data for Dashboards [article] Moving from Foundational to Advanced Visualizations [article]

0
0
9700

article-image-python-text-processing-nltk-storing-frequency-distributions-redis

Packt

09 Nov 2010

9 min read

Python Text Processing with NLTK: Storing Frequency Distributions in Redis

Packt

09 Nov 2010

9 min read

0
0
9688

Packt

25 Sep 2015

11 min read

The Dashboard Design – Best Practices

Packt

25 Sep 2015

11 min read

In this article by Julian Villafuerte, author of the book Creating Stunning Dashboards with QlikView you know more about the best practices for the dashboard design. (For more resources related to this topic, see here.) Data visualization is a field that is constantly evolving. However, some concepts have proven their value time and again through the years and have become what we call best practices. These notions should not be seen as strict rules that must be applied without any further consideration but as a series of tips that will help you create better applications. If you are a beginner, try to stick to them as much as you can. These best practices will save you a lot of trouble and will greatly enhance your first endeavors. On the other hand, if you are an advanced developer, combine them with your personal experiences in order to build the ultimate dashboard. Some guidelines in this article come from the widely known characters in the field of data visualization, such as Stephen Few, Edward Tufte, John Tukey, Alberto Cairo, and Nathan Yau. So, if a concept strikes your attention, I strongly recommend you to read more about it in their books. Throughout this article, we will review some useful recommendations that will help you create not only engaging, but also effective and user-friendly dashboards. Remember that they may apply differently depending on the information displayed and the audience you are working with. Nevertheless, they are great guidelines to the field of data visualization, so do not hesitate to consider them in all of your developments. Gestalt principles In the early 1900s, the Gestalt school of psychology conducted a series of studies on human perception in order to understand how our brain interprets forms and recognizes patterns. Understanding these principles may help you create a better structure for your dashboard and make your charts easier to interpret: Proximity: When we see multiple elements located near one another, we tend to see them as groups. For example, we can visually distinguish clusters in a scatter plot by grouping the dots according to their position. Similarity: Our brain associates the elements that are similar to each other (in terms of shape, size, color, or orientation). For example, in color-coded bar charts, we can associate the bars that share the same color even if they are not grouped. Enclosure: If a border surrounds a series of objects, we perceive them as part of a group. For example, if a scatter plot has reference lines that wrap the elements between 20 and 30 percent, we will automatically see them as a cluster. Closure: When we detect a figure that looks incomplete, we tend to perceive it as a closed structure. For example, even if we discard the borders of a bar chart, the axes will form a region that our brain will isolate without needing the extra lines. Continuity: If a number of objects are aligned, we will perceive them as a continuous body. For example, the different blocks of code when you indent QlikView script are percieved as one continuous code. Connection: If objects are connected by a line, we will see them as a group. For example, we tend to associate the dots connected by lines on a scatter plot with lines and symbols. Giving context to the data When it comes to analyzing data, context is everything. If you present isolated figures, the users will have a hard time trying to find the story hidden behind them. For example, if I told you that the gross margin of our company was 16.5 percent during the first quarter of 2015, would you evaluate it as a positive or negative sign? This is pretty difficult, right? However, what if we added some extra information to complement this KPI? Then, the following image would make a lot more sense: As you can see, adding context to the data can make the landscape look quite different. Now, it is easy to see that even though the gross margin has substantially improved during the last year, our company has some work to do in order to be competitive and surpass the industry standard. The appropriate references may change depending on the KPI you are dealing with and the goals of the organization, but some common examples are as follows: Last year's performance The quota, budget, or objective Comparison with the closest competitor, product, or employee The market share The industry standards Another good tip in this regard is to anticipate the comparisons. If you display figures regarding the monthly quota and the actual sales, you can save the users the mental calculations by including complementary indicators, such as the gap between them and the percentage of completion. Data-Ink Ratio One of the most interesting principles in the field of data visualization is Data-Ink Ratio, introduced by Edward R. Tufte in his book, The Visual Display of Quantitative Information, which must be read by every designer. In this publication, he states that there are two different types of ink (or in our case, pixels) in any chart, as follows: Data-ink: This includes all the nonerasable portions of graphic that are used to represent the actual data. These pixels are at the core of the visualization and cannot be removed without losing some of its content. Non-data-ink: This includes any element that is not directly related to the data or doesn't convey anything meaningful to the reader. Based on these concepts, he defined the Data Ink Ratio as the proportion of the graphic's ink that is devoted to the nonredundant display of data information: Data Ink Ratio = Data Ink / Total Ink As you can imagine, our goal is to maximize this number by decreasing the non-data-ink used in our dashboards. For example, the chart to the left has a low data-ink ratio due to the usage of 3D effects, shadows, backgrounds, and multiple grid lines. On the contrary, the chart to the right presents a higher ratio as most of the pixels are data-related. Avoiding chart junk Chart junk is another term coined by Tufte that refers to all the elements that distract the viewer from the actual information in a graphic. Evidently, chart junk is considered as non-data-ink and comprises of features such as heavy gridlines, frames, redundant labels, ornamental axes, backgrounds, overly complex fonts, shadows, images, or other effects included only as decoration. Take for instance the following charts: As you can see, by removing all the unnecessary elements in a chart, it becomes easier to interpret and looks much more elegant. Balance Colors, icons, reference lines, and other visual cues can be very useful to help the users focus on the most important elements in a dashboard. However, misusing or overusing these features can be a real hazard, so try to find the adequate balance for each of them. Excessive precision QlikView applications should use the appropriate language for each audience. When designing, think about whether precise figures will be useful or if they are going to become a distraction. Most of the time, dashboards show high-level KPIs, so it may be more comfortable for certain users to see rounded numbers, as in the following image: 3D charts One of Microsoft Excel's greatest wrongdoings is making everyone believe that 3D charts are good for data analysis. For some reason, people seem to love them; but, believe me, they are a real threat to business analysts. Despite their visual charm, these representations can easily hide some parts of the information and convey wrong perceptions depending on their usage of colors, shadows, and axis inclination. I strongly recommend you to avoid them in any context. Sorting Whether you are working with a list box, a bar chart, or a straight table, sorting an object is always advisable, as it adds context to the data. It can help you find the most commonly selected items in a list box, distinguish which slice is bigger on a pie chart when the sizes are similar, or easily spot the outliners in other graphic representations. Alignment and distribution Most of my colleagues argue that I am on the verge of an obsessive-compulsive disorder, but I cannot stand an application with unaligned objects. (Actually, I am still struggling with the fact that the paragraphs in this book are not justified, but anyway...). The design toolbar offers useful options in this regard, so there is no excuse for not having a tidy dashboard. If you take care of the quadrature of all the charts and filters, your interface will display a clean and professional look that every user will appreciate: Animations I have a rule of thumb regarding chart animation in QlikView—If you are Hans Rosling, go ahead. If not, better think it over twice. Even though they can be very illustrative, chart animations end up being a distraction rather than a tool to help us visualize data most of the time, so be conservative about their use. For those of you who do not know him, Hans Rosling is a Swedish professor of international health who works in Stockholm. However, he is best known for his amazing way of presenting data with GapMinder, a simple piece of software that allows him to animate a scatter plot. If you are a data enthusiast, you ought to watch his appearances in TED Talks. Avoiding scroll bars Throughout his work, Stephen Few emphasizes that all the information in a dashboard must fit on a single screen. Whilst I believe that there is no harm in splitting the data in multiple sheets, it is undeniable that scroll bars reduce the overall usability of an application. If the user has to continuously scroll right and left to read all the figures in a table, or if she must go up and down to see the filter panel, she will end up getting tired and eventually discard your dashboard. Consistency If you want to create an easy way to navigate your dashboard, you cannot forget about consistency. Locating standard objects (such as Current Selections Box, Search Object, and Filter Panels) in the same area in every tab will help the users easily find all the items they need. In addition, applying the same style, fonts, and color palettes in all your charts will make your dashboard look more elegant and professional. White space The space between charts, tables, and filters is often referred to as white space, and even though you may not notice it, it is a vital part of any dashboard. Displaying dozens of objects without letting them breathe makes your interface look cluttered and, therefore, harder to understand. Some of the benefits of using white space adequately are: The improvement in readability It focuses and emphasizes the important objects It guides the users' eyes, creating a sense of hierarchy in the dashboard It fosters a balanced layout, making your interface look clear and sophisticated Applying makeup Every now and then, you stumble upon delicate situations where some business users try their best to hide certain parts of the data. Whether it is about low sales or the insane amount of defective products, they often ask you to remove a few charts or avoid visual cues so that those numbers go unnoticed. Needless to say, dashboards are tools intended to inform and guide the decisions of the viewers, so avoid presenting misleading visualizations. Meaningless variety As a designer, you will often hesitate to use the same chart type multiple times in your application fearing that the users will get bored of it. Though this may be a haunting perception, if you present valuable data in an adequate format, there is no need to add new types of charts just for variety's sake. We want to keep the users engaged with great analyses, not just with pretty graphics. Summary In this article, you learned all about the best practices to be followed in Qlikview. Resources for Article: Further resources on this subject: Analyzing Financial Data in QlikView[article] Securing QlikView Documents[article] Common QlikView script errors [article]

0
0
9683

article-image-creating-sample-web-scraper

Packt

11 Sep 2013

10 min read

Creating a sample web scraper

Packt

11 Sep 2013

10 min read

(For more resources related to this topic, see here.) As web scrapers like to say: "Every website has an API. Sometimes it's just poorly documented and difficult to use." To say that web scraping is a useful skill is an understatement. Whether you're satisfying a curiosity by writing a quick script in an afternoon or building the next Google, the ability to grab any online data, in any amount, in any format, while choosing how you want to store it and retrieve it, is a vital part of any good programmer's toolkit. By virtue of reading this article, I assume that you already have some idea of what web scraping entails, and perhaps have specific motivations for using Java to do it. Regardless, it is important to cover some basic definitions and principles. Very rarely does one hear about web scraping in Java—a language often thought to be solely the domain of enterprise software and interactive web apps of the 90's and early 2000's. However, there are many reasons why Java is an often underrated language for web scraping: Java's excellent exception-handling lets you compile code that elegantly handles the often-messy Web Reusable data structures allow you to write once and use everywhere with ease and safety Java's concurrency libraries allow you to write code that can process other data while waiting for servers to return information (the slowest part of any scraper) The Web is big and slow, but the Java RMI allows you to write code across a distributed network of machines, in order to collect and process data quickly There are a variety of standard libraries for getting data from servers, as well as third-party libraries for parsing this data, and even executing JavaScript (which is needed for scraping some websites) In this article, we will explore these, and other benefits of Java in web scraping, and build several scrapers ourselves. Although it is possible, and recommended, to skip to the sections you already have a good grasp of, keep in mind that some sections build up the code and concepts of other sections. When this happens, it will be noted in the beginning of the section. How is this legal? Web scraping has always had a "gray hat" reputation. While websites are generally meant to be viewed by actual humans sitting at a computer, web scrapers find ways to subvert that. While APIs are designed to accept obviously computer-generated requests, web scrapers must find ways to imitate humans, often by modifying headers, forging POST requests and other methods. Web scraping often requires a great deal of problem solving and ingenuity to figure out how to get the data you want. There are often few roadmaps or tried-and-true procedures to follow, and you must carefully tailor the code to each website—often riding between the lines of what is intended and what is possible. Although this sort of hacking can be fun and challenging, you have to be careful to follow the rules. Like many technology fields, the legal precedent for web scraping is scant. A good rule of thumb to keep yourself out of trouble is to always follow the terms of use and copyright documents on websites that you scrape (if any). There are some cases in which the act of web crawling is itself in murky legal territory, regardless of how the data is used. Crawling is often prohibited in the terms of service of websites where the aggregated data is valuable (for example, a site that contains a directory of personal addresses in the United States), or where a commercial or rate-limited API is available. Twitter, for example, explicitly prohibits web scraping (at least of any actual tweet data) in its terms of service: "crawling the Services is permissible if done in accordance with the provisions of the robots.txt file, however, scraping the Services without the prior consent of Twitter is expressly prohibited" Unless explicitly prohibited by the terms of service, there is no fundamental difference between accessing a website (and its information) via a browser, and accessing it via an automated script. The robots.txt file alone has not been shown to be legally binding, although in many cases the terms of service can be. Writing a simple scraper (Simple) Wikipedia is not just a helpful resource for researching or looking up information but also a very interesting website to scrape. They make no efforts to prevent scrapers from accessing the site, and, with a very well-marked-up HTML, they make it very easy to find the information you're looking for. In this project, we will scrape an article from Wikipedia and retrieve the first few lines of text from the body of the article. Getting ready It is recommended that you have some working knowledge of Java, and the ability to create and execute Java programs at this point. As an example, we will use the article from the following Wikipedia link: http://en.wikipedia.org/wiki/Java Note that this article is about the Indonesian island nation Java, not the programming language. Regardless, it seems appropriate to use it as a test subject. We will be using the jsoup library to parse HTML data from websites and convert it into a manipulatable object, with traversable, indexed values (much like an array). In this xercise, we will show you how to download, install, and use Java libraries. In addition, we'll also be covering some of the basics of the jsoup library in particular. How to do it... Now that we're starting to get into writing scrapers, let's create a new project to keep them all bundled together. Carry out the following steps for this task: Open Eclipse and create a new Java project called Scraper. Packages are still considered to be handy for bundling collections of classes together within a single project (projects contain multiple packages, and packages contain multiple classes). You can create a new package by highlighting the Scraper project in Eclipse and going to File | New | Package . By convention, in order to prevent programmers from creating packages with the same name (and causing namespace problems), packages are named starting with the reverse of your domain name (for example, com.mydomain.mypackagename). For the rest of the article, we will begin all our packages with com.packtpub.JavaScraping appended with the package name. Let's create a new package called com.packtpub.JavaScraping.SimpleScraper. Create a new class, WikiScraper, inside the src folder of the package. Download the jsoup core library, the first link, from the following URL: http://jsoup.org/download Place the .jar file you downloaded into the lib folder of the package you just created. In Eclipse, right-click in the Package Explorer window and select Refresh. This will allow Eclipse to update the Package Explorer to the current state of the workspace folder. Eclipse should show your jsoup-1.7.2.jar file (this file may have a different name depending on the version you're using) in the Package Explorer window. Right-click on the jsoup JAR file and select Build Path | Add to Build Path. In your WikiScraper class file, write the following code: package com.packtpub.JavaScraping.SimpleScraper; import org.jsoup.Jsoup; import org.jsoup.nodes.Document; import java.net.*; import java.io.*; public class WikiScraper { public static void main(String[] args) { scrapeTopic("/wiki/Python"); } public static void scrapeTopic(String url){ String html = getUrl("http://www.wikipedia.org/"+url); Document doc = Jsoup.parse(html); String contentText = doc.select("#mw-content-text > p").first().text(); System.out.println(contentText); } public static String getUrl(String url){ URL urlObj = null; try{ urlObj = new URL(url); } catch(MalformedURLException e){ System.out.println("The url was malformed!"); return ""; } URLConnection urlCon = null; BufferedReader in = null; String outputText = ""; try{ urlCon = urlObj.openConnection(); in = new BufferedReader(new InputStreamReader(urlCon.getInputStream())); String line = ""; while((line = in.readLine()) != null){ outputText += line; } in.close(); }catch(IOException e){ System.out.println("There was an error connecting to the URL"); return ""; } return outputText; } } Assuming you're connected to the internet, this should compile and run with no errors, and print the first paragraph of text from the article. How it works... Unlike our HelloWorld example, there are a number of libraries needed to make this code work. We can incorporate all of these using the import statements before the class declaration. There are a number of jsoup objects needed, along with two Java libraries, java.io and java.net , which are needed for creating the connection and retrieving information from the Web. As always, our program starts out in the main method of the class. This method calls the scrapeTopic method, which will eventually print the data that we are looking for (the first paragraph of text in the Wikipedia article) to the screen. scrapeTopic requires another method, getURL, in order to do this. getUrl is a function that we will be using throughout the article. It takes in an arbitrary URL and returns the raw source code as a string. Essentially, it creates a Java URL object from the URL string, and calls the openConnection method on that URL object. The openConnection method returns a URLConnection object, which can be used to create a BufferedReader object. BufferedReader objects are designed to read from, potentially very long, streams of text, stopping at a certain size limit, or, very commonly, reading streams one line at a time. Depending on the potential size of the pages you might be reading (or if you're reading from an unknown source), it might be important to set a buffer size here. To simplify this exercise, however, we will continue to read as long as Java is able to. The while loop here retrieves the text from the BufferedReader object one line at a time and adds it to outputText, which is then returned. After the getURL method has returned the HTML string to scrapeTopic, jsoup is used. jsoup is a Java library that turns HTML strings (such as the string returned by our scraper) into more accessible objects. There are many ways to access and manipulate these objects, but the function you'll likely find yourself using most often is the select function. The jsoup select function returns a jsoup object (or an array of objects, if more than one element matches the argument to the select function), which can be further manipulated, or turned into text, and printed. The crux of our script can be found in this line: String contentText = doc.select("#mw-content-text > p").first().text(); This finds all the elements that match #mw-content-text > p (that is, all p elements that are the children of the elements with the CSS ID mw-content-text), selects the first element of this set, and turns the resulting object into plain text (stripping out all tags, such as <a> tags or other formatting that might be in the text). The program ends by printing this line out to the console. There's more... Jsoup is a powerful library that we will be working with in many applications throughout this article. For uses that are not covered in the article, I encourage you to read the complete documentation at http://jsoup.org/apidocs/. What if you find yourself working on a project where jsoup's capabilities aren't quite meeting your needs? There are literally dozens of Java-based HTML parsers on the market. Summary Thus in this article we took the first step towards web scraping with Java, and also learned how to scrape an article from Wikipedia and retrieve the first few lines of text from the body of the article. Resources for Article : Further resources on this subject: Making a simple cURL request (Simple) [Article] Web Scraping with Python [Article] Generating Content in WordPress Top Plugins—A Sequel [Article]

0
0
9647

article-image-facebooks-ceo-mark-zuckerberg-summoned-for-hearing-by-uk-and-canadian-houses-of-commons

Bhagyashree R

01 Nov 2018

2 min read

Facebook's CEO, Mark Zuckerberg summoned for hearing by UK and Canadian Houses of Commons

Bhagyashree R

01 Nov 2018

2 min read

Yesterday, the chairs of the UK and Canadian Houses of Commons issued a letter calling for Mark Zuckerberg, Facebook’s CEO to appear before them. The primary aim of this hearing is to get a clear idea of what measures Facebook is taking to avoid the spreading of disinformation on the social media platform and to protect user data. It is scheduled to happen at the Westminster Parliament on Tuesday 27th November. The committee has already gathered evidence regarding several data breaches and process failures including the Cambridge Analytica scandal and is now seeking answers from Mark Zuckerberg on what led to all of these incidents. Mark last attended a hearing in April with the Senate's Commerce and Judiciary committees this year in which he was asked about the company’s failure to protect its user data, its perceived bias against conservative speech, and its use for selling illegal material like drugs. After which he has not attended any of the hearings and instead sent other senior representatives such as Sheryl Sandberg, COO at Facebook. The letter pointed out: “You have chosen instead to send less senior representatives, and have not yourself appeared, despite having taken up invitations from the US Congress and Senate, and the European Parliament.” Throughout this year we saw major security and data breaches involving Facebook. The social media platform faced a security issue last month which impacted almost 50 million user accounts. Its engineering team discovered that hackers were able to find a way to exploit a series of bugs related to the View As Facebook feature. Earlier this year, Facebook witnessed a backlash for the Facebook-Cambridge Analytica data scandal. It was a major political scandal about Cambridge Analytica using personal data of millions of Facebook users for political purposes without their permission. The reports of this hearing will be shared in December if at all Zuckerberg agrees to attend it. The committee has requested his response till 7th November. Read the full letter issued by the committee. Facebook is at it again. This time with Candidate Info where politicians can pitch on camera Facebook finds ‘no evidence that hackers accessed third party Apps via user logins’, from last week’s security breach How far will Facebook go to fix what it broke: Democracy, Trust, Reality

0
0
9560

Packt

18 Jan 2016

19 min read

Controlling Relevancy

Packt

18 Jan 2016

19 min read

0
0
9544

article-image-techniques-for-creating-a-multimedia-database

Packt

17 May 2013

37 min read

Techniques for Creating a Multimedia Database

Packt

17 May 2013

37 min read

0
0
9514

article-image-working-aspnet-datalist-control

Packt

19 Feb 2010

8 min read

Working With ASP.NET DataList Control

Packt

19 Feb 2010

8 min read

0
1
9507

article-image-new-programming-video-courses-for-march-2019

Richard Gall

04 Mar 2019

6 min read

New programming video courses for March 2019

Richard Gall

04 Mar 2019

6 min read

It’s not always easy to know what to learn next if you’re a programmer. Industry shifts can be subtle but they can sometimes be dramatic, making it incredibly important to stay on top of what’s happening both in your field and beyond. No one person can make that decision for you. All the thought leadership and mentorship in the world isn’t going to be able to tell you what’s right for you when it comes to your career. But this list of videos, released last month, might give you a helping hand as to where to go next when it comes to your learning… New data science and artificial intelligence video courses for March Apache Spark is carving out a big presence as the go-to software for big data. Two videos from February focus on Spark - Distributed Deep Learning with Apache Spark and Apache Spark in 7 Days. If you’re new to Spark and want a crash course on the tool, then clearly, our video aims to get you up and running quickly. However, Distributed Deep Learning with Apache Spark offers a deeper exploration that shows you how to develop end to end deep learning pipelines that can leverage the full potential of cutting edge deep learning techniques. While we’re on the subject of machine learning, other choice video courses for March include TensorFlow 2.0 New Features (we’ve been eagerly awaiting it and it finally looks like we can see what it will be like), Hands On Machine Learning with JavaScript (yes, you can now do machine learning in the browser), and a handful of interesting videos on artificial intelligence and finance: AI for Finance Machine Learning for Algorithmic Trading Bots with Python Hands on Python for Finance Elsewhere, a number of data visualization video courses prove that communicating and presenting data remains an urgent challenge for those in the data space. Tableau remains one of the definitive tools - you can learn the latest version with Tableau 2019.1 for Data Scientists and Data Visualization Recipes with Python and Matplotlib 3. New app and web development video courses for March 2019 There are a wealth of video courses for web and app developers to choose from this month. True, Hands-on Machine Learning for JavaScript is well worth a look, but moving past the machine learning hype, there are a number of video courses that take a practical look at popular tools and new approaches to app and web development. Angular’s death has been greatly exaggerated - it remains a pillar of the JavaScript world. While the project’s versioning has arguably been lacking some clarity, if you want to get up to speed with where the framework is today, try Angular 7: A Practical Guide. It’s a video that does exactly what it says on the proverbial tin - it shows off Angular 7 and demonstrates how to start using it in web projects. We’ve also been seeing some uptake of Angular by ASP.NET developers, as it offers a nice complement to the Microsoft framework on the front end side. Our latest video on the combination, Hands-on Web Development with ASP.NET Core and Angular, is another practical look at an effective and increasingly popular approach to full-stack development. Other picks for March include Building Mobile Apps with Ionic 4, a video that brings you right up to date with the recent update that launched in January (interestingly, the project is now backed by web components, not Angular), and a couple of Redux videos - Mastering Redux and Redux Recipes. Redux is still relatively new. Essentially, it’s a JavaScript library that helps you manage application state - because it can be used with a range of different frameworks and libraries, including both Angular and React, it’s likely to go from strength to strength in 2019. Infrastructure, admin and security video courses for March 2019 Node.js is becoming an important library for infrastructure and DevOps engineers. As we move to a cloud native world, it’s a great tool for developing lightweight and modular services. That’s why we’re picking Learn Serverless App Development with Node.js and Azure Functions as one of our top videos for this month. Azure has been growing at a rapid rate over the last 12 months, and while it’s still some way behind AWS, Microsoft’s focus on developer experience is making Azure an increasingly popular platform with developers. For Node developers, this video is a great place to begin - it’s also useful for anyone who simply wants to find out what serverless development actually feels like. Read next: Serverless computing wars: AWS Lambda vs. Azure Functions A partner to this, for anyone beginning Node, is the new Node.js Design Patterns video. In particular, if Node.js is an important tool in your architecture, following design patterns is a robust method of ensuring reliability and resilience. Elsewhere, we have Modern DevOps in Practice, cutting through the consultancy-speak to give you useful and applicable guidance on how to use DevOps thinking in your workflows and processes, and DevOps with Azure, another video that again demonstrates just how impressive Azure is. For those not Azure-inclined, there’s AWS Certified Developer Associate - A Practical Guide, a video that takes you through everything you need to know to pass the AWS Developer Associate exam. There’s also a completely cloud-agnostic video course in the form of Creating a Continuous Deployment Pipeline for Cloud Platforms that’s essential for infrastructure and operations engineers getting to grips with cloud native development. Learn a new programming language with these new video courses for March Finally, there are a number of new video courses that can help you get to grips with a new programming language. So, perfect if you’ve been putting off your new year’s resolution to learn a new language… Java 11 in 7 Days is a new video that brings you bang up to date with everything in the latest version of Java, while Hands-on Functional Programming with Java will help you rethink and reevaluate the way you use Java. Together, the two videos are a great way for Java developers to kick start their learning and update their skill set.

0
0
9473

article-image-scientific-computing-apis-python

Packt

20 Aug 2015

23 min read

Scientific Computing APIs for Python

Packt

20 Aug 2015

23 min read

In this article, by Hemant Kumar Mehta author of the book Mastering Python Scientific Computing we will have comprehensive discussion of features and capabilities of various scientific computing APIs and toolkits in Python. Besides the basics, we will also discuss some example programs for each of the APIs. As symbolic computing is relatively different area of computerized mathematics, we have kept a special sub section within the SymPy section to discuss basics of computerized algebra system. In this article, we will cover following topics: Scientific numerical computing using NumPy and SciPy Symbolic Computing using SymPy (For more resources related to this topic, see here.) Numerical Scientific Computing in Python The scientific computing mainly demands for facility of performing calculations on algebraic equations, matrices, differentiations, integrations, differential equations, statistics, equation solvers and much more. By default Python doesn't come with these functionalities. However, development of NumPy and SciPy has enabled us to perform these operations and much more advanced functionalities beyond these operations. NumPy and SciPy are very powerful Python packages that enable the users to efficiently perform the desired operations for all types of scientific applications. NumPy package NumPy is the basic Python package for the scientific computing. It provides facility of multi-dimensional arrays and basic mathematical operations such as linear algebra. Python provides several data structure to store the user data, while the most popular data structures are lists and dictionaries. The list objects may store any type of Python object as an element. These elements can be processed using loops or iterators. The dictionary objects store the data in key, value format. The ndarrays data structure The ndaarays are also similar to the list but highly flexible and efficient. The ndarrays is an array object to represent multidimensional array of fixed-size items. This array should be homogeneous. It has an associated object of dtype to define the data type of elements in the array. This object defines type of the data (integer, float, or Python object), size of data in bytes, byte ordering (big-endian or little-endian). Moreover, if the type of data is record or sub-array then it also contains details about them. The actual array can be constructed using any one of the array, zeros or empty methods. Another important aspect of ndarrays is that the size of arrays can be dynamically modified. Moreover, if the user needs to remove some elements from the arrays then it can be done using the module for masked arrays. In a number of situations, scientific computing demands deletion/removal of some incorrect or erroneous data. The numpy.ma module provides the facility of masked array to easily remove selected elements from arrays. A masked array is nothing but the normal ndarrays with a mask. Mask is another associated array with true or false values. If for a particular position mask has true value then the corresponding element in the main array is valid and if the mask is false then the corresponding element in the main array is invalid or masked. In such case while performing any computation on such ndarrays the masked elements will not be considered. File handling Another important aspect of scientific computing is storing the data into files and NumPy supports reading and writing on both text as well as binary files. Mostly, text files are good way for reading, writing and data exchange as they are inherently portable and most of the platforms by default have capabilities to manipulate them. However, for some of the applications sometimes it is better to use binary files or the desired data for such application can only be stored in binary files. Sometimes the size of data and nature of data like image, sound etc. requires them to store in binary files. In comparison to text files binary files are harder to manage as they have specific formats. Moreover, the size of binary files are comparatively very small and the read/ write operations are very fast then the read/ write text files. This fast read/ write is most suitable for the application working on large datasets. The only drawback of binary files manipulated with NumPy is that they are accessible only through NumPy. Python has text file manipulation functions such as open, readlines and writelines functions. However, it is not performance efficient to use these functions for scientific data manipulation. These default Python functions are very slow in reading and writing the data in file. NumPy has high performance alternative that load the data into ndarrays before actual computation. In NumPy, text files can be accessed using numpy.loadtxt and numpy.savetxt functions. The loadtxt function can be used to load the data from text files to the ndarrays. NumPy also has a separate functions to manipulate the data in binary files. The function for reading and writing are numpy.load and numpy.save respectively. Sample NumPy programs The NumPy array can be created from a list or tuple using the array, this method can transform sequences of sequences into two dimensional array. import numpy as np x = np.array([4,432,21], int) print x #Output [ 4 432 21] x2d = np.array( ((100,200,300), (111,222,333), (123,456,789)) ) print x2d Output: [ 4 432 21] [[100 200 300] [111 222 333] [123 456 789]] Basic matrix arithmetic operation can be easily performed on two dimensional arrays as used in the following program. Basically these operations are actually applied on elements hence the operand arrays must be of equal size, if the size is not matching then performing these operations will cause a runtime error. Consider the following example for arithmetic operations on one dimensional array. import numpy as np x = np.array([4,5,6]) y = np.array([1,2,3]) print x + y # output [5 7 9] print x * y # output [ 4 10 18] print x - y # output [3 3 3] print x / y # output [4 2 2] print x % y # output [0 1 0] There is a separate subclass named as matrix to perform matrix operations. Let us understand matrix operation by following example which demonstrates the difference between array based multiplication and matrix multiplication. The NumPy matrices are 2-dimensional and arrays can be of any dimension. import numpy as np x1 = np.array( ((1,2,3), (1,2,3), (1,2,3)) ) x2 = np.array( ((1,2,3), (1,2,3), (1,2,3)) ) print "First 2-D Array: x1" print x1 print "Second 2-D Array: x2" print x2 print "Array Multiplication" print x1*x2 mx1 = np.matrix( ((1,2,3), (1,2,3), (1,2,3)) ) mx2 = np.matrix( ((1,2,3), (1,2,3), (1,2,3)) ) print "Matrix Multiplication" print mx1*mx2 Output: First 2-D Array: x1 [[1 2 3] [1 2 3] [1 2 3]] Second 2-D Array: x2 [[1 2 3] [1 2 3] [1 2 3]] Array Multiplication [[1 4 9] [1 4 9] [1 4 9]] Matrix Multiplication [[ 6 12 18] [ 6 12 18] [ 6 12 18]] Following is a simple program to demonstrate simple statistical functions given in NumPy: import numpy as np x = np.random.randn(10) # Creates an array of 10 random elements print x mean = x.mean() print mean std = x.std() print std var = x.var() print var First Sample Output: [ 0.08291261 0.89369115 0.641396 -0.97868652 0.46692439 -0.13954144 -0.29892453 0.96177167 0.09975071 0.35832954] 0.208762357623 0.559388806817 0.312915837192 Second Sample Output: [ 1.28239629 0.07953693 -0.88112438 -2.37757502 1.31752476 1.50047537 0.19905071 -0.48867481 0.26767073 2.660184 ] 0.355946458357 1.35007701045 1.82270793415 The above programs are some simple examples of NumPy. SciPy package SciPy extends Python and NumPy support by providing advanced mathematical functions such as differentiation, integration, differential equations, optimization, interpolation, advanced statistical functions, equation solvers etc. SciPy is written on top of the NumPy array framework. SciPy has utilized the arrays and the basic operations on the arrays provided in NumPy and extended it to cover most of the mathematical aspects regularly required by scientists and engineers for their applications. In this article we will cover examples of some basic functionality. Optimization package The optimization package in SciPy provides facility to solve univariate and multivariate minimization problems. It provides solutions to minimization problems using a number of algorithms and methods. The minimization problem has wide range of application in science and commercial domains. Generally, we perform linear regression, search for function's minimum and maximum values, finding the root of a function, and linear programming for such cases. All these functionalities are supported by the optimization package. Interpolation package A number of interpolation methods and algorithms are provided in this package as built-in functions. It provides facility to perform univariate and multivariate interpolation, one dimensional and two dimensional Splines etc. We use univariate interpolation when data is dependent of one variable and if data is around more than one variable then we use multivariate interpolation. Besides these functionalities it also provides additional functionality for Lagrange and Taylor polynomial interpolators. Integration and differential equations in SciPy Integration is an important mathematical tool for scientific computations. The SciPy integrations sub-package provides functionalities to perform numerical integration. SciPy provides a range of functions to perform integration on equations and data. It also has ordinary differential equation integrator. It provides various functions to perform numerical integrations using a number of methods from mathematics using numerical analysis. Stats module SciPy Stats module contains a functions for most of the probability distributions and wide range or statistical functions. Supported probability distributions include various continuous distribution, multivariate distributions and discrete distributions. The statistical functions range from simple means to the most of the complex statistical concepts, including skewness, kurtosis chi-square test to name a few. Clustering package and Spatial Algorithms in SciPy Clustering analysis is a popular data mining technique having wide range of application in scientific and commercial applications. In Science domain biology, particle physics, astronomy, life science, bioinformatics are few subjects widely using clustering analysis for problem solution. Clustering analysis is being used extensively in computer science for computerized fraud detection, security analysis, image processing etc. The clustering package provides functionality for K-mean clustering, vector quantization, hierarchical and agglomerative clustering functions. The spatial class has functions to analyze distance between data points using triangulations, Voronoi diagrams, and convex hulls of a set of points. It also has KDTree implementations for performing nearest-neighbor lookup functionality. Image processing in SciPy SciPy provides support for performing various image processing operations including basic reading and writing of image files, displaying images, simple image manipulations operations such as cropping, flipping, rotating etc. It has also support for image filtering functions such as mathematical morphing, smoothing, denoising and sharpening of images. It also supports various other operations such as image segmentation by labeling pixels corresponding to different objects, Classification, Feature extraction for example edge detection etc. Sample SciPy programs In the subsequent subsections we will discuss some example programs using SciPy modules and packages. We start with a simple program performing standard statistical computations. After this, we will discuss a program performing finding a minimal solution using optimizations. At last we will discuss image processing programs. Statistics using SciPy The stats module of SciPy has functions to perform simple statistical operations and various probability distributions. The following program demonstrates simple statistical calculations using SciPy stats.describe function. This single function operates on an array and returns number of elements, minimum value, maximum value, mean, variance, skewness and kurtosis. import scipy as sp import scipy.stats as st s = sp.randn(10) n, min_max, mean, var, skew, kurt = st.describe(s) print("Number of elements: {0:d}".format(n)) print("Minimum: {0:3.5f} Maximum: {1:2.5f}".format(min_max[0], min_max[1])) print("Mean: {0:3.5f}".format(mean)) print("Variance: {0:3.5f}".format(var)) print("Skewness : {0:3.5f}".format(skew)) print("Kurtosis: {0:3.5f}".format(kurt)) Output: Number of elements: 10 Minimum: -2.00080 Maximum: 0.91390 Mean: -0.55638 Variance: 0.93120 Skewness : 0.16958 Kurtosis: -1.15542 Optimization in SciPY Generally, in mathematical optimization a non convex function called Rosenbrock function is used to test the performance of the optimization algorithm. The following program is demonstrating the minimization problem on this function. The Rosenbrock function of N variable is given by following equation and it has minimum value 0 at xi =1. The program for the above function is: import numpy as np from scipy.optimize import minimize # Definition of Rosenbrock function def rosenbrock(x): return sum(100.0*(x[1:]-x[:-1]**2.0)**2.0 + (1-x[:-1])**2.0) x0 = np.array([1, 0.7, 0.8, 2.9, 1.1]) res = minimize(rosenbrock, x0, method = 'nelder-mead', options = {'xtol': 1e-8, 'disp': True}) print(res.x) Output is: Optimization terminated successfully. Current function value: 0.000000 Iterations: 516 Function evaluations: 827 [ 1. 1. 1. 1. 1.] The last line is the output of print(res.x) where all the elements of array are 1. Image processing using SciPy Following two programs are developed to demonstrate the image processing functionality of SciPy. First of these program is simply displaying the standard test image widely used in the field of image processing called Lena. The second program is applying geometric transformation on this image. It performs image cropping and rotation by 45 %. The following program is displaying Lena image using matplotlib API. The imshow method renders the ndarrays into an image and the show method displays the image. from scipy import misc l = misc.lena() misc.imsave('lena.png', l) import matplotlib.pyplot as plt plt.gray() plt.imshow(l) plt.show() Output: The output of the above program is the following screen shot: The following program is performing geometric transformation. This program is displaying transformed images and along with the original image as a four axis array. import scipy from scipy import ndimage import matplotlib.pyplot as plt import numpy as np lena = scipy.misc.lena() lx, ly = lena.shape crop_lena = lena[lx/4:-lx/4, ly/4:-ly/4] crop_eyes_lena = lena[lx/2:-lx/2.2, ly/2.1:-ly/3.2] rotate_lena = ndimage.rotate(lena, 45) # Four axes, returned as a 2-d array f, axarr = plt.subplots(2, 2) axarr[0, 0].imshow(lena, cmap=plt.cm.gray) axarr[0, 0].axis('off') axarr[0, 0].set_title('Original Lena Image') axarr[0, 1].imshow(crop_lena, cmap=plt.cm.gray) axarr[0, 1].axis('off') axarr[0, 1].set_title('Cropped Lena') axarr[1, 0].imshow(crop_eyes_lena, cmap=plt.cm.gray) axarr[1, 0].axis('off') axarr[1, 0].set_title('Lena Cropped Eyes') axarr[1, 1].imshow(rotate_lena, cmap=plt.cm.gray) axarr[1, 1].axis('off') axarr[1, 1].set_title('45 Degree Rotated Lena') plt.show() Output: The SciPy and NumPy are core of Python's support for scientific computing as they provide solid functionality of numerical computing. Symbolic computations using SymPy Computerized computations performed over the mathematical symbols without evaluating or changing their meaning is called as symbolic computations. Generally the symbolic computing is also called as computerized algebra and such computerized system are called computer algebra system. The following subsection has a brief and good introduction to SymPy. Computer Algebra System (CAS) Let us discuss the concept of CAS. CAS is a software or toolkit to perform computations on mathematical expressions using computers instead of doing it manually. In the beginning, using computers for these applications was named as computer algebra and now this concept is called as symbolic computing. CAS systems may be grouped into two types. First is the general purpose CAS and the second type is the CAS specific to particular problem. The general purpose systems are applicable to most of the area of algebraic mathematics while the specialized CAS are the systems designed for the specific area such as group theory or number theory. Most of the time, we prefer the general purpose CAS to manipulate the mathematical expressions for scientific applications. Features of a general purpose CAS Various desired features of a general purpose computer algebra system for scientific applications are as: A user interface to manipulate mathematical expressions. An interface for programming and debugging Such systems requires simplification of various mathematical expressions hence, a simplifier is a most essential component of such computerized algebra system. The general purpose CAS system must support exhaustive set of functions to perform various mathematical operations required by any algebraic computations Most of the applications perform extensive computations an efficient memory management is highly essential. The system must provide support to perform mathematical computations on high precision numbers and large quantities. A brief idea of SymPy SymPy is an open source and Python based implementation of computerized algebra system (CAS). The philosophy behind the SymPy development is to design and develop a CAS having all the desired features yet its code as simple as possible so that it will be highly and easily extensible. It is written completely in Python and do not requires any external library. The basic idea about using SymPy is the creation and manipulation of expressions. Using SymPy, the user represents mathematical expressions in Python language using SymPy classes and objects. These expressions are composed of numbers, symbols, operators, functions etc. The functions are the modules to perform a mathematical functionality such as logarithms, trigonometry etc. The development of SymPy was started by Ondřej Čertíkin August 2006. Since then, it has been grown considerably with the contributions more than hundreds of the contributors. This library now consists of 26 different integrated modules. These modules have capability to perform computations required for basic symbolic arithmetic, calculus, algebra, discrete mathematics, quantum physics, plotting and printing with the option to export the output of the computations to LaTeX and other formats. The capabilities of SymPy can be divided into two categories as core capability and advanced capabilities as SymPy library is divided into core module with several advanced optional modules. The various supported functionality by various modules are as follows: Core capabilities The core capability module supports basic functionalities required by any mathematical algebra operations to be performed. These operations include basic arithmetic like multiplications, addition, subtraction and division, exponential etc. It also supports simplification of expressions to simplify complex expressions. It provides the functionality of expansion of series and symbols. Core module also supports functions to perform operations related to trigonometry, hyperbola, exponential, roots of equations, polynomials, factorials and gamma functions, logarithms etc. and a number of special functions for B-Splines, spherical harmonics, tensor functions, orthogonal polynomials etc. There is strong support also given for pattern matching operations in the core module. Core capabilities of the SymPy also include the functionalities to support substitutions required by algebraic operations. It not only supports the high precision arithmetic operations over integers, rational and gloating point numbers but also non-commutative variables and symbols required in polynomial operations. Polynomials Various functions to perform polynomial operations belong to the polynomial module. These functions includes basic polynomial operations such as division, greatest common divisor (GCD) least common multiplier (LCM), square-free factorization, representation of polynomials with symbolic coefficients, some special operations like computation of resultant, deriving trigonometric identities, Partial fraction decomposition, facilities for Gröbner basis over polynomial rings and fields. Calculus Various functionalities supporting different operations required by basic and advanced calculus are provided in this module. It supports functionalities required by limits, there is a limit function for this. It also supports differentiation and integrations and series expansion, differential equations and calculus of finite differences. SymPy is also having special support for definite integrals and integral transforms. In differential it supports numerical differential, composition of derivatives and fractional derivatives. Solving equations Solver is the name of the SymPy module providing equations solving functionality. This module supports solving capabilities for complex polynomials, roots of polynomials and solving system of polynomial equations. There is a function to solve the algebraic equations. It not only provides support for solutions for differential equations including ordinary differential equations, some forms of partial differential equations, initial and boundary values problems etc. but also supports solution of difference equations. In mathematics, difference equation is also called recurrence relations, that is an equation that recursively defines a sequence or multidimensional array of values. Discrete math Discrete mathematics includes those mathematical structures which are discrete in nature rather than the continuous mathematics like calculus. It deals with the integers, graphs, statements from logic theory etc. This module has full support for binomial coefficient, products, summations etc. This module also supports various functions from number theory including residual theory, Euler's Totient, partition and a number of functions dealing with prime numbers and their factorizations. SymPy also supports creation and manipulations of logic expressions using symbolic and Boolean values. Matrices SymPy has a strong support for various operations related to the matrices and determinants. Matrix belongs to linear algebra category of mathematics. It supports creation of matrix, basic matrix operations like multiplication, addition, matrix of zeros and ones, creation of random matrix and performing operations on matrix elements. It also supports special functions line computation of Hessian matrix for a function, Gram-Schmidt process on set of vectors, Computation of Wronskian for matrix of functions etc. It has also full support for Eigenvalues/eigenvectors, matrix inversion, solution of matrix and determinants. For computing determinants of the matrix, it also supports Bareis' fraction-free algorithm and berkowitz algorithms besides the other methods. For matrices it also supports nullspace calculation, cofactor expansion tools, derivative calculation for matrix elements, calculation of dual of matrix etc. Geometry SymPy is also having module that supports various operations associated with the two-dimensional (2-D) geometry. It supports creation of various 2-D entity or objects such as point, line, circle, ellipse, polygon, triangle, ray, segment etc. It also allows us to perform query on these entities such as area of some of the suitable objects line ellipse/ circle or triangle, intersection points of lines etc. It also supports other queries line tangency determination, finding similarity and intersection of entities. Plotting There is a very good module that allows us to draw two-dimensional and three-dimensional plots. At present, the plots are rendered using the matplotlib package. It also supports other packages such as TextBackend, Pyglet, textplot etc. It has a very good interactive interface facility of customizations and plotting of various geometric entities. The plotting module has the following functions: Plotting 2-D line plots Plotting of 2-D parametric plots. Plotting of 2-D implicit and region plots. Plotting of 3-D plots of functions involving two variables. Plotting of 3-D line and surface plots etc. Physics There is a module to solve the problem from Physics domain. It supports functionality for mechanics including classical and quantum mechanics, high energy physics. It has functions to support Pauli Algebra, quantum harmonic oscillators in 1-D and 3-D. It is also having functionality for optics. There is a separate module that integrates unit systems into SymPy. This will allow users to select the specific unit system for performing his/ her computations and conversion between the units. The unit systems are composed of units and constant for computations. Statistics The statistics module introduced in SymPy to support the various concepts of statistics required in mathematical computations. Apart from supporting various continuous and discrete statistical distributions, it also supports functionality related to the symbolic probability. Generally, these distributions support functions for random number generations in SymPy. Printing SymPy is having a module for provide full support for Pretty-Printing. Pretty-print is the idea of conversions of various stylistic formatting into the text files such as source code, text files and markup files or similar content. This module produces the desired output by printing using ASCII and or Unicode characters. It supports various printers such as LATEX and MathML printer. It is also capable of producing source code in various programming languages such as c, Python or FORTRAN. It is also capable of producing contents using markup languages like HTML/ XML. SymPy modules The following list has formal names of the modules discussed in above paragraphs: Assumptions: assumption engine Concrete: symbolic products and summations Core: basic class structure: Basic, Add, Mul, Pow etc. functions: elementary and special functions galgebra: geometric algebra geometry: geometric entities integrals: symbolic integrator interactive: interactive sessions (e.g. IPython) logic: boolean algebra, theorem proving matrices: linear algebra, matrices mpmath: fast arbitrary precision numerical math ntheory: number theoretical functions parsing: Mathematica and Maxima parsers physics: physical units, quantum stuff plotting: 2D and 3D plots using Pyglet polys: polynomial algebra, factorization printing: pretty-printing, code generation series: symbolic limits and truncated series simplify: rewrite expressions in other forms solvers: algebraic, recurrence, differential statistics: standard probability distributions utilities: test framework, compatibility stuf There are numerous symbolic computing systems available in various mathematical toolkits. There are some proprietary software such as Maple/ Mathematica and there are some open source alternatives also such as Singular/ AXIOM. However, these products have their own scripting language, difficult to extend their functionality and having slow development cycle. Whereas SymPy is highly extensible, designed and developed in Python language and open source API that supports speedy development life cycle. Simple exemplary programs These are some very simple examples to get idea about the capacity of SymPy. These are less than ten lines of SymPy source codes which covers topics ranging from basis symbol manipulations to limits, differentiations and integrations. We can test the execution of these programs on SymPy live running SymPy online on Google App Engine available on http://live.sympy.org/. Basic symbol manipulation The following code is defines three symbols, an expression on these symbols and finally prints the expression. import sympy a = sympy.Symbol('a') b = sympy.Symbol('b') c = sympy.Symbol('c') e = ( a * b * b + 2 * b * a * b) + (a * a + c * c) print e Output: a**2 + 3*a*b**2 + c**2 (here ** represents power operation). Expression expansion in SymPy The following program demonstrates the concept of expression expansion. It defines two symbols and a simple expression on these symbols and finally prints the expression and its expanded form. import sympy a = sympy.Symbol('a') b = sympy.Symbol('b') e = (a + b) ** 4 print e print e.expand() Output: (a + b)**4 a**4 + 4*a**3*b + 6*a**2*b**2 + 4*a*b**3 + b**4 Simplification of expression or formula The SymPy has facility to simplify the mathematical expressions. The following program is having two expressions to simplify and displays the output after simplifications of the expressions. import sympy x = sympy.Symbol('x') a = 1/x + (x*exp(x) - 1)/x simplify(a) simplify((x ** 3 + x ** 2 - x - 1)/(x ** 2 + 2 * x + 1)) Output: ex x – 1 Simple integrations The following program is calculates the integration of two simple functions. import sympy from sympy import integrate x = sympy.Symbol('x') integrate(x ** 3 + 2 * x ** 2 + x, x) integrate(x / (x ** 2 + 2 * x), x) Output: x**4/4+2*x**3/3+x**2/2 log(x + 2) Summary In this article, we have discussed the concepts, features and selective sample programs of various scientific computing APIs and toolkits. The article started with a discussion of NumPy and SciPy. After covering NymPy, we have discussed concepts associated with symbolic computing and SymPy. In the remaining article we have discussed the Interactive computing and data analysis & visualization alog with their APIs or toolkits. IPython is the python toolkit for interactive computing. We have also discussed the data analysis package Pandas and the data visualization API names Matplotlib. Resources for Article: Further resources on this subject: Optimization in Python [article] How to do Machine Learning with Python [article] Bayesian Network Fundamentals [article]

0
0
9389

article-image-common-qlikview-script-errors

Packt

22 Nov 2013

3 min read

Common QlikView script errors

Packt

22 Nov 2013

3 min read

(For more resources related to this topic, see here.) QlikView error messages displayed during the running of the script, during reload, or just after the script is run are key to understanding what errors are contained in your code. After an error is detected and the error dialog appears, review the error, and click on OK or Cancel on the Script Error dialog box. If you have the debugger open, click on Close, then click on Cancel on the Sheet Properties dialog. Re-enter the Script Editor and examine your script to fix the error. Errors can come up as a result of syntax, formula or expression errors, join errors, circular logic, or any number of issues in your script. The following are a few common error messages you will encounter when developing your QlikView script. The first one, illustrated in the following screenshot, is the syntax error we received when running the code that missed a comma after Sales. This is a common syntax error. It's a little bit cryptic, but the error is contained in the code snippet that is displayed. The error dialog does not exactly tell you that it expected a comma in a certain place, but with practice, you will realize the error quickly. The next error is a circular reference error. This error will be handled automatically by QlikView. You can choose to accept QlikView's fix of loosening one of the tables in the circular reference (view the data model in Table Viewer for more information on which table is loosened, or view the Document Properties dialog, Tables tab to find out which table is marked Loosely Coupled). Alternatively, you can choose another table to be loosely coupled in the Document Properties, Tables tab, or you can go back into the script and fix the circular reference with one of the methods. The following screenshot is a warning/error dialog displayed when you have a circular reference in a script: Another common issue is an unknown statement error that can be caused by an error in writing your script—missed commas, colons, semicolons, brackets, quotation marks, or an improperly written formula. In the case illustrated in the following screenshot, the error has encountered an unknown statement—namely, the Customers line that QlikView is attempting to interpret as Customers Load *…. The fix for this error is to add a colon after Customers in the following way: Customers: There are instances when a load script will fail silently. Attempting to store a QVD or CSV to a file that is locked by another user viewing it is one such error. Another example is when you have two fields with the same name in your load statement. The debugger can help you find the script lines in which the silent error is present. Summary In this article we learned about QlikView error messages displayed during the script execution. Resources for Article: Further resources on this subject: Meet QlikView [Article] Introducing QlikView elements [Article] Linking Section Access to multiple dimensions [Article]

0
0
9388

How-To Tutorials - Data

Creating Your Own Functions in MySQL for Python

Apache Solr and Big Data – integration with MongoDB

Analyzing Moby Dick through frequency distribution with NLTK

How to interact with HBase using HBase shell [Tutorial]

Getting Started with Tableau Public

Python Text Processing with NLTK: Storing Frequency Distributions in Redis

The Dashboard Design – Best Practices

Creating a sample web scraper

Facebook's CEO, Mark Zuckerberg summoned for hearing by UK and Canadian Houses of Commons

Controlling Relevancy

Trending Topics

Techniques for Creating a Multimedia Database

Working With ASP.NET DataList Control

New programming video courses for March 2019

Scientific Computing APIs for Python

Common QlikView script errors

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access