Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials

7019 Articles
article-image-unit-testing-and-end-end-testing
Packt
09 Mar 2017
4 min read
Save for later

Unit Testing and End-To-End Testing

Packt
09 Mar 2017
4 min read
In this article by Andrea Passaglia, the author of the book Vue.js 2 Cookbook, will cover stubbing external API calls with Sinon.JS. (For more resources related to this topic, see here.) Stub external API calls with Sinon.JS Normally when you do end-to-end testing and integration testing you would have the backend server running and ready to respond to you. I think there are many situations in which this is not desirable. As a frontend developer you take every opportunity to blame the backend guys. Getting started No particular skills are required to complete this recipe except that you should install Jasmine as a dependency. How to do it... First of all let's install some dependencies, starting with Jasmine as we are going to use it to run the whole thing. Also install Sinon.JS and axios before continuing, you just need to add the .js files. We are going to build an application that retrieves a post at the click of a button. In the HTML part write the following: <div id="app"> <button @click="retrieve">Retrieve Post</button> <p v-if="post">{{post}}</p> </div> The JavaScript part instead, is going to look like the following: const vm = new Vue({ el: '#app', data: { post: undefined }, methods: { retrieve () { axios .get('https://jsonplaceholder.typicode.com/posts/1') .then(response => { console.log('setting post') this.post = response.data.body }) } } }) If you launch your application now you should be able to see it working. Now we want to test the application but we don't like to connect to the real server. This would take additional time and it would not be reliable, instead we are going to take a sample, correct response from the server and use it instead. Sinon.JS has the concept of a sandbox. It means that whenever a test start, some dependencies, like axios are overwritten. After each test we can discard the sandbox and everything returns normal. An empty test with Sinon.JS looks like the following (add it after the Vue instance): describe('my app', () => { let sandbox beforeEach(() => sandbox = sinon.sandbox.create()) afterEach(() => sandbox.restore()) }) We want to stub the call to the get function for axios: describe('my app', () => { let sandbox beforeEach(() => sandbox = sinon.sandbox.create()) afterEach(() => sandbox.restore()) it('should save the returned post body', done => { const resolved = new Promise(resolve => r({ data: { body: 'Hello World' } }) ) sandbox.stub(axios, 'get').returns(resolved) ... done() }) }) We are overwriting axios here. We are saying that now the get method should return the resolved promise: describe('my app', () => { let sandbox beforeEach(() => sandbox = sinon.sandbox.create()) afterEach(() => sandbox.restore()) it('should save the returned post body', done => { const promise = new Promise(resolve => resolve({ data: { body: 'Hello World' } }) ) sandbox.stub(axios, 'get').returns(resolved) vm.retrieve() promise.then(() => { expect(vm.post).toEqual('Hello World') done() }) }) }) Since we are returning a promise (and we need to return a promise because the retrieve method is calling then on it) we need to wait until it resolves. We can launch the page and see that it works: How it works... In our case we used the sandbox to stub a method of one of our dependencies. This way the get method of axios never gets fired and we receive an object that is similar to what the backend would give us. Stubbing the API responses will get you isolated from the backend and its quirks. If something goes wrong you won't mind and moreover you can run your test without relying on the backend running and running correctly. There are many libraries and techniques to stub API calls in general, not only related to HTTP. Hopefully this recipe have given you a head start. Summary In this article we covered how we can stub an external API class with Sinon.JS. Resources for Article: Further resources on this subject: Installing and Using Vue.js [article] Introduction to JavaScript [article] JavaScript Execution with Selenium [article]
Read more
  • 0
  • 0
  • 24989

article-image-reading-fine-manual
Packt
09 Mar 2017
34 min read
Save for later

Reading the Fine Manual

Packt
09 Mar 2017
34 min read
In this article by Simon Riggs, Gabriele Bartolini, Hannu Krosing, Gianni Ciolli, the authors of the book PostgreSQL Administration Cookbook - Third Edition, you will learn the following recipes: Reading The Fine Manual (RTFM) Planning a new database Changing parameters in your programs Finding the current configuration settings Which parameters are at nondefault settings? Updating the parameter file Setting parameters for particular groups of users The basic server configuration checklist Adding an external module to PostgreSQL Using an installed module Managing installed extensions (For more resources related to this topic, see here.) I get asked many questions about parameter settings in PostgreSQL. Everybody's busy and most people want a 5-minute tour of how things work. That's exactly what a Cookbook does, so we'll do our best. Some people believe that there are some magical parameter settings that will improve their performance, spending hours combing the pages of books to glean insights. Others feel comfortable because they have found some website somewhere that "explains everything", and they "know" they have their database configured OK. For the most part, the settings are easy to understand. Finding the best setting can be difficult, and the optimal setting may change over time in some cases. This article is mostly about knowing how, when, and where to change parameter settings. Reading The Fine Manual (RTFM) RTFM is often used rudely to mean "don't bother me, I'm busy", or it is used as a stronger form of abuse. The strange thing is that asking you to read a manual is most often very good advice. Don't flame the advisors back; take the advice! The most important point to remember is that you should refer to a manual whose release version matches that of the server on which you are operating. The PostgreSQL manual is very well-written and comprehensive in its coverage of specific topics. However, one of its main failings is that the "documents" aren't organized in a way that helps somebody who is trying to learn PostgreSQL. They are organized from the perspective of people checking specific technical points so that they can decide whether their difficulty is a user error or not. It sometimes answers "What?" but seldom "Why?" or "How?" I've helped write sections of the PostgreSQL documents, so I'm not embarrassed to steer you towards reading them. There are, nonetheless, many things to read here that are useful. How to do it… The main documents for each PostgreSQL release are available at http://www.postgresql.org/docs/manuals/. The most frequently accessed parts of the documents are as follows: SQL command reference, as well as client and server tools reference: http://www.postgresql.org/docs/current/interactive/reference.html Configuration: http://www.postgresql.org/docs/current/interactive/runtime-config.html Functions: http://www.postgresql.org/docs/current/interactive/functions.html You can also grab yourself a PDF version of the manual, which can allow easier searching in some cases. Don't print it! The documents are more than 2000 pages of A4-sized sheets. How it works… The PostgreSQL documents are written in SGML, which is similar to, but not the same as, XML. These files are then processed to generate HTML files, PDF, and so on. This ensures that all the formats have exactly the same content. Then, you can choose the format you prefer, and you can even compile it in other formats such as EPUB, INFO, and so on. Moreover, the PostgreSQL manual is actually a subset of the PostgreSQL source code, so it evolves together with the software. It is written by the same people who make PostgreSQL. Even more reasons to read it! There's more… More information is also available at http://wiki.postgresql.org. Many distributions offer packages that install static versions of the HTML documentation. For example, on Debian and Ubuntu, the docs for the most recent stable PostgreSQL version are named postgresql-9.6-docs (unsurprisingly). Planning a new database Planning a new database can be a daunting task. It's easy to get overwhelmed by it, so here, we present some planning ideas. It's also easy to charge headlong at the task as well, thinking that whatever you know is all you'll ever need to consider. Getting ready You are ready. Don't wait to be told what to do. If you haven't been told what the requirements are, then write down what you think they are, clearly labeling them as "assumptions" rather than "requirements"—we mustn't confuse the two things. Iterate until you get some agreement, and then build a prototype. How to do it… Write a document that covers the following items: Database design—plan your database design Calculate the initial database sizing Transaction analysis—how will we access the database? Look at the most frequent access paths What are the requirements for response times? Hardware configuration Initial performance thoughts—will all of the data fit into RAM? Choose the operating system and filesystem type How do we partition the disk? Localization plan Decide server encoding, locale, and time zone Access and security plan Identify client systems and specify required drivers Create roles according to a plan for access control Specify pg_hba.conf Maintenance plan—who will keep it working? How? Availability plan—consider the availability requirements checkpoint_timeout Plan your backup mechanism and test it High-availability plan Decide which form of replication you'll need, if any How it works… One of the most important reasons for planning your database ahead of time is that retrofitting some things is difficult. This is especially true of server encoding and locale, which can cause much downtime and exertion if we need to change them later. Security is also much more difficult to set up after the system is live. There's more… Planning always helps. You may know what you're doing, but others may not. Tell everybody what you're going to do before you do it to avoid wasting time. If you're not sure yet, then build a prototype to help you decide. Approach the administration framework as if it were a development task. Make a list of things you don't know yet, and work through them one by one. This is deliberately a very short recipe. Everybody has their own way of doing things, and it's very important not to be too prescriptive about how to do things. If you already have a plan, great! If you don't, think about what you need to do, make a checklist, and then do it. Changing parameters in your programs PostgreSQL allows you to set some parameter settings for each session or transaction. How to do it… You can change the value of a setting during your session, like this: SET work_mem = '16MB'; This value will then be used for every future transaction. You can also change it only for the duration of the "current transaction": SET LOCAL work_mem = '16MB'; The setting will last until you issue this command: RESET work_mem; Alternatively, you can issue the following command: RESET ALL; SETand RESET commands are SQL commands that can be issued from any interface. They apply only to PostgreSQL server parameters, but this does not mean that they affect the entire server. In fact, the parameters you can change with SET and RESET apply only to the current session. Also, note that there may be other parameters, such as JDBC driver parameters, that cannot be set in this way. How it works… Suppose you change the value of a setting during your session, for example, by issuing this command: SET work_mem = '16MB'; Then, the following will show up in the pg_settings catalog view: postgres=# SELECT name, setting, reset_val, source FROM pg_settings WHERE source = 'session'; name | setting | reset_val | source ----------+---------+-----------+--------- work_mem | 16384 | 1024 | session Until you issue this command: RESET work_mem; After issuing it, the setting returns to reset_val and the source returns to default: name | setting | reset_val | source ---------+---------+-----------+--------- work_mem | 1024 | 1024 | default There's more… You can change the value of a setting during your transaction as well, like this: SET LOCAL work_mem = '16MB'; Then, this will show up in the pg_settings catalog view: postgres=# SELECT name, setting, reset_val, source FROM pg_settings WHERE source = 'session'; name | setting | reset_val | source ----------+---------+-----------+--------- work_mem | 1024 | 1024 | session Huh? What happened to your parameter setting? The SET LOCAL command takes effect only for the transaction in which it was executed, which was just the SET LOCAL command in our case. We need to execute it inside a transaction block to be able to see the setting take hold, as follows: BEGIN; SET LOCAL work_mem = '16MB'; Here is what shows up in the pg_settings catalog view: postgres=# SELECT name, setting, reset_val, source FROM pg_settings WHERE source = 'session'; name | setting | reset_val | source ----------+---------+-----------+--------- work_mem | 16384 | 1024 | session You should also note that the value of source is session rather than transaction, as you might have been expecting. Finding the current configuration settings At some point, it will occur to you to ask, "What are the current configuration settings?" Most settings can be changed in more than one way, and some ways do not affect all users or all sessions, so it is quite possible to get confused. How to do it… Your first thought is probably to look in postgresql.conf, which is the configuration file, described in detail in the Updating the parameter file recipe. That works, but only as long as there is only one parameter file. If there are two, then maybe you're reading the wrong file! (How do you know?) So, the cautious and accurate way is not to trust a text file, but to trust the server itself. Moreover, you learned in the previous recipe, Changing parameters in your programs, that each parameter has a scope that determines when it can be set. Some parameters can be set through postgresql.conf, but others can be changed afterwards. So, the current value of configuration settings may have been subsequently changed. We can use the SHOW command like this: postgres=# SHOW work_mem; Its output is as follows: work_mem ---------- 1MB (1 row) However, remember that it reports the current setting at the time it is run, and that can be changed in many places. Another way of finding the current settings is to access a PostgreSQL catalog view named pg_settings: postgres=# x Expanded display is on. postgres=# SELECT * FROM pg_settings WHERE name = 'work_mem'; [ RECORD 1 ] -------------------------------------------------------- name | work_mem setting | 1024 unit | kB category | Resource Usage / Memory short_desc | Sets the maximum memory to be used for query workspaces. extra_desc | This much memory can be used by each internal sort operation and hash table before switching to temporary disk files. context | user vartype | integer source | default min_val | 64 max_val | 2147483647 enumvals | boot_val | 1024 reset_val | 1024 sourcefile | sourceline | Thus, you can use the SHOW command to retrieve the value for a setting, or you can access the full details via the catalog table. There's more… The actual location of each configuration file can be asked directly to the PostgreSQL server, as shown in this example: postgres=# SHOW config_file; This returns the following output: config_file ------------------------------------------ /etc/postgresql/9.4/main/postgresql.conf (1 row) The other configuration files can be located by querying similar variables, hba_file and ident_file. How it works… Each parameter setting is cached within each session so that we can get fast access to the parameter settings. This allows us to access the parameter settings with ease. Remember that the values displayed are not necessarily settings for the server as a whole. Many of those parameters will be specific to the current session. That's different from what you experience with many other database software, and is also very useful. Which parameters are at nondefault settings? Often, we need to check which parameters have been changed or whether our changes have correctly taken effect. In the previous two recipes, we have seen that parameters can be changed in several ways, and with different scope. You learned how to inspect the value of one parameter or get the full list of parameters. In this recipe, we will show you how to use SQL capabilities to list only those parameters whose value in the current session differs from the system-wide default value. This list is valuable for several reasons. First, it includes only a few of the 200-plus available parameters, so it is more immediate. Also, it is difficult to remember all our past actions, especially in the middle of a long or complicated session. Version 9.4 introduces the ALTER SYSTEM syntax, which we will describe in the next recipe, Updating the parameter file. From the viewpoint of this recipe, its behavior is quite different from all other setting-related commands; you run it from within your session and it changes the default value, but not the value in your session. How to do it… We write a SQL query that lists all parameter values, excluding those whose current value is either the default or set from a configuration file: postgres=# SELECT name, source, setting FROM pg_settings WHERE source != 'default' AND source != 'override' ORDER by 2, 1; The output is as follows: name | source | setting ----------------------------+----------------------+----------------- application_name | client | psql client_encoding | client | UTF8 DateStyle | configuration file | ISO, DMY default_text_search_config | configuration file | pg_catalog.english dynamic_shared_memory_type | configuration file | posix lc_messages | configuration file | en_GB.UTF-8 lc_monetary | configuration file | en_GB.UTF-8 lc_numeric | configuration file | en_GB.UTF-8 lc_time | configuration file | en_GB.UTF-8 log_timezone | configuration file | Europe/Rome max_connections | configuration file | 100 port | configuration file | 5460 shared_buffers | configuration file | 16384 TimeZone | configuration file | Europe/Rome max_stack_depth | environment variable | 2048 How it works… You can see from pg_settings which parameters have nondefault values and what the source of the current value is. The SHOW command doesn't tell you whether a parameter is set at a nondefault value. It just tells you the value, which isn't of much help if you're trying to understand what is set and why. If the source is a configuration file, then the sourcefile and sourceline columns are also set. These can be useful in understanding where the configuration came from. There's more… The setting column of pg_settings shows the current value, but you can also look at boot_val and reset_val. The boot_val parameter shows the value assigned when the PostgreSQL database cluster was initialized (initdb), while reset_val shows the value that the parameter will return to if you issue the RESET command. The max_stack_depth parameter is an exception because pg_settings says it is set by the environment variable, though it is actually set by ulimit -s on Linux and Unix systems. The max_stack_depth parameter just needs to be set directly on Windows. The time zone settings are also picked up from the OS environment, so you shouldn't need to set those directly. In older releases, pg_settings showed them as command-line settings. From version 9.1 onwards, they are written to postgresql.conf when the data directory is initialized, so they show up as configuration files. Updating the parameter file The parameter file is the main location for defining parameter values for the PostgreSQL server. All the parameters can be set in the parameter file, which is known as postgresql.conf. There are also two other parameter files: pg_hba.conf and pg_ident.conf. Both of these relate to connections and security. Getting ready First, locate postgresql.conf, as described earlier. How to do it… Some of the parameters take effect only when the server is first started. A typical example might be shared_buffers, which defines the size of the shared memory cache. Many of the parameters can be changed while the server is still running. After changing the required parameters, we issue a reload operation to the server, forcing PostgreSQL to reread the postgresql.conf file (and all other configuration files). There is a number of ways to do that, depending on your distribution and OS. The most common is to issue the following command: pg_ctl reload with the same OS user that runs the PostgreSQL server process. This assumes the default data directory; otherwise you have to specify the correct data directory with the -D option. As noted earlier, Debian and Ubuntu have a different multiversion architecture, so you should issue the following command instead: pg_ctlcluster 9.6 main reload On modern distributions you can also use systemd, as follows: sudo systemctl reload postgresql@9.6-main Some other parameters require a restart of the server for changes to take effect, for instance, max_connections, listen_addresses, and so on. The syntax is very similar to a reload operation, as shown here: pg_ctl restart For Debian and Ubuntu, use this command: pg_ctlcluster 9.6 main restart and with systemd: sudo systemctl restart postgresql@9.6-main The postgresql.conf file is a normal text file that can be simply edited. Most of the parameters are listed in the file, so you can just search for them and then insert the desired value in the right place. How it works… If you set the same parameter twice in different parts of the file, the last setting is what applies. This can cause lots of confusion if you add settings to the bottom of the file, so you are advised against doing that. The best practice is to either leave the file as it is and edit the values, or to start with a blank file and include only the values that you wish to change. I personally prefer a file with only the nondefault values. That makes it easier to see what's happening. Whichever method you use, you are strongly advised to keep all the previous versions of your .conf files. You can do this by copying, or you can use a version control system such as Git or SVN. There's more… The postgresql.conf file also supports an include directive. This allows the postgresql.conf file to reference other files, which can then reference other files, and so on. That may help you organize your parameter settings better, if you don't make it too complicated. If you are working with PostgreSQL version 9.4 or later, you can change the values stored in the parameter files directly from your session, with syntax such as the following: ALTER SYSTEM SET shared_buffers = '1GB'; This command will not actually edit postgresql.conf. Instead, it writes the new setting to another file named postgresql.auto.conf. The effect is equivalent, albeit in a safer way. The original configuration is never written, so it cannot be damaged in the event of a crash. If you mess up with too many ALTER SYSTEM commands, you can always delete postgresql.auto.conf manually and reload the configuration, or restart PostgreSQL, depending on what parameters you had changed. Setting parameters for particular groups of users PostgreSQL supports a variety of ways of defining parameter settings for various user groups. This is very convenient, especially to manage user groups that have different requirements. How to do it… For all users in the saas database, use the following commands: ALTER DATABASE saas SET configuration_parameter = value1; For a user named simon connected to any database, use this: ALTER ROLE Simon SET configuration_parameter = value2; Alternatively, you can set a parameter for a user only when connected to a specific database, as follows: ALTER ROLE Simon IN DATABASE saas SET configuration_parameter = value3; The user won't know that these have been executed specifically for them. These are default settings, and in most cases, they can be overridden if the user requires nondefault values. How it works… You can set parameters for each of the following: Database User (which is named role by PostgreSQL) Database/user combination Each of the parameter defaults is overridden by the one below it. In the preceding three SQL statements: If user hannu connects to the saas database, then value1 will apply If user simon connects to a database other than saas, then value2 will apply If user simon connects to the saas database, then value3 will apply PostgreSQL implements this in exactly the same way as if the user had manually issued the equivalent SET statements immediately after connecting. The basic server configuration checklist PostgreSQL arrives configured for use on a shared system, though many people want to run dedicated database systems. The PostgreSQL project wishes to ensure that PostgreSQL will play nicely with other server software, and will not assume that it has access to the full server resources. If you, as the system administrator, know that there is no other important server software running on this system, then you can crank up the values much higher. Getting ready Before we start, we need to know two sets of information: We need to know the size of the physical RAM that will be dedicated to PostgreSQL We need to know something about the types of applications for which we will use PostgreSQL How to do it… If your database is larger than 32 MB, then you'll probably benefit from increasing shared_buffers. You can increase this to much larger values, but remember that running out of memory induces many problems. For instance, PostgreSQL is able to store information to the disk when the available memory is too small, and it employs sophisticated algorithms to treat each case differently and to place each piece of data either in the disk or in the memory, depending on each use case. On the other hand, overstating the amount of available memory confuses such abilities and results in suboptimal behavior. For instance, if the memory is swapped to disk, then PostgreSQL will inefficiently treat all data as if it were the RAM. Another unfortunate circumstance is when the Linux Out-Of-Memory (OOM) killer terminates one of the various processes spawned by the PostgreSQL server. So, it's better to be conservative. It is good practice to set a low value in your postgresql.conf and increment slowly to ensure that you get the benefits from each change. If you increase shared_buffers and you're running on a non-Windows server, you will almost certainly need to increase the value of the SHMMAX OS parameter (and on some platforms, other parameters as well). On Linux, Mac OS, and FreeBSD, you will need to either edit the /etc/sysctl.conf file or use sysctl -w with the following values: For Linux, use kernel.shmmax=value For Mac OS, use kern.sysv.shmmax=value For FreeBSD, use kern.ipc.shmmax=value There's more… For more information, you can refer to http://www.postgresql.org/docs/9.6/static/kernel-resources.html#SYSVIPC. For example, on Linux, add the following line to /etc/sysctl.conf: kernel.shmmax=value Don't worry about setting effective_cache_size. It is much less important a parameter than you might think. There is no need for too much fuss selecting the value. If you're doing heavy write activity, then you may want to set wal_buffers to a much higher value than the default. In fact wal_buffers is set automatically from the value of shared_buffers, following a rule that fits most cases; however, it is always possible to specify an explicit value that overrides the computation for the very few cases where the rule is not good enough. If you're doing heavy write activity and/or large data loads, you may want to set checkpoint_segments higher than the default to avoid wasting I/O in excessively frequent checkpoints. If your database has many large queries, you may wish to set work_mem to a value higher than the default. However, remember that such a limit applies separately to each node in the query plan, so there is a real risk of overallocating memory, with all the problems discussed earlier. Ensure that autovacuum is turned on, unless you have a very good reason to turn it off—most people don't. Leave the settings as they are for now. Don't fuss too much about getting the settings right. You can change most of them later, so you can take an iterative approach to improving things.  And remember, don't touch the fsync parameter. It's keeping you safe. Adding an external module to PostgreSQL Another strength of PostgreSQL is its extensibility. Extensibility was one of the original design goals, going back to the late 1980s. Now, in PostgreSQL 9.6, there are many additional modules that plug into the core PostgreSQL server. There are many kinds of additional module offerings, such as the following: Additional functions Additional data types Additional operators Additional indexes Getting ready First, you'll need to select an appropriate module to install. The walk towards a complete, automated package management system for PostgreSQL is not over yet, so you need to look in more than one place for the available modules, such as the following: Contrib: The PostgreSQL "core" includes many functions. There is also an official section for add-in modules, known as "contrib" modules. They are always available for your database server, but are not automatically enabled in every database because not all users might need them. On PostgreSQL version 9.6, we will have more than 40 such modules. These are documented at http://www.postgresql.org/docs/9.6/static/contrib.html. PGXN: This is the PostgreSQL Extension Network, a central distribution system dedicated to sharing PostgreSQL extensions. The website started in 2010, as a repository dedicated to the sharing of extension files. Separate projects: These are large external projects, such as PostGIS, offering extensive and complex PostgreSQL modules. For more information, take a look at http://www.postgis.org/. How to do it… There are several ways to make additional modules available for your database server, as follows: Using a software installer Installing from PGXN Installing from a manually downloaded package Installing from source code Often, a particular module will be available in more than one way, and users are free to choose their favorite, exactly like PostgreSQL itself, which can be downloaded and installed through many different procedures. Installing modules using a software installer Certain modules are available exactly like any other software packages that you may want to install in your server. All main Linux distributions provide packages for the most popular modules, such as PostGIS, SkyTools, procedural languages other than those distributed with core, and so on. In some cases, modules can be added during installation if you're using a standalone installer application, for example, the OneClick installer, or tools such as rpm, apt-get, and YaST on Linux distributions. The same procedure can also be followed after the PostgreSQL installation, when the need for a certain module arrives. We will actually describe this case, which is way more common. For example, let's say that you need to manage a collection of Debian package files, and that one of your tasks is to be able to pick the latest version of one of them. You start by building a database that records all the package files. Clearly, you need to store the version number of each package. However, Debian version numbers are much more complex than what we usually call "numbers". For instance, on my Debian laptop, I currently have version 9.2.18-1.pgdg80+1 of the PostgreSQL client package. Despite being complicated, that string follows a clearly defined specification, which includes many bits of information, including how to compare two versions to establish which of them is older. Since this recipe discussed extending PostgreSQL with custom data types and operators, you might have already guessed that I will now consider a custom data type for Debian version numbers that is capable of tasks such as understanding the Debian version number format, sorting version numbers, choosing the latest version number in a given group, and so on. It turns out that somebody else already did all the work of creating the required PostgreSQL data type, endowed with all the useful accessories: comparison operators, input/output functions, support for indexes, and maximum/minimum aggregates. All of this has been packaged as a PostgreSQL extension, as well as a Debian package (not a big surprise), so it is just a matter of installing the postgresql-9.2-debversion package with a Debian tool such as apt-get, aptitude, or synaptic. On my laptop, that boils down to the command line: apt-get install postgresql-9.2-debversion This will download the required package and unpack all the files in the right locations, making them available to my PostgreSQL server. Installing modules from PGXN The PostgreSQL Extension Network, PGXN for short, is a website (http://pgxn.org) launched in late 2010 with the purpose of providing "a central distribution system for open source PostgreSQL extension libraries". Anybody can register and upload their own module, packaged as an extension archive. The website allows browsing available extensions and their versions, either via a search interface or from a directory of package and user names. The simple way is to use a command-line utility, called pgxnclient. It can be easily installed in most systems; see the PGXN website on how to do so. Its purpose is to interact with PGXN and take care of administrative tasks, such as browsing available extensions, downloading the package, compiling the source code, installing files in the proper place, and removing installed package files. Alternatively, you can download the extension files from the website and place them in the right place by following the installation instructions. PGXN is different from official repositories because it serves another purpose. Official repositories usually contain only seasoned extensions because they accept new software only after a certain amount of evaluation and testing. On the other hand, anybody can ask for a PGXN account and upload their own extensions, so there is no filter except requiring that the extension has an open source license and a few files that any extension must have. Installing modules from a manually downloaded package You might have to install a module that is correctly packaged for your system but is not available from the official package archives. For instance, it could be the case that the module has not been accepted in the official repository yet, or you could have repackaged a bespoke version of that module with some custom tweaks, which are so specific that they will never become official. Whatever the case, you will have to follow the installation procedure for standalone packages specific to your system. Here is an example with the Oracle compatibility module, described at http://postgres.cz/wiki/Oracle_functionality_(en): First, we get the package, say for PostgreSQL 8.4 on a 64-bit architecture, from http://pgfoundry.org/frs/download.php/2414/orafce-3.0.1-1.pg84.rhel5.x86_64.rpm. Then, we install the package in the standard way: rpm -ivh orafce-3.0.1-1.pg84.rhel5.x86_64.rpm If all the dependencies are met, we are done. I mentioned dependencies because that's one more potential problem in installing packages that are not officially part of the installed distribution—you can no longer assume that all software version numbers have been tested, all requirements are available, and there are no conflicts. If you get error messages that indicate problems in these areas, you may have to solve them yourself, by manually installing missing packages and/or uninstalling conflicting packages. Installing modules from source code In many cases, useful modules may not have full packaging. In these cases, you may need to install the module manually. This isn't very hard and it's a useful exercise that helps you understand what happens. Each module will have different installation requirements. There are generally two aspects of installing a module. They are as follows: Building the libraries (only for modules that have libraries) Installing the module files in the appropriate locations You need to follow the instructions for the specific module in order to build the libraries, if any are required. Installation will then be straightforward, and usually there will be a suitably prepared configuration file for the make utility so that you just need to type the following command: make install Each file will be copied to the right directory. Remember that you normally need to be a system superuser in order to install files on system directories. Once a library file is in the directory expected by the PostgreSQL server, it will be loaded automatically as soon as requested by a function. Modules such as auto_explain do not provide any additional user-defined function, so they won't be auto-loaded; that needs to be done manually by a superuser with a LOAD statement. How it works… PostgreSQL can dynamically load libraries in the following ways: Using the explicit LOAD command in a session Using the shared_preload_libraries parameter in postgresql.conf at server start At session start, using the local_preload_libraries parameter for a specific user, as set using ALTER ROLE PostgreSQL functions and objects can reference code in these libraries, allowing extensions to be bound tightly to the running server process. The tight binding makes this method suitable for use even in very high-performance applications, and there's no significant difference between additionally supplied features and native features. Using an installed module In this recipe, we will explain how to enable an installed module so that it can be used in a particular database. The additional types, functions, and so on will exist only in those databases where we have carried out this step. Although most modules require this procedure, there are actually a couple of notable exceptions. For instance, the auto_explain module mentioned earlier, which is shipped together with PostgreSQL, does not create any function, type or operator. To use it, you must load its object file using the LOAD command. From that moment, all statements longer than a configurable threshold will be logged together with their execution plan. In the rest of this recipe, we will cover all the other modules. They do not require a LOAD statement because PostgreSQL can automatically load the relevant libraries when they are required. As mentioned in the previous recipe, Adding an external module to PostgreSQL, specially packaged modules are called extensions in PostgreSQL. They can be managed with dedicated SQL commands.  Getting ready Suppose you have chosen to install a certain module among those available for your system (see the previous recipe, Adding an external module to PostgreSQL); all you need to know is the extension name.  How to do it…  Each extension has a unique name, so it is just a matter of issuing the following command: CREATE EXTENSION myextname; This will automatically create all the required objects inside the current database. For security reasons, you need to do so as a database superuser. For instance, if you want to install the dblink extension, type this: CREATE EXTENSION dblink; How it works… When you issue a CREATE EXTENSION command, the database server looks for a file named EXTNAME.control in the SHAREDIR/extension directory. That file tells PostgreSQL some properties of the extension, including a description, some installation information, and the default version number of the extension (which is unrelated to the PostgreSQL version number). Then, a creation script is executed in a single transaction, so if it fails, the database is unchanged. The database server also notes in a catalog table the extension name and all the objects that belong to it. Managing installed extensions In the last two recipes, we showed you how to install external modules in PostgreSQL to augment its capabilities. In this recipe, we will show you some more capabilities offered by the extension infrastructure. How to do it… First, we list all available extensions: postgres=# x on Expanded display is on. postgres=# SELECT * postgres-# FROM pg_available_extensions postgres-# ORDER BY name; -[ RECORD 1 ]-----+-------------------------------------------------- name | adminpack default_version | 1.0 installed_version | comment | administrative functions for PostgreSQL -[ RECORD 2 ]-----+-------------------------------------------------- name | autoinc default_version | 1.0 installed_version | comment | functions for autoincrementing fields (...) In particular, if the dblink extension is installed, then we see a record like this: -[ RECORD 10 ]----+-------------------------------------------------- name | dblink default_version | 1.0 installed_version | 1.0 comment | connect to other PostgreSQL databases from within a database Now, we can list all the objects in the dblink extension, as follows: postgres=# x off Expanded display is off. postgres=# dx+ dblink Objects in extension "dblink" Object Description --------------------------------------------------------------------- function dblink_build_sql_delete(text,int2vector,integer,text[]) function dblink_build_sql_insert(text,int2vector,integer,text[],text[]) function dblink_build_sql_update(text,int2vector,integer,text[],text[]) function dblink_cancel_query(text) function dblink_close(text) function dblink_close(text,boolean) function dblink_close(text,text) (...) Objects created as parts of extensions are not special in any way, except that you can't drop them individually. This is done to protect you from mistakes: postgres=# DROP FUNCTION dblink_close(text); ERROR: cannot drop function dblink_close(text) because extension dblink requires it HINT: You can drop extension dblink instead. Extensions might have dependencies too. The cube and earthdistance contrib extensions provide a good example, since the latter depends on the former: postgres=# CREATE EXTENSION earthdistance; ERROR: required extension "cube" is not installed postgres=# CREATE EXTENSION cube; CREATE EXTENSION postgres=# CREATE EXTENSION earthdistance; CREATE EXTENSION As you can reasonably expect, dependencies are taken into account when dropping objects, just like for other objects: postgres=# DROP EXTENSION cube; ERROR: cannot drop extension cube because other objects depend on it DETAIL: extension earthdistance depends on extension cube HINT: Use DROP ... CASCADE to drop the dependent objects too. postgres=# DROP EXTENSION cube CASCADE; NOTICE: drop cascades to extension earthdistance DROP EXTENSION How it works… The pg_available_extensions system view shows one row for each extension control file in the SHAREDIR/extension directory (see the Using an installed module recipe). The pg_extension catalog table records only the extensions that have actually been created. The psql command-line utility provides the dx meta-command to examine e"x"tensions. It supports an optional plus sign (+) to control verbosity and an optional pattern for the extension name to restrict its range. Consider the following command: dx+ db* This will list all extensions whose name starts with db, together with all their objects. The CREATE EXTENSION command creates all objects belonging to a given extension, and then records the dependency of each object on the extension in pg_depend. That's how PostgreSQL can ensure that you cannot drop one such object without dropping its extension. The extension control file admits an optional line, requires, that names one or more extensions on which the current one depends. The implementation of dependencies is still quite simple. For instance, there is no way to specify a dependency on a specific version number of other extensions, and there is no command that installs one extension and all its prerequisites. As a general PostgreSQL rule, the CASCADE keyword tells the DROP command to delete all the objects that depend on cube, the earthdistance extension in this example. There's more… Another system view, pg_available_extension_versions, shows all the versions available for each extension. It can be valuable when there are multiple versions of the same extension available at the same time, for example, when making preparations for an extension upgrade. When a more recent version of an already installed extension becomes available to the database server, for instance because of a distribution upgrade that installs updated package files, the superuser can perform an upgrade by issuing the following command: ALTER EXTENSION myext UPDATE TO '1.1'; This assumes that the author of the extension taught it how to perform the upgrade. Starting from version 9.6, the CASCADE option is accepted also by the CREATE EXTENSION syntax, with the meaning of “issue CREATE EXTENSION recursively to cover all dependencies”. So, instead of creating extension cube before creating extension earthdistance, you could have just issued the following command: postgres=# CREATE EXTENSION earthdistance CASCADE; NOTICE: installing required extension "cube" CREATE EXTENSION Remember that CREATE EXTENSION … CASCADE will only work if all the extensions it tries to install have already been placed in the appropriate location. Summary In this article, we studies about RTFM, how to plan new database, changing parameters in the program, changing configurations, updating parameter files, how to use a module which is already installed. Resources for Article: Further resources on this subject: PostgreSQL in Action [article] PostgreSQL Cookbook - High Availability and Replication [article] Running a PostgreSQL Database Server [article]
Read more
  • 0
  • 0
  • 1708

article-image-microservices-and-service-oriented-architecture
Packt
09 Mar 2017
6 min read
Save for later

Microservices and Service Oriented Architecture

Packt
09 Mar 2017
6 min read
Microservices are an architecture style and an approach for software development to satisfy modern business demands. They are not a new invention as such. They are instead an evolution of previous architecture styles. Many organizations today use them - they can improve organizational agility, speed of delivery, and ability to scale. Microservices give you a way to develop more physically separated modular applications. This tutorial has been taken from Spring 5.0 Microsevices - Second Edition Microservices are similar to conventional service-oriented architectures. In this article, we will see how microservices are related to SOA. The emergence of microservices Many organizations, such as Netflix, Amazon, and eBay, successfully used what is known as the 'divide and conquer' technique to functionally partition their monolithic applications into smaller atomic units. Each one performs a single function - a 'service'. These organizations solved a number of prevailing issues they were experiencing with their monolithic application. Following the success of these organizations, many other organizations started adopting this as a common pattern to refactor their monolithic applications. Later, evangelists termed this pattern as microservices architecture. Microservices originated from the idea of Hexagonal Architecture, coined by Alistair Cockburn back in 2005. Hexagonal Architecture or Hexagonal pattern is also known as the Ports and Adapters pattern. Cockburn defined microservices as: "...an architectural style or an approach for building IT systems as a set of business capabilities that are autonomous, self contained, and loosely coupled." The following diagram depicts a traditional N-tier application architecture having presentation layer, business layer, and database layer: Modules A, B, and C represent three different business capabilities. The layers in the diagram represent separation of architecture concerns. Each layer holds all three business capabilities pertaining to that layer. Presentation layer has web components of all three modules, business layer has business components of all three modules, and database hosts tables of all three modules. In most cases, layers are physically spreadable, whereas modules within a layer are hardwired. Let's now examine a microservice-based architecture: As we can see in the preceding diagram, the boundaries are inversed in the microservices architecture. Each vertical slice represents a microservice. Each microservice will have its own presentation layer, business layer, and database layer. Microservices is aligned toward business capabilities. By doing so, changes to one microservice do not impact the others. There is no standard for communication or transport mechanisms for microservices. In general, microservices communicate with each other using widely adopted lightweight protocols, such as HTTP and REST, or messaging protocols, such as JMS or AMQP. In specific cases, one might choose more optimized communication protocols, such as Thrift, ZeroMQ, Protocol Buffers, or Avro. As microservices is more aligned to the business capabilities and has independently manageable lifecycles, they are the ideal choice for enterprises embarking on DevOps and cloud. DevOps and cloud are two facets of microservices. How do microservices compare to Service Oriented Architectures? One of the common question arises when dealing with microservices architecture is, how is it different from SOA. SOA and microservices follow similar concepts. Earlier in this article, we saw that microservices is evolved from SOA and many service characteristics that are common in both approaches. However, are they the same or different? As microservices evolved from SOA, many characteristics of microservices is similar to SOA. Let’s first examine the definition of SOA. The Open Group definition of SOA is as follows: "SOA is an architectural style that supports service-orientation. Service-orientation is a way of thinking in terms of services and service-based development and the outcomes of services. Is self-contained May be composed of other services Is a “black box” to consumers of the service" You have learned similar aspects in microservices as well. So, in what way is microservices different? The answer is--it depends. The answer to the previous question could be yes or no, depending upon the organization and its adoption of SOA. SOA is a broader term and different organizations approached SOA differently to solve different organizational problems. The difference between microservices and SOA is in the way based on how an organization approaches SOA. In order to get clarity, a few cases will be examined here. Service oriented integration Service-oriented integration refers to a service-based integration approach used by many organizations: Many organizations would have used SOA primarily to solve their integration complexities, also known as integration spaghetti. Generally, this is termed as Service Oriented Integration (SOI). In such cases, applications communicate with each other through a common integration layer using standard protocols and message formats, such as SOAP/XML-based web services over HTTP or Java Message Service (JMS). These types of organizations focus on Enterprise Integration Patterns (EIP) to model their integration requirements. This approach strongly relies on heavyweight Enterprise Service Bus (ESB),such as TIBCO Business Works, WebSphere ESB, Oracle ESB, and the likes. Most of the ESB vendors also packed a set of related product, such as Rules Engines, Business Process Management Engines, and so on as a SOA suite. Such organization's integrations are deeply rooted into these products. They either write heavy orchestration logic in the ESB layer or business logic itself in the service bus. In both cases, all enterprise services are deployed and accessed through the ESB. These services are managed through an enterprise governance model. For such organizations, microservices is altogether different from SOA. Legacy modernization SOA is also used to build service layers on top of legacy applications which is shown in the following diagram: Another category of organizations would have used SOA in transformation projects or legacy modernization projects. In such cases, the services are built and deployed in the ESB connecting to backend systems using ESB adapters. For these organizations, microservices are different from SOA. Service oriented application Some organizations would have adopted SOA at an application level: In this approach as shown in the preceding diagram, lightweight Integration frameworks, such as Apache Camel or Spring Integration, are embedded within applications to handle service related cross-cutting capabilities, such as protocol mediation, parallel execution, orchestration, and service integration. As some of the lightweight integration frameworks had native Java object support, such applications would have even used native Plain Old Java Objects (POJO) services for integration and data exchange between services. As a result, all services have to be packaged as one monolithic web archive. Such organizations could see microservices as the next logical step of their SOA. Monolithic migration using SOA The following diagram represents Logical System Boundaries: The last possibility is transforming a monolithic application into smaller units after hitting the breaking point with the monolithic system. They would have broken the application into smaller physically deployable subsystems, similar to the Y axis scaling approach explained earlier and deployed them as web archives on web servers or as jars deployed on some home grown containers. These subsystems as service would have used web services or other lightweight protocols to exchange data between services. They would have also used SOA and service design principles to achieve this. For such organizations, they may tend to think that microservices is the same old wine in a new bottle. Further resources on this subject: Building Scalable Microservices [article] Breaking into Microservices Architecture [article] A capability model for microservices [article]
Read more
  • 0
  • 0
  • 23449

article-image-replication-solutions-postgresql
Packt
09 Mar 2017
14 min read
Save for later

Replication Solutions in PostgreSQL

Packt
09 Mar 2017
14 min read
In this article by Chitij Chauhan, Dinesh Kumar, the authors of the book PostgreSQL High Performance Cookbook, we will talk about various high availability and replication solutions including some popular third-party replication tool like Slony. (For more resources related to this topic, see here.) Setting up hot streaming replication Here in this recipe we are going to set up a master/slave streaming replication. Getting ready For this exercise you would need two Linux machines each with the latest version of PostgreSQL 9.6 installed. We will be using the following IP addresses for master and slave servers. Master IP address: 192.168.0.4 Slave IP Address: 192.168.0.5 How to do it… Given are the sequence of steps for setting up master/slave streaming replication: Setup password less authentication between master and slave for postgres user. First we are going to create a user ID on the master which will be used by slave server to connect to the PostgreSQL database on the master server: psql -c "CREATE USER repuser REPLICATION LOGIN ENCRYPTED PASSWORD 'charlie';" Next would be to allow the replication user that was created in the previous step to allow access to the master PostgreSQL server. This is done by making the necessary changes as mentioned in the pg_hba.conf file: Vi pg_hba.conf host replication repuser 192.168.0.5/32 md5 In the next step we are going to configure parameters in the postgresql.conf file. These parameters are required to be set in order to get the streaming replication working: Vi /var/lib/pgsql/9.6/data/postgresql.conf listen_addresses = '*' wal_level = hot_standby max_wal_senders = 3 wal_keep_segments = 8 archive_mode = on archive_command = 'cp %p /var/lib/pgsql/archive/%f && scp %p postgres@192.168.0.5:/var/lib/pgsql/archive/%f' Once the parameter changes have been made in the postgresql.conf file in the previous step ,the next step would be restart the PostgreSQL server on the master server in order to get the changes made in the previous step come into effect: pg_ctl -D /var/lib/pgsql/9.6/data restart Before the slave can replicate the master we would need to give it the initial database to build off. For this purpose we will make a base backup by copying the primary server's data directory to the standby: psql -U postgres -h 192.168.0.4 -c "SELECT pg_start_backup('label', true)" rsync -a /var/lib/pgsql/9.6/data/ 192.168.0.5:/var/lib/pgsql/9.6/data/ --exclude postmaster.pid psql -U postgres -h 192.168.0.4 -c "SELECT pg_stop_backup()" Once the data directory in the previous step is populated ,next step is to configure the following mentioned parameters in the postgresql.conf file on the slave server: hot_standby = on Next would be to copy the recovery.conf.sample in the $PGDATA location on the slave server and then configure the following mentioned parameters: cp /usr/pgsql-9.6/share/recovery.conf.sample /var/lib/pgsql/9.6/data/recovery.conf standby_mode = on primary_conninfo = 'host=192.168.0.4 port=5432 user=repuser password=charlie' trigger_file = '/tmp/trigger.replication' restore_command = 'cp /var/lib/pgsql/archive/%f "%p"' Next would be to start the slave server: service postgresql-9.6 start Now that the preceding mentioned replication steps are set up we will now test for replication. On the master server, login and issue the following mentioned SQL commands: psql -h 192.168.0.4 -d postgres -U postgres -W postgres=# create database test; postgres=# c test; test=# create table testtable ( testint int, testchar varchar(40) ); CREATE TABLE test=# insert into testtable values ( 1, 'What A Sight.' ); INSERT 0 1 On the slave server we will now check if the newly created database and the corresponding table in the previous step are replicated: psql -h 192.168.0.5 -d test -U postgres -W test=# select * from testtable; testint | testchar --------+---------------- 1 | What A Sight. (1 row) The wal_keep_segments parameter ensures that how many WAL files should be retained in the master pg_xlog in case of network delays. However if you do not want assume a value for this, you can create a replication slot which makes sure master does not remove the WAL files in pg_xlog until they have been received by standbys. For more information refer to: https://www.postgresql.org/docs/9.6/static/warm-standby.html#STREAMING-REPLICATION-SLOTS. How it works… The following is the explanation given for the steps done in the preceding section: In the initial step of the preceding section we create a user called repuser which will be used by the slave server to make a connection to the primary server. In step 2 of the preceding section we make the necessary changes in the pg_hba.conf file to allow the master server to be accessed by the slave server using the user ID repuser that was created in the step 2. We then make the necessary parameter changes on the master in step 4 of the preceding section for configuring streaming replication. Given is a description for these parameters:     Listen_Addresses: This parameter is used to provide the IP address that you want to have PostgreSQL listen too. A value of * indicates that all available IP addresses.     wal_level: This parameter determines the level of WAL logging done. Specify hot_standby for streaming replication.    wal_keep_segments: This parameter specifies the number of 16 MB WAL files to retain in the pg_xlog directory. The rule of thumb is that more such files may be required to handle a large checkpoint.     archive_mode: Setting this parameter enables completed WAL segments to be sent to archive storage.     archive_command: This parameter is basically a shell command is executed whenever a WAL segment is completed. In our case we are basically copying the file to the local machine and then we are using the secure copy command to send it across to the slave.     max_wal_senders: This parameter specifies the total number of concurrent connections allowed from the slave servers. Once the necessary configuration changes have been made on the master server we then restart the PostgreSQL server on the master in order to get the new configuration changes come into effect. This is done in step 5 of the preceding section. In step 6 of the preceding section, we were basically building the slave by copying the primary's data directory to the slave. Now with the data directory available on the slave the next step is to configure it. We will now make the necessary parameter replication related parameter changes on the slave in the postgresql.conf directory on the slave server. We set the following mentioned parameter on the slave:    hot_standby: This parameter determines if we can connect and run queries during times when server is in the archive recovery or standby mode In the next step we are configuring the recovery.conf file. This is required to be setup so that the slave can start receiving logs from the master. The following mentioned parameters are configured in the recovery.conf file on the slave:    standby_mode: This parameter when enabled causes PostgreSQL to work as a standby in a replication configuration.    primary_conninfo: This parameter specifies the connection information used by the slave to connect to the master. For our scenario the our master server is set as 192.168.0.4 on port 5432 and we are using the user ID repuser with password charlie to make a connection to the master. Remember that the repuser was the user ID which was created in the initial step of the preceding section for this purpose that is, connecting to the  master from the slave.    trigger_file: When slave is configured as a standby it will continue to restore the XLOG records from the master. The trigger_file parameter specifies what is used to trigger a slave to switch over its duties from standby and take over as master or being the primary server. At this stage the slave has been now fully configured and we then start the slave server and then replication process begins. In step 10 and 11 of the preceding section we are simply testing our replication. We first begin by creating a database test and then log into the test database and create a table by the name test table and then begin inserting some records into the test table. Now our purpose is to see whether these changes are replicated across the slave. To test this we then login into slave on the test database and then query the records from the test table as seen in step 10 of the preceding section. The final result that we see is that the all the records which are changed/inserted on the primary are visible on the slave. This completes our streaming replication setup and configuration. Replication using Slony Here in this recipe we are going to setup replication using Slony which is widely used replication engine. It replicates a desired set of tables data from one database to other. This replication approach is based on few event triggers which will be created on the source set of tables which will log the DML and DDL statements into a Slony catalog tables. By using Slony, we can also setup the cascading replication among multiple nodes. Getting ready The steps followed in this recipe are carried out on a CentOS Version 6 machine. We would first need to install Slony. The following mentioned are the steps needed to install Slony: First go to the mentioned web link and download the given software at http://slony.info/downloads/2.2/source/. Once you have downloaded the following mentioned software the next step is to unzip the tarball and then go the newly created directory: tar xvfj slony1-2.2.3.tar.bz2 cd slony1-2.2.3 In the next step we are going to configure, compile, and build the software: /configure --with-pgconfigdir=/usr/pgsql-9.6/bin/ make make install How to do it… The following mentioned are the sequence of steps required to replicate data between two tables using Slony replication: First start the PostgreSQL server if not already started: pg_ctl -D $PGDATA start In the next step we will be creating two databases test1 and test 2 which will be used as source and target databases: createdb test1 createdb test2 In the next step we will create the table t_test on the source database test1 and will insert some records into it: psql -d test1 test1=# create table t_test (id numeric primary key, name varchar); test1=# insert into t_test values(1,'A'),(2,'B'), (3,'C'); We will now set up the target database by copying the table definitions from the source database test1: pg_dump -s -p 5432 -h localhost test1 | psql -h localhost -p 5432 test2 We will now connect to the target database test2 and verify that there is no data in the tables of the test2 database: psql -d test2 test2=# select * from t_test; We will now setup a slonik script for master/slave that is, source/target setup: vi init_master.slonik #! /bin/slonik cluster name = mycluster; node 1 admin conninfo = 'dbname=test1 host=localhost port=5432 user=postgres password=postgres'; node 2 admin conninfo = 'dbname=test2 host=localhost port=5432 user=postgres password=postgres'; init cluster ( id=1); create set (id=1, origin=1); set add table(set id=1, origin=1, id=1, fully qualified name = 'public.t_test'); store node (id=2, event node = 1); store path (server=1, client=2, conninfo='dbname=test1 host=localhost port=5432 user=postgres password=postgres'); store path (server=2, client=1, conninfo='dbname=test2 host=localhost port=5432 user=postgres password=postgres'); store listen (origin=1, provider = 1, receiver = 2); store listen (origin=2, provider = 2, receiver = 1); We will now create a slonik script for subscription on the slave that is, target: vi init_slave.slonik #! /bin/slonik cluster name = mycluster; node 1 admin conninfo = 'dbname=test1 host=localhost port=5432 user=postgres password=postgres'; node 2 admin conninfo = 'dbname=test2 host=localhost port=5432 user=postgres password=postgres'; subscribe set ( id = 1, provider = 1, receiver = 2, forward = no); We will now run the init_master.slonik script created in step 6 and will run this on the master: cd /usr/pgsql-9.6/bin slonik init_master.slonik We will now run the init_slave.slonik script created in step 7 and will run this on the slave that is, target: cd /usr/pgsql-9.6/bin slonik init_slave.slonik In the next step we will start the master slon daemon: nohup slon mycluster "dbname=test1 host=localhost port=5432 user=postgres password=postgres" & In the next step we will start the slave slon daemon: nohup slon mycluster "dbname=test2 host=localhost port=5432 user=postgres password=postgres" & In the next step we will connect to the master that is, source database test1 and insert some records in the t_test table: psql -d test1 test1=# insert into t_test values (5,'E'); We will now test for replication by logging to the slave that is, target database test2 and see if the inserted records into the t_test table in the previous step are visible: psql -d test2 test2=# select * from t_test; id | name ----+------ 1 | A 2 | B 3 | C 5 | E (4 rows) How it works… We will now discuss about the steps followed in the preceding section: In step 1, we first start the PostgreSQL server if not already started. In step 2 we create two databases namely test1 and test2 that will serve as our source (master) and target (slave) databases. In step 3 of the preceding section we log into the source database test1 and create a table t_test and insert some records into the table. In step 4 of the preceding section we set up the target database test2 by copying the table definitions present in the source database and loading them into the target database test2 by using pg_dump utility. In step 5 of the preceding section we login into the target database test2 and verify that there are no records present in the table t_test because in step 5 we only extracted the table definitions into test2 database from test1 database. In step 6 we setup a slonik script for master/slave replication setup. In the file init_master.slonik we first define the cluster name as mycluster. We then define the nodes in the cluster. Each node will have a number associated to a connection string which contains database connection information. The node entry is defined both for source and target databases. The store_path commands are necessary so that each node knows how to communicate with the other. In step 7 we setup a slonik script for subscription of the slave that is, target database test2. Once again the script contains information such as cluster name, node entries which are designed a unique number related to connect string information. It also contains a subscriber set. In step 8 of the preceding section we run the init_master.slonik on the master. Similarly in step 9 we run the init_slave.slonik on the slave. In step 10 of the preceding section we start the master slon daemon. In step 11 of the preceding section we start the slave slon daemon. Subsequent section from step 12 and 13 of the preceding section are used to test for replication. For this purpose in step 12 of the preceding section we first login into the source database test1 and insert some records into the t_test table. To check if the newly inserted records have been replicated to target database test2 we login into the test2 database in step 13 and then result set obtained by the output of the query confirms that the changed/inserted records on the t_test table in the test1 database are successfully replicated across the target database test2. You may refer to the link given for more information regarding Slony replication at http://slony.info/documentation/tutorial.html. Summary We have seen how to setup streaming replication and then we looked at how to install and replicate using one popular third-party replication tool Slony. Resources for Article: Further resources on this subject: Introducing PostgreSQL 9 [article] PostgreSQL Cookbook - High Availability and Replication [article] PostgreSQL in Action [article]
Read more
  • 0
  • 0
  • 16695

article-image-interface
Packt
09 Mar 2017
18 min read
Save for later

The Interface

Packt
09 Mar 2017
18 min read
In this article by Tim Woodruff, authors of the book Learning ServiceNow, No matter what software system you're interested in learning about, understanding the interface is likely to be the first step toward success. ServiceNow is a very robust IT service management tool, and has an interface to match. Designed both to be easy to use, and to support a multitude of business processes and applications (both foreseen and unforeseen), it must be able to bend to the will of the business, and be putty in the hands of a capable developer. (For more resources related to this topic, see here.) You'll learn all the major components of the UI (user interface), and how to manipulate them to suit your business needs, and look good doing it. You'll also learn some time-saving tips, tricks, and UI shortcuts that have been built into the interface for power users to get around more quickly. This article will cover the key components of the user interface, including: The content and ServiceNow frames The application navigator UI settings and personalization We recommend that you follow along in your own development instance as you read through this section, to gain a more intimate familiarity with the interface. Frames ServiceNow is a cloud platform that runs inside your browser window. Within your browser, the ServiceNow interface is broken up into frames. Frames, in web parlance, are just separately divided sections of a page. In ServiceNow, there are two main frames: the ServiceNow frame, and the content frame.  Both have different controls and display different information. This section will show you what the different frames are, what they generally contain, and the major UI elements within them. The ServiceNow frame consists of many UI elements spanning across both the top, and left side of the ServiceNow window in your browser. ServiceNow frame Technically, the ServiceNow frame can be further broken up into two frames: The banner frame along the top edge of the interface, and the application navigator along the left side. Banner frame The banner frame runs along the top of every page in ServiceNow, save for a few exceptions. It's got room for some branding and a logo, but the more functional components for administrators and developers is on the right. There, you'll find: System settings cog Help and documentation button Conversations panel button Instance search button Profile/session dropdown System settings In your developer instance, on the far-top-right, you will see a sort of cog or sprocket. That is a universal sort of the Settings menu icon. Clicking on that icon reveals the System settings menu. This menu is broken down into several sections: General Theme Lists Forms Notifications Developer (Admins only) Fig 1.1: System Settings The settings in this menu generally apply only to the current user who's signed in, so you can freely toggle and modify these settings without worrying about breaking anything. In the General tab (as seen in the preceding figure) of the System settings UI, you'll find toggles to control accessibility options, compact the user interface, select how date/time fields are shown, select your time-zone, and even an option to display a printer-friendly version of the page you're on. In Geneva, you'll also see an option to Wrap Longer Text in List Columns and Compact list date/time. On the Theme tab (in the preceding figure), you'll find several pre-made ServiceNow themes with names like System and Blues. One of the first things that a company often does when deploying ServiceNow, is to create a custom-branded theme. We'll go over how to do that in a later section, and you'll be able to see your custom themes there. The Lists tab (not available in Geneva) contains the option to wrap longer text in list columns (which was under the General tab in Geneva), as well as options to enable striped table rows (which alternates rows in a table between contrasting shades of gray, making it easier to follow with the eye from left to right) and modern cell styles. All options in the Lists tab except Wrap longer text in list columns require the List V3 plugin to be enabled before they'll show up, as they only apply to List V3. If you've installed a fresh ServiceNow instance using Helsinki or a later version, the List V3 plugin will be enabled by default. However, if you've upgraded from Geneva or an earlier version, to Helsinki, you'll be on List V2 by default, and list V3 will need to be enabled. This, and any other plugins, can be enabled from System Definition | Plugins in the application navigator. The Forms tab contains settings to enable tabbed forms, as well as to control how and when related lists load. Related lists are lists (like tables in a spreadsheet) of related that appear at the bottom of forms. Forms are where key data about an individual record are displayed. The Notifications tab (not available in Geneva) allows you to choose whether to get notifications on your mobile device, desktop toast notifications, e-mail notifications, or audio notifications. Finally, the Developer tab (only available to users with the admin role) is where you can find settings relating to application and update set-based development. By default, your selected update set should say Default [Global], which means that any configuration changes you make in the instance will not be captured in a portable update set that you can move between instances. We'll go into detail about what these things mean later on. For now, follow along with the following steps in your developer instance using your Administrator account, as we create a new update set to contain any configuration changes we'll be making in this article: If you don't already have the System Settings menu open, click on the System Settings gear in the top-right of the ServiceNow interface. If you haven't already done so, click on the Developer tab on the bottom-left. Next, navigate to the Local Update Setstable. In the main section of the System Settings dialog, you should see the third row down labeled Update Sets. To the right of that should be a dropdown with Default [Global] selected, followed by three buttons. The first button () is called a Reference icon. Clicking it will take you to the currently selected update set (in this case, Default). The second button () will take you to the list view, showing you all of the local update sets. The third button will refresh the currently selected update set, in case you've changed update sets in another window or tab. Click on the second button, to navigate to the Local Update Sets list view. Click on the blue New button at the top-left of the page to go to the new update set form. Give this update set a name. Let's enter Article 1 into the Name field. Fill out the Description by writing in something like Learning about the ServiceNow interface! Leave State and Release date to their default values. Click Submit and Make Current. Alternately, you could click Submitor right-click the header and click Save, then return to the record and click the Make This My Current Set related link. Now that we've created an update set, any configuration changes we make will be captured and stored in a nice little package that we can back out or move into another instance to deploy the same changes. Now let's just confirm that we've got the right update set selected: Once again, click on the System Settings gear at the top-right of the ServiceNow window, and open the Developer tab. If the selected update set still shows as Default, click the Refresh button (the third icon to the right of the selected update set). If the update set still shows as Default, just select your new Article1 update set from the Update Set drop-down list. Help Next on the right side of the banner frame, is the Help icon. Clicking on this icon opens up the Help panel on the right side of the page. The Help menu has three sections: What's New, User Guide, and Search Documentation. Or, if you're in Geneva, it shows only What's New and Search Product Documentation. Clicking What's New just brings up the introduction to your instance version, with a couple of examples of the more prominent new features over the previous version. The User Guidewill redirect you to an internal mini-guide with some useful pocket-reference types of info in Helsinki. It's very slim on the details though, so you might be better off searching the developer site (http://developer.servicenow.com) or documentation (http://docs.servicenow.com ) if you have any specific questions. Speaking of the documentation site, Search Documentation is essentially a link. Clicking this link from a form or list will automatically populate a query relating to the type of record(s) you were viewing. Conversations Moving further left in the banner frame, you'll find the Conversations button. This opens up the Conversations side-bar, showing an (initially blank) list of the conversations you've recently been a part of. You can enter text in the filter box to filter the conversation list by participant name. Unfortunately, it doesn't allow you to filter/search by message contents at this point. You can also click the Plus icon to initiate a new conversation with a user of your choice. Global text search The next link to the right in the banner frame is probably the most useful one of all – the global text search. The global text search box allows you to enter a term, ticket number, or keyword and search a configurable multitude of tables. As an example of this functionality, let's search for a user that should be present in the demo data that came with your developer instance: Click on the Search icon (the one that looks like a magnifying glass). It should expand to the left, displaying a search keyword input box. In that input box, type in abel tuter. This is the name of one of the demo users that comes with your developer instance. Press Enter, and you should see the relevant search results divided into sections. Entering an exact ticket number for a given task (such as an incident, request, or problem ticket) will take you directly to that ticket rather than showing the search results. This is a great way to quickly navigate to a ticket you've received an e-mail notification about, or for a service desk agent to look up a ticket number provided by a customer. The search results from the Global Text Search are divided into search groups. The default groups are Tasks, Live Feed, Policy, and People & Places. To the right of each search group is a list of the tables that the search is run against for that group. The Policy search group, for example, contains several script types, including Business Rules, UI Actions, Client Scripts, and UI Policies. Profile The last item on our list of banner-frame elements, is the profile link. This will show your photo/icon (if you've uploaded one), and your name. As indicated by the small down-facing arrow to the right of your name (or System Administrator), clicking on this will show a little drop-down menu. This menu consists of up to four main components: Profile Impersonate User Elevate Roles Logout The Profile link in the dropdown will take you directly to the Self Service view of your profile. This is generally not what Administrators want, but it's a quick way for users to view their profile information. Impersonate User is a highly useful tool for administrators and developers, allowing them to view the instance as though they were another user, including that user's security permissions, and viewing the behavior of UI policies and scripts when that user is logged in. Elevate Roles is an option only available when the High Security plugin is enabled (which may or may not be turned on by default in your organization). Clicking this option opens a dialog that allows you to check a box, and re-initialize your session with a special security role called security_admin (assuming you have this role in your instance). With high security settings enabled, the security_admin role allows you to perform certain actions, such as modifying ACLs (Access Control Lists – security rules), and running background scripts (scripts you can write and execute directly on the server). Finally, the Logout link does just what you'd expect: logs you out. If you have difficulty with a session that you can't log out, you can always log out by visiting /logout.do on your instance. For example: http://your-instance.service-now.com/logout.do/. The application navigator The application navigator is one of the UI components with which you will become most familiar, as you work in ServiceNow. Nearly everything you do will begin either by searching in the Global Text Search box, or by filtering the application navigator. The contents of the Application Navigator consists of Modules nested underneath application menu. The first application menu in the application navigator is Self-Service. This application menu is generally what's available to a user who doesn't have any special roles or permissions. Underneath this application menu, you'll see various modules such as Homepage, Service Catalog, Knowledge, and so on. The Self-Service application menu, and several modules under it. When you hear the term application as it relates to ServiceNow, you might think of an application on your smartphone. Applications in ServiceNow and applications on your smartphone both generally consist of packaged functionality, presented in a coherent way. However in ServiceNow, there are some differences. For example, an application header might consist only of links to other areas in ServiceNow, and contain no new functionality of its' own. An application might not even necessarily have an application header. Generally, we refer to the major ITIL processes in ServiceNow as applications (Incident, Change, Problem, Knowledge, and so on) – but these can often consist of various components linked up with one another; so the functionality within an application need not necessarily be packaged in a way that it's closed off from the rest of the system. You'll often be given instructions to navigate to a particular module in a way similar to this: Self-Service | My Requests. In this example, the left portion (Self-Service) is the application menu header, and the right portion (My Requests) is the module. Filter text box The filter text box in the Application Navigator allows you to enter a string to – you guessed it – filter the Application Navigator list with! It isn't strictly a search, it's just filtering the list of items in the application navigator, which means that the term you enter must appear somewhere in the name of either an application menu, or a module. So if you enter the term Incident, you'll see modules with names like Incidents and Watched Incidents, as well as every module inside the Incident application menu. However, if you enter Create Incident, you won't get any results. This is because the module for creating a new Incident, is called Create New, inside the Incident module, and the term Create Incident doesn't appear in that title. In addition to filtering the application navigator, the filter text box has some hidden shortcuts that ServiceNow wizards use to fly around the interface with the speed of a ninja. Here are a few pro tips for you: Once you've entered a term into the filter text box in the application navigator, the first module result is automatically selected. You can navigate to it by pressing Enter. Enter a table name followed by .list and then press Enter to navigate directly to the default list view for that table. For example, entering sc_req_item.list [Enter] will direct you to the list view for the sc_req_item (Requested Item) table. Enter a table name followed by either .form, or .do and then press Enter to take you directly to the default view of that table's form (allowing you to quickly create a new record). For example, entering sc_request.form [Enter] will take you to the New Record intake form for the sc_request (Request) table. Each table has a corresponding form, with certain fields displayed by default. Use either .FORM or .LIST in caps, to navigate to the list or form view in a new tab or window!  Opening a list or form in a new tab (either using this method, by middle-clicking a link, or otherwise) breaks it out of the ServiceNow frame, showing only the Content frame. Try it yourself: Enter sys_user.list into the application navigator filter text field in your developer instance, and press Enter. You should see the list of all the demo users in your instance! No matter which application navigator tab you have selected when you start typing in the filter text box, it will always show you results from the all applications tab, with any of your favorites that match the filter showing up first. Favorites Users can add favorites within the Application Navigator by clicking the star icon, visible on the right when hovering over any application menu or module in the application navigator. Adding a favorite will make it come up first when filtering the application navigator using any term that it matches. It'll also show up under your favorites list, which you can see by clicking the tab at the top of the application navigator, below the filter text box, with the same star icon you see when adding a module to your favorites. Let's try out favorites now by adding some favorites that an admin or developer is likely to want to come back to on frequent occasions. Add the following modules to your favorites list by filtering the application navigator by the module name, hovering over the module, and clicking the star icon on the right: Workflow | Workflow Editor System Definition | Script Includes System Definition | Dictionary System Update Sets | Local Update Sets System Logs | System Log | All This one (All) is nested under a module (System Log) that doesn't point anywhere, but it is just there to serve as a separator for other modules. It's not much use searching for All, so try searching for System Log! Now that we've got a few favorites, let's rename them so they're easier to identify at a glance. While we're at it, we'll give them some new icons as well: Click the favorites tab in the application navigator, and you should see your newly added favorites in the list. At the bottom-right of the application navigator in the ServiceNow frame, click on Edit Favorites. Click on the favorite item called Workflow – Workflow Editor. This will select it so you can edit it in the content frame on the right: 01-10-Editing workflow favorite.png In the Name field, give it something simpler, such as Workflow Editor. Then choose a color and an icon. I chose white, and the icon that looks like a flowchart. I also removed my default Home favorite, but you don't have to. Here is what my favorites look like after I make my modifications: 01-11-Favorites after customizing.png Another way to add something to your favorites is to drag it there. Certain specific elements in the ServiceNow UI can be dragged directly into your Favorites tab. Let's give it a try! Head over to the Incident table by using the .list trick. In your developer instance, enter incident.list into the filter text box in the application navigator; and then press Enter. Click on the Filter icon at the top-left of the Incident list, and filter the Incident list using the condition builder. Add some conditions so that it only displays records where Active is true, and Assigned to is empty. Then click on Run. 01-12-Incident condition for favorites.png The list should now be filtered, after you hit Run. You should see just a few incidents in the list. Now, at the top-left of the Incident table, to the left of the Incidents table label, click on the hamburger menu (yeah, that's really what it's called). It looks like three horizontal bars atop one another. In that menu, click on Create Favorite. Choose a good name, like Unassigned Incidents, and an appropriate icon and color. Then click Done. You should now have an Unassigned Incidents favorite listed! Finally, if you click on the little white left-facing arrow at the bottom-left of the application navigator, you'll notice that whichever navigator tab you have selected, your favorites show up in a stacked list on the left. This gives you a bit more screen real-estate for the content frame. Summary In this article, we learned about: How content is organized on the screen, within frames – the banner frame, Application Navigator, and content frame. How to access the built-in help and documentation for ServiceNow. How to use the global text search functionality, to find the records we're looking for. What it means to elevate roles or impersonate a user. How to get around the Application Navigator, including some pro tips on getting around like a power user from the filter text box. How to use favorites and navigation history within ServiceNow, to our advantage. What UI settings are available, and how to personalize our interface. Resources for Article: Further resources on this subject: Getting Things Done with Tasks [article] Events, Notifications, and Reporting [article] Start Treating your Infrastructure as Code [article]
Read more
  • 0
  • 0
  • 1925

article-image-members-inheritance-and-polymorphism
Packt
09 Mar 2017
16 min read
Save for later

Members Inheritance and Polymorphism

Packt
09 Mar 2017
16 min read
In this article by Gastón C. Hillar, the author of the book Java 9 with JShell, we will learn about one of the most exciting features of object-oriented programming in Java 9: polymorphism. We will code many classes and then we will work with their instances in JShell to understand how objects can take many different forms. We will: Create concrete classes that inherit from abstract superclasses Work with instances of subclasses Understand polymorphism Control whether subclasses can or cannot override members Control whether classes can be subclassed Use methods that perform operations with instances of different subclasses (For more resources related to this topic, see here.) Creating concrete classes that inherit from abstract superclasses We will consider the existence of an abstract base class named VirtualAnimal and the following three abstract subclasses: VirtualMammal, VirtualDomesticMammal, and VirtualHorse. Next, we will code the following three concrete classes. Each class represents a different horse breed and is a subclass of the VirtualHorse abstract class. AmericanQuarterHorse: This class represents a virtual horse that belongs to the American Quarter Horse breed. ShireHorse: This class represents a virtual horse that belongs to the Shire Horse breed. Thoroughbred: This class represents a virtual horse that belongs to the Thoroughbred breed. The three concrete classes will implement the following three abstract methods they inherited from abstract superclasses: String getAsciiArt(): This abstract method is inherited from the VirtualAnimal abstract class. String getBaby(): This abstract method is inherited from the VirtualAnimal abstract class. String getBreed(): This abstract method is inherited from the VirtualHorse abstract class. The following UML diagram shows the members for the three concrete classes that we will code: AmericanQuarterHorse, ShireHorse, and Thoroughbred. We don’t use bold text format for the three methods that each of these concrete classes will declare because they aren’t overriding the methods, they are implementing the abstract methods that the classes inherited. First, we will create the AmericanQuarterHorse concrete class. The following lines show the code for this class in Java 9. Notice that there is no abstract keyword before class, and therefore, our class must make sure that it implements all the inherited abstract methods. public class AmericanQuarterHorse extends VirtualHorse { public AmericanQuarterHorse( int age, boolean isPregnant, String name, String favoriteToy) { super(age, isPregnant, name, favoriteToy); System.out.println("AmericanQuarterHorse created."); } public AmericanQuarterHorse( int age, String name, String favoriteToy) { this(age, false, name, favoriteToy); } public String getBaby() { return "AQH baby "; } public String getBreed() { return "American Quarter Horse"; } public String getAsciiArt() { return " >>\.n" + " /* )`.n" + " // _)`^)`. _.---. _n" + " (_,' \ `^-)'' `.\n" + " | | \n" + " \ / |n" + " / \ /.___.'\ (\ (_n" + " < ,'|| \ |`. \`-'n" + " \\ () )| )/n" + " |_>|> /_] //n" + " /_] /_]n"; } } Now, we will create the ShireHorse concrete class. The following lines show the code for this class in Java 9: public class ShireHorse extends VirtualHorse { public ShireHorse( int age, boolean isPregnant, String name, String favoriteToy) { super(age, isPregnant, name, favoriteToy); System.out.println("ShireHorse created."); } public ShireHorse( int age, String name, String favoriteToy) { this(age, false, name, favoriteToy); } public String getBaby() { return "ShireHorse baby "; } public String getBreed() { return "Shire Horse"; } public String getAsciiArt() { return " ;;n" + " .;;'*\n" + " __ .;;' ' \n" + " /' '\.~~.~' \ /'\.)n" + " ,;( ) / |n" + " ,;' \ /-.,,( )n" + " ) /| ) /|n" + " ||(_\ ||(_\n" + " (_\ (_\n"; } } Finally, we will create the Thoroughbred concrete class. The following lines show the code for this class in Java 9: public class Thoroughbred extends VirtualHorse { public Thoroughbred( int age, boolean isPregnant, String name, String favoriteToy) { super(age, isPregnant, name, favoriteToy); System.out.println("Thoroughbred created."); } public Thoroughbred( int age, String name, String favoriteToy) { this(age, false, name, favoriteToy); } public String getBaby() { return "Thoroughbred baby "; } public String getBreed() { return "Thoroughbred"; } public String getAsciiArt() { return " })\-=--.n" + " // *._.-'n" + " _.-=-...-' /n" + " {{| , |n" + " {{\ | \ /_n" + " }} \ ,'---'\___\n" + " / )/\\ \\ >\n" + " // >\ >\`-n" + " `- `- `-n"; } } We have more than one constructor defined for the three concrete classes. The first constructor that requires four arguments uses the super keyword to call the constructor from the base class or superclass, that is, the constructor defined in the VirtualHorse class. After the constructor defined in the superclass finishes its execution, the code prints a message indicating that an instance of each specific concrete class has been created. The constructor defined in each class prints a different message. The second constructor uses the this keyword to call the previously explained constructor with the received arguments and with false as the value for the isPregnant argument. Each class returns a different String in the implementation of the getBaby and getBreed methods. In addition, each class returns a different ASCII art representation for a virtual horse in the implementation of the getAsciiArt method. Understanding polymorphism We can use the same method, that is, a method with the same name and arguments, to cause different things to happen according to the class on which we invoke the method. In object-oriented programming, this feature is known as polymorphism. Polymorphism is the ability of an object to take on many forms, and we will see it in action by working with instances of the previously coded concrete classes. The following lines create a new instance of the AmericanQuarterHorse class named american and use one of its constructors that doesn’t require the isPregnant argument: AmericanQuarterHorse american = new AmericanQuarterHorse( 8, "American", "Equi-Spirit Ball"); american.printBreed(); The following lines show the messages that the different constructors displayed in JShell after we enter the previous code: VirtualAnimal created. VirtualMammal created. VirtualDomesticMammal created. VirtualHorse created. AmericanQuarterHorse created. The constructor defined in the AmericanQuarterHorse calls the constructor from its superclass, that is, the VirtualHorse class. Remember that each constructor calls its superclass constructor and prints a message indicating that an instance of the class is created. We don’t have five different instances; we just have one instance that calls the chained constructors of five different classes to perform all the necessary initialization to create an instance of AmericanQuarterHorse. If we execute the following lines in JShell, all of them will display true as a result, because american belongs to the VirtualAnimal, VirtualMammal, VirtualDomesticMammal, VirtualHorse, and AmericanQuarterHorse classes. System.out.println(american instanceof VirtualAnimal); System.out.println(american instanceof VirtualMammal); System.out.println(american instanceof VirtualDomesticMammal); System.out.println(american instanceof VirtualHorse); System.out.println(american instanceof AmericanQuarterHorse); The results of the previous lines mean that the instance of the AmericanQuarterHorse class, whose reference is saved in the american variable of type AmericanQuarterHorse, can take on the form of an instance of any of the following classes: VirtualAnimal VirtualMammal VirtualDomesticMammal VirtualHorse AmericanQuarterHorse The following screenshot shows the results of executing the previous lines in JShell: We coded the printBreed method within the VirtualHorse class, and we didn’t override this method in any of the subclasses. The following is the code for the printBreed method: public void printBreed() { System.out.println(getBreed()); } The code prints the String returned by the getBreed method, declared in the same class as an abstract method. The three concrete classes that inherit from VirtualHorse implemented the getBreed method and each of them returns a different String. When we called the american.printBreed method, JShell displayed American Quarter Horse. The following lines create an instance of the ShireHorse class named zelda. Note that in this case, we use the constructor that requires the isPregnant argument. As happened when we created an instance of the AmericanQuarterHorse class, JShell will display a message for each constructor that is executed as a result of the chained constructors we coded. ShireHorse zelda = new ShireHorse(9, true, "Zelda", "Tennis Ball"); The next lines call the printAverageNumberOfBabies and printAsciiArt instance methods for american, the instance of AmericanQuarterHorse, and zelda, which is the instance of ShireHorse. american.printAverageNumberOfBabies(); american.printAsciiArt(); zelda.printAverageNumberOfBabies(); zelda.printAsciiArt(); We coded the printAverageNumberOfBabies and printAsciiArt methods in the VirtualAnimal class, and we didn’t override them in any of its subclasses. Hence, when we call these methods for either american or zelda, Java will execute the code defined in the VirtualAnimal class. The printAverageNumberOfBabies method uses the int value returned by the getAverageNumberOfBabies and the String returned by the getBaby method to generate a String that represents the average number of babies for a virtual animal. The VirtualHorse class implemented the inherited getAverageNumberOfBabies abstract method with code that returns 1. The AmericanQuarterHorse and ShireHorse classes implemented the inherited getBaby abstract method with code that returns a String that represents a baby for the virtual horse breed: "AQH baby" and "ShireHorse baby". Thus, our call to the printAverageNumberOfBabies method will produce different results in each instance because they belong to a different class. The printAsciiArt method uses the String returned by the getAsciiArt method to print the ASCII art that represents a virtual horse. The AmericanQuarterHorse and ShireHorse classes implemented the inherited getAsciiArt abstract method with code that returns a String with the ASCII art that is appropriate for each virtual horse that the class represents. Thus, our call to the printAsciiArt method will produce different results in each instance because they belong to a different class. The following screenshot shows the results of executing the previous lines in JShell. Both instances run the same code for the two methods that were coded in the VirtualAnimal abstract class. However, each class provided a different implementation for the methods that end up being called to generated the result and cause the differences in the output. The following lines create an instance of the Thoroughbred class named willow, and then call its printAsciiArt method. As happened before, JShell will display a message for each constructor that is executed as a result of the chained constructors we coded. Thoroughbred willow = new Thoroughbred(5, "Willow", "Jolly Ball"); willow.printAsciiArt(); The following screenshot shows the results of executing the previous lines in JShell. The new instance is from a class that provides a different implementation of the getAsciiArt method, and therefore, we will see a different ASCII art than in the previous two calls to the same method for the other instances. The following lines call the neigh method for the instance named willow with a different number of arguments. This way, we take advantage of the neigh method that we overloaded four times with different arguments. Remember that we coded the four neigh methods in the VirtualHorse class and the Thoroughbred class inherits the overloaded methods from this superclass through its hierarchy tree. willow.neigh(); willow.neigh(2); willow.neigh(2, american); willow.neigh(3, zelda, true); american.nicker(); american.nicker(2); american.nicker(2, willow); american.nicker(3, willow, true); The following screenshot shows the results of calling the neigh and nicker methods with the different arguments in JShell: We called the four versions of the neigh method defined in the VirtualHorse class for the Thoroughbred instance named willow. The third and fourth lines that call the neigh method specify a value for the otherDomesticMammal argument of type VirtualDomesticMammal. The third line specifies american as the value for otherDomesticMammal and the fourth line specifies zelda as the value for the same argument. Both the AmericanQuarterHorse and ShireHorse concrete classes are subclasses of VirtualHorse, and VirtualHorse is a subclass or VirtualDomesticMammal. Hence, we can use american and zelda as arguments where a VirtualDomesticMammal instance is required. Then, we called the four versions of the nicker method defined in the VirtualHorse class for the AmericanQuarterHorse instance named american. The third and fourth lines that call the nicker method specify willow as the value for the otherDomesticMammal argument of type VirtualDomesticMammal. The Thoroughbred concrete class is also a subclass of VirtualHorse, and VirtualHorse is a subclass or VirtualDomesticMammal. Hence, we can use willow as an argument where a VirtualDomesticMammal instance is required. Controlling overridability of members in subclasses We will code the VirtualDomesticCat abstract class and its concrete subclass: MaineCoon. Then, we will code the VirtualBird abstract class, its VirtualDomesticBird abstract subclass and the Cockatiel concrete subclass. Finally, we will code the VirtualDomesticRabbit concrete class. While coding these classes, we will use Java 9 features that allow us to decide whether the subclasses can or cannot override specific members. All the virtual domestic cats must be able to talk, and therefore, we will override the talk method inherited from VirtualDomesticMammal to print the word that represents a cat meowing: "Meow". We also want to provide a method to print "Meow" a specific number of times. Hence, at this point, we realize that we can take advantage of the printSoundInWords method we had declared in the VirtualHorse class. We cannot access this instance method in the VirtualDomesticCat abstract class because it doesn’t inherit from VirtualHorse. Thus, we will move this method from the VirtualHorse class to its superclass: VirtualDomesticMammal. We will use the final keyword before the return type for the methods that we don’t want to be overridden in subclasses. When a method is marked as a final method, the subclasses cannot override the method and the Java 9 compiler shows an error if they try to do so. Not all the birds are able to fly in real-life. However, all our virtual birds are able to fly, and therefore, we will implement the inherited isAbleToFly abstract method as a final method that returns true. This way, we make sure that all the classes that inherit from the VirtualBird abstract class will always run this code for the isAbleToFly method and that they won’t be able to override it. The following UML diagram shows the members for the new abstract and concrete classes that we will code. In addition, the diagram shows the printSoundInWords method moved from the VirtualHorse abstract class to the VirtualDomesticMammal abstract class. First, we will create a new version of the VirtualDomesticMammal abstract class. We will add the printSoundInWords method that we have in the VirtualHorse abstract class and we will use the final keyword to indicate that we don’t want to allow subclasses to override this method. The following lines show the new code for the VirtualDomesticMammal class. public abstract class VirtualDomesticMammal extends VirtualMammal { public final String name; public String favoriteToy; public VirtualDomesticMammal( int age, boolean isPregnant, String name, String favoriteToy) { super(age, isPregnant); this.name = name; this.favoriteToy = favoriteToy; System.out.println("VirtualDomesticMammal created."); } public VirtualDomesticMammal( int age, String name, String favoriteToy) { this(age, false, name, favoriteToy); } protected final void printSoundInWords( String soundInWords, int times, VirtualDomesticMammal otherDomesticMammal, boolean isAngry) { String message = String.format("%s%s: %s%s", name, otherDomesticMammal == null ? "" : String.format(" to %s ", otherDomesticMammal.name), isAngry ? "Angry " : "", new String(new char[times]).replace(" ", soundInWords)); System.out.println(message); } public void talk() { System.out.println( String.format("%s: says something", name)); } } After we enter the previous lines, JShell will display the following messages: | update replaced class VirtualHorse which cannot be referenced until this error is corrected: | printSoundInWords(java.lang.String,int,VirtualDomesticMammal,boolean) in VirtualHorse cannot override printSoundInWords(java.lang.String,int,VirtualDomesticMammal,boolean) in VirtualDomesticMammal | overridden method is final | protected void printSoundInWords(String soundInWords, int times, | ^---------------------------------------------------------------... | update replaced class AmericanQuarterHorse which cannot be referenced until class VirtualHorse is declared | update replaced class ShireHorse which cannot be referenced until class VirtualHorse is declared | update replaced class Thoroughbred which cannot be referenced until class VirtualHorse is declared | update replaced variable american which cannot be referenced until class AmericanQuarterHorse is declared | update replaced variable zelda which cannot be referenced until class ShireHorse is declared | update replaced variable willow which cannot be referenced until class Thoroughbred is declared | update overwrote class VirtualDomesticMammal JShell indicates us that the VirtualHorse class and its subclasses cannot be referenced until we correct an error for this class. The class declares the printSoundInWords method and overrides the recently added method with the same name and arguments in the VirtualDomesticMammal. We used the final keyword in the new declaration to make sure that any subclass cannot override it, and therefore, the Java compiler generates the error message that JShell displays. Now, we will create a new version of the VirtualHorse abstract class. The following lines show the new version that removes the printSoundInWords method and uses the final keyword to make sure that many methods cannot be overridden by any of the subclasses. The declarations that use the final keyword to avoid the methods to be overridden are highlighted in the next lines. public abstract class VirtualHorse extends VirtualDomesticMammal { public VirtualHorse( int age, boolean isPregnant, String name, String favoriteToy) { super(age, isPregnant, name, favoriteToy); System.out.println("VirtualHorse created."); } public VirtualHorse( int age, String name, String favoriteToy) { this(age, false, name, favoriteToy); } public final boolean isAbleToFly() { return false; } public final boolean isRideable() { return true; } public final boolean isHervibore() { return true; } public final boolean isCarnivore() { return false; } public int getAverageNumberOfBabies() { return 1; } public abstract String getBreed(); public final void printBreed() { System.out.println(getBreed()); } public final void printNeigh( int times, VirtualDomesticMammal otherDomesticMammal, boolean isAngry) { printSoundInWords("Neigh ", times, otherDomesticMammal, isAngry); } public final void neigh() { printNeigh(1, null, false); } public final void neigh(int times) { printNeigh(times, null, false); } public final void neigh(int times, VirtualDomesticMammal otherDomesticMammal) { printNeigh(times, otherDomesticMammal, false); } public final void neigh(int times, VirtualDomesticMammal otherDomesticMammal, boolean isAngry) { printNeigh(times, otherDomesticMammal, isAngry); } public final void printNicker(int times, VirtualDomesticMammal otherDomesticMammal, boolean isAngry) { printSoundInWords("Nicker ", times, otherDomesticMammal, isAngry); } public final void nicker() { printNicker(1, null, false); } public final void nicker(int times) { printNicker(times, null, false); } public final void nicker(int times, VirtualDomesticMammal otherDomesticMammal) { printNicker(times, otherDomesticMammal, false); } public final void nicker(int times, VirtualDomesticMammal otherDomesticMammal, boolean isAngry) { printNicker(times, otherDomesticMammal, isAngry); } @Override public final void talk() { nicker(); } } After we enter the previous lines, JShell will display the following messages: | update replaced class AmericanQuarterHorse | update replaced class ShireHorse | update replaced class Thoroughbred | update replaced variable american, reset to null | update replaced variable zelda, reset to null | update replaced variable willow, reset to null | update overwrote class VirtualHorse We could replace the definition for the VirtualHorse class and the subclasses were also updated. It is important to know that the variables we declared in JShell that held references to instances of subclasses of VirtualHorse were set to null. Summary In this article, we created many abstract and concrete classes. We learned to control whether subclasses can or cannot override members, and whether classes can be subclassed. We worked with instances of many subclasses and we understood that objects can take many forms. We worked with many instances and their methods in JShell to understand how the classes and the methods that we coded are executed. We used methods that performed operations with instances of different classes that had a common superclass. Resources for Article: Further resources on this subject: Getting Started with Sorting Algorithms in Java [article]  Introduction to JavaScript [article]  Using Spring JMX within Java Applications [article]
Read more
  • 0
  • 0
  • 9938
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime
article-image-learn-data
Packt
09 Mar 2017
6 min read
Save for later

Learn from Data

Packt
09 Mar 2017
6 min read
In this article by Rushdi Shams, the author of the book Java Data Science Cookbook, we will cover recipes that use machine learning techniques to learn patterns from data. These patterns are at the centre of attention for at least three key machine-learning tasks: classification, regression, and clustering. Classification is the task of predicting a value from a nominal class. In contrast to classification, regression models attempt to predict a value from a numeric class. (For more resources related to this topic, see here.) Generating linear regression models Most of the linear regression modelling follows a general pattern—there will be many independent variables that will be collectively produce a result, which is a dependent variable. For instance, we can generate a regression model to predict the price of a house based on different attributes/features of a house (mostly numeric, real values) like its size in square feet, number of bedrooms, number of washrooms, importance of its location, and so on. In this recipe, we will use Weka’s Linear Regression classifier to generate a regression model. Getting ready In order to perform the recipes in this section, we will require the following: To download Weka, go to http://www.cs.waikato.ac.nz/ml/weka/downloading.html and you will find download options for Windows, Mac, and other operating systems such as Linux. Read through the options carefully and download the appropriate version. During the writing of this article, 3.9.0 was the latest version for the developers and as the author already had version 1.8 JVM installed in his 64-bit Windows machine, he has chosen to download a self-extracting executable for 64-bit Windows without a Java Virtual Machine (JVM)   After the download is complete, double-click on the executable file and follow on screen instructions. You need to install the full version of Weka. Once the installation is done, do not run the software. Instead, go to the directory where you have installed it and find the Java Archive File for Weka (weka.jar). Add this file in your Eclipse project as external library. If you need to download older versions of Weka for some reasons, all of them can be found at https://sourceforge.net/projects/weka/files/. Please note that there is a possibility that many of the methods from old versions are deprecated and therefore not supported any more. How to do it… In this recipe, the linear regression model we will be creating is based on the cpu.arff dataset that can be found in the data directory of the Weka installation directory. Our code will have two instance variables: the first variable will contain the data instances of cpu.arff file and the second variable will be our linear regression classifier. Instances cpu = null; LinearRegression lReg ; Next, we will be creating a method to load the ARFF file and assign the last attribute of the ARFF file as its class attribute. public void loadArff(String arffInput){ DataSource source = null; try { source = new DataSource(arffInput); cpu = source.getDataSet(); cpu.setClassIndex(cpu.numAttributes() - 1); } catch (Exception e1) { } } We will be creating a method to build the linear regression model. To do so, we simply need to call the buildClassifier() method of our linear regression variable. The model can directly be sent as parameter to System.out.println(). public void buildRegression(){ lReg = new LinearRegression(); try { lReg.buildClassifier(cpu); } catch (Exception e) { } System.out.println(lReg); } The complete code for the recipe is as follows: import weka.classifiers.functions.LinearRegression; import weka.core.Instances; import weka.core.converters.ConverterUtils.DataSource; public class WekaLinearRegressionTest { Instances cpu = null; LinearRegression lReg ; public void loadArff(String arffInput){ DataSource source = null; try { source = new DataSource(arffInput); cpu = source.getDataSet(); cpu.setClassIndex(cpu.numAttributes() - 1); } catch (Exception e1) { } } public void buildRegression(){ lReg = new LinearRegression(); try { lReg.buildClassifier(cpu); } catch (Exception e) { } System.out.println(lReg); } public static void main(String[] args) throws Exception{ WekaLinearRegressionTest test = new WekaLinearRegressionTest(); test.loadArff("path to the cpu.arff file"); test.buildRegression(); } } The output of the code is as follows: Linear Regression Model class = 0.0491 * MYCT + 0.0152 * MMIN + 0.0056 * MMAX + 0.6298 * CACH + 1.4599 * CHMAX + -56.075 Generating logistic regression models Weka has a class named Logistic that can be used for building and using a multinomial logistic regression model with a ridge estimator. Although original Logistic Regression does not deal with instance weights, the algorithm in Weka has been modified to handle the instance weights. In this recipe, we will use Weka to generate logistic regression model on iris dataset. How to do it… We will be generating a logistic regression model from the iris dataset that can be found in the data directory in the installed folder of Weka. Our code will have two instance variables: one will be containing the data instances of iris dataset and the other will be the logistic regression classifier.  Instances iris = null; Logistic logReg ; We will be using a method to load and read the dataset as well as assign its class attribute (the last attribute of iris.arff file): public void loadArff(String arffInput){ DataSource source = null; try { source = new DataSource(arffInput); iris = source.getDataSet(); iris.setClassIndex(iris.numAttributes() - 1); } catch (Exception e1) { } } Next, we will be creating the most important method of our recipe that builds a logistic regression classifier from the iris dataset: public void buildRegression(){ logReg = new Logistic(); try { logReg.buildClassifier(iris); } catch (Exception e) { } System.out.println(logReg); } The complete executable code for the recipe is as follows: import weka.classifiers.functions.Logistic; import weka.core.Instances; import weka.core.converters.ConverterUtils.DataSource; public class WekaLogisticRegressionTest { Instances iris = null; Logistic logReg ; public void loadArff(String arffInput){ DataSource source = null; try { source = new DataSource(arffInput); iris = source.getDataSet(); iris.setClassIndex(iris.numAttributes() - 1); } catch (Exception e1) { } } public void buildRegression(){ logReg = new Logistic(); try { logReg.buildClassifier(iris); } catch (Exception e) { } System.out.println(logReg); } public static void main(String[] args) throws Exception{ WekaLogisticRegressionTest test = new WekaLogisticRegressionTest(); test.loadArff("path to the iris.arff file "); test.buildRegression(); } } The output of the code is as follows: Logistic Regression with ridge parameter of 1.0E-8 Coefficients... Class Variable Iris-setosa Iris-versicolor =============================================== sepallength 21.8065 2.4652 sepalwidth 4.5648 6.6809 petallength -26.3083 -9.4293 petalwidth -43.887 -18.2859 Intercept 8.1743 42.637 Odds Ratios... Class Variable Iris-setosa Iris-versicolor =============================================== sepallength 2954196659.8892 11.7653 sepalwidth 96.0426 797.0304 petallength 0 0.0001 petalwidth 0 0 The interpretation of the results from the recipe is beyond the scope of this article. Interested readers are encouraged to see a Stack Overflow discussion here: http://stackoverflow.com/questions/19136213/how-to-interpret-weka-logistic-regression-output. Summary In this article, we have covered the recipes that use machine learning techniques to learn patterns from data. These patterns are at the centre of attention for at least three key machine-learning tasks: classification, regression, and clustering. Classification is the task of predicting a value from a nominal class. Resources for Article: Further resources on this subject: The Data Science Venn Diagram [article] Data Science with R [article] Data visualization [article]
Read more
  • 0
  • 0
  • 2620

article-image-what-d3js
Packt
08 Mar 2017
13 min read
Save for later

What is D3.js?

Packt
08 Mar 2017
13 min read
In this article by Ændrew H. Rininsland, the author of the book Learning D3.JS 4.x Data Visualization, we'll see what is new in D3 v4 and get started with Node and Git on the command line. (For more resources related to this topic, see here.) D3 (Data-Driven Documents), developed by Mike Bostock and the D3 community since 2011, is the successor to Bostock's earlier Protovis library. It allows pixel-perfect rendering of data by abstracting the calculation of things such as scales and axes into an easy-to-use domain-specific language (DSL), and uses idioms that should be immediately familiar to anyone with experience of using the popular jQuery JavaScript library. Much like jQuery, in D3, you operate on elements by selecting them and then manipulating via a chain of modifier functions. Especially within the context of data visualization, this declarative approach makes using it easier and more enjoyable than a lot of other tools out there. The official website, https://d3js.org/, features many great examples that show off the power of D3, but understanding them is tricky to start with. After finishing this book, you should be able to understand D3 well enough to figure out the examples, tweaking them to fit your needs. If you want to follow the development of D3 more closely, check out the source code hosted on GitHub at https://github.com/d3. The fine-grained control and its elegance make D3 one of the most powerful open source visualization libraries out there. This also means that it's not very suitable for simple jobs such as drawing a line chart or two-in that case you might want to use a library designed for charting. Many use D3 internally anyway. For a massive list, visit https://github.com/sorrycc/awesome-javascript#data-visualization. D3 is ultimately based around functional programming principles, which is currently experience a renaissance in the JavaScript community. This book really isn't about functional programming, but a lot of what we'll be doing will seem really familiar if you've ever used functional programming principles before. What happened to all the classes?! The second edition of this book contained quite a number of examples using the new class feature that is new in ES2015. The revised examples in this edition all use factory functions instead, and the class keyword never appears. Why is this, exactly? ES2015 classes are essentially just syntactic sugaring for factory functions. By this I mean that they ultimately compile down to that anyway. While classes can provide a certain level of organization to a complex piece of code, they ultimately hide what is going on underneath it all. Not only that, using OO paradigms like classes are effectively avoiding one of the most powerful and elegant aspects of JavaScript as a language, which is its focus on first-class functions and objects. Your code will be simpler and more elegant using functional paradigms than OO, and you'll find it less difficult to read examples in the D3 community, which almost never use classes. There are many, much more comprehensive arguments against using classes than I'm able to make here. For one of the best, please read Eric Elliott's excellent "The Two Pillars of JavaScript" pieces, at medium.com/javascript-scene/the-two-pillars-of-javascript-ee6f3281e7f3. What's new in D3 v4? One of the key changes to D3 since the last edition of this book is the release of version 4. Among its many changes, the most significant is a complete overhaul of the D3 namespace. This means that none of the examples in this book will work with D3 3.x, and the examples from the last book will not work with D3 4.x. This is quite possibly the cruelest thing Mr. Bostock could ever do to educational authors such as myself (I am totally joking here). Kidding aside, it also means many of the "block" examples in the D3 community are out-of-date and may appear rather odd if this book is your first encounter with the library. For this reason, it is very important to note the version of D3 an example uses - if it uses 3.x, it might be worth searching for a 4.x example just to prevent this cognitive dissonance. Related to this is how D3 has been broken up from a single library into many smaller libraries. There are two approaches you can take: you can use D3 as a single library in much the same way as version 3, or you can selectively use individual components of D3 in your project. This book takes the latter route, even if it does take a bit more effort - the benefit is primarily in that you'll have a better idea of how D3 is organized as a library and it reduces the size of the final bundle people who view your graphics will have to download. What's ES2017? One of the main changes to this book since the first edition is the emphasis on modern JavaScript; in this case, ES2017. Formerly known as ES6 (Harmony), it pushes the JavaScript language's features forward significantly, allowing for new usage patterns that simplify code readability and increase expressiveness. If you've written JavaScript before and the examples in this article look pretty confusing, it means you're probably familiar with the older, more common ES5 syntax. But don't sweat! It really doesn't take too long to get the hang of the new syntax, and I will try to explain the new language features as we encounter them. Although it might seem a somewhat steep learning curve at the start, by the end, you'll have improved your ability to write code quite substantially and will be on the cutting edge of contemporary JavaScript development. For a really good rundown of all the new toys you have with ES2016, check out this nice guide by the folks at Babel.js, which we will use extensively throughout this book: https://babeljs.io/docs/learn-es2015/. Before I go any further, let me clear some confusion about what ES2017 actually is. Initially, the ECMAScript (or ES for short) standards were incremented by cardinal numbers, for instance, ES4, ES5, ES6, and ES7. However, with ES6, they changed this so that a new standard is released every year in order to keep pace with modern development trends, and thus we refer to the year (2017) now. The big release was ES2015, which more or less maps to ES6. ES2016 was ratified in June 2016, and builds on the previous year's standard, while adding a few fixes and two new features. ES2017 is currently in the draft stage, which means proposals for new features are being considered and developed until it is ratified sometime in 2017. As a result of this book being written while these features are in draft, they may not actually make it into ES2017 and thus need to wait until a later standard to be officially added to the language. You don't really need to worry about any of this, however, because we use Babel.js to transpile everything down to ES5 anyway, so it runs the same in Node.js and in the browser. I try to refer to the relevant spec where a feature is added when I introduce it for the sake of accuracy (for instance, modules are an ES2015 feature), but when I refer to JavaScript, I mean all modern JavaScript, regardless of which ECMAScript spec it originated in. Getting started with Node and Git on the command line I will try not to be too opinionated in this book about which editor or operating system you should use to work through it (though I am using Atom on Mac OS X), but you are going to need a few prerequisites to start. The first is Node.js. Node is widely used for web development nowadays, and it's actually just JavaScript that can be run on the command line. Later on in this book, I'll show you how to write a server application in Node, but for now, let's just concentrate on getting it and npm (the brilliant and amazing package manager that Node uses) installed. If you're on Windows or Mac OS X without Homebrew, use the installer at https://nodejs.org/en/. If you're on Mac OS X and are using Homebrew, I would recommend installing "n" instead, which allows you to easily switch between versions of Node: $ brew install n $ n latest Regardless of how you do it, once you finish, verify by running the following lines: $ node --version $ npm --version If it displays the versions of node and npm it means you're good to go. I'm using 6.5.0 and 3.10.3, respectively, though yours might be slightly different-- the key is making sure node is at least version 6.0.0. If it says something similar to Command not found, double-check whether you've installed everything correctly, and verify that Node.js is in your $PATH environment variable. In the last edition of this book, we did a bunch of annoying stuff with Webpack and Babel and it was a bit too configuration-heavy to adequately explain. This time around we're using the lovely jspm for everything, which handles all the finicky annoying stuff for us. Install it now, using npm: npm install -g jspm@beta jspm-server This installs the most up-to-date beta version of jspm and the jspm development server. We don't need Webpack this time around because Rollup (which is used to bundle D3 itself) is used to bundle our projects, and jspm handles our Babel config for us. How helpful! Next, you'll want to clone the book's repository from GitHub. Change to your project directory and type this: $ git clone https://github.com/aendrew/learning-d3-v4 $ cd $ learning-d3-v4 This will clone the development environment and all the samples in the learning-d3-v4/ directory, as well as switch you into it. Another option is to fork the repository on GitHub and then clone your fork instead of mine as was just shown. This will allow you to easily publish your work on the cloud, enabling you to more easily seek support, display finished projects on GitHub Pages, and even submit suggestions and amendments to the parent project. This will help us improve this book for future editions. To do this, fork aendrew/learning-d3-v4 by clicking the "fork" button on GitHub, and replace aendrew in the preceding code snippet with your GitHub username. To switch between them, type the following command: $ git checkout <folder name> Replace <folder name> with the appropriate name of your folder. Stay at master for now though. To get back to it, type this line: $ git stash save && git checkout master The master branch is where you'll do a lot of your coding as you work through this book. It includes a prebuilt config.js file (used by jspm to manage dependencies), which we'll use to aid our development over the course of this book. We still need to install our dependencies, so let's do that now: $ npm install All of the source code that you'll be working on is in the lib/ folder. You'll notice it contains a just a main.js file; almost always, we'll be working in main.js, as index.html is just a minimal container to display our work in. This is it in its entirety, and it's the last time we'll look at any HTML in this book: <!DOCTYPE html> <html> <head> <meta charset="utf-8"> <title>Learning D3</title> </head> <body> <script src="jspm_packages/system.js"></script> <script src="config.js"></script> <script> System.import('lib/main.js'); </script> </body> </html> There's also an empty stylesheet in styles/index.css, which we'll add to in a bit. To get things rolling, start the development server by typing the following line: $ npm start This starts up the jspm development server, which will transform our new-fangled ES2017 JavaScript into backwards-compatible ES5, which can easily be loaded by most browsers. Instead of loading in a compiled bundle, we use SystemJS directly and load in main.js. When we're ready for production, we'll use jspm bundle to create an optimized JS payload. Now point Chrome (or whatever, I'm not fussy - so long as it's not Internet Explorer!) to localhost:8080 and fire up the developer console ( Ctrl +  Shift + J for Linux and Windows and option + command + J for Mac). You should see a blank website and a blank JavaScript console with a Command Prompt waiting for some code: A quick Chrome Developer Tools primer Chrome Developer Tools are indispensable to web development. Most modern browsers have something similar, but to keep this book shorter, we'll stick to just Chrome here for the sake of simplicity. Feel free to use a different browser. Firefox's Developer Edition is particularly nice, and - yeah yeah, I hear you guys at the back; Opera is good too! We are mostly going to use the Elements and Console tabs, Elements to inspect the DOM and Console to play with JavaScript code and look for any problems. The other six tabs come in handy for large projects: The Network tab will let you know how long files are taking to load and help you inspect the Ajax requests. The Profiles tab will help you profile JavaScript for performance. The Resources tab is good for inspecting client-side data. Timeline and Audits are useful when you have a global variable that is leaking memory and you're trying to work out exactly why your library is suddenly causing Chrome to use 500 MB of RAM. While I've used these in D3 development, they're probably more useful when building large web applications with frameworks such as React and Angular. The main one you want to focus on, however, is Sources, which shows all the source code files that have been pulled in by the webpage. Not only is this useful in determining whether your code is actually loading, it contains a fully functional JavaScript debugger, which few mortals dare to use. While explaining how to debug code is kind of boring and not at the level of this article, learning to use breakpoints instead of perpetually using console.log to figure out what your code is doing is a skill that will take you far in the years to come. For a good overview, visit https://developers.google.com/web/tools/chrome-devtools/debug/breakpoints/step-code?hl=en Most of what you'll do with Developer Tools, however, is look at the CSS inspector at the right-hand side of the Elements tab. It can tell you what CSS rules are impacting the styling of an element, which is very good for hunting rogue rules that are messing things up. You can also edit the CSS and immediately see the results, as follows: Summary In this article, you learned what D3 is and took a glance at the core philosophy behind how it works. You also set up your computer for prototyping of ideas and to play with visualizations. Resources for Article: Further resources on this subject: Learning D3.js Mapping [article] Integrating a D3.js visualization into a simple AngularJS application [article] Simple graphs with d3.js [article]
Read more
  • 0
  • 0
  • 3038

article-image-getting-started-kotlin
Brent Watson
08 Mar 2017
6 min read
Save for later

Getting Started with Kotlin

Brent Watson
08 Mar 2017
6 min read
Kotlin has been gaining more and more attention recently. With its 1.1 release, the language has proved both stable and usable.  In this article we’re going to cover a few different things: what is Kotlin and why it’s become popular, how to get started, where it’s most effective, and where to go next if you want to learn more. What is Kotlin?  Kotlin is a programing language.  Much like Scala or Groovy.It is a language that targets the JVM. It is developed by JetBrains, who make the IDEs that you know and love so much. The same diligence and finesse that is put into IntelliJ, PyCharm, Resharper, and their many other tools also shines through with Kotlin.  There are two secret ingredients that make Kotlin such a joy to use. First, since Kotlin compiles to Java bytecode, it is 100% interoperable with your existing Java code. Second, since JetBrains has control over both language and IDE, the tooling support is beyond excellent.  Here’s a quick example of some interoperability between Java and Kotlin:  Person.kt data class Person( val title: String?, // String type ends with “?”, so it is nullable. val name: String, // val's are immutable. “String” field, so non-null. var age: Int// var's are mutable. ) PersonDemo.java Person person = new Person("Mr.", "John Doe", 23); person.getAge(); // data classes provide getter and setters automatically. person.toString(); // ... in addition to toString, equals, hashCode, and copy methods.  The above example shows a “data class” (think Value Object / Data Transfer Object) in Kotlin being used by a Java class. Not only does the code work seamlessly, but also the JetBrains IDE allows you to navigate, auto-complete, debug, and refactor these together without skipping a beat. Continuing on with the above example, we’ll show how you might want to filter a list of Person objects using Kotlin.  Filters.kt fun demoFilter(people: List<Person>) : List<String> { return people .filter { it.age>35 } .map { it.name} } FiltersApplied.java List<String> names = new Filters().demoFilter(people); The simple addition of higher order functions (such as map, filter, reduce, sum, zip, and so on) in Kotlin greatly reduces the boilerplate code you usually have to write in Java programs when iterating through collections. The above filtering code in Java would require you to create a temporary list of results, iterate through people, perform an if check on age, add name to the list, then finally return the list. The above Kotlin version is not only 1 line, but it can actually be reduced even further since Kotlin supports Single Expression Functions: fun demoFilter(people: List<Person>) = people.filter{ it.age>35 }.map { it.name } // return type and return keyword replaced with “=”. Another Kotlin feature that greatly reduces the boilerplate found in Java code comes from its more advanced type system that treats nullable types differently from non-null types. The Java version of: if (people != null && !people.isEmpty() &&people.get(0).getAddress() != null &&people.get(0).getAddress().getStreet() != null) { return people.get(0).getAddress().getStreet(); }  Using Kotlin’s “?” operator that checks for null before continuing to evaluate an expression, we can simplify this statement drastically: return people?.firstOrNull()?.address?.street Not only does this reduce the verbosity inherent in Java, but it also helps to eliminate the “Billion dollar mistake” in Java: NullPointerExceptions.  The ability to mark a type as either nullable or not-null (Address? vs Address) means that the compiler can ensure null checks are properly done at compile time, not at runtime in production.  These are just a couple of examples of how Kotlin helps to both reduce the number of lines in your code and also reduce the often unneeded complexity. The more you use Kotlin, the more of these idioms you will find hard to start living without.  Android  More than any other industry, Kotlin has gained the most ground with Android developers.  Since Kotlin compiles to Java 6 bytecode, it can be used to build Android applications.  Since the interoperability between Kotlin and Java is so simple, it can be slowly added into a larger Java Android project over time (or any Java project for that matter). Given that Android developers do not yet have access to the nice features of Java 8, Kotlin provides these and gives Android developers many of the new language features they otherwise can only read about. The Kotlin team realized this early on and has provided many libraries and tools targeted at Android devs. Here are a couple short examples of how Kotlin can simplify working with the Android SDK: context.runOnUiThread{ ... } button?.setOnClickListener{ Toast.makeText(...) } Android developers will immediately understand the reduced boilerplate here.  If you are not an Android developer, trust me, this is much better.  If you are an Android developer I would suggest to you take a look at both Kotlin Android Extensions and Anko.  Extension Functions The last feature of Kotlin we will look at today is one of the most useful features.  This is the ability to write Extension Functions. Think of these as the ability to add your own method to any existing class.  Here is a quick example of extending Java’s String class to add a prepend method: fun String.prepend(str: String) = str + this Once imported, you can use this from any Kotlin code as though it were a method on String.  Consider the ability to extend any system or framework class.  Suddenly all of your utility methods become extension methods and your code starts to look intentionally designed instead of patched together. Maybe you’d like to add a dpToPx() method on your Android Context class. Or, maybe you’d like to add a subscribeOnNewObserveOnMain() method on your RxJava Observable class.  Well, now you can.  Next Steps If you’re interested in trying Kotlin, grab a copy of the IntelliJ IDEA IDE or Android Studio and install the Kotlin plugin to get started. There is also a very well built online IDE maintained by JetBrains along with a series of exercises called Kotin Koans. These can be found at http://try.kotlinlang.org/.  For more information on Kotlin, check out https://kotlinlang.org/ .  About the author  Brent Watson is an Android engineer in NYC. He is a developer, entrepreneur, author, TEDx speaker, and Kotlin advocate. He can be found at http://brentwatson.com/.
Read more
  • 0
  • 0
  • 41190

article-image-toy-bin
Packt
08 Mar 2017
8 min read
Save for later

Toy Bin

Packt
08 Mar 2017
8 min read
In this article by Steffen Damtoft Sommer and Jim Campagno, the author of the book Swift 3 Programming for Kids, we will walk you through what an array is. These are considered collection types in Swift and are very powerful. (For more resources related to this topic, see here.) Array An array stores values of the same type in an ordered list. The following is an example of an array: let list = ["Legos", "Dungeons and Dragons", "Gameboy", "Monopoly", "Rubix Cube"] This is an array (which you can think of as a list). Arrays are an ordered collections of values. We've created a constant called list of the [String] type and assigned it a value that represents our list of toys that we want to take with us. When describing arrays, you surround the type of the values that are being stored in the array by square brackets, [String]. Following is another array called numbers which contains four values being 5, 2, 9 and 22: let numbers = [5, 2, 9, 22] You would describe numbers as being an array which contains Int values which can be written as [Int]. We can confirm this by holding Alt and selecting the numbers constant to see what its type is in a playground file: What if we were to go back and confirm that list is an array of String values. Let's Alt click that constant to make sure: Similar to how we created instances of String and Int without providing any type information, we're doing the same thing here when we create list and numbers. Both list and numbers are created taking advantage of type inference. In creating our two arrays, we weren't explicit in providing any type information, we just created the array and Swift was able to figure out the type of the array for us. If we want to, though, we can provide type information, as follows: let colors: [String] = ["Red", "Orange", "Yellow"] colors is a constant of the [String] type. Now that we know how to create an array in swift, which can be compared to a list in real life, how can we actually use it? Can we access various items from the array? If so, how? Also, can we add new items to the list in case we forgot to include any items? Yes to all of these questions. Every element (or item) in an array is indexed. What does that mean? Well, you can think of being indexed as being numbered. Except that there's one big difference between how we humans number things and how arrays number things. Humans start from 1 when they create a list (just like we did when we created our preceding list). An array starts from 0. So, the first element in an array is considered to be at index 0:  Always remember that the first item in any array begins at 0. If we want to grab the first item from an array, we will do so as shown using what is referred to as subscript syntax: That 0 enclosed in two square brackets is what is known as subscript syntax. We are looking to access a certain element in the array at a certain index. In order to do that, we need to use subscript index, including the index of the item we want within square brackets. In doing so, it will return the value at the index. The value at the index in our preceding example is Legos. The = sign is also referred to as the assignment operator. So, we are assigning the Legos value to a new constant, called firstItem. If we were to print out firstItem, Legos should print to the console: print(firstItem) // Prints "Legos" If we want to grab the last item in this array, how do we do it? Well, there are five items in the array, so the last item should be at index 5, right? Wrong! What if we wrote the following code (which would be incorrect!): let lastItem = list[5] This would crash our application, which would be bad. When working with arrays, you need to ensure that you don't attempt to grab an item at a certain index which doesn't exist. There is no item in our array at index 5, which would make our application crash. When you run your app, you will receive the fatal error: Index out of range error. This is shown in the screenshot below: Let's correctly grab the last item in the array: let lastItem = list[4] print("I'm not as good as my sister, but I love solving the (lastItem)") // Prints "I'm not as good as my sister, but I love solving the Rubix Cube" Comments in code are made by writing text after //. None of this text will be considered code and will not be executed; it's a way for you to leave notes in your code. All of a sudden, you've now decided that you don't want to take the rubix cube as it's too difficult to play with. You were never able to solve it on Earth, so you start wondering why bringing it to the moon would help solve that problem. Bringing crayons is a much better idea. Let's swap out the rubix cube for crayons, but how do we do that? Using subscript syntax, we should be able to assign a new value to the array. Let's give it a shot: list[4] = "Crayons" This will not work! But why, can you take a guess? It's telling us that we cannot assign through subscript because list is a constant (we declared it using the let keyword). Ah! That's exactly how String and Int work. We decide whether or not we can change (mutate) the array based upon the let or var keyword just like every other type in Swift. Let's change the list array to a variable using the var keyword: var list = ["Legos", "Dungeons and Dragons", "Gameboy", "Monopoly", "Rubix Cube"] After doing so, we should be able to run this code without any problem: list[4] = "Crayons" If we decide to print the entire array, we will see the following print to console: ["Legos", "Dungeons and Dragons", "Gameboy", "Monopoly", "Crayons"] Note how Rubix Cube is no longer the value at index 4 (our last index); it has been changed to Crayons.  That's how we can mutate (or change) elements at certain indexes in our array. What if we want to add a new item to the array, how do we do that? We've just saw that trying to use subscript syntax with an index that doesn't exist in our array crashes our application, so we know we can't use that to add new items to our array. Apple (having created Swift) has created hundreds, if not thousands, of functions that are available in all the different types (like String, Int, and array). You can consider yourself an instance of a person (person being the name of the type). Being an instance of a person, you can run, eat, sleep, study, and exercise (among other things). These things are considered functions (or methods) that are available to you. Your pet rock doesn't have these functions available to it, why? This is because it's an instance of a rock and not an instance of a person. An instance of a rock doesn't have the same functions available to it that an instance of a person has. All that being said, an array can do things that a String and Int can't do. No, arrays can't run or eat, but they can append (or add) new items to themselves. An array can do this by calling the append(_:) method available to it. This method can be called on an instance of an array (like the preceding list) using what is known as dot syntax. In dot syntax, you write the name of the method immediately after the instance name, separated by a period (.), without any space: list.append("Play-Doh") Just as if we were to tell a person to run, we are telling the list to append. However, we can't just tell it to append, we have to pass an argument to the append function so that it can add it to the list. Our list array now looks like this: ["Legos", "Dungeons and Dragons", "Gameboy", "Monopoly", "Crayons", "Play-Doh"] Summary We have covered a lot of material important to understanding Swift and writing iOS apps here. Feel free to reread what you've read so far as well as write code in a playground file. Create your own arrays, add whatever items you want to it, and change values at certain indexes. Get used to the syntax of working with creating an arrays as well as appending new items. If you can feel comfortable up to this point with how arrays work, that's awesome, keep up the great work! Resources for Article:  Further resources on this subject: Introducing the Swift Programming Language [article] The Swift Programming Language [article] Functions in Swift [article]
Read more
  • 0
  • 0
  • 25411
article-image-numpy-array-object
Packt
03 Mar 2017
18 min read
Save for later

The NumPy array object

Packt
03 Mar 2017
18 min read
In this article by Armando Fandango author of the book Python Data Analysis - Second Edition, discuss how the NumPy provides a multidimensional array object called ndarray. NumPy arrays are typed arrays of fixed size. Python lists are heterogeneous and thus elements of a list may contain any object type, while NumPy arrays are homogenous and can contain object of only one type. An ndarray consists of two parts, which are as follows: The actual data that is stored in a contiguous block of memory The metadata describing the actual data Since the actual data is stored in a contiguous block of memory hence loading of the large data set as ndarray is affected by availability of large enough contiguous block of memory. Most of the array methods and functions in NumPy leave the actual data unaffected and only modify the metadata. Actually, we made a one-dimensional array that held a set of numbers. The ndarray can have more than a single dimension. (For more resources related to this topic, see here.) Advantages of NumPy arrays The NumPy array is, in general, homogeneous (there is a particular record array type that is heterogeneous)—the items in the array have to be of the same type. The advantage is that if we know that the items in an array are of the same type, it is easy to ascertain the storage size needed for the array. NumPy arrays can execute vectorized operations, processing a complete array, in contrast to Python lists, where you usually have to loop through the list and execute the operation on each element. NumPy arrays are indexed from 0, just like lists in Python. NumPy utilizes an optimized C API to make the array operations particularly quick. We will make an array with the arange() subroutine again. You will see snippets from Jupyter Notebook sessions where NumPy is already imported with instruction import numpy as np. Here's how to get the data type of an array: In: a = np.arange(5) In: a.dtype Out: dtype('int64') The data type of the array a is int64 (at least on my computer), but you may get int32 as the output if you are using 32-bit Python. In both the cases, we are dealing with integers (64 bit or 32 bit). Besides the data type of an array, it is crucial to know its shape. A vector is commonly used in mathematics but most of the time we need higher-dimensional objects. Let's find out the shape of the vector we produced a few minutes ago: In: a Out: array([0, 1, 2, 3, 4]) In: a.shape Out: (5,) As you can see, the vector has five components with values ranging from 0 to 4. The shape property of the array is a tuple; in this instance, a tuple of 1 element, which holds the length in each dimension. Creating a multidimensional array Now that we know how to create a vector, we are set to create a multidimensional NumPy array. After we produce the matrix, we will again need to show its, as demonstrated in the following code snippets: Create a multidimensional array as follows: In: m = np.array([np.arange(2), np.arange(2)]) In: m Out: array([[0, 1], [0, 1]]) We can show the array shape as follows: In: m.shape Out: (2, 2) We made a 2 x 2 array with the arange() subroutine. The array() function creates an array from an object that you pass to it. The object has to be an array, for example, a Python list. In the previous example, we passed a list of arrays. The object is the only required parameter of the array() function. NumPy functions tend to have a heap of optional arguments with predefined default options. Selecting NumPy array elements From time to time, we will wish to select a specific constituent of an array. We will take a look at how to do this, but to kick off, let's make a 2 x 2 matrix again: In: a = np.array([[1,2],[3,4]]) In: a Out: array([[1, 2], [3, 4]]) The matrix was made this time by giving the array() function a list of lists. We will now choose each item of the matrix one at a time, as shown in the following code snippet. Recall that the index numbers begin from 0: In: a[0,0] Out: 1 In: a[0,1] Out: 2 In: a[1,0] Out: 3 In: a[1,1] Out: 4 As you can see, choosing elements of an array is fairly simple. For the array a, we just employ the notation a[m,n], where m and n are the indices of the item in the array. Have a look at the following figure for your reference: NumPy numerical types Python has an integer type, a float type, and complex type; nonetheless, this is not sufficient for scientific calculations. In practice, we still demand more data types with varying precisions and, consequently, different storage sizes of the type. For this reason, NumPy has many more data types. The bulk of the NumPy mathematical types ends with a number. This number designates the count of bits related to the type. The following table (adapted from the NumPy user guide) presents an overview of NumPy numerical types: Type Description bool Boolean (True or False) stored as a bit inti Platform integer (normally either int32 or int64) int8 Byte (-128 to 127) int16 Integer (-32768 to 32767) int32 Integer (-2 ** 31 to 2 ** 31 -1) int64 Integer (-2 ** 63 to 2 ** 63 -1) uint8 Unsigned integer (0 to 255) uint16 Unsigned integer (0 to 65535) uint32 Unsigned integer (0 to 2 ** 32 - 1) uint64 Unsigned integer (0 to 2 ** 64 - 1) float16 Half precision float: sign bit, 5 bits exponent, and 10 bits mantissa float32 Single precision float: sign bit, 8 bits exponent, and 23 bits mantissa float64 or float Double precision float: sign bit, 11 bits exponent, and 52 bits mantissa complex64 Complex number, represented by two 32-bit floats (real and imaginary components) complex128 or complex Complex number, represented by two 64-bit floats (real and imaginary components) For each data type, there exists a matching conversion function: In: np.float64(42) Out: 42.0 In: np.int8(42.0) Out: 42 In: np.bool(42) Out: True In: np.bool(0) Out: False In: np.bool(42.0) Out: True In: np.float(True) Out: 1.0 In: np.float(False) Out: 0.0 Many functions have a data type argument, which is frequently optional: In: np.arange(7, dtype= np.uint16) Out: array([0, 1, 2, 3, 4, 5, 6], dtype=uint16) It is important to be aware that you are not allowed to change a complex number into an integer. Attempting to do that sparks off a TypeError: In: np.int(42.0 + 1.j) Traceback (most recent call last): <ipython-input-24-5c1cd108488d> in <module>() ----> 1 np.int(42.0 + 1.j) TypeError: can't convert complex to int The same goes for conversion of a complex number into a floating-point number. By the way, the j component is the imaginary coefficient of a complex number. Even so, you can convert a floating-point number to a complex number, for example, complex(1.0). The real and imaginary pieces of a complex number can be pulled out with the real() and imag() functions, respectively. Data type objects Data type objects are instances of the numpy.dtype class. Once again, arrays have a data type. To be exact, each element in a NumPy array has the same data type. The data type object can tell you the size of the data in bytes. The size in bytes is given by the itemsize property of the dtype class : In: a.dtype.itemsize Out: 8 Character codes Character codes are included for backward compatibility with Numeric. Numeric is the predecessor of NumPy. Its use is not recommended, but the code is supplied here because it pops up in various locations. You should use the dtype object instead. The following table lists several different data types and character codes related to them: Type Character code integer i Unsigned integer u Single precision float f Double precision float d bool b complex D string S unicode U Void V Take a look at the following code to produce an array of single precision floats: In: arange(7, dtype='f') Out: array([ 0., 1., 2., 3., 4., 5., 6.], dtype=float32) Likewise, the following code creates an array of complex numbers: In: arange(7, dtype='D') In: arange(7, dtype='D') Out: array([ 0.+0.j, 1.+0.j, 2.+0.j, 3.+0.j, 4.+0.j, 5.+0.j, 6.+0.j]) The dtype constructors We have a variety of means to create data types. Take the case of floating-point data (have a look at dtypeconstructors.py in this book's code bundle): We can use the general Python float, as shown in the following lines of code: In: np.dtype(float) Out: dtype('float64') We can specify a single precision float with a character code: In: np.dtype('f') Out: dtype('float32') We can use a double precision float with a character code: In: np.dtype('d') Out: dtype('float64') We can pass the dtype constructor a two-character code. The first character stands for the type; the second character is a number specifying the number of bytes in the type (the numbers 2, 4, and 8 correspond to floats of 16, 32, and 64 bits, respectively): In: np.dtype('f8') Out: dtype('float64') A (truncated) list of all the full data type codes can be found by applying sctypeDict.keys(): In: np.sctypeDict.keys() In: np.sctypeDict.keys() Out: dict_keys(['?', 0, 'byte', 'b', 1, 'ubyte', 'B', 2, 'short', 'h', 3, 'ushort', 'H', 4, 'i', 5, 'uint', 'I', 6, 'intp', 'p', 7, 'uintp', 'P', 8, 'long', 'l', 'L', 'longlong', 'q', 9, 'ulonglong', 'Q', 10, 'half', 'e', 23, 'f', 11, 'double', 'd', 12, 'longdouble', 'g', 13, 'cfloat', 'F', 14, 'cdouble', 'D', 15, 'clongdouble', 'G', 16, 'O', 17, 'S', 18, 'unicode', 'U', 19, 'void', 'V', 20, 'M', 21, 'm', 22, 'bool8', 'Bool', 'b1', 'float16', 'Float16', 'f2', 'float32', 'Float32', 'f4', 'float64', ' Float64', 'f8', 'float128', 'Float128', 'f16', 'complex64', 'Complex32', 'c8', 'complex128', 'Complex64', 'c16', 'complex256', 'Complex128', 'c32', 'object0', 'Object0', 'bytes0', 'Bytes0', 'str0', 'Str0', 'void0', 'Void0', 'datetime64', 'Datetime64', 'M8', 'timedelta64', 'Timedelta64', 'm8', 'int64', 'uint64', 'Int64', 'UInt64', 'i8', 'u8', 'int32', 'uint32', 'Int32', 'UInt32', 'i4', 'u4', 'int16', 'uint16', 'Int16', 'UInt16', 'i2', 'u2', 'int8', 'uint8', 'Int8', 'UInt8', 'i1', 'u1', 'complex_', 'int0', 'uint0', 'single', 'csingle', 'singlecomplex', 'float_', 'intc', 'uintc', 'int_', 'longfloat', 'clongfloat', 'longcomplex', 'bool_', 'unicode_', 'object_', 'bytes_', 'str_', 'string_', 'int', 'float', 'complex', 'bool', 'object', 'str', 'bytes', 'a']) The dtype attributes The dtype class has a number of useful properties. For instance, we can get information about the character code of a data type through the properties of dtype: In: t = np.dtype('Float64') In: t.char Out: 'd' The type attribute corresponds to the type of object of the array elements: In: t.type Out: numpy.float64 The str attribute of dtype gives a string representation of a data type. It begins with a character representing endianness, if appropriate, then a character code, succeeded by a number corresponding to the number of bytes that each array item needs. Endianness, here, entails the way bytes are ordered inside a 32- or 64-bit word. In the big-endian order, the most significant byte is stored first, indicated by >. In the little-endian order, the least significant byte is stored first, indicated by <, as exemplified in the following lines of code: In: t.str Out: '<f8' One-dimensional slicing and indexing Slicing of one-dimensional NumPy arrays works just like the slicing of standard Python lists. Let's define an array containing the numbers 0, 1, 2, and so on up to and including 8. We can select a part of the array from indexes 3 to 7, which extracts the elements of the arrays 3 through 6: In: a = np.arange(9) In: a[3:7] Out: array([3, 4, 5, 6]) We can choose elements from indexes the 0 to 7 with an increment of 2: In: a[:7:2] Out: array([0, 2, 4, 6]) Just as in Python, we can use negative indices and reverse the array: In: a[::-1] Out: array([8, 7, 6, 5, 4, 3, 2, 1, 0]) Manipulating array shapes We have already learned about the reshape() function. Another repeating chore is the flattening of arrays. Flattening in this setting entails transforming a multidimensional array into a one-dimensional array. Let us create an array b that we shall use for practicing the further examples: In: b = np.arange(24).reshape(2,3,4) In: print(b) Out: [[[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]], [[12, 13, 14, 15], [16, 17, 18, 19], [20, 21, 22, 23]]]) We can manipulate array shapes using the following functions: Ravel: We can accomplish this with the ravel() function as follows: In: b Out: array([[[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]], [[12, 13, 14, 15], [16, 17, 18, 19], [20, 21, 22, 23]]]) In: b.ravel() Out: array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23]) Flatten: The appropriately named function, flatten(), does the same as ravel(). However, flatten() always allocates new memory, whereas ravel gives back a view of the array. This means that we can directly manipulate the array as follows: In: b.flatten() Out: array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23]) Setting the shape with a tuple: Besides the reshape() function, we can also define the shape straightaway with a tuple, which is exhibited as follows: In: b.shape = (6,4) In: b Out: array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11], [12, 13, 14, 15], [16, 17, 18, 19], [20, 21, 22, 23]]) As you can understand, the preceding code alters the array immediately. Now, we have a 6 x 4 array. Transpose: In linear algebra, it is common to transpose matrices. Transposing is a way to transform data. For a two-dimensional table, transposing means that rows become columns and columns become rows. We can do this too by using the following code: In: b.transpose() Out: array([[ 0, 4, 8, 12, 16, 20], [ 1, 5, 9, 13, 17, 21], [ 2, 6, 10, 14, 18, 22], [ 3, 7, 11, 15, 19, 23]]) Resize: The resize() method works just like the reshape() method, In: b.resize((2,12)) In: b Out: array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], [12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23]]) Stacking arrays Arrays can be stacked horizontally, depth wise, or vertically. We can use, for this goal, the vstack(), dstack(), hstack(), column_stack(), row_stack(), and concatenate() functions. To start with, let's set up some arrays: In: a = np.arange(9).reshape(3,3) In: a Out: array([[0, 1, 2], [3, 4, 5], [6, 7, 8]]) In: b = 2 * a In: b Out: array([[ 0, 2, 4], [ 6, 8, 10], [12, 14, 16]]) As mentioned previously, we can stack arrays using the following techniques: Horizontal stacking: Beginning with horizontal stacking, we will shape a tuple of ndarrays and hand it to the hstack() function to stack the arrays. This is shown as follows: In: np.hstack((a, b)) Out: array([[ 0, 1, 2, 0, 2, 4], [ 3, 4, 5, 6, 8, 10], [ 6, 7, 8, 12, 14, 16]]) We can attain the same thing with the concatenate() function, which is shown as follows: In: np.concatenate((a, b), axis=1) Out: array([[ 0, 1, 2, 0, 2, 4], [ 3, 4, 5, 6, 8, 10], [ 6, 7, 8, 12, 14, 16]]) The following diagram depicts horizontal stacking: Vertical stacking: With vertical stacking, a tuple is formed again. This time it is given to the vstack() function to stack the arrays. This can be seen as follows: In: np.vstack((a, b)) Out: array([[ 0, 1, 2], [ 3, 4, 5], [ 6, 7, 8], [ 0, 2, 4], [ 6, 8, 10], [12, 14, 16]]) The concatenate() function gives the same outcome with the axis parameter fixed to 0. This is the default value for the axis parameter, as portrayed in the following code: In: np.concatenate((a, b), axis=0) Out: array([[ 0, 1, 2], [ 3, 4, 5], [ 6, 7, 8], [ 0, 2, 4], [ 6, 8, 10], [12, 14, 16]]) Refer to the following figure for vertical stacking: Depth stacking: To boot, there is the depth-wise stacking employing dstack() and a tuple, of course. This entails stacking a list of arrays along the third axis (depth). For example, we could stack 2D arrays of image data on top of each other as follows: In: np.dstack((a, b)) Out: array([[[ 0, 0], [ 1, 2], [ 2, 4]], [[ 3, 6], [ 4, 8], [ 5, 10]], [[ 6, 12], [ 7, 14], [ 8, 16]]]) Column stacking: The column_stack() function stacks 1D arrays column-wise. This is shown as follows: In: oned = np.arange(2) In: oned Out: array([0, 1]) In: twice_oned = 2 * oned In: twice_oned Out: array([0, 2]) In: np.column_stack((oned, twice_oned)) Out: array([[0, 0], [1, 2]]) 2D arrays are stacked the way the hstack() function stacks them, as demonstrated in the following lines of code: In: np.column_stack((a, b)) Out: array([[ 0, 1, 2, 0, 2, 4], [ 3, 4, 5, 6, 8, 10], [ 6, 7, 8, 12, 14, 16]]) In: np.column_stack((a, b)) == np.hstack((a, b)) Out: array([[ True, True, True, True, True, True], [ True, True, True, True, True, True], [ True, True, True, True, True, True]], dtype=bool) Yes, you guessed it right! We compared two arrays with the == operator. Row stacking: NumPy, naturally, also has a function that does row-wise stacking. It is named row_stack() and for 1D arrays, it just stacks the arrays in rows into a 2D array: In: np.row_stack((oned, twice_oned)) Out: array([[0, 1], [0, 2]]) The row_stack() function results for 2D arrays are equal to the vstack() function results: In: np.row_stack((a, b)) Out: array([[ 0, 1, 2], [ 3, 4, 5], [ 6, 7, 8], [ 0, 2, 4], [ 6, 8, 10], [12, 14, 16]]) In: np.row_stack((a,b)) == np.vstack((a, b)) Out: array([[ True, True, True], [ True, True, True], [ True, True, True], [ True, True, True], [ True, True, True], [ True, True, True]], dtype=bool) Splitting NumPy arrays Arrays can be split vertically, horizontally, or depth wise. The functions involved are hsplit(), vsplit(), dsplit(), and split(). We can split arrays either into arrays of the same shape or indicate the location after which the split should happen. Let's look at each of the functions in detail: Horizontal splitting: The following code splits a 3 x 3 array on its horizontal axis into three parts of the same size and shape (see splitting.py in this book's code bundle): In: a Out: array([[0, 1, 2], [3, 4, 5], [6, 7, 8]]) In: np.hsplit(a, 3) Out: [array([[0], [3], [6]]), array([[1], [4], [7]]), array([[2], [5], [8]])] Liken it with a call of the split() function, with an additional argument, axis=1: In: np.split(a, 3, axis=1) Out: [array([[0], [3], [6]]), array([[1], [4], [7]]), array([[2], [5], [8]])] Vertical splitting: vsplit() splits along the vertical axis: In: np.vsplit(a, 3) Out: [array([[0, 1, 2]]), array([[3, 4, 5]]), array([[6, 7, 8]])] The split() function, with axis=0, also splits along the vertical axis: In: np.split(a, 3, axis=0) Out: [array([[0, 1, 2]]), array([[3, 4, 5]]), array([[6, 7, 8]])] Depth-wise splitting: The dsplit() function, unsurprisingly, splits depth-wise. We will require an array of rank 3 to begin with: In: c = np.arange(27).reshape(3, 3, 3) In: c Out: array([[[ 0, 1, 2], [ 3, 4, 5], [ 6, 7, 8]], [[ 9, 10, 11], [12, 13, 14], [15, 16, 17]], [[18, 19, 20], [21, 22, 23], [24, 25, 26]]]) In: np.dsplit(c, 3) Out: [array([[[ 0], [ 3], [ 6]], [[ 9], [12], [15]], [[18], [21], [24]]]), array([[[ 1], [ 4], [ 7]], [[10], [13], [16]], [[19], [22], [25]]]), array([[[ 2], [ 5], [ 8]], [[11], [14], [17]], [[20], [23], [26]]])] NumPy array attributes Let's learn more about the NumPy array attributes with the help of an example. Let us create an array b that we shall use for practicing the further examples: In: b = np.arange(24).reshape(2, 12) In: b Out: array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], [12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23]]) Besides the shape and dtype attributes, ndarray has a number of other properties, as shown in the following list: ndim gives the number of dimensions, as shown in the following code snippet: In: b.ndim Out: 2 size holds the count of elements. This is shown as follows: In: b.size Out: 24 itemsize returns the count of bytes for each element in the array, as shown in the following code snippet: In: b.itemsize Out: 8 If you require the full count of bytes the array needs, you can have a look at nbytes. This is just a product of the itemsize and size properties: In: b.nbytes Out: 192 In: b.size * b.itemsize Out: 192 The T property has the same result as the transpose() function, which is shown as follows: In: b.resize(6,4) In: b Out: array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11], [12, 13, 14, 15], [16, 17, 18, 19], [20, 21, 22, 23]]) In: b.T Out: array([[ 0, 4, 8, 12, 16, 20], [ 1, 5, 9, 13, 17, 21], [ 2, 6, 10, 14, 18, 22], [ 3, 7, 11, 15, 19, 23]]) If the array has a rank of less than 2, we will just get a view of the array: In: b.ndim Out: 1 In: b.T Out: array([0, 1, 2, 3, 4]) Complex numbers in NumPy are represented by j. For instance, we can produce an array with complex numbers as follows: In: b = np.array([1.j + 1, 2.j + 3]) In: b Out: array([ 1.+1.j, 3.+2.j]) The real property returns to us the real part of the array, or the array itself if it only holds real numbers: In: b.real Out: array([ 1., 3.]) The imag property holds the imaginary part of the array: In: b.imag Out: array([ 1., 2.]) If the array holds complex numbers, then the data type will automatically be complex as well: In: b.dtype Out: dtype('complex128') In: b.dtype.str Out: '<c16' The flat property gives back a numpy.flatiter object. This is the only means to get a flatiter object; we do not have access to a flatiter constructor. The flat iterator enables us to loop through an array as if it were a flat array, as shown in the following code snippet: In: b = np.arange(4).reshape(2,2) In: b Out: array([[0, 1], [2, 3]]) In: f = b.flat In: f Out: <numpy.flatiter object at 0x103013e00> In: for item in f: print(item) Out: 0 1 2 3 It is possible to straightaway obtain an element with the flatiter object: In: b.flat[2] Out: 2 Also, you can obtain multiple elements as follows: In: b.flat[[1,3]] Out: array([1, 3]) The flat property can be set. Setting the value of the flat property leads to overwriting the values of the entire array: In: b.flat = 7 In: b Out: array([[7, 7], [7, 7]]) We can also obtain selected elements as follows: In: b.flat[[1,3]] = 1 In: b Out: array([[7, 1], [7, 1]]) The next diagram illustrates various properties of ndarray: Converting arrays We can convert a NumPy array to a Python list with the tolist() function . The following is a brief explanation: Convert to a list: In: b Out: array([ 1.+1.j, 3.+2.j]) In: b.tolist() Out: [(1+1j), (3+2j)] The astype() function transforms the array to an array of the specified data type: In: b Out: array([ 1.+1.j, 3.+2.j]) In: b.astype(int) /usr/local/lib/python3.5/site-packages/ipykernel/__main__.py:1: ComplexWarning: Casting complex values to real discards the imaginary part … Out: array([1, 3]) In: b.astype('complex') Out: array([ 1.+1.j, 3.+2.j]) We are dropping off the imaginary part when casting from the complex type to int. The astype() function takes the name of a data type as a string too. The preceding code won't display a warning this time because we used the right data type. Summary In this article, we found out a heap about the NumPy basics: data types and arrays. Arrays have various properties that describe them. You learned that one of these properties is the data type, which, in NumPy, is represented by a full-fledged object. NumPy arrays can be sliced and indexed in an effective way, compared to standard Python lists. NumPy arrays have the extra ability to work with multiple dimensions. The shape of an array can be modified in multiple ways, such as stacking, resizing, reshaping, and splitting. Resources for Article: Further resources on this subject: Big Data Analytics [article] Python Data Science Up and Running [article] R and its Diverse Possibilities [article]
Read more
  • 0
  • 0
  • 40385

article-image-data-pipelines
Packt
03 Mar 2017
17 min read
Save for later

Data Pipelines

Packt
03 Mar 2017
17 min read
In this article by Andrew Morgan, Antoine Amend, Matthew Hallett, David George, the author of the book Mastering Spark for Data Science, readers will learn how to construct a content registerand use it to track all input loaded to the system, and to deliver metrics on ingestion pipelines, so that these flows can be reliably run as an automated, lights-out process. Readers will learn how to construct a content registerand use it to track all input loaded to the system, and to deliver metrics on ingestion pipelines, so that these flows can be reliably run as an automated, lights-out process. In this article we will cover the following topics: Welcome the GDELT Dataset Data Pipelines Universal Ingestion Framework Real-time monitoring for new data Receiving Streaming Data via Kafka Registering new content and vaulting for tracking purposes Visualization of content metrics in Kibana - to monitor ingestion processes & data health   (For more resources related to this topic, see here.) Data Pipelines Even with the most basic of analytics, we always require some data. In fact, finding the right data is probably among the hardest problems to solve in data science (but that’s a whole topic for another book!). We have already seen that the way in which we obtain our data can be as simple or complicated as is needed. In practice, we can break this decision into two distinct areas: Ad-hoc and scheduled. Ad-hoc data acquisition is the most common method during prototyping and small scale analytics as it usually doesn’t require any additional software to implement - the user requires some data and simply downloads it from source as and when required. This method is often a matter of clicking on a web link and storing the data somewhere convenient, although the data may still need to be versioned and secure. Scheduled data acquisition is used in more controlled environments for large scale and production analytics, there is also an excellent case for ingesting a dataset into a data lake for possible future use. With Internet of Things (IoT) on the increase, huge volumes of data are being produced, in many cases if the data is not ingested now it is lost forever. Much of this data may not have an immediate or apparent use today, but could do in the future; so the mind-set is to gather all of the data in case it is needed and delete it later when sure it is not. It’s clear we need a flexible approach to data acquisition that supports a variety of procurement options. Universal Ingestion Framework There are many ways to approach data acquisition ranging from home grown bash scripts through to high-end commercial tools. The aim of this section is to introduce a highly flexible framework that we can use for small scale data ingest, and then grow as our requirements change - all the way through to a full corporately managed workflow if needed - that framework will be build using Apache NiFi. NiFi enables us to build large-scale integrated data pipelines that move data around the planet. In addition, it’s also incredibly flexible and easy to build simple pipelines - usually quicker even than using Bash or any other traditional scripting method. If an ad-hoc approach is taken to source the same dataset on a number of occasions, then some serious thought should be given as to whether it falls into the scheduled category, or at least whether a more robust storage and versioning setup should be introduced. We have chosen to use Apache NiFi as it offers a solution that provides the ability to create many, varied complexity pipelines that can be scaled to truly Big Data and IoT levels, and it also provides a great drag & drop interface (using what’s known as flow-based programming[1]). With patterns, templates and modules for workflow production, it automatically takes care of many of the complex features that traditionally plague developers such as multi-threading, connection management and scalable processing. For our purposes it will enable us to quickly build simple pipelines for prototyping, and scale these to full production where required. It’s pretty well documented and easy to get running https://nifi.apache.org/download.html, it runs in a browser and looks like this: https://en.wikipedia.org/wiki/Flow-based_programming We leave the installation of NiFi as an exercise for the reader - which we would encourage you to do - as we will be using it in the following section. Introducing the GDELT News Stream Hopefully, we have NiFi up and running now and can start to ingest some data. So let’s start with some global news media data from GDELT. Here’s our brief, taken from the GDELT website http://blog.gdeltproject.org/gdelt-2-0-our-global-world-in-realtime/: “Within 15 minutes of GDELT monitoring a news report breaking anywhere the world, it has translated it, processed it to identify all events, counts, quotes, people, organizations, locations, themes, emotions, relevant imagery, video, and embedded social media posts, placed it into global context, and made all of this available via a live open metadata firehose enabling open research on the planet itself. [As] the single largest deployment in the world of sentiment analysis, we hope that by bringing together so many emotional and thematic dimensions crossing so many languages and disciplines, and applying all of it in realtime to breaking news from across the planet, that this will spur an entirely new era in how we think about emotion and the ways in which it can help us better understand how we contextualize, interpret, respond to, and understand global events.” In order to start consuming this open data, we’ll need to hook into that metadata firehose and ingest the news streams onto our platform.  How do we do this?  Let’s start by finding out what data is available. Discover GDELT Real-time GDELT publish a list of the latest files on their website - this list is updated every 15 minutes. In NiFi, we can setup a dataflow that will poll the GDELT website, source a file from this list and save it to HDFS so we can use it later. Inside the NiFi dataflow designer, create a HTTP connector by dragging a processor onto the canvas and selecting GetHTTP. To configure this processor, you’ll need to enter the URL of the file list as: http://data.gdeltproject.org/gdeltv2/lastupdate.txt And also provide a temporary filename for the file list you will download. In the example below, we’ve used the NiFi’s expression language to generate a universally unique key so that files are not overwritten (UUID()). It’s worth noting that with this type of processor (GetHTTP), NiFi supports a number of scheduling and timing options for the polling and retrieval. For now, we’re just going to use the default options and let NiFi manage the polling intervals for us. An example of latest file list from GDELT is shown below. Next, we will parse the URL of the GKG news stream so that we can fetch it in a moment. Create a Regular Expression parser by dragging a processor onto the canvas and selecting ExtractText. Now position the new processor underneath the existing one and drag a line from the top processor to the bottom one. Finish by selecting the success relationship in the connection dialog that pops up. This is shown in the example below. Next, let’s configure the ExtractText processor to use a regular expression that matches only the relevant text of the file list, for example: ([^ ]*gkg.csv.*) From this regular expression, NiFi will create a new property (in this case, called url) associated with the flow design, which will take on a new value as each particular instance goes through the flow. It can even be configured to support multiple threads. Again, this is example is shown below. It’s worth noting here that while this is a fairly specific example, the technique is deliberately general purpose and can be used in many situations. Our First GDELT Feed Now that we have the URL of the GKG feed, we fetch it by configuring an InvokeHTTP processor to use the url property we previously created as it’s remote endpoint, and dragging the line as before. All that remains is to decompress the zipped content with a UnpackContent processor (using the basic zip format) and save to HDFS using a PutHDFS processor, like so: Improving with Publish and Subscribe So far, this flow looks very “point-to-point”, meaning that if we were to introduce a new consumer of data, for example, a Spark-streaming job, the flow must be changed. For example, the flow design might have to change to look like this: If we add yet another, the flow must change again. In fact, each time we add a new consumer, the flow gets a little more complicated, particularly when all the error handling is added. This is clearly not always desirable, as introducing or removing consumers (or producers) of data, might be something we want to do often, even frequently. Plus, it’s also a good idea to try to keep your flows as simple and reusable as possible. Therefore, for a more flexible pattern, instead of writing directly to HDFS, we can publish to Apache Kafka. This gives us the ability to add and remove consumers at any time without changing the data ingestion pipeline. We can also still write to HDFS from Kafka if needed, possibly even by designing a separate NiFi flow, or connect directly to Kafka using Spark-streaming. To do this, we create a Kafka writer by dragging a processor onto the canvas and selecting PutKafka. We now have a simple flow that continuously polls for an available file list, routinely retrieving the latest copy of a new stream over the web as it becomes available, decompressing the content and streaming it record-by-record into Kafka, a durable, fault-tolerant, distributed message queue, for processing by spark-streaming or storage in HDFS. And what’s more, without writing a single line of bash! Content Registry We have seen in this article that data ingestion is an area that is often overlooked, and that its importance cannot be underestimated. At this point we have a pipeline that enables us to ingest data from a source, schedule that ingest and direct the data to our repository of choice. But the story does not end there. Now we have the data, we need to fulfil our data management responsibilities. Enter the content registry. We’re going to build an index of metadata related to that data we have ingested. The data itself will still be directed to storage (HDFS, in our example) but, in addition, we will store metadata about the data, so that we can track what we’ve received and understand basic information about it, such as, when we received it, where it came from, how big it is, what type it is, etc. Choices and More Choices The choice of which technology we use to store this metadata is, as we have seen, one based upon knowledge and experience. For metadata indexing, we will require at least the following attributes: Easily searchable Scalable Parallel write ability Redundancy There are many ways to meet these requirements, for example we could write the metadata to Parquet, store in HDFS and search using Spark SQL. However, here we will use Elasticsearch as it meets the requirements a little better, most notably because it facilitates low latency queries of our metadata over a REST API - very useful for creating dashboards. In fact, Elasticsearch has the advantage of integrating directly with Kibana, meaning it can quickly produce rich visualizations of our content registry. For this reason, we will proceed with Elasticsearch in mind. Going with the Flow Using our current NiFi pipeline flow, let’s fork the output from “Fetch GKG files from URL” to add an additional set of steps to allow us to capture and store this metadata in Elasticsearch. These are: Replace the flow content with our metadata model Capture the metadata Store directly in Elasticsearch Here’s what this looks like in NiFi: Metadata Model So, the first step here is to define our metadata model. And there are many areas we could consider, but let’s select a set that helps tackle a few key points from earlier discussions. This will provide a good basis upon which further data can be added in the future, if required. So, let’s keep it simple and use the following three attributes: File size Date ingested File name These will provide basic registration of received files. Next, inside the NiFi flow, we’ll need to replace the actual data content with this new metadata model. An easy way to do this, is to create a JSON template file from our model. We’ll save it to local disk and use it inside a FetchFile processor to replace the flow’s content with this skeleton object. This template will look something like: { "FileSize": SIZE, "FileName": "FILENAME", "IngestedDate": "DATE" } Note the use of placeholder names (SIZE, FILENAME, DATE) in place of the attribute values. These will be substituted, one-by-one, by a sequence of ReplaceText processors, that swap the placeholder names for an appropriate flow attribute using regular expressions provided by the NiFi Expression Language, for example DATE becomes ${now()}. The last step is to output the new metadata payload to Elasticsearch. Once again, NiFi comes ready with a processor for this; the PutElasticsearch processor. An example metadata entry in Elasticsearch: { "_index": "gkg", "_type": "files", "_id": "AVZHCvGIV6x-JwdgvCzW", "_score": 1, "source": { "FileSize": 11279827, "FileName": "20150218233000.gkg.csv.zip", "IngestedDate": "2016-08-01T17:43:00+01:00" } } Now that we have added the ability to collect and interrogate metadata, we now have access to more statistics that can be used for analysis. This includes: Time based analysis e.g. file sizes over time Loss of data, for example are there data “holes” in the timeline? If there is a particular analytic that is required, the NIFI metadata component can be adjusted to provide the relevant data points. Indeed, an analytic could be built to look at historical data and update the index accordingly if the metadata does not exist in current data. Kibana Dashboard We have mentioned Kibana a number of times in this article, now that we have an index of metadata in Elasticsearch, we can use the tool to visualize some analytics. The purpose of this brief section is to demonstrate that we can immediately start to model and visualize our data. In this simple example we have completed the following steps: Added the Elasticsearch index for our GDELT metadata to the “Settings” tab Selected “file size” under the “Discover” tab Selected Visualize for “file size” Changed the Aggregation field to “Range” Entered values for the ranges The resultant graph displays the file size distribution: From here we are free to create new visualizations or even a fully featured dashboard that can be used to monitor the status of our file ingest. By increasing the variety of metadata written to Elasticsearch from NiFi, we can make more fields available in Kibana and even start our data science journey right here with some ingest based actionable insights. Now that we have a fully-functioning data pipeline delivering us real-time feeds of data, how do we ensure data quality of the payload we are receiving?  Let’s take a look at the options. Quality Assurance With an initial data ingestion capability implemented, and data streaming onto your platform, you will need to decide how much quality assurance is required at the front door. It’s perfectly viable to start with no initial quality controls and build them up over time (retrospectively scanning historical data as time and resources allow). However, it may be prudent to install a basic level of verification to begin with. For example, basic checks such as file integrity, parity checking, completeness, checksums, type checking, field counting, overdue files, security field pre-population, denormalization, etc. You should take care that your up-front checks do not take too long. Depending on the intensity of your examinations and the size of your data, it’s not uncommon to encounter a situation where there is not enough time to perform all processing before the next dataset arrives. You will always need to monitor your cluster resources and calculate the most efficient use of time. Here are some examples of the type of rough capacity planning calculation you can perform: Example 1: Basic Quality Checking, No Contending Users Data is ingested every 15 minutes and takes 1 minute to pull from the source Quality checking (integrity, field count, field pre-population) takes 4 minutes There are no other users on the compute cluster There are 10 minutes of resources available for other tasks. As there are no other users on the cluster, this is satisfactory - no action needs to be taken. Example 2: Advanced Quality Checking, No Contending Users Data is ingested every 15 minutes and takes 1 minute to pull from the source Quality checking (integrity, field count, field pre-population, denormalization, sub dataset building) takes 13 minutes There are no other users on the compute cluster There is only 1 minute of resource available for other tasks. We probably need to consider, either: Configuring a resource scheduling policy Reducing the amount of data ingested Reducing the amount of processing we undertake Adding additional compute resources to the cluster Example 3: Basic Quality Checking, 50% Utility Due to Contending Users Data is ingested every 15 minutes and takes 1 minute to pull from the source Quality checking (integrity, field count, field pre-population) takes 4 minutes (100% utility) There are other users on the compute cluster There are 6 minutes of resources available for other tasks (15 - 1 - (4 * (100 / 50))). Since there are other users there is a danger that, at least some of the time, we will not be able to complete our processing and a backlog of jobs will occur. When you run into timing issues, you have a number of options available to you in order to circumvent any backlog: Negotiating sole use of the resources at certain times Configuring a resource scheduling policy, including: YARN Fair Scheduler: allows you to define queues with differing priorities and target your Spark jobs by setting the spark.yarn.queue property on start-up so your job always takes precedence Dynamicandr Resource Allocation: allows concurrently running jobs to automatically scale to match their utilization Spark Scheduler Pool: allows you to define queues when sharing a SparkContext using multithreading model, and target your Spark job by setting the spark.scheduler.pool property per execution thread so your thread takes precedence Running processing jobs overnight when the cluster is quiet In any case, you will eventually get a good idea of how the various parts to your jobs perform and will then be in a position to calculate what changes could be made to improve efficiency. There’s always the option of throwing more resources at the problem, especially when using a cloud provider, but we would certainly encourage the intelligent use of existing resources - this is far more scalable, cheaper and builds data expertise. Summary In this article we walked through the full setup of an Apache NiFi GDELT ingest pipeline, complete with metadata forks and a brief introduction to visualizing the resultant data. This section is particularly important as GDELT is used extensively throughout the book and the NiFi method is a highly effective way to source data in a scalable and modular way. Resources for Article: Further resources on this subject: Integration with Continuous Delivery [article] Amazon Web Services [article] AWS Fundamentals [article]
Read more
  • 0
  • 1
  • 16359

article-image-getting-started-salesforce-lightning-experience
Packt
02 Mar 2017
8 min read
Save for later

Getting Started with Salesforce Lightning Experience

Packt
02 Mar 2017
8 min read
In this article by Rakesh Gupta, author of the book Mastering Salesforce CRM Administration, we will start with the overview of the Salesforce Lightning Experience and its benefits, which takes the discussion forward to the various business use cases where it can boost the sales representatives’ productivity. We will also discuss different Sales Cloud and Service Cloud editions offered by Salesforce. (For more resources related to this topic, see here.) Getting started with Lightning Experience Lightning Experience is a new generation productive user interface designed to help your sales team to close more deals and sell quicker and smarter. The upswing in mobile usages is influencing the way people work. Sales representatives are now using mobile to research potential customers, get the details of nearby customer offices, socially connect with their customers, and even more. That's why Salesforce synced the desktop Lightning Experience with mobile Salesforce1. Salesforce Lighting Editions With its Summer'16 release, Salesforce announced the Lightning Editions of Sales Cloud and Service Cloud. The Lightning Editions are a completely reimagined packaging of Sales Cloud and Service Cloud, which offer additional functionality to their customers and increased productivity with a relatively small increase in cost. Sales Cloud Lightning Editions Sales Cloud is a product designed to automate your sales process. By implementing this, an organization can boost its sales process. It includes Campaign,Lead, Account, Contact,OpportunityReport, Dashboard, and many other features as well,. Salesforce offers various Sales Cloud editions, and as per business needs, an organization can buy any of these different editions, which are shown in the following image: Let’s take a closer look at the three Sales Cloud Lightning Editions: Lightning Professional: This edition is for small and medium enterprises (SMEs). It is designed for business needs where a full-featured CRM functionality is required. It provides the CRM functionality for marketing, sales, and service automation. Professional Edition is a perfect fit for small- to mid-sized businesses. After the Summer'16 release, in this edition, you can create a limited number of processes, record types, roles, profiles, and permission sets. For each Professional Edition license, organizations have to pay USD 75 per month. Lightning Enterprise: This edition is for businesses with large and complex business requirements. It includes all the features available in the Professional Edition, plus it provides advanced customization capabilities to automate business processes and web service API access for integration with other systems. Enterprise Editions also include processes, workflow, approval process, profile, page layout, and custom app development. In addition, organizations also get the Salesforce Identity feature with this edition. For each Enterprise Edition license, organizations have to pay USD 150 per month. Lightning Unlimited: This edition includes all Salesforce.com features for an entire enterprise. It provides all the features of Enterprise Edition and a new level of Platform flexibility for managing and sharing all of their information on demand. The key features of Salesforce.com Unlimited Edition (in addition to Enterprise features) are premier support, full mobile access, and increased storage limits. It also includes Work.com, Service Cloud, knowledge base, live agent chat, multiple sandboxes and unlimited custom app development. While purchasing Salesforce.com licenses, organizations have to negotiate with Salesforce to get the maximum number of sandboxes. To know more about these license types, please visit the Salesforce website at https://www.salesforce.com/sales-cloud/pricing/. Service Cloud Lightning Editions Service Cloud helps your organization to streamline the customer service process. Users can access it anytime, anywhere, and from any device. It will help your organization to close a case faster. Service agents can connect with customers through the agent console, meaning agents can interact with customers through multiple channels. Service Cloud includes case management, computer telephony integration (CTI), Service Cloud console, knowledge base, Salesforce communities, Salesforce Private AppExchange, premier+ success plan, report, and dashboards, with many other analytics features. The various Service Cloud Lightning Editions are shown in the following image: Let’s take a closer look at the three Service Cloud Lightning Edition: Lightning Professional: This edition is for SMEs. It provides CRM functionality for customer support through various channels. It is a perfect fit for small- to mid-sized businesses. It includes features, such as case management, CTI integration, mobile access, solution management, content library, reports, and analytics, along with Sales features such as opportunity management and forecasting. After the Summer'16 release, in this edition, you can create a limited number of processes, record types, roles, profiles, and permission sets. For each Professional Edition license, organizations have to pay USD 75 per month. Lightning Enterprise: This edition is for businesses with large and complex business requirements. It includes all the features available in the Professional edition, plus it provides advanced customization capabilities to automate business processes and web service API access for integration with other systems. It also includes Service console, Service contract and entitlement management, workflow and approval process, web chat, offline access, and knowledge base. Organizations get Salesforce Identity feature with this edition. For each Enterprise Edition license, organizations have to pay USD 150 per month. Lightning Unlimited: This edition includes all Salesforce.com features for an entire enterprise. It provides all the features of Enterprise Edition and a new level of platform flexibility for managing and sharing all of their information on demand. The key features of Salesforce.com Unlimited edition (in addition to the Enterprise features) are premier support, full mobile access, unlimited custom apps, and increased storage limits. It also includes Work.com, Service Cloud, knowledge base, live agent chat, multiple sandboxes, and unlimited custom app development. While purchasing the licenses, organizations have to negotiate with Salesforce to get the maximum number of sandboxes. To know more about these license types, please visit the Salesforce website at https://www.salesforce.com/service-cloud/pricing/. Creating a Salesforce developer account To get started with the given topics in this, it is recommended to use a Salesforce developer account. Using Salesforce production instance is not essential for practicing. If you currently do not have your developer account, you can create a new Salesforce developer account. The Salesforce developer account is completely free and can be used to practice newly learned concepts, but you cannot use this for commercial purposes. To create a Salesforce developer account follow these steps: Visit the website http://developer.force.com/. Click on the Sign Up button. It will open a sign up page; fill it out to create one for you. The signup page will look like the following screenshot: Once you register for the developer account, Salesforce.com will send you login details on the e-mail ID you have provided during the registration. By following the instructions in the e-mail, you are ready to get started with Salesforce. Enabling the Lightning Experience for Users Once you are ready to roll out the Lightning Experience for your users, navigate to the Lightning Setup page, which is available in Setup, by clicking Lightning Experience. The slider button at the bottom of the Lightning Setup page, shown in the following screenshot, enables Lightning Experience for your organization:. Flip that switch, and Lightning Experience will be enabled for your Salesforce organization. The Lightning Experience is now enabled for all standard profiles by default. Granting permission to users through Profile Depending on the number of users for a rollout, you have to decide how to enable the Lightning Experience for them. If you are planning to do a mass rollout, it is better to update Profiles. Business scenario:Helina Jolly is working as a system administrator in Universal Container. She has received a requirement to enable Lightning Experience for a custom profile, Training User. First of all, create a custom profile for the license type, Salesforce, and give it the name, Training User. To enable the Lightning Experience for a custom profile, follow these instructions: In the Lightning Experience user interface, click on page-level action-menu | ADMINISTRATION | Users | Profiles, and then select the Training User profile, as shown in the following screenshot: Then, navigate to theSystem Permission section, and select the Lightning Experience User checkbox. Granting permission to users through permission sets If you want to enable the Lightning Experience for a small group of users, or if you are not sure whether you will keep the Lightning Experience on for a group of users, consider using permission sets. Permission sets are mainly a collection of settings and permissions that give the users access to numerous tools and functions within Salesforce. By creating a permission set, you can grant the Lightning Experience user permission to the users in your organization. Switching between Lightning Experience and Salesforce Classic If you have enabled Lightning Experience for your users, they can use the switcher to switch back and forth between Lightning Experience and Salesforce Classic. The switcher is very smart. Every time a user switches, it remembers that user experience as their new default preference. So, if a user switches to Lightning Experience, it is now their default user experience until they switch back to Salesforce Classic. If you want to restrict your users to switch back to Salesforce Classic, you have to develop an Apex trigger or process with flow. When the UserPreferencesLightningExperiencePreferred field on the user object is true, then it redirects the user to the Lightning Experience interface. Summary In this article, we covered the overview of Salesforce Lightning Experience. We also covered various Salesforce editions available in the market. We also went through standard and custom objects. Resources for Article: Further resources on this subject: Configuration in Salesforce CRM [article] Salesforce CRM Functions [article] Introduction to vtiger CRM [article]
Read more
  • 0
  • 0
  • 20036
article-image-functions-arduino
Packt
02 Mar 2017
13 min read
Save for later

Functions with Arduino

Packt
02 Mar 2017
13 min read
In this article by Syed Omar Faruk Towaha, the author of the book Learning C for Arduino, we will learn about functions and file handling with Arduino. We learned about loops and conditions. Let’s begin out journey into Functions with Arduino. (For more resources related to this topic, see here.) Functions Do you know how to make instant coffee? Don’t worry; I know. You will need some water, instant coffee, sugar, and milk or creamer. Remember, we want to drink coffee, but we are doing something that makes coffee. This procedure can be defined as a function of coffee making. Let’s finish making coffee now. The steps can be written as follows: Boil some water. Put some coffee inside a mug. Add some sugar. Pour in boiled water. Add some milk. And finally, your coffee is ready! Let’s write a pseudo code for this function: function coffee (water, coffee, sugar, milk) { add coffee beans; add sugar; pour boiled water; add milk; return coffee; } In our pseudo code we have four items (we would call them parameters) to make coffee. We did something with our ingredients, and finally, we got our coffee, right? Now, if anybody wants to get coffee, he/she will have to do the same function (in programming, we will call it calling a function) again. Let’s move into the types of functions. Types of functions A function returns something, or a value, which is called the return value. The return values can be of several types, some of which are listed here: Integers Float Double Character Void Boolean Another term, argument, refers to something that is passed to the function and calculated or used inside the function. In our previous example, the ingredients passed to our coffee-making process can be called arguments (sugar, milk, and so on), and we finally got the coffee, which is the return value of the function. By definition, there are two types of functions. They are, a system-defined function and a user-defined function. In our Arduino code, we have often seen the following structure: void setup() { } void loop() { } setup() and loop() are also functions. The return type of these functions is void. Don’t worry, we will discuss the type of function soon. The setup() and loop() functions are system-defined functions. There are a number of system-defined functions. The user-defined functions cannot be named after them. Before going deeper into function types, let’s learn the syntax of a function. Functions can be as follows: void functionName() { //statements } Or like void functionName(arg1, arg2, arg3) { //statements } So, what’s the difference? Well, the first function has no arguments, but the second function does. There are four types of function, depending on the return type and arguments. They are as follows: A function with no arguments and no return value A function with no arguments and a return value A function with arguments and no return value A function with arguments and a return value Now, the question is, can the arguments be of any type? Yes, they can be of any type, depending on the function. They can be Boolean, integers, floats, or characters. They can be a mixture of data types too. We will look at some examples later. Now, let’s define and look at examples of the four types of function we just defined. Function with no arguments and no return value, these functions do not accept arguments. The return type of these functions is void, which means the function returns nothing. Let me clear this up. As we learned earlier, a function must be named by something. The naming of a function will follow the rule for the variable naming. If we have a name for a function, we need to define its type also. It’s the basic rule for defining a function. So, if we are not sure of our function’s type (what type of job it will do), then it is safe to use the void keyword in front of our function, where void means no data type, as in the following function: void myFunction(){ //statements } Inside the function, we may do all the things we need. Say we want to print I love Arduino! ten times if the function is called. So, our function must have a loop that continues for ten times and then stops. So, our function can be written as follows: void myFunction() { int i; for (i = 0; i < 10; i++) { Serial.println(“I love Arduino!“); } } The preceding function does not have a return value. But if we call the function from our main function (from the setup() function; we may also call it from the loop() function, unless we do not want an infinite loop), the function will print I love Arduino! ten times. No matter how many times we call, it will print ten times for each call. Let’s write the full code and look at the output. The full code is as follows: void myFunction() { int i; for (i = 0; i < 10; i++) { Serial.println(“I love Arduino!“); } } void setup() { Serial.begin(9600); myFunction(); // We called our function Serial.println(“................“); //This will print some dots myFunction(); // We called our function again } void loop() { // put your main code here, to run repeatedly: } In the code, we placed our function (myFunction) after the loop() function. It is a good practice to declare the custom function before the setup() loop. Inside our setup() function, we called the function, then printed a few dots, and finally, we called our function again. You can guess what will happen. Yes, I love Arduino! will be printed ten times, then a few dots will be printed, and finally, I love Arduino! will be printed ten times. Let’s look at the output on the serial monitor: Yes. Your assumption is correct! Function with no arguments and a return value In this type of function, no arguments are passed, but they return a value. You need to remember that the return value depends on the type of the function. If you declare a function as an integer function, the return value’s type will have to have be an integer also. If you declare a function as a character, the return type must be a character. This is true for all other data types as well. Let’s look at an example. We will declare an integer function, where we will define a few integers. We will add them and store them to another integer, and finally, return the addition. The function may look as follows: int addNum() { int a = 3, b = 5, c = 6, addition; addition = a + b + c; return addition; } The preceding function should return 14. Let’s store the function’s return value to another integer type of variable in the setup() function and print in on the serial monitor. The full code will be as follows: void setup() { Serial.begin(9600); int fromFunction = addNum(); // added values to an integer Serial.println(fromFunction); // printed the integer } void loop() { } int addNum() { int a = 3, b = 5, c = 6, addition; //declared some integers addition = a + b + c; // added them and stored into another integers return addition; // Returned the addition. } The output will look as follows: Function with arguments and no return value This type of function processes some arguments inside the function, but does not return anything directly. We can do the calculations inside the function, or print something, but there will be no return value. Say we need find out the sum of two integers. We may define a number of variables to store them, and then print the sum. But with the help of a function, we can just pass two integers through a function; then, inside the function, all we need to do is sum them and store them in another variable. Then we will print the value. Every time we call the function and pass our values through it, we will get the sum of the integers we pass. Let’s define a function that will show the sum of the two integers passed through the function. We will call the function sumOfTwo(), and since there is no return value, we will define the function as void. The function should look as follows: void sumOfTwo(int a, int b) { int sum = a + b; Serial.print(“The sum is “ ); Serial.println(sum); } Whenever we call this function with proper arguments, the function will print the sum of the number we pass through the function. Let’s look at the output first; then we will discuss the code: We pass the arguments to a function, separating them with commas. The sequence of the arguments must not be messed up while we call the function. Because the arguments of a function may be of different types, if we mess up while calling, the program may not compile and will not execute correctly: Say a function looks as follows: void myInitialAndAge(int age, char initial) { Serial.print(“My age is “); Serial.println(age); Serial.print(“And my initial is “); Serial.print(initial); } Now, we must call the function like so: myInitialAndAge(6,’T’); , where 6 is my age and T is my initial. We should not do it as follows: myInitialAndAge(‘T’, 6);. We called the function and passed two values through it (12 and 24). We got the output as The sum is 36. Isn’t it amazing? Let’s go a little bit deeper. In our function, we declared our two arguments (a and b) as integers. Inside the whole function, the values (12 and 24) we passed through the function are as follows: a = 12 and b =24; If we called the function this sumOfTwo(24, 12), the values of the variables would be as follows: a = 24 and b = 12; I hope you can now understand the sequence of arguments of a function. How about an experiment? Call the sumOfTwo() function five times in the setup() function, with different values of a and b, and compare the outputs. Function with arguments and a return value This type of function will have both the arguments and the return value. Inside the function, there will be some processing or calculations using the arguments, and later, there would be an outcome, which we want as a return value. Since this type of function will return a value, the function must have a type. Let‘s look at an example. We will write a function that will check if a number is prime or not. From your math class, you may remember that a prime number is a natural number greater than 1 that has no positive divisors other than 1 and itself. The basic logic behind checking whether a number is prime or not is to check all the numbers starting from 2 to the number before the number itself by dividing the number. Not clear? Ok, let’s check if 9 is a prime number. No, it is not a prime number. Why? Because it can be divided by 3. And according to the definition, the prime number cannot be divisible by any number other than 1 and the number itself. So, we will check if 9 is divisible by 2. No, it is not. Then we will divide by 3 and yes, it is divisible. So, 9 is not a prime number, according to our logic. Let’s check if 13 is a prime number. We will check if the number is divisible by 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, and 12. No, the number is not divisible by any of those numbers. We may also shorten our checking by only checking the number that is half of the number we are checking. Look at the following code: int primeChecker(int n) // n is our number which will be checked { int i; // Driver variable for the loop for (i = 2; i <= n / 2; i++) // Continued the loop till n/2 { if (n % i == 0) // If no reminder return 1; // It is not prime } return 0; // else it is prime } The code is quite simple. If the remainder is equal to zero, our number is fully divisible by a number other than 1 and itself, so it is not a prime number. If there is any remainder, the number is a prime number. Let’s write the full code and look at the output for the following numbers: 23 65 235 4,543 4,241 The full source code to check if the numbers are prime or not is as follows: void setup() { Serial.begin(9600); primeChecker(23); // We called our function passing our number to test. primeChecker(65); primeChecker(235); primeChecker(4543); primeChecker(4241); } void loop() { } int primeChecker(int n) { int i; //driver variable for the loop for (i = 2; i <= n / 2; ++i) //loop continued until the half of the numebr { if (n % i == 0) return Serial.println(“Not a prime number“); // returned the number status } return Serial.println(“A prime number“); } This is a very simple code. We just called our primeChecker() function and passed our numbers. Inside our primeChecker() function, we wrote the logic to check our number. Now let’s look at the output: From the output, we can see that, other than 23 and 4,241, none of the numbers are prime. Let’s look at an example, where we will write four functions: add(), sub(), mul(), and divi(). Into these functions, we will pass two numbers, and print the value on the serial monitor. The four functions can be defined as follows: float sum(float a, float b) { float sum = a + b; return sum; } float sub(float a, float b) { float sub = a - b; return sub; } float mul(float a, float b) { float mul = a * b; return mul; } float divi(float a, float b) { float divi = a / b; return divi; } Now write the rest of the code, which will give the following outputs: Usages of functions You may wonder which type of function we should use. The answer is simple. The usages of the functions are dependent on the operations of the programs. But whatever the function is, I would suggest to do only a single task with a function. Do not do multiple tasks inside a function. This will usually speed up your processing time and the calculation time of the code. You may also want to know why we even need to use functions. Well, there a number of uses of functions, as follows: Functions help programmers write more organized code Functions help to reduce errors by simplifying code Functions make the whole code smaller Functions create an opportunity to use the code multiple times Exercise To extend your knowledge of functions, you may want to do the following exercise: Write a program to check if a number is even or odd. (Hint: you may remember the % operator). Write a function that will find the largest number among four numbers. The numbers will be passed as arguments through the function. (Hint: use if-else conditions). Suppose you work in a garment factory. They need to know the area of a cloth. The area can be in float. They will provide you the length and height of the cloth. Now write a program using functions to find out the area of the cloth. (Use basic calculation in the user-defined function). Summary This article gave us a hint about the functions that Arduino perform. With this information we can create even more programs that is supportive in nature with Arduino. Resources for Article: Further resources on this subject: Connecting Arduino to the Web [article] Getting Started with Arduino [article] Arduino Development [article]
Read more
  • 0
  • 0
  • 38179

article-image-preparing-initial-two-nodes
Packt
02 Mar 2017
8 min read
Save for later

Preparing the Initial Two Nodes

Packt
02 Mar 2017
8 min read
In this article by Carlos R. Morrison the authors of the book Build Supercomputers with Raspberry Pi 3, we will learn following topics: Preparing master node Transferring the code Preparing slave node (For more resources related to this topic, see here.) Preparing master node During boot-up, select the US or (whatever country you reside in) keyboard option located at the middle-bottom of the screen. After boot-up completes, start with the following figure. Click on Menu, then Preferences. Select Raspberry Pi Configuration, refer the following screenshot: Menu options The System tab appears (see following screenshot). Type in a suitable Hostname for your master Pi. Auto login is already checked: System tab Go ahead and change the password to your preference. Refer the following screenshot: Change password Select the Interfaces tab (see following screenshot). Click Enable on all options, especially Secure Shell (SSH), as you will be remotely logging into the Pi2 or Pi3 from your main PC. The other options are enabled for convenience, as you may be using the Pi2 or Pi3 in other projects outside of supercomputing: Interface options The next important step is to boost the processing speed from 900 MHz to 1000 MHz or 1 GHz (this option is only available on the Pi2). Click on the Performance tab (see following screenshot) and select High (1000MHz). You are indeed building a supercomputer, so you need to muster all the available computing muscle. Leave the GPU Memory as is (default). The author used a 32-GB SD card in the master Pi2, hence the 128 Mb default setting: Performance, overclock option Changing the processor clock speed requires reboot of the Pi2. Go ahead and click the Yes button. After reboot, the next step is to update and upgrade the Pi2 or Pi3 software. Go ahead and click on the Terminal icon located at the top left of the Pi2 or Pi3 monitor. Refer the following screenshot: Terminal icon After the terminal window appears, enter sudo apt-get update at the “$” prompt. Refer the following screenshot: Terminal screen This update process takes several minutes. After update concludes, enter sudo apt-get upgrade; again, this upgrade process takes several minutes. At the completion of the upgrade, enter at the $ prompt each of the following commands: sudo apt-get install build-essential sudo apt-get install manpages-dev sudo apt-get install gfortran sudo apt-get install nfs-common sudo apt-get install nfs-kernel-server sudo apt-get install vim sudo apt-get install openmpi-bin sudo apt-get install libopenmpi-dev sudo apt-get install openmpi-doc sudo apt-get install keychain sudo apt-get install nmap These updates and installs will allow you to edit (vim), and run Fortran, C, MPI codes, and allow you to manipulate and further configure your master Pi2 or Pi3. We will now transfer codes from the main PC to the master Pi2 or Pi3. Transferring the code IP address The next step is to transfer your codes from the main PC to the master Pi2 or Pi3, after which you will again compile and run the codes, same as you did earlier. But before we proceed, you need to ascertain the IP address of the master Pi2 or Pi3. Go ahead and enter the command ifconfig, in the Pi terminal window: The author’s Pi2 IP address is 192.168.0.9 (see second line of the displayed text). The MAC address is b8:27:eb:81:e5:7d (see first line of displayed text), and the net mask address is 255.255.255.0. Write down these numbers, as they will be needed later when configuring the Pis and switch. Your IP address may be similar, possibly except for the last two highlighted numbers depicted previously. Return to your main PC, and list the contents in the code folder located in the Desktop directory; that is, type, and enter ls -la. The list of the folder content is displayed. You can now Secure File Transfer Protocol (SFTP) your codes to your master Pi. Note the example as shown in following screenshot. The author’s processor name is gamma in the Ubuntu/Linux environment: If you are not in the correct code folder, change directory by typing cd Desktop, followed by the path to your code files. The author’s files are stored in the Book_b folder. The requisite files are highlighted in red. At the $ prompt, enter sftp pi@192.168.0.9, using, of course, your own Pi IP address following the @ character. You will be prompted for a password. Enter the password for your Pi. At the sftp> prompt, enter put MPI_08_b.c, again replacing the author’s file name with your own, if so desired. Go ahead and sftp the other files also. Next, enter Exit. You should now be back at your code folder. Now, here comes the fun part. You will, from here onward, communicate remotely with your Pi from your main PC, same as the way hackers due to remote computers. So, go ahead now and log into your master Pi. Enter ssh pi@192.168.09 at the $ prompt, and enter your password. Now do a listing of the files in the home directory; enter ls -la, refer the following screenshot: You should see the files (the files shown in the previous figure) in your home directory that you recently sftp over from your main PC. You can now roam around freely inside your Pi, and see what secret data can be stolen – never mind, I’m getting a little carried away here. Go ahead and compile the requisite files, call-procs.c, C_PI.c, and MPI_08_b.c, by entering the command mpicc [file name.c] -o [file name] –lm in each case. The extension -lm is required when compiling C files containing the math header file <math.h>. Execute the file call-procs, or the file name you created, using the command mpiexec -n 1 call-procs. After the first execution, use a different process number in each subsequent execution. Your run should approximate to the ones depicted following. Note that, because of the multithreaded nature of the program, the number of processes executes in random order, and there is clearly no dependence on one another: Note, from here onward, the run data was from a Pi2 computer. The Pi3 computer will give a faster run time (its processor clock speed is 1.2 GHz, as compared to the Pi2, which has an overclocked speed of 1 GHz). Execute the serial Pi code C_PI as depicted following: Execute the M_PI code MPI_08_b, as depicted following: The difference in execution time between the four cores on gamma (0m59.076s), and the four cores on Mst0 (16m35.140s). Each core in gamma is running at 4 GHz, while each core in Mst0 is running at 1 GHz. Core clock speed matters. Preparing slave node You will now prepare or configure the first slave node of your supercomputer. Switch the HDMI monitor cable from the master Pi to the slave Pi. Check to see whether the SD NOOBS/Raspbian card is inserted in the drive. Label side should face outwards. The drive is spring-loaded, so you must gently insert the card. To remove the card, apply a gentle inward pressure to unlock the card. Insert the slave power cord into a USB power slot adjacent the master Pi, in the rapid charger. Follow the same procedure outlined previously for installing, updating, and upgrading the Raspbian OS on the master Pi, this time naming the slave node Slv1, or any other name you desire. Use the same password for all the Pi2s or Pi3s in the cluster, as it simplifies the configuration procedure. At the completion of the update, upgrade, and installs, acquire the IP address of the slave1 Pi, as you will be remotely logging into it. Use the command ifconfig to obtain its IP address. Return to your main PC, and use sftp to transfer the requisite files from your main computer – same as you did for the master Pi2. Compile the transferred .c files. Test or run the codes as discussed earlier. You are now ready to configure the master Pi and slave Pi for communication between themselves and the main PC. We now discuss configuring the static IP address for the Pis, and the switch they are connected to. Summary In this article we have learned how to prepare master node and slave node, and also how to transfer the code for supercomputer. Resources for Article: Further resources on this subject: Bug Tracking [article] API with MongoDB and Node.js [article] Basic Website using Node.js and MySQL database [article]
Read more
  • 0
  • 0
  • 17012
Modal Close icon
Modal Close icon