How-To Tutorials

14 Sep 2015

6 min read

Getting Started – Understanding Citrix XenDesktop and its Architecture

14 Sep 2015

In this article written by Gurpinder Singh, author of the book Troubleshooting Citrix Xendesktop, the author wants us to learn about the following topics: Hosted shared vs hosted virtual desktops Citrix FlexCast delivery technology Modular framework architecture What's new in XenDesktop 7.x (For more resources related to this topic, see here.) Hosted shared desktops (HSD) vs hosted virtual desktops (HVD) Instead of going through the XenDesktop architecture; firstly, we would like to explain the difference between the two desktop delivery platforms HSD and HVD. It is a common question that is asked by every System Administrator whenever there is a discussion on the most suited desktop delivery platform for the enterprises. Desktop Delivery platform depends on the requirements for the enterprise. Some choose Hosted Shared Desktops (HSD)or Server Based Computing (XenApp) over Hosted Virtual Desktop (XenDesktop); where single server desktop is shared among multiple users, and the environment is locked down using Active Directory GPOs. XenApp is cost effective platform when compared between XenApp and XenDesktop and many small to mid-sized enterprises prefer to choose this platform due to its cost benefits and less complexity. However, the preceding model does pose some risks to the environment as the same server is being shared by multiple users and a proper design plan is required to configure proper HSD or XenApp Published Desktop environment. Many enterprises have security and other user level dependencies where they prefer to go with hosted virtual desktops solution. Hosted virtual desktop or XenDesktop runs a Windows 7 or Windows 8 desktop running as virtual machine hosted on a data centre. In this model, single user connects to single desktop and therefore, there is a very less risk of having desktop configuration impacted for all users. XenDesktop 7.x and above versions now also enable you to deliver server based desktops (HSD) along with HVD within one product suite. XenDesktop also provides HVD pooled desktops which work on a shared OS image concept which is similar to HSD desktops with a difference of running Desktop Operating System instead of Server Operating System. Please have a look at the following table which would provide you a fair idea on the requirement and recommendation on both delivery platforms for your enterprise. Customer Requirement Delivery Platform User needs to work on one or two applications and often need not to do any updates or installation on their own. Hosted Shared Desktop User work on their own core set of applications for which they need to change system level settings, installations and so on. Hosted virtual Desktops (Dedicated) User works on MS Office and other content creation tools Hosted Shared Desktop User needs to work on CPU and graphic intensive applications that requires video rendering Hosted Virtual Desktop (Blade PCs) User needs to have admin privileges to work on specific set of applications. Hosted Virtual Desktop (Pooled) You can always have mixed set of desktop delivery platforms in your environment focussed on the customer need and requirements. Citrix FlexCast delivery technology Citrix FlexCast is a delivery technology that allows Citrix administrator to personalize virtual desktops to meet the performance, security and flexibility requirements of end users. There are different types of user requirements; some need standard desktops with standard set of apps and others require high performance personalized desktops. Citrix has come up with a solution to meet these demands with Citrix FlexCast Technology. You can deliver any kind of virtualized desktop with FlexCast technology, there are five different categories in which FlexCast models are available. Hosted Shared or HSD Hosted Virtual Desktop or HVD Streamed VHD Local VMs On-Demand Apps The detailed discussion on these models is out of scope for this article. To read more about the FlexCast models, please visit http://support.citrix.com/article/CTX139331. Modular framework architecture To understand the XenDesktop architecture, it is better to break down the architecture into discrete independent modules rather than visualizing it as an integrated one single big piece. Citrix provided this modularized approach to design and architect XenDesktop to solve end customers set of requirements and objectives. This modularized approach solves customer requirements by providing a platform that is highly resilient, flexible and scalable. This reference architecture is based on information gathered by multiple Citrix consultants working on a wide range of XenDesktop implementations. Have a look at the basic components of the XenDesktop architecture that everyone should be aware of before getting involved with troubleshooting: We won't be spending much time on understanding each component of the reference architecture, http://www.citrix.com/content/dam/citrix/en_us/documents/products-solutions/xendesktop-deployment-blueprint.pdf in detail as this is out of scope for this book. We would be going through each component quickly. What's new in XenDesktop 7.x With the release of Citrix XenDesktop 7, Citrix has introduced a lot of improvements over previous releases. With every new product release, there is lot of information published and sometimes it becomes very difficult to get the key information that all system administrators would be looking for to understand what has been changed and what the key benefits of the new release are. The purpose of this section would be to highlight the new key features that XenDesktop 7.x brings to the kitty for all Citrix administrators. This section would not provide you all the details regarding the new features and changes that XenDesktop 7.x has introduced but highlights the key points that every Citrix administrator should be aware of while administrating XenDesktop 7. Key Highlights: XenApp and XenDesktop are part of now single setup Cloud integration to support desktop deployments on the cloud IMA database doesn't exist anymore IMA is replaced by FMA (Flexcast Management Architecture) Zones Concept are no more zones or ZDC (Data Collectors) Microsoft SQL is the only supported Database Sites are used instead of Farms XenApp and XenDesktop can now share consoles, Citrix Studio and Desktop Director are used for both products Shadowing feature is deprecated; Citrix recommends Microsoft Remote Assistance to be used Locally installed applications integrated to be used with Server based desktops HDX & mobility features Profile Management is included MCS can now be leveraged for both Server & Desktop OS MCS now works with KMS Storefront replaces Web Interface Remote-PC Access No more Citrix Streaming Profile Manager; Citrix recommends MS App-V Core component is being replaced by a VDA agent Summary We should now have a basic understanding on desktop virtualization concepts, Architecture, new features in XenDesktop 7.x, XenDesktop delivery models based on FlexCast Technology. Resources for Article: Further resources on this subject: High Availability, Protection, and Recovery using Microsoft Azure [article] Designing a XenDesktop® Site [article] XenMobile™ Solutions Bundle [article]

0
0
20630

Packt

14 Sep 2015

41 min read

Understanding the Datastore

Packt

14 Sep 2015

41 min read

0
0
2744

How-To Tutorials

Packt

14 Sep 2015

9 min read

Apache Spark

Packt

14 Sep 2015

9 min read

In this article by Mike, author of the book Mastering Apache Spark many Hadoop-based tools built on Hadoop CDH cluster are introduced. (For more resources related to this topic, see here.) His premise, when approaching any big data system, is that none of the components exist in isolation. There are many functions that need to be addressed in a big data system with components passing data along an ETL (Extract Transform and Load) chain, or calling the subcomponents to carry out processing. Some of the functions are: Data Movement Scheduling Storage Data Acquisition Real Time Data Processing Batch Data Processing Monitoring Reporting This list is not exhaustive, but it gives you an idea of the functional areas that are involved. For instance, HDFS (Hadoop Distributed File System) might be used for storage, Oozie for scheduling, Hue for monitoring, and Spark for real-time processing. His point, though, is that none of these systems exists in isolation; they either exist in an ETL chain when processing data, and rely on other sub components as in Oozie, or depend on other components to provide functionality that they do not have. His contention is that integration between big data systems is an important factor. One needs to consider from where the data is coming, how it will be processed, and where it is then going to. Given this consideration, the integration options for a big data component need to be investigated both, in terms of what is available now, and what might be available in the future. In the book, the author has distributed the system functionality by chapters, and tried to determine what tools might be available to carry out these functions. Then, with the help of simple examples by using code and data, he has shown how the systems might be used together. The book is based upon Apache Spark, so as you might expect, it investigates the four main functional modules of Spark: MLlib for machine learning Streaming for the data stream processing SQL for data processing in a tabular format GraphX for graph-based processing However, the book attempts to extend these common, real-time big data processing areas by examining extra areas such as graph-based storage and real-time cloud-based processing via Databricks. It provides examples of integration with external tools, such as Kafka and Flume, as well as Scala-based development examples. In order to Spark your interest, and prepare you for the book's contents, he has described the contents of the book by subject, and given you a sample of the content. Overview The introduction sets the scene for the book by examining topics such as Spark cluster design, and the choice of cluster managers. It considers the issues, affecting the cluster performance, and explains how real-time big data processing can be carried out in the cloud. The following diagram, describes the topics that are explained in the book: The Spark Streaming examples are provided along with details for checkpointing to avoid data loss. Installation and integration examples are provided for Kafka (messaging) and Flume (data movement). The functionality of Spark MLlib is extended via 0xdata H2O, and a deep learning example neural system is created and tested. The Spark SQL is investigated, and integrated with Hive to show that Spark can become a real-time processing engine for Hive. Spark storage is considered, by example, using Aurelius (Datastax) Titan along with underlying storage in HBase and Cassandra. The use of Tinkerpop and Gremlin shell are explained by example for graph processing. Finally, of course many, methods of integrating Spark to HDFS are shown with the help of an example. This gives you a flavor of what is in the book, but it doesn't give you the detail. Keep reading to find out what is in each area. Spark MLlib Spark MLlib examines data classification with Naïve Bayes, data clustering with K-Means, and neural processing with ANN (Artificial Neural Network). If these terms do not mean anything to you, don't worry. They are explained both, in terms of theory, and then practically with examples. The author has always been interested in neural networks, and was pleased to be able to base the ANN section on the work by Bert Greevenbosch (www.bertgreevenbosch.nl). This allows to show how Apache Spark can be built from source code, and be extended in the same process with extra functionality. The following diagram shows a real, biological neuron to the left, and a simulated neuron to the right. It also explains how computational neurons are simulated in a step-by-step process from real neurons in your head. It then goes on to describe how neural networks are created, and how processing takes place. It's an interesting topic. The integration of big data systems, and neural processing. Spark Streaming An important issue, when processing stream-based data, is failure recover. Here, we examine error recovery, and checkpointing with the help of an example for Apache Spark. It also provides examples for TCP, file, Flume, and Kafka-based stream processing using Spark. Even though he has provided step-by-step, code-based examples, data stream processing can become complicated. He has tried to reduce complexity, so that learning does not become a challenge. For example, when introducing a Kafka-based example, The following diagram is used to explain the test components with the data flow, and the component set up in a logical, step-by-step manner: Spark SQL When introducing Spark SQL, he has described the data file formats that might be used to assist with data integration. Then move on to describe with the help of an example the use of the data frames, followed closely by practical SQL examples. Finally, integration with Apache Hive is introduced to provide big data warehouse real-time processing by example. The user-defined functions are also explained, showing how they can be defined in multiple ways, and be used with Spark SQL. Spark GraphX Graph processing is examined by showing how a simple graph can be created in Scala. Then, sample graph algorithms are introduced like PageRank and Triangles. With permission from Kenny Bastani (http://www.kennybastani.com/), the Mazerunner prototype application is discussed. A step-by-step approach is described by which Docker, Neo4j, and Mazerunner can be installed. Then, the functionality of both, Neo4j and Mazerunner, is used to move the data between Neo4j and HDFS. The following diagram gives an overview of the architecture that will be introduced: Spark storage Apache Spark is a highly functional, real-time, distributed big data processing system. However, it does not provide any data storage. In many places within the book, the examples are provided for using HDFS-based storage, but what if you want graph-based storage? What if you want to process and store data as a graph? The Aurelius (Datastax) Titan graph database is examined in the book. The underlying storage options with Cassandra, and HBase are used with Scala examples. The graph-based processing is examined using Tinkerpop and Gremlin-based scripts. Using a simple, example-based approach, both: the architecture involved, and multiple ways of using Gremlin shell are introduced in the following diagram: Spark H2O While Apache Spark is highly functional and agile, allowing data to move easily between its modules, how might we extend it? By considering the H2O product from http://h2o.ai/, the machine learning functionality of Apache Spark can be extended. H2O plus Spark equals Sparkling Water. Sparkling Water is used to create a deep learning neural processing example for data processing. The H2O web-based Flow application is also introduced for analytics, and data investigation. Spark Databricks Having created big data processing clusters on the physical machines, the next logical step is to move processing into the cloud. This might be carried out by obtaining cloud-based storage, using Spark as a cloud-based service, or using a Spark-based management system. The people who designed Apache Spark have created a Spark cloud-based processing platform called https://databricks.com/. He has dedicated two chapters in the book to this service, because he feels that it is important to investigate the future trends. All the aspects of Databricks are examined from the user and cluster management to the use of Notebooks for data processing. The languages that can be used are investigated as the ways of developing code on local machines, and then they can be moved to the cloud, in order to save money. The data import is examined with examples, as is the DbUtils package for data processing. The REST interface for the Spark cloud instance management is investigated, because it offers integration options between your potential cloud instance, and the external systems. Finally, options for moving data and functionality are investigated in terms of data and folder import/export, along with library import, and cluster creation on demand. Databricks visualisation The various options of cloud-based big data visualization using Databricks are investigated. Multiple ways are described for creating reports with the help of tables and SQL bar graphs. Pie charts and world maps are used to present data. Databricks allows geolocation data to be combined with your raw data to create geographical real-time charts. The following figure, taken from the book, shows the result of a worked example, combining GeoNames data with geolocation data. The color coded country-based data counts are the result. It's difficult to demonstrate this in a book, but imagine this map, based upon the stream-based data, and continuously updating in real time. In a similar way, it is possible to create dashboards from your Databricks reports, and make them available to your external customers via a web-based URL. Summary Mike hopes that this article has given you an idea of the book's contents. And also that it has intrigued you, so that you will search out a copy of the Spark-based book, Mastering Apache Spark, and try out all of these examples for yourself. The book comes with a code package that provides the example-based sample code, as well as build and execution scripts. This should provide you with an easy start, and a platform to build your own Spark based-code. Resources for Article: Further resources on this subject: Sabermetrics with Apache Spark[article] Getting Started with Apache Spark[article] Machine Learning Using Spark MLlib[article]

0
0
2801

article-image-introducing-boost-c-libraries

Packt

14 Sep 2015

22 min read

Introducing the Boost C++ Libraries

Packt

14 Sep 2015

22 min read

In this article written by John Torjo and Wisnu Anggoro, authors of the book Boost.Asio C++ Network Programming - Second Edition, the authors state that "Many programmers have used libraries since this simplifies the programming process. Because they do not need to write the function from scratch anymore, using a library can save much code development time". In this article, we are going to get acquainted with Boost C++ libraries. Let us prepare our own compiler and text editor to prove the power of Boost libraries. As we do so, we will discuss the following topics: Introducing the C++ standard template library Introducing Boost C++ libraries Setting up Boost C++ libraries in MinGW compiler Building Boost C++ libraries Compiling code that contains Boost C++ libraries (For more resources related to this topic, see here.) Introducing the C++ standard template library The C++ Standard Template Library (STL) is a generic template-based library that offers generic containers among other things. Instead of dealing with dynamic arrays, linked lists, binary trees, or hash tables, programmers can easily use an algorithm that is provided by STL. The STL is structured by containers, iterators, and algorithms, and their roles are as follows: Containers: Their main role is to manage the collection of objects of certain kinds, such as arrays of integers or linked lists of strings. Iterators: Their main role is to step through the element of the collections. The working of an iterator is similar to that of a pointer. We can increment the iterator by using the ++ operator and access the value by using the * operator. Algorithms: Their main role is to process the element of collections. An algorithm uses an iterator to step through all elements. After it iterates the elements, it processes each element, for example, modifying the element. It can also search and sort the element after it finishes iterating all the elements. Let us examine the three elements that structure STL by creating the following code: /* stl.cpp */ #include <vector> #include <iostream> #include <algorithm> int main(void) { int temp; std::vector<int> collection; std::cout << "Please input the collection of integer numbers, input 0 to STOP!n"; while(std::cin >> temp != 0) { if(temp == 0) break; collection.push_back(temp); } std::sort(collection.begin(), collection.end()); std::cout << "nThe sort collection of your integer numbers:n"; for(int i: collection) { std::cout << i << std::endl; } } Name the preceding code stl.cpp, and run the following command to compile it: g++ -Wall -ansi -std=c++11 stl.cpp -o stl Before we dissect this code, let us run it to see what happens. This program will ask users to enter as many as integer, and then it will sort the numbers. To stop the input and ask the program to start sorting, the user has to input 0. This means that 0 will not be included in the sorting process. Since we do not prevent users from entering non-integer numbers such as 3.14, or even the string, the program will soon stop waiting for the next number after the user enters a non-integer number. The code yields the following output: We have entered six integer: 43, 7, 568, 91, 2240, and 56. The last entry is 0 to stop the input process. Then the program starts to sort the numbers and we get the numbers sorted in sequential order: 7, 43, 56, 91, 568, and 2240. Now, let us examine our code to identify the containers, iterators, and algorithms that are contained in the STL. std::vector<int> collection; The preceding code snippet has containers from STL. There are several containers, and we use a vector in the code. A vector manages its elements in a dynamic array, and they can be accessed randomly and directly with the corresponding index. In our code, the container is prepared to hold integer numbers so we have to define the type of the value inside the angle brackets <int>. These angle brackets are also called generics in STL. collection.push_back(temp); std::sort(collection.begin(), collection.end()); The begin() and end() functions in the preceding code are algorithms in STL. They play the role of processing the data in the containers that are used to get the first and the last elements in the container. Before that, we can see the push_back() function, which is used to append an element to the container. for(int i: collection) { std::cout << i << std::endl; } The preceding for block will iterate each element of the integer which is called as collection. Each time the element is iterated, we can process the element separately. In the preceding example, we showed the number to the user. That is how the iterators in STL play their role. #include <vector> #include <algorithm> We include vector definition to define all vector functions and algorithm definition to invoke the sort() function. Introducing the Boost C++ libraries The Boost C++ libraries is a set of libraries to complement the C++ standard libraries. The set contains more than a hundred libraries that we can use to increase our productivity in C++ programming. It is also used when our requirements go beyond what is available in the STL. It provides source code under Boost Licence, which means that it allows us to use, modify, and distribute the libraries for free, even for commercial use. The development of Boost is handled by the Boost community, which consists of C++ developers from around the world. The mission of the community is to develop high-quality libraries as a complement to STL. Only proven libraries will be added to the Boost libraries. For detailed information about Boost libraries go to www.boost.org. And if you want to contribute developing libraries to Boost, you can join the developer mailing list at lists.boost.org/mailman/listinfo.cgi/boost. The entire source code of the libraries is available on the official GitHub page at github.com/boostorg. Advantages of Boost libraries As we know, using Boost libraries will increase programmer productivity. Moreover, by using Boost libraries, we will get advantages such as these: It is open source, so we can inspect the source code and modify it if needed. Its license allows us to develop both open source and close source projects. It also allows us to commercialize our software freely. It is well documented and we can find it libraries all explained along with sample code from the official site. It supports almost any modern operating system, such as Windows and Linux. It also supports many popular compilers. It is a complement to STL instead of a replacement. It means using Boost libraries will ease those programming processes which are not handled by STL yet. In fact, many parts of Boost are included in the standard C++ library. Preparing Boost libraries for MinGW compiler Before we go through to program our C++ application by using Boost libraries, the libraries need to be configured in order to be recognized by MinGW compiler. Here we are going to prepare our programming environment so that our compiler is able use Boost libraries. Downloading Boost libraries The best source from which to download Boost is the official download page. We can go there by pointing our internet browser to www.boost.org/users/download. Find the Download link in Current Release section. At the time of writing, the current version of Boost libraries is 1.58.0, but when you read this article, the version may have changed. If so, you can still choose the current release because the higher version must be compatible with the lower. However, you have to adjust as we're goning to talk about the setting later. Otherwise, choosing the same version will make it easy for you to follow all the instructions in this article. There are four file formats to be choose from for download; they are .zip, .tar.gz, .tar.bz2, and .7z. There is no difference among the four files but their file size. The largest file size is of the ZIP format and the lowest is that of the 7Z format. Because of the file size, Boost recommends that we download the 7Z format. See the following image for comparison: We can see, from the preceding image, the size of ZIP version is 123.1 MB while the size of the 7Z version is 65.2 MB. It means that the size of the ZIP version is almost twice that of the 7Z version. Therefore they suggest that you choose the 7Z format to reduce download and decompression time. Let us choose boost_1_58_0.7z to be downloaded and save it to our local storage. Deploying Boost libraries After we have got boost_1_58_0.7z in our local storage, decompress it using the 7ZIP application and save the decompression files to C:boost_1_58_0. The 7ZIP application can be grabbed from www.7-zip.org/download.html. The directory then should contain file structures as follows: Instead of browsing to the Boost download page and searching for the Boost version manually, we can go directly to sourceforge.net/projects/boost/files/boost/1.58.0. It will be useful when the 1.58.0 version is not the current release anymore. Using Boost libraries Most libraries in Boost are header-only; this means that all declarations and definitions of functions, including namespaces and macros, are visible to the compiler and there is no need to compile them separately. We can now try to use Boost with the program to convert the string into int value as follows: /* lexical.cpp */ #include <boost/lexical_cast.hpp> #include <string> #include <iostream> int main(void) { try { std::string str; std::cout << "Please input first number: "; std::cin >> str; int n1 = boost::lexical_cast<int>(str); std::cout << "Please input second number: "; std::cin >> str; int n2 = boost::lexical_cast<int>(str); std::cout << "The sum of the two numbers is "; std::cout << n1 + n2 << "n"; return 0; } catch (const boost::bad_lexical_cast &e) { std::cerr << e.what() << "n"; return 1; } } Open the Notepad++ application, type the preceding code, and save it as lexical.cpp in C:CPP. Now open the command prompt, point the active directory to C:CPP, and then type the following command: g++ -Wall -ansi lexical.cpp –Ic:boost_1_58_0 -o lexical We have a new option here, which is –I (the "include" option). This option is used along with the full path of the directory to inform the compiler that we have another header directory that we want to include to our code. Since we store our Boost libraries in c: boost_1_58_0, we can use –Ic:boost_1_58_0 as an additional parameter. In lexical.cpp, we apply boost::lexical_cast to convert string type data into int type data. The program will ask the user to input two numbers and will then automatically find the sum of both numbers. If a user inputs an inappropriate number, it will inform that an error has occurred. The Boost.LexicalCast library is provided by Boost for casting data type purpose (converting numeric types such as int, double, or floats into string types, and vice versa). Now let us dissect lexical.cpp to for a more detailed understanding of what it does: #include <boost/lexical_cast.hpp> #include <string> #include <iostream> We include boost/lexical_cast.hpp because the boost::lexical_cast function is declared lexical_cast.hpp header file whilst string header is included to apply std::string function and iostream header is included to apply std::cin, std::cout and std::cerr function. Other functions, such as std::cin and std::cout, and we saw what their functions are so we can skip those lines. #include <boost/lexical_cast.hpp> #include <string> #include <iostream> We used the preceding two separate lines to convert the user-provided input string into the int data type. Then, after converting the data type, we summed up both of the int values. We can also see the try-catch block in the preceding code. It is used to catch the error if user inputs an inappropriate number, except 0 to 9. catch (const boost::bad_lexical_cast &e) { std::cerr << e.what() << "n"; return 1; } The preceding code snippet will catch errors and inform the user what the error message exactly is by using boost::bad_lexical_cast. We call the e.what() function to obtain the string of the error message. Now let us run the application by typing lexical at the command prompt. We will get output like the following: I put 10 for first input and 20 for the second input. The result is 30 because it just sums up both input. But what will happen if I put in a non-numerical value, for instance Packt. Here is the output to try that condition: Once the application found the error, it will ignore the next statement and go directly to the catch block. By using the e.what() function, the application can get the error message and show it to the user. In our example, we obtain bad lexical cast: source type value could not be interpreted as target as the error message because we try to assign the string data to int type variable. Building Boost libraries As we discussed previously, most libraries in Boost are header-only, but not all of them. There are some libraries that have to be built separately. They are: Boost.Chrono: This is used to show the variety of clocks, such as current time, the range between two times, or calculating the time passed in the process. Boost.Context: This is used to create higher-level abstractions, such as coroutines and cooperative threads. Boost.Filesystem: This is used to deal with files and directories, such as obtaining the file path or checking whether a file or directory exists. Boost.GraphParallel: This is an extension to the Boost Graph Library (BGL) for parallel and distributed computing. Boost.IOStreams: This is used to write and read data using stream. For instance, it loads the content of a file to memory or writes compressed data in GZIP format. Boost.Locale: This is used to localize the application, in other words, translate the application interface to user's language. Boost.MPI: This is used to develop a program that executes tasks concurrently. MPI itself stands for Message Passing Interface. Boost.ProgramOptions: This is used to parse command-line options. Instead of using the argv variable in the main parameter, it uses double minus (--) to separate each command-line option. Boost.Python: This is used to parse Python language in C++ code. Boost.Regex: This is used to apply regular expression in our code. But if our development supports C++11, we do not depend on the Boost.Regex library anymore since it is available in the regex header file. Boost.Serialization: This is used to convert objects into a series of bytes that can be saved and then restored again into the same object. Boost.Signals: This is used to create signals. The signal will trigger an event to run a function on it. Boost.System: This is used to define errors. It contains four classes: system::error_code, system::error_category, system::error_condition, and system::system_error. All of these classes are inside the boost namespace. It is also supported in the C++11 environment, but because many Boost libraries use Boost.System, it is necessary to keep including Boost.System. Boost.Thread: This is used to apply threading programming. It provides classes to synchronize access on multiple-thread data. It is also supported in C++11 environments, but it offers extensions, such as we can interrupt thread in Boost.Thread. Boost.Timer: This is used to measure the code performance by using clocks. It measures time passed based on usual clock and CPU time, which states how much time has been spent to execute the code. Boost.Wave: This provides a reusable C preprocessor that we can use in our C++ code. There are also a few libraries that have optional, separately compiled binaries. They are as follows: Boost.DateTime: It is used to process time data; for instance, calendar dates and time. It has a binary component that is only needed if we use to_string, from_string, or serialization features. It is also needed if we target our application in Visual C++ 6.x or Borland. Boost.Graph: It is used to create two-dimensional graphics. It has a binary component that is only needed if we intend to parse GraphViz files. Boost.Math: It is used to deal with mathematical formulas. It has binary components for cmath functions. Boost.Random: It is used to generate random numbers. It has a binary component which is only needed if we want to use random_device. Boost.Test: It is used to write and organize test programs and their runtime execution. It can be used in header-only or separately compiled mode, but separate compilation is recommended for serious use. Boost.Exception: It is used to add data to an exception after it has been thrown. It provides non-intrusive implementation of exception_ptr for 32-bit _MSC_VER==1310 and _MSC_VER==1400, which requires a separately compiled binary. This is enabled by #define BOOST_ENABLE_NON_INTRUSIVE_EXCEPTION_PTR. Let us try to recreate the random number generator. But now we will use the Boost.Random library instead of std::rand() from the C++ standard function. Let us take a look at the following code: /* rangen_boost.cpp */ #include <boost/random/mersenne_twister.hpp> #include <boost/random/uniform_int_distribution.hpp> #include <iostream> int main(void) { int guessNumber; std::cout << "Select number among 0 to 10: "; std::cin >> guessNumber; if(guessNumber < 0 || guessNumber > 10) { return 1; } boost::random::mt19937 rng; boost::random::uniform_int_distribution<> ten(0,10); int randomNumber = ten(rng); if(guessNumber == randomNumber) { std::cout << "Congratulation, " << guessNumber << " is your lucky number.n"; } else { std::cout << "Sorry, I'm thinking about number " << randomNumber << "n"; } return 0; } We can compile the preceding source code by using the following command: g++ -Wall -ansi -Ic:/boost_1_58_0 rangen_boost.cpp -o rangen_boost Now, let us run the program. Unfortunately, for the three times that I ran the program, I always obtained the same random number as follows: As we can see from this example, we always get number 8. This is because we apply Mersenne Twister, a Pseudorandom Number Generator (PRNG), which uses the default seed as a source of randomness so it will generate the same number every time the program is run. And of course it is not the program that we expect. Now, we will rework the program once again, just in two lines. First, find the following line: #include <boost/random/mersenne_twister.hpp> Change it as follows: #include <boost/random/random_device.hpp> Next, find the following line: boost::random::mt19937 rng; Change it as follows: boost::random::random_device rng; Then, save the file as rangen2_boost.cpp and compile the rangen2_boost.cpp file by using the command like we compiled rangen_boost.cpp. The command will look like this: g++ -Wall -ansi -Ic:/boost_1_58_0 rangen2_boost.cpp -o rangen2_boost Sadly, there will be something wrong and the compiler will show the following error message: cc8KWVvX.o:rangen2_boost.cpp:(.text$_ZN5boost6random6detail20generate _uniform_intINS0_13random_deviceEjEET0_RT_S4_S4_N4mpl_5bool_ILb1EEE[_ ZN5boost6random6detail20generate_uniform_intINS0_13random_deviceEjEET 0_RT_S4_S4_N4mpl_5bool_ILb1EEE]+0x24f): more undefined references to boost::random::random_device::operator()()' follow collect2.exe: error: ld returned 1 exit status This is because, as we have discussed earlier, the Boost.Random library needs to be compiled separately if we want to use the random_device attribute. Boost libraries have a system to compile or build Boost itself, called Boost.Build library. There are two steps we have to achieve to install the Boost.Build library. First, run Bootstrap by pointing the active directory at the command prompt to C:boost_1_58_0 and typing the following command: bootstrap.bat mingw We use our MinGW compiler, as our toolset in compiling the Boost library. Wait a second and then we will get the following output if the process is a success: Building Boost.Build engine Bootstrapping is done. To build, run: .b2 To adjust configuration, edit 'project-config.jam'. Further information: - Command line help: .b2 --help - Getting started guide: http://boost.org/more/getting_started/windows.html - Boost.Build documentation: http://www.boost.org/build/doc/html/index.html In this step, we will find four new files in the Boost library's root directory. They are: b2.exe: This is an executable file to build Boost libraries. bjam.exe: This is exactly the same as b2.exe but it is a legacy version. bootstrap.log: This contains logs from the bootstrap process project-config.jam: This contains a setting that will be used in the building process when we run b2.exe. We also find that this step creates a new directory in C:boost_1_58_0toolsbuildsrcenginebin.ntx86 , which contains a bunch of .obj files associated with Boost libraries that needed to be compiled. After that, run the second step by typing the following command at the command prompt: b2 install toolset=gcc Grab yourself a cup of coffee after running that command because it will take about twenty to fifty minutes to finish the process, depending on your system specifications. The last output we will get will be like this: ...updated 12562 targets... This means that the process is complete and we have now built the Boost libraries. If we check in our explorer, the Boost.Build library adds C:boost_1_58_0stagelib, which contains a collection of static and dynamic libraries that we can use directly in our program. bootstrap.bat and b2.exe use msvc (Microsoft Visual C++ compiler) as the default toolset, and many Windows developers already have msvc installed on their machines. Since we have installed GCC compiler, we set the mingw and gcc toolset options in Boost's build. If you also have mvsc installed and want to use it in Boost's build, the toolset options can be omitted. Now, let us try to compile the rangen2_boost.cpp file again, but now with the following command: c:CPP>g++ -Wall -ansi -Ic:/boost_1_58_0 rangen2_boost.cpp - Lc:boost_1_58_0stagelib -lboost_random-mgw49-mt-1_58 - lboost_system-mgw49-mt-1_58 -o rangen2_boost We have two new options here, they are –L and –l. The -L option is used to define the path that contains the library file if it is not in the active directory. The –l option is used to define the name of library file but omitting the first lib word in front of the file name. In this case, the original library file name is libboost_random-mgw49-mt-1_58.a, and we omit the lib phrase and the file extension for option -l. The new file called rangen2_boost.exe will be created in C:CPP. But before we can run the program, we have to ensure that the directory which the program installed has contained the two dependencies library file. These are libboost_random-mgw49-mt-1_58.dll and libboost_system-mgw49-mt-1_58.dll, and we can get them from the library directory c:boost_1_58_0_1stagelib. Just to make it easy for us to run that program, run the following copy command to copy the two library files to C:CPP: copy c:boost_1_58_0_1stageliblibboost_random-mgw49-mt-1_58.dll c:cpp copy c:boost_1_58_0_1stageliblibboost_system-mgw49-mt-1_58.dll c:cpp And now the program should run smoothly. In order to create a network application, we are going to use the Boost.Asio library. We do not find Boost.Asio—the library we are going to use to create a network application—in the non-header-only library. It seems that we do not need to build the boost library since Boost.Asio is header-only library. This is true, but since Boost.Asio depends on Boost.System and Boost.System needs to be built before being used, it is important to build Boost first before we can use it to create our network application. For option –I and –L, the compiler does not care if we use backslash () or slash (/) to separate each directory name in the path because the compiler can handle both Windows and Unix path styles. Summary We saw that Boost C++ libraries were developed to complement the standard C++ library We have also been able to set up our MinGW compiler in order to compile the code which contains Boost libraries and build the binaries of libraries which have to be compiled separately. Please remember that though we can use the Boost.Asio library as a header-only library, it is better to build all Boost libraries by using the Boost.Build library. It will be easy for us to use all libraries without worrying about compiling failure. Resources for Article: Further resources on this subject: Actors and Pawns[article] What is Quantitative Finance?[article] Program structure, execution flow, and runtime objects [article]

0
0
13548

article-image-overview-common-machine-learning-tasks

Packt

14 Sep 2015

29 min read

Introducing Bayesian Inference

Packt

14 Sep 2015

29 min read

0
0
3455

Packt

14 Sep 2015

13 min read

Continuous Delivery and Continuous Deployment

Packt

14 Sep 2015

13 min read

0
0
2662

Packt

14 Sep 2015

10 min read

PostgreSQL in Action

Packt

14 Sep 2015

10 min read

In this article by Salahadin Juba, Achim Vannahme, and Andrey Volkov, authors of the book Learning PostgreSQL, we will discuss PostgreSQL (pronounced Post-Gres-Q-L) or Postgres is an open source, object-relational database management system. It emphasizes extensibility, creativity, and compatibility. It competes with major relational database vendors, such as Oracle, MySQL, SQL servers, and others. It is used by different sectors, including government agencies and the public and private sectors. It is cross-platform and runs on most modern operating systems, including Windows, Mac, and Linux flavors. It conforms to SQL standards and it is ACID complaint. (For more resources related to this topic, see here.) An overview of PostgreSQL PostgreSQL has many rich features. It provides enterprise-level services, including performance and scalability. It has a very supportive community and very good documentation. The history of PostgreSQL The name PostgreSQL comes from post-Ingres database. the history of PostgreSQL can be summarized as follows: Academia: University of California at Berkeley (UC Berkeley) 1977-1985, Ingres project: Michael Stonebraker created RDBMS according to the formal relational model 1986-1994, postgres: Michael Stonebraker created postgres in order to support complex data types and the object-relational model. 1995, Postgres95: Andrew Yu and Jolly Chen changed postgres to postgres query language (P) with an extended subset of SQL. Industry 1996, PostgreSQL: Several developers dedicated a lot of labor and time to stabilize Postgres95. The first open source version was released on January 29, 1997. With the introduction of new features, and enhancements, and at the start of open source projects, the Postgres95 name was changed to PostgreSQL. PostgreSQL began at version 6, with a very strong starting point by taking advantage of several years of research and development. Being an open source with a very good reputation, PostgreSQL has attracted hundreds of developers. Currently, PostgreSQL has innumerable extensions and a very active community. Advantages of PostgreSQL PostgreSQL provides many features that attract developers, administrators, architects, and companies. Business advantages of PostgreSQL PostgreSQL is free, open source software (OSS); it has been released under the PostgreSQL license, which is similar to the BSD and MIT licenses. The PostgreSQL license is highly permissive, and PostgreSQL is not a subject to monopoly and acquisition. This gives the company the following advantages. There is no associated licensing cost to PostgreSQL. The number of deployments of PostgreSQL is unlimited. A more profitable business model. PostgreSQL is SQL standards compliant. Thus finding professional developers is not very difficult. PostgreSQL is easy to learn and porting code from one database vendor to PostgreSQL is cost efficient. Also, PostgreSQL administrative tasks are easy to automate. Thus, the staffing cost is significantly reduced. PostgreSQL is cross-platform, and it has drivers for all modern programming languages; so, there is no need to change the company policy about the software stack in order to use PostgreSQL. PostgreSQL is scalable and it has a high performance. PostgreSQL is very reliable; it rarely crashes. Also, PostgreSQL is ACID compliant, which means that it can tolerate some hardware failure. In addition to that, it can be configured and installed as a cluster to ensure high availability (HA). User advantages of PostgreSQL PostgreSQL is very attractive for developers, administrators, and architects; it has rich features that enable developers to perform tasks in an agile way. The following are some attractive features for the developer: There is a new release almost each year; until now, starting from Postgres95, there have been 23 major releases. Very good documentation and an active community enable developers to find and solve problems quickly. The PostgreSQL manual is over than 2,500 pages in length. A rich extension repository enables developers to focus on the business logic. Also, it enables developers to meet requirement changes easily. The source code is available free of charge, it can be customized and extended without a huge effort. Rich clients and administrative tools enable developers to perform routine tasks, such as describing database objects, exporting and importing data, and dumping and restoring databases, very quickly. Database administration tasks do not requires a lot of time and can be automated. PostgreSQL can be integrated easily with other database management systems, giving software architecture good flexibility in putting software designs. Applications of PostgreSQL PostgreSQL can be used for a variety of applications. The main PostgreSQL application domains can be classified into two categories: Online transactional processing (OLTP): OLTP is characterized by a large number of CRUD operations, very fast processing of operations, and maintaining data integrity in a multiaccess environment. The performance is measured in the number of transactions per second. Online analytical processing (OLAP): OLAP is characterized by a small number of requests, complex queries that involve data aggregation, and a huge amount of data from different sources, with different formats and data mining and historical data analysis. OLTP is used to model business operations, such as customer relationship management (CRM). OLAP applications are used for business intelligence, decision support, reporting, and planning. An OLTP database size is relatively small compared to an OLAP database. OLTP normally follows the relational model concepts, such as normalization when designing the database, while OLAP is less relational and the schema is often star shaped. Unlike OLTP, the main operation of OLAP is data retrieval. OLAP data is often generated by a process called Extract, Transform and Load (ETL). ETL is used to load data into the OLAP database from different data sources and different formats. PostgreSQL can be used out of the box for OLTP applications. For OLAP, there are many extensions and tools to support it, such as the PostgreSQL COPY command and Foreign Data Wrappers (FDW). Success stories PostgreSQL is used in many application domains, including communication, media, geographical, and e-commerce applications. Many companies provide consultation as well as commercial services, such as migrating proprietary RDBMS to PostgreSQL in order to cut off licensing costs. These companies often influence and enhance PostgreSQL by developing and submitting new features. The following are a few companies that use PostgreSQL: Skype uses PostgreSQL to store user chats and activities. Skype has also affected PostgreSQL by developing many tools called Skytools. Instagram is a social networking service that enables its user to share pictures and photos. Instagram has more than 100 million active users. The American Chemical Society (ACS): More than one terabyte of data for their journal archive is stored using PostgreSQL. In addition to the preceding list of companies, PostgreSQL is used by HP, VMware, and Heroku. PostgreSQL is used by many scientific communities and organizations, such as NASA, due to its extensibility and rich data types. Forks There are more than 20 PostgreSQL forks; PostgreSQL extensible APIs makes postgres a great candidate to fork. Over years, many groups have forked PostgreSQL and contributed their findings to PostgreSQL. The following is a list of popular PostgreSQL forks: HadoopDB is a hybrid between the PostgreSQL, RDBMS, and MapReduce technologies to target analytical workload. Greenplum is a proprietary DBMS that was built on the foundation of PostgreSQL. It utilizes the shared-nothing and massively parallel processing (MPP) architectures. It is used as a data warehouse and for analytical workloads. The EnterpriseDB advanced server is a proprietary DBMS that provides Oracle capabilities to cap Oracle fees. Postgres-XC (eXtensible Cluster) is a multi-master PostgreSQL cluster based on the shared-nothing architecture. It emphasis write-scalability and provides the same APIs to applications that PostgreSQL provides. Vertica is a column-oriented database system, which was started by Michael Stonebraker in 2005 and acquisitioned by HP in 2011. Vertica reused the SQL parser, semantic analyzer, and standard SQL rewrites from the PostgreSQL implementation. Netzza is a popular data warehouse appliance solution that was started as a PostgreSQL fork. Amazon Redshift is a popular data warehouse management system based on PostgreSQL 8.0.2. It is mainly designed for OLAP applications. The PostgreSQL architecture PostgreSQL uses the client/server model; the client and server programs could be on different hosts. The communication between the client and server is normally done via TCP/IP protocols or Linux sockets. PostgreSQL can handle multiple connections from a client. A common PostgreSQL program consists of the following operating system processes: Client process or program (frontend): The database frontend application performs a database action. The frontend could be a web server that wants to display a web page or a command-line tool to perform maintenance tasks. PostgreSQL provides frontend tools, such as psql, createdb, dropdb, and createuser. Server process (backend): The server process manages database files, accepts connections from client applications, and performs actions on behalf of the client; the server process name is postgres. PostgreSQL forks a new process for each new connection; thus, the client and server processes communicate with each other without the intervention of the server main process (postgres), and they have a certain lifetime determined by accepting and terminating a client connection. The abstract architecture of PostgreSQL The aforementioned abstract, conceptual PostgreSQL architecture can give an overview of PostgreSQL's capabilities and interactions with the client as well as the operating system. The PostgreSQL server can be divided roughly into four subsystems as follows: Process manager: The process manager manages client connections, such as the forking and terminating processes. Query processor: When a client sends a query to PostgreSQL, the query is parsed by the parser, and then the traffic cop determines the query type. A Utility query is passed to the utilities subsystem. The Select, insert, update, and delete queries are rewritten by the rewriter, and then an execution plan is generated by the planner; finally, the query is executed, and the result is returned to the client. Utilities: The utilities subsystem provides the means to maintain the database, such as claiming storage, updating statistics, exporting and importing data with a certain format, and logging. Storage manager: The storage manager handles the memory cache, disk buffers, and storage allocation. Almost all PostgreSQL components can be configured, including the logger, planner, statistical analyzer, and storage manager. PostgreSQL configuration is governed by the application nature, such as OLAP and OLTP. The following diagram shows the PostgreSQL abstract, conceptual architecture: PostgreSQL's abstract, conceptual architecture The PostgreSQL community PostgreSQL has a very cooperative, active, and organized community. In the last 8 years, the PostgreSQL community published eight major releases. Announcements are brought to developers via the PostgreSQL weekly newsletter. There are dozens of mailing lists organized into categories, such as users, developers, and associations. Examples of user mailing lists are pgsql-general, psql-doc, and psql-bugs. pgsql-general is a very important mailing list for beginners. All non-bug-related questions about PostgreSQL installation, tuning, basic administration, PostgreSQL features, and general discussions are submitted to this list. The PostgreSQL community runs a blog aggregation service called Planet PostgreSQL—https://planet.postgresql.org/. Several PostgreSQL developers and companies use this service to share their experience and knowledge. Summary PostgreSQL is an open source, object-oriented relational database system. It supports many advanced features and complies with the ANSI-SQL:2008 standard. It has won industry recognition and user appreciation. The PostgreSQL slogan "The world's most advanced open source database" reflects the sophistication of the PostgreSQL features. PostgreSQL is a result of many years of research and collaboration between academia and industry. Companies in their infancy often favor PostgreSQL due to licensing costs. PostgreSQL can aid profitable business models. PostgreSQL is also favoured by many developers because of its capabilities and advantages. Resources for Article: Further resources on this subject: Introducing PostgreSQL 9 [article] PostgreSQL – New Features [article] Installing PostgreSQL [article]

0
0
4653

article-image-deploy-nodejs-apps-aws-code-deploy

Ankit Patial

14 Sep 2015

7 min read

Deploy Node.js Apps with AWS Code Deploy

Ankit Patial

14 Sep 2015

7 min read

As an application developer, you must be familiar with the complexity of deploying apps to a fleet of servers with minimum down time. AWS introduced a new service called AWS Code Deploy to ease out the deployment of applications to an EC2 instance on the AWS cloud. Before explaining the full process, I will assume that you are using AWS VPC and are having all of your EC2 instances inside VPC, and that each instance is having an IAM role. Let's see how we can deploy a Node.js application to AWS. Install AWS Code Deploy Agent The first thing you need to do is to install aws-codedeploy-agent on each machine that you want your code deployed on. Before installing a client, please make sure that you have trust relationship for codedeploy.us-west-2.amazonaws.com and codedeploy.us-east-1.amazonaws.com added in IAM role that EC2 instance is using. Not sure what it is? Then, click on the top left dropdown with your account name in AWS console, select Security Credentials option you will be redirected to a new page, select Roles from left menu and look for IAM role that EC2 instance is using, click it and scroll to them bottom, and you will see Edit Trust Relationship button. Click this button to edit trust relationship, and make sure it looks like the following. ... "Principal": { "Service": [ "ec2.amazonaws.com", "codedeploy.us-west-2.amazonaws.com", "codedeploy.us-east-1.amazonaws.com" ] } ... Ok we are good to install the AWS Code Deploy Agent, so make sure ruby2.0 is installed. Use the following script to install code deploy agent. aws s3 cp s3://aws-codedeploy-us-east-1/latest/install ./install-aws-codedeploy-agent --region us-east-1 chmod +x ./install-aws-codedeploy-agent sudo ./install-aws-codedeploy-agent auto rm install-aws-codedeploy-agent Ok, hopefully nothing will go wrong and agent will be installed up and running. To check if its running or not, try the following command: sudo service codedeploy-agent status Let's move to the next step. Create Code Deploy Application Login to your AWS account. Under Deployment & Management click on Code Deploy link, on next screen click on the Get Started Now button and complete the following things: Choose Custom Deployment and click the Skip Walkthrough button. Create New Application; the following are steps to create an application. – Application Name: display name for application you want to deploy. – Deployment Group Name: this is something similar to environments like LIVE, STAGING and QA. – Add Instances: you can choose Amazon EC2 instances by name group name etc. In case you are using autoscaling feature, you can add that auto scaling group too. – Deployment Config: its a way to specify how we want to deploy application, whether we want to deploy one server at-a-time or half of servers at-a-time or deploy all at-once. – Service Role: Choose the IAM role that has access to S3 bucket that we will use to hold code revisions. – Hit the Create Application button. Ok, we just created a Code Deploy application. Let's hold it here and move to our NodeJs app to get it ready for deployment. Code Revision Ok, you have written your app and you are ready to deploy it. The most important thing your app need is appspec.yml. This file will be used by code deploy agent to perform various steps during the deployment life cycle. In simple words the deployment process includes the following steps: Stop the previous application if already deployed; if its first time then this step will not exist. Update the latest code, such as copy files to the application directory. Install new packages or run DB migrations. Start the application. Check if the application is working. Rollback if something went wrong. All above steps seem easy, but they are time consuming and painful to perform each time. Let's see how we can perform these steps easily with AWS code deploy. Lets say we have a following appspec.yml file in our code and also we have bin folder in an app that contain executable sh scripts to perform certain things that I will explain next. First of all take an example of appspec.yml: version: 0.0 os: linux files: - source: / destination: /home/ec2-user/my-app permissions: - object: / pattern: "**" owner: ec2-user group: ec2-user hooks: ApplicationStop: - location: bin/app-stop timeout: 10 runas: ec2-user AfterInstall: - location: bin/install-pkgs timeout: 1200 runas: ec2-user ApplicationStart: - location: bin/app-start timeout: 60 runas: ec2-user ValidateService: - location: bin/app-validate timeout: 10 runas: ec2-user It's a way to tell Code Deploy to copy and provide a destination of those files. files: - source: / destination: /home/ec2-user/my-app We can specify the permissions to be set for the source file on copy. permissions: - object: / pattern: "**" owner: ec2-user group: ec2-user Hooks are executed in an order during the Code Deploy life cycle. We have ApplicationStop, DownloadBundle, BeforeInstall, Install, AfterInstall, ApplicationStart and ValidateService hooks that all have the same syntax. hooks: deployment-lifecycle-event-name - location: script-location timeout: timeout-in-seconds runas: user-name location is the relative path from code root to script file that you want to execute. timeout is the maximum time a script can run. runas is an os user to run the script, and some time you may want to run a script with diff user privileges. Lets bundle your app, exclude the unwanted files such as node_modules folder, and zip it. I use AWS CLI to deploy my code revisions, but you can install awscli using PPI (Python Package Index). sudo pip install awscli I am using awscli profile that has access to s3 code revision bucket in my account. Here is code sample that can help: aws --profile simsaw-baas deploy push --no-ignore-hidden-files --application-name MY_CODE_DEPLOY_APP_NAME --s3-location s3://MY_CODE_REVISONS/MY_CODE_DEPLOY_APP_NAME/APP_BUILD --source MY_APP_BUILD.zip Now Code Revision is published to s3 and also the same revision is registered with the Code Deploy application with the name MY_CODE_DEPLOY_APP_NAME (it will be name of the application you created earlier in the second step.) Now go back to AWS console, Code Deploy. Deploy Code Revision Select your Code Deploy application from the application list show on the Code Deploy Dashboard. It will take you to the next window where you can see the published revision(s), expand the revision and click on Deploy This Revision. You will be redirected to a new window with options like application and deployment group. Choose them carefully and hit deploy. Wait for magic to happen. Code Deploy has a another option to deploy your app from github. The process for it will be almost the same, except you need not push code revisions to S3. About the author Ankit Patial has a Masters in Computer Applications, and nine years of experience with custom APIs, web and desktop applications using .NET technologies, ROR and NodeJs. As a CTO with SimSaw Inc and Pink Hand Technologies, his job is to learn and help his team to implement the best practices of using Cloud Computing and JavaScript technologies.

0
0
29097

How-To Tutorials

article-image-how-to-run-code-in-the-cloud-with-aws-lambda

Ankit Patial

11 Sep 2015

5 min read

How to Run Code in the Cloud with AWS Lambda

Ankit Patial

11 Sep 2015

5 min read

AWS Lambda is a new compute service introduced by AWS to run a piece of code in response to events. The source of these events can be AWS S3, AWS SNS, AWS Kinesis, AWS Cognito and User Application using AWS-SDK. The idea behind this is to create backend services that are cost effective and highly scaleable. If you believe in the Unix Philosophy and you build your applications as components, then AWS Lambda is a nice feature that you can make use of. Some of Its Benefits Cost-effective: AWS Lambdas are not always executing, they are triggered on certain events and have a maximum execution time of 60 seconds (it's a lots of time to do many operations, but not all). There is zero wastage, and a maximum savings on resources used. No hassle of maintaining infrastructure: Create Lambda and forget. There is no need to worry about scaling infrastructure as load increases. It will be all done automatically by AWS. Integrations with other AWS service: The AWS Lambda function can be triggered in response to various events of other AWS Services. The following are services that can trigger a Lambda: AWS S3 AWS SNS(Publish) AWS Kinesis AWS Cognito Custom call using aws-sdk Creating a Lambda function First, login to your AWS account(create one if you haven't got one). Under Compute Services click on the Lambda option. You will see a screen with a "Get Started Now" button. Click on it, and then you will be on a screen to write your first Lambda function. Choose a name for it that will describe it best. Give it a nice description and move on to the code. We can code it in one of the following two ways: Inline code or Upload a zip file. Inline Code Inline code will be very helpful for writing simple scripts like image editing. The AMI (Amazon Machine Image) that Lambda runs on comes with preinstalled Ghostscript and ImageMagick libraries and NodeJs packages like aws-sdk and imagemagick. Let's create a Lambda that can list install packages on AMI and that runs Lambda. I will name it ls-packages The description will be list installed packages on AMI For code entry, type Edit Code Inline For the code template None, paste the below code in: var cp = require('child_process'); exports.handler = function(event, context) { cp.exec('rpm -qa', function (err, stdout, stderr ) { if (err) { return context.fail(err); } console.log(stdout); context.succeed('Done'); }); }; Handler name handler, this will be the entry point function name. You can change it as you like. Role, select Create new role Basic execution role. You will be prompted to create an IAM role with the required permission i.e. access to create logs. Press "Allow." For the Memory(MB), I am going to keep it low 128 Timeout(s), keep it default 3 Press Create Lambda function You will see your first Lambda created and showing up in Lambda: Function list, select it if it is not already selected, and click on the Actions drop-down. On the top select the Edit/Test option. You will see your Lambda function in edit mode, ignore the left side Sample event section just client Invoke button on the right bottom, wait for a few seconds and you will see nice details in Execution result. The "Execution logs" is where you will find out the list of installed packages on the machine that you can utilize. I wish there was a way to install custom packages, or at least have the latest version running of installed packages. I mean, look at ghostscript-8.70-19.23.amzn1.x86_64. It is an old version published in 2009. Maybe AWS will add such features in the future. I certainly hope so. Upload a zip file You now have created something complicated that is included in multiple code files and NPM packages that are not available on Lambda AMI. No worries, just create a simple NodeJs app, install you packages in write up your code and we are good to deploy it. Few things that need to be take care of are: Zip node_modules folder along with code don't exclude it while zipping your code. Steps will be the same as are of Inline Code online, but one addition is File name. File name will be path to entry file, so if you have lib dir in your code with index.js file then you can mention it as bin/index.js. Monitoring On the Lambda Dashboard you will see a nice graph of various events like Invocation Count, Invocation Duration, Invocation failures and Throttled invocations. You will also view the logs created by Lambda functions in AWS Cloud Watch(Administration & Security) Conclusion AWS Lambda is a unique, and very useful service. It can help us build nice scaleable backends for mobile applications. It can also help you to centralize many components that can be shared across applications that you are running on and off the AWS infrastructure. About the author Ankit Patial has a Masters in Computer Applications, and nine years of experience with custom APIs, web and desktop applications using .NET technologies, ROR and NodeJs. As a CTO with SimSaw Inc and Pink Hand Technologies, his job is to learn and and help his team to implement the best practices of using Cloud Computing and JavaScript technologies.

0
0
32855

Packt

11 Sep 2015

12 min read

Deploying a Zabbix proxy

Packt

11 Sep 2015

12 min read

In this article by Andrea Dalle Vacche, author of the book Mastering Zabbix, Second Edition, you will learn the basics on how to deploy a Zabbix proxy on a Zabbix server. (For more resources related to this topic, see here.) A Zabbix proxy is compiled together with the main server if you add --enable-proxy to the compilation options. The proxy can use any kind of database backend, just as the server does, but if you don't specify an existing DB, it will automatically create a local SQLite database to store its data. If you intend to rely on SQLite, just remember to add --with-sqlite3 to the options as well. When it comes to proxies, it's usually advisable to keep things light and simple as much as we can; of course, this is valid only if the network design permits us to take this decision. A proxy DB will just contain configuration and measurement data that, under normal circumstances, is almost immediately synchronized with the main server. Dedicating a full-blown database to it is usually an overkill, so unless you have very specific requirements, the SQLite option will provide the best balance between performance and ease of management. If you didn't compile the proxy executable the first time you deployed Zabbix, just run configure again with the options you need for the proxies: $ ./configure --enable-proxy --enable-static --with-sqlite3 --with-net-snmp --with-libcurl --with-ssh2 --with-openipmi In order to build the proxy statically, you must have a static version of every external library needed. The configure script doesn't do this kind of check. Compile everything again using the following command: $ make Be aware that this will compile the main server as well; just remember not to run make install, nor copy the new Zabbix server executable over the old one in the destination directory. The only files you need to take and copy over to the proxy machine are the proxy executable and its configuration file. The $PREFIX variable should resolve to the same path you used in the configuration command (/usr/local by default): # cp src/zabbix_proxy/zabbix_proxy $PREFIX/sbin/zabbix_proxy # cp conf/zabbix_proxy.conf $PREFIX/etc/zabbix_proxy.conf Next, you need to fill out relevant information in the proxy's configuration file. The default values should be fine in most cases, but you definitely need to make sure that the following options reflect your requirements and network status: ProxyMode=0 This means that the proxy machine is in an active mode. Remember that you need at least as many Zabbix trappers on the main server as the number of proxies you deploy. Set the value to 1 if you need or prefer a proxy in the passive mode. The following code captures this discussion: Server=n.n.n.n This should be the IP number of the main Zabbix server or of the Zabbix node that this proxy should report to: Hostname=Zabbix proxy This must be a unique, case-sensitive name that will be used in the main Zabbix server's configuration to refer to the proxy: LogFile=/tmp/zabbix_proxy.log LogFileSize=1 DebugLevel=2 If you are using a small, embedded machine, you may not have much disk space to spare. In that case, you may want to comment all the options regarding the log file and let syslog send the proxy's log to another server on the Internet: # DBHost= # DBSchema= # DBUser= # DBPassword= # DBSocket= # DBPort= We need now create the SQLite database; this can be done with the following commands: $ mkdir –p /var/lib/sqlite/ $ sqlite3 /var/lib/sqlite/zabbix.db < /usr/share/doc/zabbix-proxy-sqlite3-2.4.4/create/schema.sql Now, in the DBName parameter, we need to specify the full path to our SQLite database: DBName=/var/lib/sqlite/zabbix.db The proxy will automatically populate and use a local SQLite database. Fill out the relevant information if you are using a dedicated, external database: ProxyOfflineBuffer=1 This is the number of hours that a proxy will keep monitored measurements if communications with the Zabbix server go down. Once the limit has been reached, the proxy will housekeep away the old data. You may want to double or triple it if you know that you have a faulty, unreliable link between the proxy and server. CacheSize=8M This is the size of the configuration cache. Make it bigger if you have a large number of hosts and items to monitor. Zabbix's runtime proxy commands There is a set of commands that you can run against the proxy to change runtime parameters. This set of commands is really useful if your proxy is struggling with items, in the sense that it is taking longer to deliver the items and maintain our Zabbix proxy up and running. You can force the configuration cache to get refreshed from the Zabbix server with the following: $ zabbix_proxy -c /usr/local/etc/zabbix_proxy.conf -R config_cache_reload This command will invalidate the configuration cache on the proxy side and will force the proxy to ask for the current configuration to our Zabbix server. We can also increase or decrease the log level quite easily at runtime with log_level_increase and log_level_decrease: $ zabbix_proxy -c /usr/local/etc/zabbix_proxy.conf –R log_level_increase This command will increase the log level for the proxy process; the same command also supports a target that can be PID, process type or process type, number here. What follow are a few examples. Increase the log level of the three poller process: $ zabbix_proxy -c /usr/local/etc/zabbix_proxy.conf -R log_level_increase=poller,3 Increase the log level of the PID to 27425: $ zabbix_proxy -c /usr/local/etc/zabbix_proxy.conf -R log_level_increase=27425 Increase or decrease the log level of icmp pinger or any other proxy processes with: $ zabbix_proxy -c /usr/local/etc/zabbix_proxy.conf -R log_level_increase="icmp pinger" zabbix_proxy [28064]: command sent successfully $ zabbix_proxy -c /usr/local/etc/zabbix_proxy.conf -R log_level_decrease="icmp pinger" zabbix_proxy [28070]: command sent successfully We can quickly see the changes reflected in the log file here: 28049:20150412:021435.841 log level has been increased to 4 (debug) 28049:20150412:021443.129 Got signal [signal:10(SIGUSR1),sender_pid:28034,sender_uid:501,value_int:770(0x00000302)]. 28049:20150412:021443.129 log level has been decreased to 3 (warning) Deploying a Zabbix proxy using RPMs Deploying a Zabbix proxy using the RPM is a very simple task. Here, there are fewer steps required as Zabbix itself distributes a prepackaged Zabbix proxy that is ready to use. What you need to do is simply add the official Zabbix repository with the following command that must be run from root: $ rpm –ivh http://repo.zabbix.com/zabbix/2.4/rhel/6/x86_64/zabbix-2.4.4-1.el6.x86_64.rpm Now, you can quickly list all the available zabbix-proxy packages with the following command, again from root: $ yum search zabbix-proxy ============== N/S Matched: zabbix-proxy ================ zabbix-proxy.x86_64 : Zabbix Proxy common files zabbix-proxy-mysql.x86_64 : Zabbix proxy compiled to use MySQL zabbix-proxy-pgsql.x86_64 : Zabbix proxy compiled to use PostgreSQL zabbix-proxy-sqlite3.x86_64 : Zabbix proxy compiled to use SQLite3 In this example, the command is followed by the relative output that lists all the available zabbix-proxy packages; here, all you have to do is choose between them and install your desired package: $ yum install zabbix-proxy-sqlite3 Now, you've already installed the Zabbix proxy, which can be started up with the following command: $ service zabbix-proxy start Starting Zabbix proxy: [ OK ] Please also ensure that you enable your Zabbix proxy when the server boots with the $ chkconfig zabbix-proxy on command. That done, if you're using iptables, it is important to add a rule to enable incoming traffic on the 10051 port (that is the standard Zabbix proxy port) or, in any case, against the port that is specified in the configuration file: ListenPort=10051 To do that, you simply need to edit the iptables configuration file /etc/sysconfig/iptables and add the following line right on the head of the file: -A INPUT -m state --state NEW -m tcp -p tcp --dport 10051 -j ACCEPT Then, you need to restart your local firewall from root using the following command: $ service iptables restart The log file is generated at /var/log/zabbix/zabbix_proxy.log: $ tail -n 40 /var/log/zabbix/zabbix_proxy.log 62521:20150411:003816.801 **** Enabled features **** 62521:20150411:003816.801 SNMP monitoring: YES 62521:20150411:003816.801 IPMI monitoring: YES 62521:20150411:003816.801 WEB monitoring: YES 62521:20150411:003816.801 VMware monitoring: YES 62521:20150411:003816.801 ODBC: YES 62521:20150411:003816.801 SSH2 support: YES 62521:20150411:003816.801 IPv6 support: YES 62521:20150411:003816.801 ************************** 62521:20150411:003816.801 using configuration file: /etc/zabbix/zabbix_proxy.conf As you can quickly spot, the default configuration file is located at /etc/zabbix/zabbix_proxy.conf. The only thing that you need to do is make the proxy known to the server and add monitoring objects to it. All these tasks are performed through the Zabbix frontend by just clicking on Admin | Proxies and then Create. This is shown in the following screenshot: Please take care to use the same Proxy name that you've used in the configuration file, which, in this case, is ZabbixProxy; you can quickly check with: $ grep Hostname= /etc/zabbix/zabbix_proxy.conf # Hostname= Hostname=ZabbixProxy Note how, in the case of an Active proxy, you just need to specify the proxy's name as already set in zabbix_proxy.conf. It will be the proxy's job to contact the main server. On the other hand, a Passive proxy will need an IP address or a hostname for the main server to connect to, as shown in the following screenshot: You don't have to assign hosts to proxies at creation time or only in the proxy's edit screen. You can also do that from a host configuration screen, as follows: One of the advantages of proxies is that they don't need much configuration or maintenance; once they are deployed and you have assigned some hosts to one of them, the rest of the monitoring activities are fairly transparent. Just remember to check the number of values per second that every proxy has to guarantee as expressed by the Required performance column in the proxies' list page: Values per second (VPS) is the number of measurements per second that a single Zabbix server or proxy has to collect. It's an average value that depends on the number of items and the polling frequency for every item. The higher the value, the more powerful the Zabbix machine must be. Depending on your hardware configuration, you may need to redistribute the hosts among proxies or add new ones if you notice degraded performances coupled with high VPS. Considering a different Zabbix proxy database Nowadays, from Zabbix 2.4 the support for nodes has been discontinued, and the only distributed scenario available is limited to the Zabbix proxy; those proxies now play a truly critical role. Also, with proxies deployed in many different geographic locations, the infrastructure is more subject to network outages. That said, there is a case to consider which database we want to use for those critical remote proxies. Now SQLite3 is a good product as a standalone and lightweight setup, but if, in our scenario, the proxy we've deployed needs to retain a considerable amount of metrics, we need to consider the fact that SQLite3 has certain weak spots: The atomic-locking mechanism on SQLite3 is not the most robust ever SQLite3 suffers during high-volume writes SQLite3 does not implement any kind of user authentication mechanism Apart from the point that SQLite3 does not implement any kind of authentication mechanism, the database files are created with the standard unmask, due to which, they are readable by everyone, In the event of a crash during high load it is not the best database to use. Here is an example of the sqlite3 database and how to access it using a third-party account: $ ls -la /tmp/zabbix_proxy.db -rw-r--r--. 1 zabbix zabbix 867328 Apr 12 09:52 /tmp/zabbix_proxy.db ]# su - adv [adv@localhost ~]$ sqlite3 /tmp/zabbix_proxy.db SQLite version 3.6.20 Enter ".help" for instructions Enter SQL statements terminated with a ";" sqlite> Then, for all the critical proxies, it is advisable to use a different database. Here, we will use MySQL, which is a well-known database. To install the Zabbix proxy with MySQL, if you're compiling it from source, you need to use the following command line: $ ./configure --enable-proxy --enable-static --with-mysql --with-net-snmp --with-libcurl --with-ssh2 --with-openipmi This should be followed by the usual: $ make Instead, if you're using the precompiled rpm, you can simply run from root: $ yum install zabbix-proxy-mysql Now, you need to start up your MySQL database and create the required database for your proxy: $ mysql -uroot -p<password> $ create database zabbix_proxy character set utf8 collate utf8_bin; $ grant all privileges on zabbix_proxy.* to zabbix@localhost identified by '<password>'; $ quit; $ mysql -uzabbix -p<password> zabbix_proxy < database/mysql/schema.sql If you've installed using rpm, the previous command will be: $ mysql -uzabbix -p<password> zabbix_proxy < /usr/share/doc/zabbix-proxy-mysql-2.4.4/create/schema.sql/schema.sql Now, we need to configure zabbix_proxy.conf and add the proper value to those parameters: DBName=zabbix_proxy DBUser=zabbix DBPassword=<password> Please note that there is no need to specify DBHost as the socket used for MySQL. Finally, we can start up our Zabbix proxy with the following command from root: $ service zabbix-proxy start Starting Zabbix proxy: [ OK ] Summary In this article, you learned how to start up a Zabbix proxy over a Zabbix server. Resources for Article: Further resources on this subject: Zabbix Configuration[article] Bar Reports in Zabbix 1.8[article] Going beyond Zabbix agents [article]

0
0
32058

article-image-creating-slash-commands-slack-using-bottle

Ellison Leao

10 Sep 2015

4 min read

Creating slash commands for Slack using Bottle

Ellison Leao

10 Sep 2015

4 min read

In this post I will show you how to make a custom slack command for your organizational chat using Python's microframework Bottle. This post is not a Bottle tutorial and I will assume that you have at least a basic amount of Python knowledge. If you want to learn more about Python, click here. For learning about Bottle, click here. We will deploy our app on Heroku, so you will need git installed as well. On our application, we will create a simple "Hello World!" command to be outputted on slack when typing the /hello command. Installing and Creating the Application We will need to install Bottle inside a Python virtualenv. Make sure you have virtualenvwrapper installed and configured on your system. After the virtualenvwrapper install, create a new virtualenv called slash by typing the following: mkvirtualenv slash After that, install Bottle project using python's pip command: pip install bottle The choice for Bottle is that you can create web applications with a few lines of code. You can use another web framework if you want, like Flask, web.py, web2py or even Django. Now, moving to the app. First let's create its structure. mkdir myslash touch myslash/app.py Open your favorite editor, and add the following lines to the app.py file. We will explain step by step how they work and what are they doing. #!/usr/bin/env python # encoding: utf-8 from bottle import run, post @post('/hello') def hello(): return'Hello World!' if__name__ == '__main__': run(host='0.0.0.0', port=5000) Explaining what this code does: from bottle import run, post` Here, we import the necessary methods we will need for our app. run method, and will create a web server that will run our application. post method is a Python decorator that will create a POST route that will be used for outputting the "Hello world!" message. @post('/hello') def hello(): return'Hello World!' This is our app's main method. You can see the post decorator creating a /hello route, which will be handled by the hello() method. if__name__ == '__main__': run(host='0.0.0.0', port=5000) The run method will be called when we run the python app.py command. For the host we need to listen on all addresses, which is why we add 0.0.0.0 as the param. You can change the port param if you want, but the default is 5000. Now open another terminal on the app folder and type: python app.py To test if the app is running okay, use the cURL command to make a POST test request curl -X POST localhost:5000/hello You should see the Hello World! message printed out. Deploying If you don't have a Heroku account yet, please go to https://signup.heroku.com/www-header. After that, go to https://dashboard.heroku.com/new to create a new application. Type your favorite app name and click on Create App. We will need to create a Procfile so the app could run on Heroku side. Create a file called Procfile on your app's main directory and add the following: web: python app.py Now, on the app's main directory, create a git repository and send the files to the new application you just created. Heroku will know this is a python app and will make the proper configuration to run it. git init git remote add heroku git@heroku.com:YOURAPPNAME.git git push heroku master Make sure your public key is configured on your account's SSH Keys (https://dashboard.heroku.com/account). If everything went well you should see the app running on YOURAPPNAME.herokuapp.com Configuring Slack Now to the Slack part. We will need to add a custom slash command on our organization settings. Go to https://YOURORGNAME.slack.com/services/new/slash-commands and on the Choose your command input, type hello. For the configurations we will have: Command: /hello URL: http://YOURAPPNAME.herokuapp.com/hello (Important: WITHOUT TRAILING SLASH!) Method: POST Check Show this command in the autocomplete list and add a Description and usage hint Click in Save integration Testing Go to your slack org chat and type /hello on any chat. You should see the "Hello world!" message printed out. And that's it! You can see the app code here. If you have any questions or suggestions you can reach me out on twitter @ellisonleao. About The Author Ellison Leao is a passionate software engineer with more than 6 years of experience in web projects and a contributor to the MelonJS framework and other open source projects. When he is not writing games, he loves to play drums.

0
0
7622

How-To Tutorials

article-image-introduction-spring-web-application-no-time

Packt

10 Sep 2015

8 min read

Introduction to Spring Web Application in No Time

Packt

10 Sep 2015

8 min read

Many official Spring tutorials have both a Gradle build and a Maven build, so you will find examples easily if you decide to stick with Maven. Spring 4 is fully compatible with Java 8, so it would be a shame not to take advantage of lambdas to simplify our code base. In this article by Geoffroy Warin, author of the book Mastering Spring MVC 4, we will see some Git commands. It's a good idea to keep track of your progress and commit when you are in a stable state. (For more resources related to this topic, see here.) Getting started with Spring Tool Suite One of the best ways to get started with Spring and discover the numerous tutorials and starter projects that the Spring community offers is to download Spring Tool Suite (STS). STS is a custom version of eclipse designed to work with various Spring projects, as well as Groovy and Gradle. Even if, like me, you have another IDE that you would rather work with, we recommend that you give STS a shot because it gives you the opportunity to explore Spring's vast ecosystem in a matter of minutes with the "Getting Started" projects. So, let's visit https://Spring.io/tools/sts/all and download the latest release of STS. Before we generate our first Spring Boot project we will need to install the Gradle support for STS. You can find a Manage IDE Extensions button on the dashboard. You will then need to download the Gradle Support software in the Language and framework tooling section. Its recommend installing the Groovy Eclipse plugin along with the Groovy 2.4 compiler, as shown in the following screenshot. These will be needed later in this article when we set up acceptance tests with geb: We now have two main options to get started. The first option is to navigate to File | New | Spring Starter Project, as shown in the following screenshot. This will give you the same options as http://start.Spring.io, embedded in your IDE: The second way is to navigate to File | New | Import Getting Started Content. This will give you access to all the tutorials available on Spring.io. You will have the choice of working with either Gradle or Maven, as shown in the following screenshot: You can also check out the starter code to follow along with the tutorial, or get the complete code directly. There is a lot of very interesting content available in the Getting Started Content. It will demonstrate the integration of Spring with various technologies that you might be interested in. For the moment, we will generate a web project as shown in the preceding image. It will be a Gradle application, producing a JAR file and using Java 8. Here is the configuration we want to use: Property Value Name masterSpringMvc Type Gradle project Packaging Jar Java version 1.8 Language Java Group masterSpringMvc Artifact masterSpringMvc Version 0.0.1-SNAPSHOT Description Be creative! Package masterSpringMvc On the second screen you will be asked for the Spring Boot version you want to use and the the dependencies that should be added to the project. At the time of writing this, the latest version of Spring boot was 1.2.5. Ensure that you always check out the latest release. The latest snapshot version of Spring boot will also be available by the time you read this. If Spring boot 1.3 isn't released by then, you can probably give it a shot. One of its big features is the awesome devs tools. Refer to https://spring.io/blog/2015/06/17/devtools-in-spring-boot-1-3 for more details. At the bottom the configuration window you will see a number of checkboxes representing the various boot starter libraries. These are dependencies that can be appended to your build file. They provide autoconfigurations for various Spring projects. We are only interested in Spring MVC for the moment, so we will check only the Web checkbox. A JAR for a web application? Some of you might find it odd to package your web application as a JAR file. While it is still possible to use WAR files for packaging, it is not always the recommended practice. By default, Spring boot will create a fat JAR, which will include all the application's dependencies and provide a convenient way to start a web server using Java -jar. Our application will be packaged as a JAR file. If you want to create a war file, refer to http://spring.io/guides/gs/convert-jar-to-war/. Have you clicked on Finish yet? If you have, you should get the following project structure: We can see our main class MasterSpringMvcApplication and its test suite MasterSpringMvcApplicationTests. There are also two empty folders, static and templates, where we will put our static web assets (images, styles, and so on) and obviously our templates (jsp, freemarker, Thymeleaf). The last file is an empty application.properties file, which is the default Spring boot configuration file. It's a very handy file and we'll see how Spring boot uses it throughout this article. The last is build.gradle file, the build file that we will detail in a moment. If you feel ready to go, run the main method of the application. This will launch a web server for us. To do this, go to the main method of the application and navigate to Run as | Spring Application in the toolbar either by right-clicking on the class or clicking on the green play button in the toolbar. Doing so and navigating to http://localhost:8080 will produce an error. Don't worry, and read on. Now we will show you how to generate the same project without STS, and we will come back to all these files. Getting started with IntelliJ IntelliJ IDEA is a very popular tool among Java developers. For the past few years I've been very pleased to pay Jetbrains a yearly fee for this awesome editor. IntelliJ also has a way of creating Spring boot projects very quickly. Go to the new project menu and select the Spring Initializr project type: This will give us exactly the same options as STS. You will need to import the Gradle project into IntelliJ. we recommend generating the Gradle wrapper first (refer to the following Gradle build section). If needed, you can reimport the project by opening its build.gradle file again. Getting started with start.Spring.io Go to http://start.Spring.io to get started with start.Spring.io. The system behind this remarkable Bootstrap-like website should be familiar to you! You will see the following screenshot when you go to the previously mentioned link: Indeed, the same options available with STS can be found here. Clicking on Generate Project will download a ZIP file containing our starter project. Getting started with the command line For those of you who are addicted to the console, it is possible to curl http://start.Spring.io. Doing so will display instructions on how to structure your curl request. For instance, to generate the same project as earlier, you can issue the following command: $ curl http://start.Spring.io/starter.tgz -d name=masterSpringMvc -d dependencies=web -d language=java -d JavaVersion=1.8 -d type=gradle-project -d packageName=masterSpringMvc -d packaging=jar -d baseDir=app | tar -xzvf - % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 1255 100 1119 100 136 1014 123 0:00:01 0:00:01 --:--:-- 1015 x app/ x app/src/ x app/src/main/ x app/src/main/Java/ x app/src/main/Java/com/ x app/src/main/Java/com/geowarin/ x app/src/main/resources/ x app/src/main/resources/static/ x app/src/main/resources/templates/ x app/src/test/ x app/src/test/Java/ x app/src/test/Java/com/ x app/src/test/Java/com/geowarin/ x app/build.Gradle x app/src/main/Java/com/geowarin/AppApplication.Java x app/src/main/resources/application.properties x app/src/test/Java/com/geowarin/AppApplicationTests.Java And viola! You are now ready to get started with Spring without leaving the console, a dream come true. You might consider creating an alias with the previous command, it will help you prototype the Spring application very quickly. Summary In this article, we leveraged Spring Boot's autoconfiguration capabilities to build an application with zero boilerplate or configuration files. We configured Spring Boot tool suite, IntelliJ,and start.spring.io and how to configure it! Resources for Article: Further resources on this subject: Welcome to the Spring Framework[article] Mailing with Spring Mail[article] Creating a Spring Application [article]

0
0
2433

How-To Tutorials

Packt

09 Sep 2015

22 min read

Sabermetrics with Apache Spark

Packt

09 Sep 2015

22 min read

In this article by Rindra Ramamonjison, the author of the book called Apache Spark Graph Processing, we will gain useful insights that are required to quickly process big data, and handle its complexities. It is not the secret analytics that have made a big impact in sports. The quest for an objective understanding of the game has a name even—"sabermetrics". Analytics has proven invaluable in many aspects, from building dream teams under tight cap constraints, to selecting game-specific strategies, to actively engaging with fans, and so on. In the following sections, we will analyze NCAA Men's college basketball game stats, gathered during a single season. As sports data experts, we are going to leverage Spark's graph processing library to answer several questions for retrospection. Apache Spark is a fast, general-purpose technology, which greatly simplifies the parallel processing of large data that is distributed over a computing cluster. While Spark handles different types of processing, here, we will focus on its graph-processing capability. In particular, our goal is to expose the powerful yet generic graph-aggregation operator of Spark—aggregateMessages. We can think of this operator as a version of MapReduce for aggregating the neighborhood information in graphs. In fact, many graph-processing algorithms, such as PageRank rely on iteratively accessing the properties of neighboring vertices and adjacent edges. By applying aggregateMessages on the NCAA College Basketball datasets, we will: Identify the basic mechanisms and understand the patterns for using aggregateMessages Apply aggregateMessages to create custom graph aggregation operations Optimize the performance and efficiency of aggregateMessages (For more resources related to this topic, see here.) NCAA College Basketball datasets As an illustrative example, the NCAA College Basketball datasets consist of two CSV datasets. This first one called teams.csv contains the list of all the college teams that played in NCAA Division I competition. Each team is associated with a 4-digit ID number. The second dataset called stats.csv contains the score and statistics of every game played during the 2014-2015 regular season. Loading team data into RDDs To start with, we parse and load these datasets into RDDs (Resilient Distributed Datasets), which are the core Spark abstraction for any data that is distributed and stored over a cluster. First, we create a class called GameStats that records a team's statistics during a game: case class GameStats( val score: Int, val fieldGoalMade: Int, val fieldGoalAttempt: Int, val threePointerMade: Int, val threePointerAttempt: Int, val threeThrowsMade: Int, val threeThrowsAttempt: Int, val offensiveRebound: Int, val defensiveRebound: Int, val assist: Int, val turnOver: Int, val steal: Int, val block: Int, val personalFoul: Int ) Loading game stats into RDDs We also add the following methods to GameStats in order to know how efficient a team's offense was: // Field Goal percentage def fgPercent: Double = 100.0 * fieldGoalMade / fieldGoalAttempt // Three Point percentage def tpPercent: Double = 100.0 * threePointerMade / threePointerAttempt // Free throws percentage def ftPercent: Double = 100.0 * threeThrowsMade / threeThrowsAttempt override def toString: String = "Score: " + score Next, we create a couple of classes for the games' result: abstract class GameResult( val season: Int, val day: Int, val loc: String ) case class FullResult( override val season: Int, override val day: Int, override val loc: String, val winnerStats: GameStats, val loserStats: GameStats ) extends GameResult(season, day, loc) FullResult has the year and day of the season, the location where the game was played, and the game statistics of both the winning and losing teams. Next, we will create a statistics graph of the regular seasons. In this graph, the nodes are the teams, whereas each edge corresponds to a specific game. To create the graph, let's parse the CSV file called teams.csv into the RDD teams: val teams: RDD[(VertexId, String)] = sc.textFile("./data/teams.csv"). filter(! _.startsWith("#")). map {line => val row = line split ',' (row(0).toInt, row(1)) } We can check the first few teams in this new RDD: scala> teams.take(3).foreach{println} (1101,Abilene Chr) (1102,Air Force) (1103,Akron) We do the same thing to obtain an RDD of the game results, which will have a type called RDD[Edge[FullResult]]. We just parse stats.csv, and record the fields that we need: The ID of the winning team The ID of the losing team The game statistics of both the teams val detailedStats: RDD[Edge[FullResult]] = sc.textFile("./data/stats.csv"). filter(! _.startsWith("#")). map {line => val row = line split ',' Edge(row(2).toInt, row(4).toInt, FullResult( row(0).toInt, row(1).toInt, row(6), GameStats( score = row(3).toInt, fieldGoalMade = row(8).toInt, fieldGoalAttempt = row(9).toInt, threePointerMade = row(10).toInt, threePointerAttempt = row(11).toInt, threeThrowsMade = row(12).toInt, threeThrowsAttempt = row(13).toInt, offensiveRebound = row(14).toInt, defensiveRebound = row(15).toInt, assist = row(16).toInt, turnOver = row(17).toInt, steal = row(18).toInt, block = row(19).toInt, personalFoul = row(20).toInt ), GameStats( score = row(5).toInt, fieldGoalMade = row(21).toInt, fieldGoalAttempt = row(22).toInt, threePointerMade = row(23).toInt, threePointerAttempt = row(24).toInt, threeThrowsMade = row(25).toInt, threeThrowsAttempt = row(26).toInt, offensiveRebound = row(27).toInt, defensiveRebound = row(28).toInt, assist = row(20).toInt, turnOver = row(30).toInt, steal = row(31).toInt, block = row(32).toInt, personalFoul = row(33).toInt ) ) ) } We can avoid typing all this by using the nice spark-csv package that reads CSV files into SchemaRDD. Let's check what we got: scala> detailedStats.take(3).foreach(println) Edge(1165,1384,FullResult(2006,8,N,Score: 75-54)) Edge(1393,1126,FullResult(2006,8,H,Score: 68-37)) Edge(1107,1324,FullResult(2006,9,N,Score: 90-73)) We then create our score graph using the collection of teams (of the type called RDD[(VertexId, String)]) as vertices, and the collection called detailedStats (of the type called RDD[(VertexId, String)]) as edges: scala> val scoreGraph = Graph(teams, detailedStats) For curiosity, let's see which team has won against the 2015 NCAA national champ Duke during the regular season. It seems Duke has lost only four games during the regular season: scala> scoreGraph.triplets.filter(_.dstAttr == "Duke").foreach(println)((1274,Miami FL),(1181,Duke),FullResult(2015,71,A,Score: 90-74)) ((1301,NC State),(1181,Duke),FullResult(2015,69,H,Score: 87-75)) ((1323,Notre Dame),(1181,Duke),FullResult(2015,86,H,Score: 77-73)) ((1323,Notre Dame),(1181,Duke),FullResult(2015,130,N,Score: 74-64)) Aggregating game stats After we have our graph ready, let's start aggregating the stats data in scoreGraph. In Spark, aggregateMessages is the operator for such a kind of jobs. For example, let's find out the average field goals made per game by the winners. In other words, the games that a team has lost will not be counted. To get the average for each team, we first need to have the number of games won by the team, and the total field goals that the team made in these games: // Aggregate the total field goals made by winning teams type Msg = (Int, Int) type Context = EdgeContext[String, FullResult, Msg] val winningFieldGoalMade: VertexRDD[Msg] = scoreGraph aggregateMessages( // sendMsg (ec: Context) => ec.sendToSrc(1, ec.attr.winnerStats.fieldGoalMade), // mergeMsg (x: Msg, y: Msg) => (x._1 + y._1, x._2+ y._2) ) The aggregateMessage operator There is a lot going on in the previous call to aggregateMessages. So, let's see it working in slow motion. When we called aggregateMessages on the scoreGraph, we had to pass two functions as arguments. SendMsg The first function has a signature called EdgeContext[VD, ED, Msg] => Unit. It takes an EdgeContext as input. Since it does not return anything, its return type is Unit. This function is needed for sending message between the nodes. Okay, but what is the EdgeContext type? EdgeContext represents an edge along with its neighboring nodes. It can access both the edge attribute, and the source and destination nodes' attributes. In addition, EdgeContext has two methods to send messages along the edge to its source node, or to its destination node. These methods are called sendToSrc and sendToDst respectively. Then, the type of messages being sent through the graph is defined by Msg. Similar to vertex and edge types, we can define the concrete type that Msg takes as we wish. Merge In addition to sendMsg, the second function that we need to pass to aggregateMessages is a mergeMsg function with the (Msg, Msg) => Msg signature. As its name implies, mergeMsg is used to merge two messages, received at each node into a new one. Its output must also be of the Msg type. Using these two functions, aggregateMessages returns the aggregated messages inside VertexRDD[Msg]. Example In our example, we need to aggregate the number of games played and the number of field goals made. Therefore, Msg is simply a pair of Int. Furthermore, each edge context needs to send a message to only its source node, that is, the winning team. This is because we want to compute the total field goals made by each team for only the games that it has won. The actual message sent to each "winner" node is the pair of integers (1, ec.attr.winnerStats.fieldGoalMade). Here, 1 serves as a counter for the number of games won by the source node. The second integer, which is the number of field goals in one game, is extracted from the edge attribute. As we set out to compute the average field goals per winning game for all teams, we need to apply the mapValues operator to the output of aggregateMessages, which is as follows: // Average field goals made per Game by the winning teams val avgWinningFieldGoalMade: VertexRDD[Double] = winningFieldGoalMade mapValues ( (id: VertexId, x: Msg) => x match { case (count: Int, total: Int) => total.toDouble/count }) Here is the output: scala> avgWinningFieldGoalMade.take(5).foreach(println) (1260,24.71641791044776) (1410,23.56578947368421) (1426,26.239436619718308) (1166,26.137614678899084) (1434,25.34285714285714) Abstracting out the aggregation This was kind of cool! We can surely do the same thing for the average points per game scored by the winning teams: // Aggregate the points scored by winning teams val winnerTotalPoints: VertexRDD[(Int, Int)] = scoreGraph.aggregateMessages( // sendMsg triplet => triplet.sendToSrc(1, triplet.attr.winnerStats.score), // mergeMsg (x, y) => (x._1 + y._1, x._2+ y._2) ) // Average field goals made per Game by winning teams var winnersPPG: VertexRDD[Double] = winnerTotalPoints mapValues ( (id: VertexId, x: (Int, Int)) => x match { case (count: Int, total: Int) => total.toDouble/count }) Let's check the output: scala> winnersPPG.take(5).foreach(println) (1260,71.19402985074628) (1410,71.11842105263158) (1426,76.30281690140845) (1166,76.89449541284404) (1434,74.28571428571429) What if the coach wants to know the top five teams with the highest average three pointers made per winning game? By the way, he might also ask about the teams that are the most efficient in three pointers. Keeping things DRY We can copy and modify the previous code, but that would be quite repetitive. Instead, let's abstract out the average aggregation operator so that it can work on any statistics that the coach needs. Luckily, Scala's higher-order functions are there to help in this task. Let's define the functions that take a team's GameStats as an input, and return specific statistic that we are interested in. For now, we will need the number of three pointer made, and the average three pointer percentage: // Getting individual stats def threePointMade(stats: GameStats) = stats.threePointerMade def threePointPercent(stats: GameStats) = stats.tpPercent Then, we create a generic function that takes as an input a stats graph, and one of the functions defined previously, which has a signature called GameStats => Double: // Generic function for stats averaging def averageWinnerStat(graph: Graph[String, FullResult])(getStat: GameStats => Double): VertexRDD[Double] = { type Msg = (Int, Double) val winningScore: VertexRDD[Msg] = graph.aggregateMessages[Msg]( // sendMsg triplet => triplet.sendToSrc(1, getStat(triplet.attr.winnerStats)), // mergeMsg (x, y) => (x._1 + y._1, x._2+ y._2) ) winningScore mapValues ( (id: VertexId, x: Msg) => x match { case (count: Int, total: Double) => total/count }) } Now, we can get the average stats by passing the threePointMade and threePointPercent to averageWinnerStat functions: val winnersThreePointMade = averageWinnerStat(scoreGraph)(threePointMade) val winnersThreePointPercent = averageWinnerStat(scoreGraph)(threePointPercent) With little efforts, we can tell the coach which five winning teams score the highest number of threes per game: scala> winnersThreePointMade.sortBy(_._2,false).take(5).foreach(println) (1440,11.274336283185841) (1125,9.521929824561404) (1407,9.008849557522124) (1172,8.967441860465117) (1248,8.915384615384616) While we are at it, let's find out the five most efficient teams in three pointers: scala> winnersThreePointPercent.sortBy(_._2,false).take(5).foreach(println) (1101,46.90555728464225) (1147,44.224282479431224) (1294,43.754532434101534) (1339,43.52308905887638) (1176,43.080814169045105) Interestingly, the teams that made the most three pointers per winning game are not always the one who are the most efficient ones at it. But it is okay because at least they have won these games. Coach wants more numbers The coach seems to argue against this argument. He asks us to get the same statistics, but he wants the average over all the games that each team has played. We then have to aggregate the information at all the nodes, and not only at the destination nodes. To make our previous abstraction more flexible, let's create the following types: trait Teams case class Winners extends Teams case class Losers extends Teams case class AllTeams extends Teams We modify the previous higher-order function to have an extra argument called Teams, which will help us specify those nodes where we want to collect and aggregate the required game stats. The new function becomes as the following: def averageStat(graph: Graph[String, FullResult])(getStat: GameStats => Double, tms: Teams): VertexRDD[Double] = { type Msg = (Int, Double) val aggrStats: VertexRDD[Msg] = graph.aggregateMessages[Msg]( // sendMsg tms match { case _ : Winners => t => t.sendToSrc((1, getStat(t.attr.winnerStats))) case _ : Losers => t => t.sendToDst((1, getStat(t.attr.loserStats))) case _ => t => { t.sendToSrc((1, getStat(t.attr.winnerStats))) t.sendToDst((1, getStat(t.attr.loserStats))) } } , // mergeMsg (x, y) => (x._1 + y._1, x._2+ y._2) ) aggrStats mapValues ( (id: VertexId, x: Msg) => x match { case (count: Int, total: Double) => total/count }) } Now, aggregateStat allows us to choose if we want to aggregate the stats for winners only, for losers only, or for the all teams. Since the coach wants the overall stats averaged over all the games played, we aggregate the stats by passing the AllTeams() flag in aggregateStat. In this case, we define the sendMsg argument in aggregateMessages to send the required stats to both source (the winner) and destination (the loser) using the EdgeContext class's sendToSrc and sendToDst functions respectively. This mechanism is pretty straightforward. We just need to make sure that we send the right information to the right node. In this case, we send winnerStats to the winner, and loserStatsto the loser. Okay, you get the idea now. So, let's apply it to please our coach. Here are the teams with the overall highest three pointers per page: // Average Three Point Made Per Game for All Teams val allThreePointMade = averageStat(scoreGraph)(threePointMade, AllTeams()) scala> allThreePointMade.sortBy(_._2, false).take(5).foreach(println) (1440,10.180811808118081) (1125,9.098412698412698) (1172,8.575657894736842) (1184,8.428571428571429) (1407,8.411149825783973) And here are the five most efficient teams overall in three pointers per game: // Average Three Point Percent for All Teams val allThreePointPercent = averageStat(scoreGraph)(threePointPercent, AllTeams()) Let's check the output: scala> allThreePointPercent.sortBy(_._2,false).take(5).foreach(println) (1429,38.8351815824302) (1323,38.522819895594) (1181,38.43052051444854) (1294,38.41227053353959) (1101,38.097896464168954) Actually, there is only a 2 percent difference between the most efficient team and the one in the fiftieth position. Most NCAA teams are therefore pretty efficient behind the line. I bet coach knew this already! Average points per game We can also reuse the averageStat function to get the average points per game for the winners. In particular, let's take a look at the two teams that won games with the highest and lowest scores: // Winning teams val winnerAvgPPG = averageStat(scoreGraph)(score, Winners()) Let's check the output: scala> winnerAvgPPG.max()(Ordering.by(_._2)) res36: (org.apache.spark.graphx.VertexId, Double) = (1322,90.73333333333333) scala> winnerAvgPPG.min()(Ordering.by(_._2)) res39: (org.apache.spark.graphx.VertexId, Double) = (1197,60.5) Apparently, the most defensive team can win game by scoring only 60 points, whereas the most offensive team can score an average of 90 points. Next, let's average the points per game for all games played and look at the two teams with the best and worst offense during the 2015 season: // Average Points Per Game of All Teams val allAvgPPG = averageStat(scoreGraph)(score, AllTeams()) Let's see the output: scala> allAvgPPG.max()(Ordering.by(_._2)) res42: (org.apache.spark.graphx.VertexId, Double) = (1322,83.81481481481481) scala> allAvgPPG.min()(Ordering.by(_._2)) res43: (org.apache.spark.graphx.VertexId, Double) = (1212,51.111111111111114) To no one's surprise, the best offensive team is the same as the one who scores the most in winning games. To win the games, 50 points are not enough in an average for a team to win the games. Defense stats – the D matters as in direction Previously, we obtained some statistics such as field goals or a three-point percentage that a team achieves. What if we want to aggregate instead the average points or rebounds that each team concedes to their opponents? To compute this, we define a new higher-order function called averageConcededStat. Compared to averageStat, this function needs to send loserStats to the winning team, and the winnerStats function to the losing team. To make things more interesting, we are going to make the team name as a part of the message Msg: def averageConcededStat(graph: Graph[String, FullResult])(getStat: GameStats => Double, rxs: Teams): VertexRDD[(String, Double)] = { type Msg = (Int, Double, String) val aggrStats: VertexRDD[Msg] = graph.aggregateMessages[Msg]( // sendMsg rxs match { case _ : Winners => t => t.sendToSrc((1, getStat(t.attr.loserStats), t.srcAttr)) case _ : Losers => t => t.sendToDst((1, getStat(t.attr.winnerStats), t.dstAttr)) case _ => t => { t.sendToSrc((1, getStat(t.attr.loserStats),t.srcAttr)) t.sendToDst((1, getStat(t.attr.winnerStats),t.dstAttr)) } } , // mergeMsg (x, y) => (x._1 + y._1, x._2+ y._2, x._3) ) aggrStats mapValues ( (id: VertexId, x: Msg) => x match { case (count: Int, total: Double, name: String) => (name, total/count) }) } With this, we can calculate the average points conceded by the winning and losing teams as follows: val winnersAvgConcededPoints = averageConcededStat(scoreGraph)(score, Winners()) val losersAvgConcededPoints = averageConcededStat(scoreGraph)(score, Losers()) Let's check the output: scala> losersAvgConcededPoints.min()(Ordering.by(_._2)) res: (VertexId, (String, Double)) = (1101,(Abilene Chr,74.04761904761905)) scala> winnersAvgConcededPoints.min()(Ordering.by(_._2)) res: (org.apache.spark.graphx.VertexId, (String, Double)) = (1101,(Abilene Chr,74.04761904761905)) scala> losersAvgConcededPoints.max()(Ordering.by(_._2)) res: (VertexId, (String, Double)) = (1464,(Youngstown St,78.85714285714286)) scala> winnersAvgConcededPoints.max()(Ordering.by(_._2)) res: (VertexId, (String, Double)) = (1464,(Youngstown St,71.125)) The previous tells us that Abilene Christian University is the most defensive team. They concede the least points whether they win a game or not. On the other hand, Youngstown has the worst defense. Joining aggregated stats into graphs The previous example shows us how flexible the aggregateMessages operator is. We can define the Msg type of the messages to be aggregated to fit our needs. Moreover, we can select which nodes receive the messages. Finally, we can also define how we want to merge the messages. As a final example, let's aggregate many statistics about each team, and join this information into the nodes of the graph. To start, we create its own class for the team stats: // Average Stats of All Teams case class TeamStat( wins: Int = 0 // Number of wins ,losses: Int = 0 // Number of losses ,ppg: Int = 0 // Points per game ,pcg: Int = 0 // Points conceded per game ,fgp: Double = 0 // Field goal percentage ,tpp: Double = 0 // Three point percentage ,ftp: Double = 0 // Free Throw percentage ){ override def toString = wins + "-" + losses } Then, we collect the average stats for all teams using aggregateMessages in the following. For this, we define the type of the message to be an 8-element tuple that holds the counter for games played, wins, losses, and other statistics that will be stored in TeamStat as listed previously: type Msg = (Int, Int, Int, Int, Int, Double, Double, Double) val aggrStats: VertexRDD[Msg] = scoreGraph.aggregateMessages( // sendMsg t => { t.sendToSrc(( 1, 1, 0, t.attr.winnerStats.score, t.attr.loserStats.score, t.attr.winnerStats.fgPercent, t.attr.winnerStats.tpPercent, t.attr.winnerStats.ftPercent )) t.sendToDst(( 1, 0, 1, t.attr.loserStats.score, t.attr.winnerStats.score, t.attr.loserStats.fgPercent, t.attr.loserStats.tpPercent, t.attr.loserStats.ftPercent )) } , // mergeMsg (x, y) => ( x._1 + y._1, x._2 + y._2, x._3 + y._3, x._4 + y._4, x._5 + y._5, x._6 + y._6, x._7 + y._7, x._8 + y._8 ) ) Given the aggregate message called aggrStats, we map them into a collection of TeamStat: val teamStats: VertexRDD[TeamStat] = aggrStats mapValues { (id: VertexId, m: Msg) => m match { case ( count: Int, wins: Int, losses: Int, totPts: Int, totConcPts: Int, totFG: Double, totTP: Double, totFT: Double) => TeamStat( wins, losses, totPts/count, totConcPts/count, totFG/count, totTP/count, totFT/count) } } Next, let's join teamStats into the graph. For this, we first create a class called Team as a new type for the vertex attribute. Team will have a name and TeamStat: case class Team(name: String, stats: Option[TeamStat]) { override def toString = name + ": " + stats } Next, we use the joinVertices operator that we have seen in the previous chapter: // Joining the average stats to vertex attributes def addTeamStat(id: VertexId, t: Team, stats: TeamStat) = Team(t.name, Some(stats)) val statsGraph: Graph[Team, FullResult] = scoreGraph.mapVertices((_, name) => Team(name, None)). joinVertices(teamStats)(addTeamStat) We can see that the join has worked well by printing the first three vertices in the new graph called statsGraph: scala> statsGraph.vertices.take(3).foreach(println) (1260,Loyola-Chicago: Some(17-13)) (1410,TX Pan American: Some(7-21)) (1426,UT Arlington: Some(15-15)) To conclude this task, let's find out the top 10 teams in the regular seasons. To do so, we define an ordering for Option[TeamStat] as follows: import scala.math.Ordering object winsOrdering extends Ordering[Option[TeamStat]] { def compare(x: Option[TeamStat], y: Option[TeamStat]) = (x, y) match { case (None, None) => 0 case (Some(a), None) => 1 case (None, Some(b)) => -1 case (Some(a), Some(b)) => if (a.wins == b.wins) a.losses compare b.losses else a.wins compare b.wins }} Finally, we get the following: import scala.reflect.classTag import scala.reflect.ClassTag scala> statsGraph.vertices.sortBy(v => v._2.stats,false)(winsOrdering, classTag[Option[TeamStat]]). | take(10).foreach(println) (1246,Kentucky: Some(34-0)) (1437,Villanova: Some(32-2)) (1112,Arizona: Some(31-3)) (1458,Wisconsin: Some(31-3)) (1211,Gonzaga: Some(31-2)) (1320,Northern Iowa: Some(30-3)) (1323,Notre Dame: Some(29-5)) (1181,Duke: Some(29-4)) (1438,Virginia: Some(29-3)) (1268,Maryland: Some(27-6)) Note that the ClassTag parameter is required in sortBy to make use of Scala's reflection. This is why we had the previous imports. Performance optimization with tripletFields In addition to sendMsg and mergeMsg, aggregateMessages can also take an optional argument called tripletsFields, which indicates what data is accessed in the EdgeContext. The main reason for explicitly specifying such information is to help optimize the performance of the aggregateMessages operation. In fact, TripletFields represents a subset of the fields of EdgeTriplet, and it enables GraphX to populate only thse fields when necessary. The default value is TripletFields. All which means that the sendMsg function may access any of the fields in the EdgeContext. Otherwise, the tripletFields argument is used to tell GraphX that only part of the EdgeContext will be required so that an efficient join strategy can be used. All the possible options for the tripletsFields are listed here: TripletFields.All: Expose all the fields (source, edge, and destination) TripletFields.Dst: Expose the destination and edge fields, but not the source field TripletFields.EdgeOnly: Expose only the edge field. TripletFields.None: None of the triplet fields are exposed TripletFields.Src: Expose the source and edge fields, but not the destination field Using our previous example, if we are interested in computing the total number of wins and losses for each team, we will not need to access any field of the EdgeContext. In this case, we should use TripletFields. None to indicate so: // Number of wins of the teams val numWins: VertexRDD[Int] = scoreGraph.aggregateMessages( triplet => { triplet.sendToSrc(1) // No attribute is passed but an integer }, (x, y) => x + y, TripletFields.None ) // Number of losses of the teams val numLosses: VertexRDD[Int] = scoreGraph.aggregateMessages( triplet => { triplet.sendToDst(1) // No attribute is passed but an integer }, (x, y) => x + y, TripletFields.None ) To see that this works, let's print the top five and bottom five teams: scala> numWins.sortBy(_._2,false).take(5).foreach(println) (1246,34) (1437,32) (1112,31) (1458,31) (1211,31) scala> numLosses.sortBy(_._2, false).take(5).foreach(println) (1363,28) (1146,27) (1212,27) (1197,27) (1263,27) Should you want the name of the top five teams, you need to access the srcAttr attribute. In this case, we need to set tripletFields to TripletFields.Src: Kentucky as undefeated team in regular season: val numWinsOfTeams: VertexRDD[(String, Int)] = scoreGraph.aggregateMessages( t => { t.sendToSrc(t.srcAttr, 1) // Pass source attribute only }, (x, y) => (x._1, x._2 + y._2), TripletFields.Src ) Et voila! scala> numWinsOfTeams.sortBy(_._2._2, false).take(5).foreach(println) (1246,(Kentucky,34)) (1437,(Villanova,32)) (1112,(Arizona,31)) (1458,(Wisconsin,31)) (1211,(Gonzaga,31)) scala> numWinsOfTeams.sortBy(_._2._2).take(5).foreach(println) (1146,(Cent Arkansas,2)) (1197,(Florida A&M,2)) (1398,(Tennessee St,3)) (1263,(Maine,3)) (1420,(UMBC,4)) Kentucky has not lost any of its 34 games during the regular season. Too bad that they could not make it into the championship final. Warning about the MapReduceTriplets operator Prior to Spark 1.2, there was no aggregateMessages method in graph. Instead, the now deprecated mapReduceTriplets was the primary aggregation operator. The API for mapReduceTriplets is: class Graph[VD, ED] { def mapReduceTriplets[Msg]( map: EdgeTriplet[VD, ED] => Iterator[(VertexId, Msg)], reduce: (Msg, Msg) => Msg) : VertexRDD[Msg] } Compared to mapReduceTriplets, the new operator called aggregateMessages is more expressive as it employs the message passing mechanism instead of returning an iterator of messages as mapReduceTriplets does. In addition, aggregateMessages explicitly requires the user to specify the TripletFields object for performance improvement as we explained previously. In addition to the API improvements, aggregateMessages is optimized for performance. Because mapReduceTriplets is now deprecated, we will not discuss it further. If you have to use it with earlier versions of Spark, you can refer to the Spark programming guide. Summary In brief, AggregateMessages is a useful and generic operator that provides a functional abstraction for aggregating neighborhood information in the Spark graphs. Its definition is summarized here: class Graph[VD, ED] { def aggregateMessages[Msg: ClassTag]( sendMsg: EdgeContext[VD, ED, Msg] => Unit, mergeMsg: (Msg, Msg) => Msg, tripletFields: TripletFields = TripletFields.All) : VertexRDD[Msg] } This operator applies a user-defined sendMsg function to each edge in the graph using an EdgeContext. Each EdgeContext access the required information about the edge and passes this information to its source node and/or destination node using the sendToSrc and/or sendToDst respectively. After all the messages are received by the nodes, the mergeMsg function is used to aggregate these messages at each node. Some interesting reads Six keys to sports analytics Moneyball: The Art Of Winning An Unfair Game Golden State Warriors at the forefront of NBA data analysis How Data and Analytics Have Changed 'The Beautiful Game' NHL, SAP partnership to lead statistical revolution Resources for Article: Further resources on this subject: The Spark programming model[article] Apache Karaf – Provisioning and Clusters[article] Machine Learning Using Spark MLlib [article]

0
0
2289

Packt

08 Sep 2015

17 min read

The Symfony Framework – Installation and Configuration

Packt

08 Sep 2015

17 min read

In this article by Wojciech Bancer, author of the book, Symfony2 Essentials, we will learn the basics of Symfony, its installation, configuration, and use. The Symfony framework is currently one of the most popular PHP frameworks existing within the PHP developer's environment. Version 2, which was released a few years ago, has been a great improvement, and in my opinion was one of the key elements for making the PHP ecosystem suitable for larger enterprise projects. The framework version 2.0 not only required the modern PHP version (minimal version required for Symfony is PHP 5.3.8), but also uses state-of-the-art technology — namespaces and anonymous functions. Authors also put a lot of efforts to provide long term support and to minimize changes, which break the compatibility between versions. Also, Symfony forced developers to use a few useful design concepts. The key one, introduced in Symfony, was DependencyInjection. (For more resources related to this topic, see here.) In most cases, the article will refer to the framework as Symfony2. If you want to look over the Internet or Google about this framework, apart from using Symfony keyword you may also try to use the Symfony2 keyword. This was the way recommended some time ago by one of the creators to make searching or referencing to the specific framework version easier in future. Key reasons to choose Symfony2 Symfony2 is recognized in the PHP ecosystem as a very well-written and well-maintained framework. Design patterns that are recommended and forced within the framework allow work to be more efficient in the group, this allows better tests and the creation of reusable code. Symfony's knowledge can also be verified through a certificate system, and this allows its developers to be easily found and be more recognized on the market. Last but not least, the Symfony2 components are used as parts of other projects, for example, look at the following: Drupal phpBB Laravel eZ Publish and more Over time, there is a good chance that you will find the parts of the Symfony2 components within other open source solutions. Bundles and extendable architecture are also some of the key Symfony2 features. They not only allow you to make your work easier through the easy development of reusable code, but also allows you to find smaller or larger pieces of code that you can embed and use within your project to speed up and make your work faster. The standards of Symfony2 also make it easier to catch errors and to write high-quality code; its community is growing every year. The history of Symfony There are many Symfony versions around, and it's good to know the differences between them to learn how the framework was evolving during these years. The first stable Symfony version — 1.0 — was released in the beginning of 2007 and was supported for three years. In mid-2008, version 1.1 was presented, which wasn't compatible with the previous release, and it was difficult to upgrade any old project to this. Symfony 1.2 version was released shortly after this, at the end of 2008. Migrating between these versions was much easier, and there were no dramatic changes in the structure. The final versions of Symfony 1's legacy family was released nearly one year later. Simultaneously, there were two version releases, 1.3 and 1.4. Both were identical, but Symfony 1.4 did not have deprecated features, and it was recommended to start new projects with it. Version 1.4 had 3 years of support. If you look into the code, version 1.x was very different from version 2. The company that was behind Symfony (the French company, SensioLabs) made a bold move and decided to rewrite the whole framework from scratch. The first release of Symfony2 wasn't perfect, but it was very promising. It relied on Git submodules (the composer did not exist back then). The 2.1 and 2.2 versions were closer to the one we use now, although it required a lot of effort to migrate to the upper level. Finally, the Symfony 2.3 was released — the first long-term support version within the 2.x branch. After this version, the changes provided within the next major versions (2.4, 2.5, and 2.6) are not so drastic and usually they do not break compatibility. This article was written based on the latest stable Symfony 2.7.4 version and was tested with PHP 5.5). This Symfony version is marked as the so called long-term support version, and updates for it will be released for 3 years since the first 2.7 version release. Installation Prior to installing Symfony2, you don't need to have a configured web server. If you have at least PHP version 5.4, you can use the standalone server provided by Symfony2. This server is suitable for development purposes and should not be used for production. It is strongly recommend to work with a Linux/UNIX system for both development and production deployment of Symfony2 framework applications. While it is possible to install and operate on a Windows box, due to its different nature, working with Windows can sometimes force you to maintain a separate fragment of code for this system. Even if your primary OS is Windows, it is strongly recommended to configure Linux system in a virtual environment. Also, there are solutions that will help you in automating the whole process. As an example, see more on https://www.vagrantup.com/ website. To install Symfony2, you can use a few methods as follows: Use a new Symfony2 installer script (currently, the only officially recommended). Please note that installer requires at least PHP 5.4. Use a composer dependency manager to install a Symfony project. Download a zip or tgz package and unpack it. It does not really matter which method you choose, as they all give you similar results. Installing Symfony2 by using an installer To install Symfony2 through an installer, go to the Symfony website at http://symfony.com/download, and install the Symfony2 installer by issuing the following commands: $ sudo curl -LsS http://symfony.com/installer -o /usr/local/bin/symfony $ sudo chmod +x /usr/local/bin/symfony After this, you can install Symfony by just typing the following command: $ symfony new <new_project_folder> To install the Symfony2 framework for a to-do application, execute the following command: $ symfony new <new_project_folder> This command installs the latest Symfony2 stable version on the newly created todoapp folder, creates the Symfony2 application, and prepares some basic structure for you to work with. After the app creation, you can verify that your local PHP is properly configured for Symfony2 by typing the following command: $ php app/check.php If everything goes fine, the script should complete with the following message: [OK] Your system is ready to run Symfony projects Symfony2 is equipped with a standalone server. It makes development easier. If you want to run this, type the following command: $ php app/console server:run If everything went alright, you will see a message that your server is working on the IP 127.0.0.1 and port 8000. If there is an error, make sure you are not running anything else that is listening on port 8000. It is also possible to run the server on a different port or IP, if you have such a requirement, by adding the address and port as a parameter, that is: $ php app/console server:run 127.0.0.1:8080 If everything works, you can now type the following: http://127.0.0.1:8000/ Now, you will visit Symfony's welcome page. This page presents you with a nice welcome information and useful documentation link. The Symfony2 directory structure Let's dive in to the initial directory structure within the typical Symfony application. Here it is: app bin src vendor web While Symfony2 is very flexible in terms of directory structure, it is recommended to keep the basic structure mentioned earlier. The following table describes their purpose: Directory Used for app This holds information about general configuration, routing, security configuration, database parameters, and many others. It is also the recommended place for putting new view files. This directory is a starting point. bin It holds some helper executables. It is not really important during the development process, and rarely modified. src This directory holds the project PHP code (usually your bundles). vendor These are third-party libraries used within the project. Usually, this directory contains all the open source third-party bundles, libraries, and other resources. It's worth to mention that it's recommended to keep the files within this directory outside the versioning system. It means that you should not modify them under any circumstances. Fortunately, there are ways to modify the code, if it suits your needs more. This will be demonstrated when we implement user management within our to-do application. web This is the directory that is accessible through the web server. It holds the main entry point to the application (usually the app.php and app_dev.php files), CSS files, JavaScript files, and all the files that need to be available through the web server (user uploadable files). So, in most cases, you will be usually modifying and creating the PHP files within the src/ directory, the view and configuration files within the app/ directory, and the JS/CSS files within the web/ directory. The main directory also holds a few files as follows: .gitignore README.md composer.json composer.lock The .gitignore file's purpose is to provide some preconfigured settings for the Git repository, while the composer.json and composer.lock files are the files used by the composer dependency manager. What is a bundle? Within the Symfony2 application, you will be using the "bundle" term quite often. Bundle is something similar to plugins. So it can literally hold any code controllers, views, models, and services. A bundle can integrate other non-Symfony2 libraries and hold some JavaScript/CSS code as well. We can say that almost everything is a bundle in Symfony2; even some of the core framework features together form a bundle. A bundle usually implements a single feature or functionality. The code you are writing when you write a Symfony2 application is also a bundle. There are two types of bundles. The first kind of bundle is the one you write within the application, which is project-specific and not reusable. For this purpose, there is a special bundle called AppBundle created for you when you install the Symfony2 project. Also, there are reusable bundles that are shared across the various projects either written by you, your team, or provided by a third-party vendors. Your own bundles are usually stored within the src/ directory, while the third-party bundles sit within the vendor/ directory. The vendor directory is used to store third-party libraries and is managed by the composer. As such, it should never be modified by you. There are many reusable open source bundles, which help you to implement various features within the application. You can find many of them to help you with User Management, writing RESTful APIs, making better documentation, connecting to Facebook and AWS, and even generating a whole admin panel. There are tons of bundles, and everyday brings new ones. If you want to explore open source bundles, and want to look around what's available, I recommend you to start with the http://knpbundles.com/ website. The bundle name is correlated with the PHP namespace. As such, it needs to follow some technical rules, and it needs to end with the Bundle suffix. A few examples of correct names are AppBundle and AcmeDemoBundle, CompanyBlogBundle or CompanySocialForumBundle, and so on. Composer Symfony2 is built based on components, and it would be very difficult to manage the dependencies between them and the framework without a dependency manager. To make installing and managing these components easier, Symfony2 uses a manager called composer. You can get it from the https://getcomposer.org/ website. The composer makes it easy to install and check all dependencies, download them, and integrate them to your work. If you want to find additional packages that can be installed with the composer, you should visit https://packagist.org/. This site is the main composer repository, and contains information about most of the packages that are installable with the composer. To install the composer, go to https://getcomposer.org/download/ and see the download instruction. The download instruction should be similar to the following: $ curl -sS https://getcomposer.org/installer | php If the download was successful, you should see the composer.phar file in your directory. Move this to the project location in the same place where you have the composer.json and composer.lock files. You can also install it globally, if you prefer to, with these two commands: $ curl -sS https://getcomposer.org/installer | php $ sudo mv composer.phar /usr/local/bin/composer You will usually need to use only three composer commands: require, install, and update. The require command is executed when you need to add a new dependency. The install command is used to install the package. The update command is used when you need to fetch the latest version of your dependencies as specified within the JSON file. The difference between install and update is subtle, but very important. If you are executing the update command, your composer.lock file gets updated with the version of the code you just fetched and downloaded. The install command uses the information stored in the composer.lock file and the fetch version stored in this file. When to use install? For example, if you deploy the code to the server, you should use install rather than update, as it will deploy the version of the code stored in composer.lock, rather than download the latest version (which may be untested by you). Also, if you work in a team and you just got an update through Git, you should use install to fetch the vendor code updated by other developers. You should use the update command if you want to check whether there is an updated version of the package you have installed, that is, whether a new minor version of Symfony2 will be released, then the update command will fetch everything. As an example, let's install one extra package for user management called FOSUserBundle (FOS is a shortcut of Friends of Symfony). We will only install it here; we will not configure it. To install FOSUserBundle, we need to know the correct package name and version. The easiest way is to look in the packagist site at https://packagist.org/ and search for the package there. If you type fosuserbundle, the search should return a package called friendsofsymfony/user-bundle as one of the top results. The download counts visible on the right-hand side might be also helpful in determining how popular the bundle is. If you click on this, you will end up on the page with the detailed information about that bundle, such as homepage, versions, and requirements of the package. Type the following command: $ php composer.phar require friendsofsymfony/user-bundle ^1.3 Using version ^1.3 for friendsofsymfony/user-bundle ./composer.json has been updated Loading composer repositories with package information Updating dependencies (including require-dev) - Installing friendsofsymfony/user-bundle (v1.3.6) Loading from cache friendsofsymfony/user-bundle suggests installing willdurand/propel-typehintable-behavior (Needed when using the propel implementation) Writing lock file Generating autoload files ... Which version of the package you choose is up to you. If you are interested in package versioning standards, see the composer website at https://getcomposer.org/doc/01-basic-usage.md#package-versions to get more information on it. The composer holds all the configurable information about dependencies and where to install them in a special JSON file called composer.json. Let's take a look at this: { "name": "wbancer/todoapp", "license": "proprietary", "type": "project", "autoload": { "psr-0": { "": "src/", "SymfonyStandard": "app/SymfonyStandard/" } }, "require": { "php": ">=5.3.9", "symfony/symfony": "2.7.*", "doctrine/orm": "~2.2,>=2.2.3,<2.5", // [...] "incenteev/composer-parameter-handler": "~2.0", "friendsofsymfony/user-bundle": "^1.3" }, "require-dev": { "sensio/generator-bundle": "~2.3" }, "scripts": { "post-root-package-install": [ "SymfonyStandard\\Composer::hookRootPackageInstall" ], "post-install-cmd": [ // post installation steps ], "post-update-cmd": [ // post update steps ] }, "config": { "bin-dir": "bin" }, "extra": { // [...] } } The most important section is the one with the require key. It holds all the information about the packages we want to use within the project. The key scripts contain a set of instructions to run post-install and post-update. The extra key in this case contains some settings specific to the Symfony2 framework. Note that one of the values in here points out to the parameter.yml file. This file is the main file holding the custom machine-specific parameters. The meaning of the other keys is rather obvious. If you look into the vendor/ directory, you will notice that our package has been installed in the vendor/friendsofsymfony/user-bundle directory. The configuration files Each application has a need to hold some global and machine-specific parameters and configurations. Symfony2 holds configuration within the app/config directory and it is split into a few files as follows: config.yml config_dev.yml config_prod.yml config_test.yml parameters.yml parameters.yml.dist routing.yml routing_dev.yml security.yml services.yml All the files except the parameters.yml* files contain global configuration, while the parameters.yml file holds machine-specific information such as database host, database name, user, password, and SMTP configuration. The default configuration file generated by the new Symfony command will be similar to the following one. This file is auto-generated during the composer install: parameters: database_driver: pdo_mysql database_host: 127.0.0.1 database_port: null database_name: symfony database_user: root database_password: null mailer_transport: smtp mailer_host: 127.0.0.1 mailer_user: null mailer_password: null secret: 93b0eebeffd9e229701f74597e10f8ecf4d94d7f As you can see, it mostly holds the parameters related to database, SMTP, locale settings, and secret key that are used internally by Symfony2. Here, you can add your custom parameters using the same syntax. It is a good practice to keep machine-specific data such as passwords, tokens, api-keys, and access keys within this file only. Putting passwords in the general config.yml file is considered as a security risk bug. The global configuration file (config.yml) is split into a few other files called routing*.yml that contain information about routing on the development and production configuration. The file called as security.yml holds information related to authentication and securing the application access. Note that some files contains information for development, production, or test mode. You can define your mode when you run Symfony through the command-line console and when you run it through the web server. In most cases, while developing you will be using the dev mode. The Symfony2 console To finish, let's take a look at the Symfony console script. We used it before to fire up the development server, but it offers more. Execute the following: $ php app/console You will see a list of supported commands. Each command has a short description. Each of the standard commands come with help, so I will not be describing each of them here, but it is worth to mention a few commonly used ones: Command Description app/console: cache:clear Symfony in production uses a lot of caching. Therefore, if you need to change values within a template (twig) or within configuration files while in production mode, you will need to clear the cache. Cache is also one of the reasons why it's worth to work in the development mode. app/console container:debug Displays all configured public services app/console router:debug Displays all routing configuration along with method, scheme, host, and path. app/console security:check Checks your composer and packages version against known security vulnerabilities. You should run this command regularly. Summary In this article, we have demonstrated how to use the Symfony2 installer, test the configuration, run the deployment server, and play around with the Symfony2 command line. We have also installed the composer and learned how to install a package using it. To demonstrate how Symfony2 enables you to make web applications faster, we will try to learn through examples that can be found in real life. To make this task easier, we will try to produce a real to-do web application with modern look and a few working features. In case you are interested in knowing other Symfony books that Packt has in store for you, here is the link: Symfony 1.3 Web Application Development, Tim Bowler, Wojciech Bancer Extending Symfony2 Web Application Framework, Sébastien Armand Resources for Article: Further resources on this subject: A Command-line Companion Called Artisan[article] Creating and Using Composer Packages[article] Services [article]

0
0
3674

How-To Tutorials

article-image-application-development-workflow

Packt

08 Sep 2015

15 min read

Application Development Workflow

Packt

08 Sep 2015

15 min read

In this article by Ivan Turkovic, author of the book PhoneGap Essentials, you will learn some of the basics on how to work with the PhoneGap application development and how to start building the application. We will go over some useful steps and tips to get the most out of your PhoneGap application. In this article, you will learn the following topics: An introduction to a development workflow Best practices Testing (For more resources related to this topic, see here.) An introduction to a development workflow PhoneGap solves a great problem of developing mobile applications for multiple platforms at the same time, but still it is pretty much open about how you want to approach the creation of an application. You do not have any predefined frameworks that come out of-the-box by default. It just allows you to use the standard web technologies such as the HTML5, CSS3, and JavaScript languages for hybrid mobile application development. The applications are executed in wrappers that are custom-built to work on every platform and the underlying web view behaves in the same way on all the platforms. For accessing device APIs, it relies on the standard API bindings to access every device's sensors or the other features. The developers who start using PhoneGap usually come from different backgrounds, as shown in the following list: Mobile developers who want to expand the functionality of their application on other platforms but do not want to learn a new language for each platform Web developers who want to port their existing desktop web application to a mobile application; if they are using a responsive design, it is quite simple to do this Experienced mobile developers who want to use both the native and web components in their application, so that the web components can communicate with the internal native application code as well The PhoneGap project itself is pretty simple. By default, it can open an index.html page and load the initial CSS file, JavaScript, and other resources needed to run it. Besides the user's resources, it needs to refer the cordova.js file, which provides the API bindings for all the plugins. From here onwards, you can take different steps but usually the process falls in two main workflows: web development workflow and native platform development. Web project development A web project development workflow can be used when you want to create a PhoneGap application that runs on many mobile operating systems with as little as possible changes to a specific one. So there is a single codebase that is working along with all the different devices. It has become possible with the latest versions since the introduction of the command-line interface (CLI). This automates the tedious work involved in a lot of the functionalities while taking care of each platform, such as building the app, copying the web assets in the correct location for every supported platform, adding platform-specific changes, and finally running build scripts to generate binaries. This process can be automated even more with build system automating tasks such as Gulp or Grunt. You can run these tasks before running PhoneGap commands. This way you can optimize the assets before they are used. Also you can run JSLint automatically for any change or doing automatic builds for every platform that is available. Native platform development A native platform development workflow can be imagined as a focus on building an application for a single platform and the need to change the lower-level platform details. The benefit of using this approach is that it gives you more flexibility and you can mix the native code with a WebView code and impose communication between them. This is appropriate for those functionalities that contain a section of the features that are not hard to reproduce with web views only; for example, a video app where you can do the video editing in the native code and all the social features and interaction can be done with web views. Even if you want to start with this approach, it is better to start the new project as a web project development workflow and then continue to separate the code for your specific needs. One thing to keep in mind is that, to develop with this approach, it is better to develop the application in more advanced IDE environments, which you would usually use for building native applications. Best practices The running of hybrid mobile applications requires some sacrifices in terms of performance and functionality; so it is good to go over some useful tips for new PhoneGap developers. Use local assets for the UI As mobile devices are limited by the connection speeds and mobile data plans are not generous with the bandwidth, you need to prepare all the UI components in the application before deploying to the app store. Nobody will want to use an application that takes a few seconds to load the server-rendered UI when the same thing could be done on the client. For example, the Google Fonts or other non-UI assets that are usually loaded from the server for the web applications are good enough as for the development process, but for the production; you need to store all the assets in the application's container and not download them during its run process. You do not want the application to wait while an important part is being loaded. The best advice on the UI that I can give you is to adopt the Single Page Application (SPA) design; it is a client-side application that is run from one request from a web page. Initial loading means taking care of loading all the assets that are required for the application in order to function, and any further updates are done via AJAX (such as loading data). When you use SPA, not only do you minimize the amount of interaction with the server, you also organize your application in a more efficient manner. One of the benefits is that the application doesn't need to wait for every deviceready event for each additional page that it loads from the start. Network access for data As you have seen in the previous section, there are many limitations that mobile applications face with the network connection—from mobile data plans to the network latency. So you do not want it to rely on the crucial elements, unless real-time communication is required for the application. Try to keep the network access only to access crucial data and everything else that is used frequently can be packed into assets. If the received data does not change often, it is advisable to cache it for offline use. There are many ways to achieve this, such as localStorage, sessionStorage, WebSQL, or a file. When loading data, try to load only the data you need at that moment. If you have a comment section, it will make sense if you load all thousand comments; the first twenty comments should be enough to start with. Non-blocking UI When you are loading additional data to show in the application, don't try to pause the application until you receive all the data that you need. You can add some animation or a spinner to show the progress. Do not let the user stare at the same screen when he presses the button. Try to disable the actions once they are in motion in order to prevent sending the same action multiple times. CSS animations As most of the modern mobile platforms now support CSS3 with a more or less consistent feature set, it is better to make the animations and transitions with CSS rather than with the plain JavaScript DOM manipulation, which was done before CSS3. CSS3 is much faster as the browser engine supports the hardware acceleration of CSS animations and is more fluid than the JavaScript animations. CSS3 supports translations and full keyframe animations as well, so you can be really creative in making your application more interactive. Click events You should avoid click events at any cost and use only touch events. They work in the same way as they do in the desktop browser. They take a longer time to process as the mobile browser engine needs to process the touch or touchhold events before firing a click event. This usually takes 300 ms, which is more than enough to give an additional impression of slow responses. So try to start using touchstart or touchend events. There is a solution for this called FastClick.js. It is a simple, easy-to-use library for eliminating the 300 ms delay between a physical tap and the firing of a click event on mobile browsers. Performance The performance that we get on the desktops isn't reflected in mobile devices. Most of the developers assume that the performance doesn't change a lot, especially as most of them test the applications on the latest mobile devices and a vast majority of the users use mobile devices that are 2-3 years old. You have to keep in mind that even the latest mobile devices have a slower CPU, less RAM, and a weaker GPU. Recently, mobile devices are catching up in the sheer numbers of these components but, in reality, they are slower and the maximum performance is limited due to the battery life that prevents it from using the maximum performance for a prolonged time. Optimize the image assets We are not limited any more by the app size that we need to deploy. However, you need to optimize the assets, especially images, as they take a large part of the assets, and make them appropriate for the device. You should prepare images in the right size; do not add the biggest size of the image that you have and force the mobile device to scale the image in HTML. Choosing the right image size is not an easy task if you are developing an application that should support a wide array of screens, especially for Android that has a very fragmented market with different screen sizes. The scaled images might have additional artifacts on the screen and they might not look so crisp. You will be hogging additional memory just for an image that could leave a smaller memory footprint. You should remember that mobile devices still have limited resources and the battery doesn't last forever. If you are going to use PhoneGap Build, you will need to make sure you do not exceed the limit as the service still has a limited size. Offline status As we all know, the network access is slow and limited, but the network coverage is not perfect so it is quite possible that your application will be working in the offline mode even in the usual locations. Bad reception can be caused by being inside a building with thick walls or in the basement. Some weather conditions can affect the reception too. The application should be able to handle this situation and respond to it properly, such as by limiting the parts of the application that require a network connection or caching data and syncing it when you are online once again. This is one of the aspects that developers usually forget to test in the offline mode to see how the app behaves under certain conditions. You should have a plugin available in order to detect the current state and the events when it passes between these two modes. Load only what you need There are a lot of developers that do this, including myself. We need some part of the library or a widget from a framework, which we don't need for anything other than this, and yet we are a bit lazy about loading a specific element and the full framework. This can load an immense amount of resources that we will never need but they will still run in the background. It might also be the root cause of some of the problems as some libraries do not mix well and we can spend hours trying to solve this problem. Transparency You should try to use as little as possible of the elements that have transparent parts as they are quite processor-intensive because you need to update screen on every change behind them. The same things apply to the other visual elements that are processor-intensive such as shadows or gradients. The great thing is that all the major platforms have moved away from flashy graphical elements and started using the flat UI design. JSHint If you use JSHint throughout the development, it will save you a lot of time when developing things in JavaScript. It is a static code analysis tool for checking whether the JavaScript source code complies with the coding rules. It will detect all the common mistakes done with JavaScript, as JavaScript is not a compiled language and you can't see the error until you run the code. At the same time, JSHint can be a very restrictive and demanding tool. Many beginners in JavaScript, PhoneGap, or mobile programming could be overwhelmed with the number of errors or bad practices that JSHint will point out. Testing The testing of applications is an important aspect of build applications, and mobile applications are no exception. With a slight difference for most of the development that doesn't require native device APIs, you can use the platform simulators and see the results. However, if you are using the native device APIs that are not supported through simulators, then you need to have a real device in order to run a test on it. It is not unusual to use desktop browsers resized to mobile device screen resolution to emulate their screen while you are developing the application just to test the UI screens, since it is much faster and easier than building and running the application on a simulator or real device for every small change. There is a great plugin for the Google Chrome browser called Apache Ripple. It can be run without any additional tools. The Apache Ripple simulator runs as a web app in the Google Chrome browser. In Cordova, it can be used to simulate your app on a number of iOS and Android devices and it provides basic support for the core Cordova plugins such as Geolocation and Device Orientation. You can run the application in a real device browser or use the PhoneGap developer app. This simplifies the workflow as you can test the application on your mobile device without the need to re-sign, recompile, or reinstall your application to test the code. The only disadvantage is that with simulators, you cannot access the device APIs that aren't available in the regular web browsers. The PhoneGap developer app allows you to access device APIs as long as you are using one of the supplied APIs. It is good if you remember to always test the application on real devices at least before deploying to the app store. Computers have almost unlimited resources as compared to mobile devices, so the application that runs flawlessly on the computer might fail on mobile devices due to low memory. As simulators are faster than the real device, you might get the impression that it will work on every device equally fast, but it won't—especially with older devices. So, if you have an older device, it is better to test the response on it. Another reason to use the mobile device instead of the simulator is that it is hard to get a good usability experience from clicking on the interface on the computer screen without your fingers interfering and blocking the view on the device. Even though it is rare that you would get some bugs with the plain PhoneGap that was introduced with the new version, it might still happen. If you use the UI framework, it is good if you try it on the different versions of the operating systems as they might not work flawlessly on each of them. Even though hybrid mobile application development has been available for some time, it is still evolving, and as yet there are no default UI frameworks to use. Even the PhoneGap itself is still evolving. As with the UI, the same thing applies to the different plugins. Some of the features might get deprecated or might not be supported, so it is good if you implement alternatives or give feedback to the users about why this will not work. From experience, the average PhoneGap application will use at least ten plugins or different libraries for the final deployment. Every additional plugin or library installed can cause conflicts with another one. Summary In this article, we learned more advanced topics that any PhoneGap developer should get into more detail once he/she has mastered the essential topics. Resources for Article: Further resources on this subject: Building the Middle-Tier[article] Working with the sharing plugin[article] Getting Ready to Launch Your PhoneGap App in the Real World [article]

0
0
8258

Getting Started – Understanding Citrix XenDesktop and its Architecture

Understanding the Datastore

Apache Spark

Introducing the Boost C++ Libraries

Introducing Bayesian Inference

Continuous Delivery and Continuous Deployment

PostgreSQL in Action

Deploy Node.js Apps with AWS Code Deploy

How to Run Code in the Cloud with AWS Lambda

Deploying a Zabbix proxy

Trending Topics

Creating slash commands for Slack using Bottle

Introduction to Spring Web Application in No Time

Sabermetrics with Apache Spark

The Symfony Framework – Installation and Configuration

Application Development Workflow

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access