Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials

7019 Articles
article-image-support-vector-machines-classification-engine
Packt
17 Mar 2016
9 min read
Save for later

Support Vector Machines as a Classification Engine

Packt
17 Mar 2016
9 min read
In this article by Tomasz Drabas, author of the book, Practical Data Analysis Cookbook, we will discuss on how Support Vector Machine models can be used as a classification engine. (For more resources related to this topic, see here.) Support Vector Machines Support Vector Machines (SVMs) are a family of extremely powerful models that can be used in classification and regression problems. They aim at finding decision boundaries that separate observations with differing class memberships. While many classifiers exist that can classify linearly separable data (for example, logistic regression), SVMs can handle highly non-linear problems using a kernel trick that implicitly maps the input vectors to higher-dimensional feature spaces. The transformation rearranges the dataset in such a way that it is then linearly solvable. The mechanics of the machine Given a set of n points of a form (x1,y1)...(xn,yn), where xi is a z-dimensional input vector and  yi is a class label, the SVM aims at finding the maximum margin hyperplane that separates the data points: In a two-dimensional dataset, with linearly separable data points (as shown in the preceding figure), the maximum margin hyperplane would be a line that would maximize the distance between each of the classes. The hyperplane could be expressed as a dot product of the set of input vectors  x and a vector normal to the hyperplane W:W.X=b, where b is the offset from the origin of the coordinate system. To find the hyperplane, we solve the following optimization problem: The constraint of our optimization problem effectively states that no point can cross the hyperplane if it does not belong to the class on that side of the hyperplane. Linear SVM Building a linear SVM classifier in Python is easy. There are multiple Python packages that can estimate a linear SVM but here, we decided to use MLPY (http://mlpy.sourceforge.net): import pandas as pd import numpy as np import mlpy as ml First, we load the necessary modules that we will use later, namely pandas (http://pandas.pydata.org), NumPy (http://www.numpy.org), and the aforementioned MLPY. We use pandas to read the data (https://github.com/drabastomek/practicalDataAnalysisCookbook repository to download the data): # the file name of the dataset r_filename = 'Data/Chapter03/bank_contacts.csv' # read the data csv_read = pd.read_csv(r_filename) The dataset that we use was described in S. Moro, P. Cortez, and P. Rita. A data-driven approach to Predict the Success of Bank Telemarketing. Decision Support Systems, Elsevier, 62:22-31, June 2014 and found here http://archive.ics.uci.edu/ml/datasets/Bank+Marketing. It consists of over 41.1k outbound marketing calls of a bank. Our aim is to classify these calls into two buckets: those that resulted in a credit application and those that did not. Once the file was loaded, we split the data into training and testing datasets; we also keep the input and class indicator data separately. To this end, we use the split_dataset(...) method: def split_data(data, y, x = 'All', test_size = 0.33): ''' Method to split the data into training and testing ''' import sys # dependent variable variables = {'y': y} # and all the independent if x == 'All': allColumns = list(data.columns) allColumns.remove(y) variables['x'] = allColumns else: if type(x) != list: print('The x parameter has to be a list...') sys.exit(1) else: variables['x'] = x # create a variable to flag the training sample data['train'] = np.random.rand(len(data)) < (1 - test_size) # split the data into training and testing train_x = data[data.train] [variables['x']] train_y = data[data.train] [variables['y']] test_x = data[~data.train][variables['x']] test_y = data[~data.train][variables['y']] return train_x, train_y, test_x, test_y, variables['x'] We randomly set 1/3 of the dataset aside for testing purposes and use the remaining 2/3 for the training of the model: # split the data into training and testing train_x, train_y, test_x, test_y, labels = hlp.split_data( csv_read, y = 'credit_application' ) Once we read the data and split it into training and testing datasets, we can estimate the model: # create the classifier object svm = ml.LibSvm(svm_type='c_svc', kernel_type='linear', C=100.0) # fit the data svm.learn(train_x,train_y) The svm_type parameter of the .LibSvm(...) method controls what algorithm to use to estimate the SVM. Here, we use the c_svc method—a C-support Vector Classifier. The method specifies how much you want to avoid misclassifying observations: the larger values of C parameter will shrink the margin for the hyperplane (theb) so that more of the observations are correctly classified. You can also specify nu_svc with a nu parameter that controls how much of your sample (at most) can be misclassified and how many of your observations (at least) can become support vectors. Here, we estimate an SVM with a linear kernel, so let's talk about kernels. Kernels A kernel function K is effectively a function that computes a dot product between two n-dimensional vectors, K: Rn.Rn --> R. In other words, the kernel function takes two vectors and produces a scalar: The linear kernel does not effectively transform the data into a higher dimensional space. This is not true for polynomial or Radial Basis Function (RBF) kernels that transform the input feature space into higher dimensions. In case of the polynomial kernel of degree d, the obtained feature space has (n+d/d) dimensions for the Rn dimensional input feature space. As you can see, the number of additional dimensions can grow very quickly and this would pose significant problems in estimating the model if we would explicitly transform the data into higher dimensions. Thankfully, we do not have to do this as that's where the kernel trick comes into play. The truth is that SVMs do not have to work explicitly in higher dimensions but can rather implicitly map the data to higher dimensions using pairwise inner products (instead of an explicit dot product) and then use it to find the maximum margin hyperplane. You can find a really good explanation of the kernel trick at http://www.eric-kim.net/eric-kim-net/posts/1/kernel_trick.html. Back to our example The .learn(...) method of the .LibSvm(...) object estimates the model. Once the model is estimated, we can test how well it performs. First, we use the estimated model to predict the classes for the observations in the testing dataset: predicted_l = svm.pred(test_x) Next, we will use some of the scikit-learn methods to print the basic statistics for our model: def printModelSummary(actual, predicted): ''' Method to print out model summaries ''' import sklearn.metrics as mt print('Overall accuracy of the model is {0:.2f} percent' .format( (actual == predicted).sum() / len(actual) * 100)) print('Classification report: n', mt.classification_report(actual, predicted)) print('Confusion matrix: n', mt.confusion_matrix(actual, predicted)) print('ROC: ', mt.roc_auc_score(actual, predicted)) First, we calculate the overall accuracy of the model expressed as a ratio of properly classified observations to the total number of observations in the testing sample. Next, we print the classification report: The precision is the model's ability to avoid classifying an observation as positive when it is not. It is a ratio of true positives to the overall number of positively classified records. The overall precision score is a weighted average of the individual precision scores where the weight is the support. The support is the total number of actual observations in each class. The total precision for our model is not too bad—89 out of 100. However, when we look at the precision to classify the true positives, the situation is not as good—only 63 out of 100 were properly classified. Recall can be viewed as the model's capacity to find all the positive samples. It is a ratio of true positives to the sum of true positives and false negatives. The recall for the class 0.0 is almost perfect but for class 1.0, it looks really bad. This might be a problem with the fact that our sample is not balanced, but it is more likely that the features we use to classify the data do not really capture the differences between the two groups. The f1-score is effectively a weighted amalgam of the precision and recall: it is a ratio of twice the product of precision and recall to their sum. In one measure, it shows whether the model performs well or not. At the general level, the model does not perform badly but when looked at the model's ability to classify the true signal, it fails gravely. It is a perfect example why judging the model at the general level might be misleading when dealing with samples that are heavily unbalanced. RBF kernel SVM Given that the linear kernel performed poorly, our dataset might not be linearly separable. Thus, let's try the RBF kernel. The RBF kernel is given as K(x,y)=e ||x-y||2/2a2, where ||x-y||2 is a Euclidean distance between the two vectors, x and y, and σ is a free parameter. The value of RBF equals to 1 when x=y and gradually falls to 0 when the distance approaches infinity. To fit an RBF version of our model, we can specify our svm object as follows: svm = ml.LibSvm(svm_type='c_svc', kernel_type='rbf', gamma=0.1, C=1.0) The gamma parameter here specifies how far the influence of a single support vector reaches. Visually, you can investigate the relationship between gamma and C parameters at http://scikit-learn.org/stable/auto_examples/svm/plot_rbf_parameters.html. The rest of the code for the model estimation follows in a similar fashion as with the linear kernel and we obtain the following results: The results are even worse than the linear kernel as the precision and recall were lost across the board. The SVM with the RBF kernel performed worse when classifying calls that resulted in applying for the credit card and those that did not. Summary In this article, we saw that the problem is not with the model but rather, the dataset that we use does not explain the variance sufficiently. This requires going back to the drawing board and selecting other features. Resources for Article: Further resources on this subject: Push your data to the Web [article] Transferring Data from MS Access 2003 to SQL Server 2008 [article] Exporting data from MS Access 2003 to MySQL [article]
Read more
  • 0
  • 0
  • 16014

Packt
17 Mar 2016
9 min read
Save for later

Microservices – Brave New World

Packt
17 Mar 2016
9 min read
In this article by David Gonzalez, author of the book Developing Microservices with Node.js, we will cover the need for microservices, explain the monolithic approach, and study how to build and deploy microservices. (For more resources related to this topic, see here.) Need for microservices The world of software development has evolved quickly over the past 40 years. One of the key points of this evolution has been the size of these systems. From the days of MS-DOS, we taken a hundred-fold leap into our present systems. This growth in size creates a need for better ways of organizing the code and software components. Usually, when a company grows due to business needs, which is known as organic growth, the software gets organized on a monolithic architecture as it is the easiest and quickest way of building software. After few years (or even months), adding new features becomes harder due to the coupled nature of the created software. Monolithic software There are a few companies that have already started building their software using microservices, which is the ideal scenario. The problem is that not all the companies can plan their software upfront. Instead of planning, these companies build the software based on the organic growth experienced: few software components that group business flows by affinity. It is not rare to see companies having two big software components: the user facing website and the internal administration tools. This is usually known as a monolithic software architecture. Some of these companies face big problems when trying to scale the engineering teams. It is hard to coordinate the teams that build, deploy, and maintain a single software component. Clashes on releases and reintroduction of bugs are a common problem that drains a big chunk of energy from the teams. One of the solution to this problem (it also has other benefits) is to split the monolithic software into microservices so that the teams are able to specialize in few smaller modules and autonomous and isolated software components that can be versioned, updated, and deployed without interfering with the rest of the systems of the company. One of the most interesting solutions to this problem is splitting the monolithic architecture into microservices. This enables the engineering team to create isolated and autonomous units of work that are highly specialized in a given task (such as sending e-mails, processing card payment, and so on). Microservices in the real world Microservices are small software components that specialize in one task and work together to achieve a higher-level task. Forget about software for a second and think about how a company works. When someone applies for a job in a company, he applies for a given position: software engineer, systems administrator, or office manager The reason for it can be summarized in one word—specialization. If you are used to working as a software engineer, you will get better with the experience and add more value to the company. The fact that you don’t know how to deal with a customer, won’t affect your performance as it is not your area of expertise and will hardly add any value to your day-to-day work. A microservice is an autonomous unit of work that can execute one task without interfering with other parts of the system, similar to what a job position is to a company. This has a number of benefits that can be used in favor of the engineering team in order to help to scale the systems of a company. Nowadays, hundreds of systems are built using a microservices-oriented architectures, as follows: Netflix: They are one of the most popular streaming services and have built an entire ecosystem of applications that collaborate in order to provide a reliable and scalable streaming system used across the globe. Spotify: They are one of the leading music streaming services in the world and have built this application using microservices. Every single widget of the application (which is a website exposed as a desktop app using Chromium Embedded Framework (CEF)) is a different microservice that can be updated individually. First, there was the monolith A huge percentage (my estimate is around 90%) of the modern enterprise software is built following a monolithic approach. Huge software components that run in a single container and have a well-defined development life cycle that goes completely against the following agile principles, deliver early and deliver often (https://en.wikipedia.org/wiki/Release_early,_release_often): Deliver early: The sooner you fail, the easier it is to recover. If you are working for two years in a software component and then, it is released, there is a huge risk of deviation from the original requirements, which are usually wrong and changing every few days. Deliver often: Everything of the software is delivered to all the stake holders so that they can have their inputs and see the changes reflected in the software. Errors can be fixed in a few days and improvements are identified easily. Companies build big software components instead of smaller ones that work together as it is the natural thing to do, as follows: The developer has a new requirement. He builds a new method on an existing class on the service layer. The method is exposed on the API via HTTP, SOAP, or any other protocol. Now, repeat it by the number of developers in your company and you will obtain something called organic growth. Organic growth is the type of uncontrolled and unplanned growth on software systems under business pressure without an adequate long-term planning, and it is bad. How to tackle the organic growth? The first thing needed to tackle the organic growth is make sure that business and IT are aligned in the company. Usually, in big companies, IT is not seen as a core part of the business. Organizations outsource their IT systems, keeping the cost in mind, but not the quality so that the partners building these software components are focused on one thing: deliver on time and according to the specification, even if it is incorrect. This produces a less-than-ideal ecosystem to respond to the business needs with a working solution for an existing problem. IT is lead by people who barely understand how the systems are built and usually overlook the complexity of the software development. Fortunately, this is a changing tendency as IT systems have become the drivers of 99% of the businesses around the world, but we need to be smarter about how we build them. The first measure to tackle the organic growth is to align IT and business stakeholders in order to work together, educating the non-technical stakeholders is the key to success. If we go back to the example from the previous section (few releases with quite big changes). Can we do it better? Of course, we can. Divide the work into manageable software artifacts that model a single and well-defined business activity and give it an entity. It does not need to be a microservice at this stage, but keeping the logic inside a separated, well-defined, easy testable, and decoupled module will give us a huge advantage towards future changes in the application. Building microservices – The fallback strategy When you design a system, we usually think about the replaceability of the existing components. For example, when using a persistence technology in Java, we tend to lean towards the standards (Java Persistence API (JPA)) so that we can replace the underneath implementation without too much effort. Microservices take the same approach, but they isolate the problem instead of working towards an easy replaceability. Also, e-mailing is something that, although it seems simple, always ends up giving problems. Consider that we want to replace Mandrill with a plain SMTP server, such as Gmail. We don't need to do anything special, we just change the implementation and rollout the new version of our microservice, as follows: var nodemailer = require('nodemailer'); var seneca = require("seneca")(); var transporter = nodemailer.createTransport({ service: 'Gmail', auth: { user: 'info@micromerce.com', pass: 'verysecurepassword' } }); /** * Sends an email including the content. */ seneca.add({area: "email", action: "send"}, function(args, done) { var mailOptions = { from: 'Micromerce Info ✔ <info@micromerce.com>', to: args.to, subject: args.subject, html: args.body }; transporter.sendMail(mailOptions, function(error, info){ if(error){ done({code: e}, null); } done(null, {status: "sent"}); }); }); For the outer world, our simplest version of the e-mail sender is now at all lights, using SMTP through Gmail to deliver our e-mails. We could even rollout one server with this version and send some traffic to it in order to validate our implementation without affecting all the customers (in other words, contain the failure). Deploying microservices Deployment is usually the ugly friend of the software development life cycle party. There is a missing contact point in between development and system administration, which DevOps is going to solve in the following few years (or has already done it and no one told me). The following is the graph showing the cost of fixing software bugs versus the various phases of development: From the continuous integration up to continuous delivery, the process should be automated as much as possible, where as much as possible means 100%. Remember, humans are imperfect…if we rely on humans carrying on a manual repetitive process for a bug-free software, we are walking the wrong path. Remember that a machine will always be error free (as long as the algorithm that is executed is error free) so…why not let a machine control our infrastructure? Summary In this article, we saw how microservices are required in complex software systems, how the monolithic approach is useful, and how to build and deploy microservices. Resources for Article: Further resources on this subject: Making a Web Server in Node.js [article] Node.js Fundamentals and Asynchronous JavaScript [article] An Introduction to Node.js Design Patterns [article]
Read more
  • 0
  • 0
  • 17495

article-image-welcome-to-machine-learning-using-the-net-framework
Oli Huggins
16 Mar 2016
26 min read
Save for later

Welcome to Machine Learning using the .NET Framework

Oli Huggins
16 Mar 2016
26 min read
This article by, Jamie Dixon, the author of the book, Mastering .NET Machine Learning, will focus on some of the larger questions you might have about machine learning using the .NET Framework, namely: What is machine learning? Why should we consider it in the .NET Framework? How can I get started with coding? (For more resources related to this topic, see here.) What is machine learning? If you check out on Wikipedia, you will find a fairly abstract definition of machine learning: "Machine learning explores the study and construction of algorithms that can learn from and make predictions on data. Such algorithms operate by building a model from example inputs in order to make data-driven predictions or decisions, rather than following strictly static program instructions." I like to think of machine learning as computer programs that produce different results as they are exposed to more information without changing their source code (and consequently needed to be redeployed). For example, consider a game that I play with the computer. I show the computer this picture  and tell it "Blue Circle". I then show it this picture  and tell it "Red Circle". Next I show it this picture  and say "Green Triangle." Finally, I show it this picture  and ask it "What is this?". Ideally the computer would respond, "Green Circle." This is one example of machine learning. Although I did not change my code or recompile and redeploy, the computer program can respond accurately to data it has never seen before. Also, the computer code does not have to explicitly write each possible data permutation. Instead, we create models that the computer applies to new data. Sometimes the computer is right, sometimes it is wrong. We then feed the new data to the computer to retrain the model so the computer gets more and more accurate over time—or, at least, that is the goal. Once you decide to implement some machine learning into your code base, another decision has to be made fairly early in the process. How often do you want the computer to learn? For example, if you create a model by hand, how often do you update it? With every new data row? Every month? Every year? Depending on what you are trying to accomplish, you might create a real-time ML model, a near-time model, or a periodic model. Why .NET? If you are a Windows developer, using .NET is something you do without thinking. Indeed, a vast majority of Windows business applications written in the last 15 years use managed code—most of it written in C#. Although it is difficult to categorize millions of software developers, it is fair to say that .NET developers often come from nontraditional backgrounds. Perhaps a developer came to .NET from a BCSC degree but it is equally likely s/he started writing VBA scripts in Excel, moving up to Access applications, and then into VB.NET/C# applications. Therefore, most .NET developers are likely to be familiar with C#/VB.NET and write in an imperative and perhaps OO style. The problem with this rather narrow exposure is that most machine learning classes, books, and code examples are in R or Python and very much use a functional style of writing code. Therefore, the .NET developer is at a disadvantage when acquiring machine learning skills because of the need to learn a new development environment, a new language, and a new style of coding before learning how to write the first line of machine learning code. If, however, that same developer could use their familiar IDE (Visual Studio) and the same base libraries (the .NET Framework), they can concentrate on learning machine learning much sooner. Also, when creating machine learning models in .NET, they have immediate impact as you can slide the code right into an existing C#/VB.NET solution. On the other hand, .NET is under-represented in the data science community. There are a couple of different reasons floating around for that fact. The first is that historically Microsoft was a proprietary closed system and the academic community embraced open source systems such as Linux and Java. The second reason is that much academic research uses domain-specific languages such as R, whereas Microsoft concentrated .NET on general purpose programming languages. Research that moved to industry took their language with them. However, as the researcher's role is shifted from data science to building programs that can work at real time that customers touch, the researcher is getting more and more exposure to Windows and Windows development. Whether you like it or not, all companies which create software that face customers must have a Windows strategy, an iOS strategy, and an Android strategy. One real advantage to writing and then deploying your machine learning code in .NET is that you can get everything with one stop shopping. I know several large companies who write their models in R and then have another team rewrite them in Python or C++ to deploy them. Also, they might write their model in Python and then rewrite it in C# to deploy on Windows devices. Clearly, if you could write and deploy in one language stack, there is a tremendous opportunity for efficiency and speed to market. What version of the .NET Framework are we using? The .NET Framework has been around for general release since 2002. The base of the framework is the Common Language Runtime or CLR. The CLR is a virtual machine that abstracts much of the OS specific functionality like memory management and exception handling. The CLR is loosely based on the Java Virtual Machine (JVM). Sitting on top of the CLR is the Framework Class Library (FCL) that allows different languages to interoperate with the CLR and each other: the FCL is what allows VB.Net, C#, F#, and Iron Python code to work side-by-side with each other. Since its first release, the .NET framework has included more and more features. The first release saw support for the major platform libraries like WinForms, ASP.NET, and ADO.NET. Subsequent releases brought in things like Windows Communication Foundation (WCF), Language Integrated Query (LINQ), and Task Parallel Library (TPL). At the time of writing, the latest version is of the .Net Framework is 4.6.2. In addition to the full-Monty .NET Framework, over the years Microsoft has released slimmed down versions of the .NET Framework intended to run on machines that have limited hardware and OS support. The most famous of these releases was the Portable Class Library (PCL) that targeted Windows RT applications running Windows 8. The most recent incantation of this is Universal Windows Applications (UWA), targeting Windows 10. At Connect(); in November 2015, Microsoft announced GA of the latest edition of the .NET Framework. This release introduced the .Net Core 5. In January, they decided to rename it to .Net Core 1.0. .NET Core 1.0 is intended to be a slimmed down version of the full .NET Framework that runs on multiple operating systems (specifically targeting OS X and Linux). The next release of ASP.NET (ASP.NET Core 1.0) sits on top of .NET Core 1.0. ASP.NET Core 1.0 applications that run on Windows can still run the full .NET Framework. (https://blogs.msdn.microsoft.com/webdev/2016/01/19/asp-net-5-is-dead-int...) In this book, we will be using a mixture of ASP.NET 4.0, ASP.NET 5.0, and Universal Windows Applications. As you can guess, machine learning models (and the theory behind the models) change with a lot less frequency than framework releases so the most of the code you write on .NET 4.6 will work equally well with PCL and .NET Core 1.0. Saying that, the external libraries that we will use need some time to catch up—so they might work with PCL but not with .NET Core 1.0 yet. To make things realistic, the demonstration projects will use .NET 4.6 on ASP.NET 4.x for existing (Brownfield) applications. New (Greenfield) applications will be a mixture of a UWA using PCL and ASP.NET 5.0 applications. Why write your own? It seems like all of the major software companies are pitching machine learning services such as Google Analytics, Amazon Machine Learning Services, IBM Watson, Microsoft Cortana Analytics, to name a few. In addition, major software companies often try to sell products that have a machine learning component, such as Microsoft SQL Server Analysis Service, Oracle Database Add-In, IBM SPSS, or SAS JMP. I have not included some common analytical software packages such as PowerBI or Tableau because they are more data aggregation and report writing applications. Although they do analytics, they do not have a machine learning component (not yet at least). With all these options, why would you want to learn how to implement machine learning inside your applications, or in effect, write some code that you can purchase elsewhere? It is the classic build versus buy decision that every department or company has to make. You might want to build because: You really understand what you are doing and you can be a much more informed consumer and critic of any given machine learning package. In effect, you are building your internal skill set that your company will most likely prize. Another way to look at it, companies are not one tool away from purchasing competitive advantage because if they were, their competitors could also buy the same tool and cancel any advantage. However, companies can be one hire away or more likely one team away to truly have the ability to differentiate themselves in their market. You can get better performance by executing locally, which is especially important for real-time machine learning and can be implemented in disconnected or slow connection scenarios. This becomes particularly important when we start implementing machine learning with Internet of Things (IoT) devices in scenarios where the device has a lot more RAM than network bandwidth. Consider the Raspberry Pi running Windows 10 on a pipeline. Network communication might be spotty, but the machine has plenty of power to implement ML models. You are not beholden to any one vendor or company, for example, every time you implement an application with a specific vendor and are not thinking about how to move away from the vendor, you make yourself more dependent on the vendor and their inevitable recurring licensing costs. The next time you are talking to the CTO of a shop that has a lot of Oracle, ask him/her if they regret any decision to implement any of their business logic in Oracle databases. The answer will not surprise you. A majority of this book's code is written in F#—an open source language that runs great on Windows, Linux, and OS X. You can be much more agile and have much more flexibility in what you implement. For example, we will often re-train our models on the fly and when you write your own code, it is fairly easy to do this. If you use a third-party service, they may not even have API hooks to do model training and evaluation, so near-time model changes are impossible. Once you decide to go native, you have a choice of rolling your own code or using some of the open source assemblies out there. This book will introduce both the techniques to you, highlight some of the pros and cons of each technique, and let you decide how you want to implement them. For example, you can easily write your own basic classifier that is very effective in production but certain models, such as a neural network, will take a considerable amount of time and energy and probably will not give you the results that the open source libraries do. As a final note, since the libraries that we will look at are open source, you are free to customize pieces of it—the owners might even accept your changes. However, we will not be customizing these libraries in this book. Why open data? Many books on machine learning use datasets that come with the language install (such as R or Hadoop) or point to public repositories that have considerable visibility in the data science community. The most common ones are Kaggle (especially the Titanic competition) and the UC Irvine's datasets. While these are great datasets and give a common denominator, this book will expose you to datasets that come from government entities. The notion of getting data from government and hacking for social good is typically called open data. I believe that open data will transform how the government interacts with its citizens and will make government entities more efficient and transparent. Therefore, we will use open datasets in this book and hopefully you will consider helping out with the open data movement. Why F#? As we will be on the .NET Framework, we could use either C#, VB.NET, or F#. All three languages have strong support within Microsoft and all three will be around for many years. F# is the best choice for this book because it is unique in the .NET Framework for thinking in the scientific method and machine learning model creation. Data scientists will feel right at home with the syntax and IDE (languages such as R are also functional first languages). It is the best choice for .NET business developers because it is built right into Visual Studio and plays well with your existing C#/VB.NET code. The obvious alternative is C#. Can I do this all in C#? Yes, kind of. In fact, many of the .NET libraries we will use are written in C#. However, using C# in our code base will make it larger and have a higher chance of introducing bugs into the code. At certain points, I will show some examples in C#, but the majority of the book is in F#. Another alternative is to forgo .NET altogether and develop the machine learning models in R and Python. You could spin up a web service (such as AzureML), which might be good in some scenarios, but in disconnected or slow network environments, you will get stuck. Also, assuming comparable machines, executing locally will perform better than going over the wire. When we implement our models to do real-time analytics, anything we can do to minimize the performance hit is something to consider. A third alternative that the .NET developers will consider is to write the models in T-SQL. Indeed, many of our initial models have been implemented in T-SQL and are part of the SQL Server Analysis Server. The advantage of doing it on the data server is that the computation is as close as you can get to the data, so you will not suffer the latency of moving large amount of data over the wire. The downsides of using T-SQL are that you can't implement unit tests easily, your domain logic is moving away from the application and to the data server (which is considered bad form with most modern application architecture), and you are now reliant on a specific implementation of the database. F# is open source and runs on a variety of operating systems, so you can port your code much more easily. Getting ready for Machine Learning In this section, we will install Visual Studio, take a quick lap around F#, and install the major open source libraries that we will be using. Setting up Visual Studio To get going, you will need to download Visual Studio on a Microsoft Windows machine. As of this writing, the latest (free) version is Visual Studio 2015 Community. If you have a higher version already installed on your machine, you can skip this step. If you need a copy, head on over to the Visual Studio home page at https://www.visualstudio.com. Download the Visual Studio Community 2015 installer and execute it. Now, you will get the following screen: Select Custom installation and you will be taken to the following screen: Make sure Visual F# has a check mark next to it. Once it is installed, you should see Visual Studio in your Windows Start menu. Learning F# One of the great features about F# is that you can accomplish a whole lot with very little code. It is a very terse language compared to C# and VB.NET, so picking up the syntax is a bit easier. Although this is not a comprehensive introduction, this is going to introduce you to the major language features that we will use in this book. I encourage you to check out http://www.tryfsharp.org/ or the tutorials at http://fsharpforfunandprofit.com/ if you want to get a deeper understanding of the language. With that in mind, let's create our 1st F# project: Start Visual Studio. Navigate to File | New | Project as shown in the following screenshot: When the New Project dialog box appears, navigate the tree view to Visual F# | Windows | Console Application. Have a look at the following screenshot: Give your project a name, hit OK, and the Visual Studio Template generator will create the following boilerplate: Although Visual Studio created a Program.fs file that creates a basic console .exe application for us, we will start learning about F# in a different way, so we are going to ignore it for now. Right-click in the Solution Explorer and navigate to Add | New Item. When the Add New Item dialog box appears, select Script File. The Script1.fsx file is then added to the project. Once Script1.fsx is created, open it up, and enter the following into the file: let x = "Hello World" Highlight that entire row of code, right-click and select Execute In Interactive (or press Alt + Enter). And the F# Interactive console will pop up and you will see this: The F# Interactive is a type of REPL, which stands for Read-Evaluate-Print-Loop. If you are a .NET developer who has spent any time in SQL Server Management Studio, the F# Interactive will look very familiar to the Query Analyzer where you enter your code at the top and see how it executes at the bottom. Also, if you are a data scientist using R Studio, you are very familiar with the concept of a REPL. I have used the words REPL and FSI interchangeably in this book. There are a couple of things to notice about this first line of F# code you wrote. First, it looks very similar to C#. In fact, consider changing the code to this: It would be perfectly valid C#. Note that the red squiggly line, showing you that the F# compiler certainly does not think this is valid. Going back to the correct code, notice that type of x is not explicitly defined. F# uses the concept of inferred typing so that you don't have to write the type of the values that you create. I used the term value deliberately because unlike variables, which can be assigned in C# and VB.NET, values are immutable; once bound, they can never change. Here, we are permanently binding the name x to its value, Hello World. This notion of immutability might seem constraining at first, but it has profound and positive implications, especially when writing machine learning models. With our basic program idea proven out, let's move it over to a compliable assembly; in this case, an .exe that targets the console. Highlight the line that you just wrote, press Ctrl + C, and then open up Program.fs. Go into the code that was generated and paste it in: [<EntryPoint>] let main argv = printfn "%A" argv let x = "Hello World" 0 // return an integer exit code Then, add the following lines of code around what you just added: // Learn more about F# at http://fsharp.org // See the 'F# Tutorial' project for more help. open System [<EntryPoint>] let main argv = printfn "%A" argv let x = "Hello World" Console.WriteLine(x) let y = Console.ReadKey() 0 // return an integer exit code Press the Start button (or hit F5) and you should see your program run: You will notice that I had to bind the return value from Console.ReadKey() to y. In C# or VB.NET, you can get away with not handling the return value explicitly. In F#, you are not allowed to ignore the returned values. Although some might think this is a limitation, it is actually a strength of the language. It is much harder to make a mistake in F# because the language forces you to address execution paths explicitly versus accidentally sweeping them under the rug (or into a null, but we'll get to that later). In any event, let's go back to our script file and enter in another line of code: let ints = [|1;2;3;4;5;6|] If you send that line of code to the REPL, you should see this: val ints : int [] = [|1; 2; 3; 4; 5; 6|] This is an array, as if you did this in C#: var ints = new[] {1,2,3,4,5,6}; Notice that the separator is a semicolon in F# and not a comma. This differs from many other languages, including C#. The comma in F# is reserved for tuples, not for separating items in an array. We'll discuss tuples later. Now, let's sum up the values in our array: let summedValue = ints |> Array.sum While sending that line to the REPL, you should see this: val summedValue : int = 21 There are two things going on. We have the |> operator, which is a pipe forward operator. If you have experience with Linux or PowerShell, this should be familiar. However, if you have a background in C#, it might look unfamiliar. The pipe forward operator takes the result of the value on the left-hand side of the operator (in this case, ints) and pushes it into the function on the right-hand side (in this case, sum). The other new language construct is Array.sum. Array is a module in the core F# libraries, which has a series of functions that you can apply to your data. The function sum, well, sums the values in the array, as you can probably guess by inspecting the result. So, now, let's add a different function from the Array type: let multiplied = ints |> Array.map (fun i -> i * 2) If you send it to the REPL, you should see this: val multiplied : int [] = [|2; 4; 6; 8; 10; 12|] Array.map is an example of a high ordered function that is part of the Array type. Its parameter is another function. Effectively, we are passing a function into another function. In this case, we are creating an anonymous function that takes a parameter i and returns i * 2. You know it is an anonymous function because it starts with the keyword fun and the IDE makes it easy for us to understand that by making it blue. This anonymous function is also called a lambda expression, which has been in C# and VB.NET since .Net 3.5, so you might have run across it before. If you have a data science background using R, you are already quite familiar with lambdas. Getting back to the higher-ordered function Array.map, you can see that it applies the lambda function against each item of the array and returns a new array with the new values. We will be using Array.map (and its more generic kin Seq.map) a lot when we start implementing machine learning models as it is the best way to transform an array of data. Also, if you have been paying attention to the buzz words of map/reduce when describing big data applications such as Hadoop, the word map means exactly the same thing in this context. One final note is that because of immutability in F#, the original array is not altered, instead, multiplied is bound to a new array. Let's stay in the script and add in another couple more lines of code: let multiplyByTwo x = x * 2 If you send it to the REPL, you should see this: val multiplyByTwo : x:int -> int These two lines created a named function called multiplyByTwo. The function that takes a single parameter x and then returns the value of the parameter multiplied by 2. This is exactly the same as our anonymous function we created earlier in-line that we passed into the map function. The syntax might seem a bit strange because of the -> operator. You can read this as, "the function multiplyByTwo takes in a parameter called x of type int and returns an int." Note three things here. Parameter x is inferred to be an int because it is used in the body of the function as multiplied to another int. If the function reads x * 2.0, the x would have been inferred as a float. This is a significant departure from C# and VB.NET but pretty familiar for people who use R. Also, there is no return statement for the function, instead, the final expression of any function is always returned as the result. The last thing to note is that whitespace is important so that the indentation is required. If the code was written like this: let multiplyByTwo(x) = x * 2 The compiler would complain: Script1.fsx(8,1): warning FS0058: Possible incorrect indentation: this token is offside of context started at position (7:1). Since F# does not use curly braces and semicolons (or the end keyword), such as C# or VB.NET, it needs to use something to separate code. That separation is whitespace. Since it is good coding practice to use whitespace judiciously, this should not be very alarming to people having a C# or VB.NET background. If you have a background in R or Python, this should seem natural to you. Since multiplyByTwo is the functional equivalent of the lambda created in Array.map (fun i -> i * 2), we can do this if we want: let multiplied' = ints |> Array.map (fun i -> multiplyByTwo i) If you send it to the REPL, you should see this: val multiplied' : int [] = [|2; 4; 6; 8; 10; 12|] Typically, we will use named functions when we need to use that function in several places in our code and we use a lambda expression when we only need that function for a specific line of code. There is another minor thing to note. I used the tick notation for the value multiplied when I wanted to create another value that was representing the same idea. This kind of notation is used frequently in the scientific community, but can get unwieldy if you attempt to use it for a third or even fourth (multiplied'''') representation. Next, let's add another named function to the REPL: let isEven x = match x % 2 = 0 with | true -> "even" | false -> "odd" isEven 2 isEven 3 If you send it to the REPL, you should see this: val isEven : x:int -> string This is a function named isEven that takes a single parameter x. The body of the function uses a pattern-matching statement to determine whether the parameter is odd or even. When it is odd, then it returns the string odd. When it is even, it returns the string even. There is one really interesting thing going on here. The match statement is a basic example of pattern matching and it is one of the coolest features of F#. For now, you can consider the match statement much like the switch statement that you may be familiar within R, Python, C#, or VB.NET. I would have written the conditional logic like this: let isEven' x = if x % 2 = 0 then "even" else "odd" But I prefer to use pattern matching for this kind of conditional logic. In fact, I will attempt to go through this entire book without using an if…then statement. With isEven written, I can now chain my functions together like this: let multipliedAndIsEven = ints |> Array.map (fun i -> multiplyByTwo i) |> Array.map (fun i -> isEven i) If you send it to REPL, you should see this: val multipliedAndIsEven : string [] = [|"even"; "even"; "even"; "even"; "even"; "even"|] In this case, the resulting array from the first pipe Array.map (fun i -> multiplyByTwo i))gets sent to the next function Array.map (fun i -> isEven i). This means we might have three arrays floating around in memory: ints which is passed into the first pipe, the result from the first pipe that is passed into the second pipe, and the result from the second pipe. From your mental model point of view, you can think about each array being passed from one function into the next. In this book, I will be chaining pipe forwards frequently as it is such a powerful construct and it perfectly matches the thought process when we are creating and using machine learning models. You now know enough F# to get you up and running with the first machine learning models in this book. I will be introducing other F# language features as the book goes along, but this is a good start. As you will see, F# is truly a powerful language where a simple syntax can lead to very complex work. Third-party libraries The following are a few third-party libraries that we will cover in our book later on: Math.NET Math.NET is an open source project that was created to augment (and sometimes replace) the functions that are available in System.Math. Its home page is http://www.mathdotnet.com/. We will be using Math.Net's Numerics and Symbolics namespaces in some of the machine learning algorithms that we will write by hand. A nice feature about Math.Net is that it has strong support for F#. Accord.NET Accord.NET is an open source project that was created to implement many common machine learning models. Its home page is http://accord-framework.net/. Although the focus of Accord.NET was for computer vision and signal processing, we will be using Accord.Net extensively in this book as it makes it very easy to implement algorithms in our problem domain. Numl Numl is an open source project that implements several common machine learning models as experiments. Its home page is http://numl.net/. Numl is newer than any of the other third-party libraries that we will use in the book, so it may not be as extensive as the other ones, but it can be very powerful and helpful in certain situations. Summary We covered a lot of ground in this article. We discussed what machine learning is, why you want to learn about it in the .NET stack, how to get up and running using F#, and had a brief introduction to the major open source libraries that we will be using in this book. With all this preparation out of the way, we are ready to start exploring machine learning. Further resources on this subject: ASP.Net Site Performance: Improving JavaScript Loading [article] Displaying MySQL data on an ASP.NET Web Page [article] Creating a NHibernate session to access database within ASP.NET [article]
Read more
  • 0
  • 0
  • 11099

article-image-unit-testing-apps-android-studio
Troy Miles
15 Mar 2016
6 min read
Save for later

Unit Testing Apps with Android Studio

Troy Miles
15 Mar 2016
6 min read
We will need to create an Android app, get it all set up, then add a test project to it. Let's begin. 1. Start Android Studio and select new project. 2. Change the Application name to UTest. Click Next . 3. Click Next again. 4. Click Finish. Now that we have the project started, let’s set it up. Open the layout resource file:activity_main.xml. Add an ID to TextView. It should look as follows: <RelativeLayout android:layout_width="match_parent" android:layout_height="match_parent" android:paddingLeft="@dimen/activity_horizontal_margin" android:paddingRight="@dimen/activity_horizontal_margin" android:paddingTop="@dimen/activity_vertical_margin" android:paddingBottom="@dimen/activity_vertical_margin" tools:context="com.tekadept.utest.app.MainActivity" > <TextView android:id="@+id/greeting" android:text="@string/hello_world" android:layout_width="wrap_content" android:layout_height="wrap_content" /> </RelativeLayout> The random message Next we modify the MainActivity class. We are going to add some code that will display a random greeting message to the user. Modify MainActivity so that it looks like the following code: TextViewtxtGreeting; @Override protected void onCreate(Bundle savedInstanceState) { super .onCreate(savedInstanceState); setContentView(R.layout.activity_main); txtGreeting = (TextView)findViewById(R.id.greeting); Random rndGenerator = new Random(); int rnd = rndGenerator.nextInt(4); String greeting = getGreeting(rnd); txtGreeting.setText(greeting); } private String getGreeting(intmsgNumber) { String greeting; switch (msgNumber){ case 0: greeting = "Holamundo"; break ; case 1: greeting = "Bonjour tout le monde"; break ; case 2: greeting = "Ciao mondo"; break ; case 3: greeting = "Hallo Welt"; break ; default : greeting = "Hello world"; break ; } return greeting; } At this point, if you run the app, it should display one of four random greetings each time you run. We want to test the getGreeting method. We need to be sure that the string it returns matches the number we sent it. Currently, however, we have no way to know that. In order to add a test package, we need to hover over the package name. For my app, the package name is com.tekadept.utest.app . It is the line directly below the Java directory. The rest of the steps are as follows: Right click on the package name and choose New-> Package. Give your new package the name tests . Click OK. Right click on tests and choose New -> Java Class . Enter MainActivityTest as your name. Click OK from inside MainActivityTest. Currently, we are not extending from the proper base class. Let's fix that. Change the MainActivityTest class so it looks like the following code: package com.tekadept.utest.app.tests; import android.test.ActivityInstrumentationTestCase2; import com.tekadept.utest.app.MainActivity; public class MainActivityTestextends ActivityInstrumentationTestCase2<MainActivity>{ public MainActivityTest() { super (MainActivity.class); } } We've done two things. First, we changed the base class to ActivityInstrumentationTestCase2. Secondly, we added a constructor method. Before we can test the logic of the getGreeting method, we need to make it visible to outside classes by changing its modifier from private to public. Once we've done that, return to the MainActivityTest class and add a new method, testGetGreetings. This is shown in the following code: public void testGetGreeting() throws Exception { MainActivity activity = getActivity(); int count = 0; String result = activity.getGreeting(count); Assert.assertEquals("Holamundo", result); count = 1; result = activity.getGreeting(count); Assert.assertEquals("Bonjour tout le monde", result); count = 2; result = activity.getGreeting(count); Assert.assertEquals("Ciao mondo", result); count = 3; result = activity.getGreeting(count); Assert.assertEquals("Hallo Welt", result); } Time to test All we need to do now is create a configuration for our test package. Click Run -> Edit Configurations…. On the Run/Debug Configurations click the plus sign in the upper left hand corner. Click onAndroid Tests. For the name, enter test. Make sure the General tab is selected. For Module , choose app . For Test, choose All in Package . For Package , browse down to the test folder. The Android unit test must run on a device or emulator. I prefer having the choose dialog come up, so I've selected that option. You should select whichever option works best for you. Then click OK. At this point, you have a working app complete with a functioning unit test. To run the unit test, choose the test configuration from the drop-down menu to the left of the run button. Then click the run button. After building your app and running it on your selected device, Android Studio will show the test results. If you don't see the results, click the run button in the lower left hand corner of Android Studio. Green is good. Red means one or more tests have failed. Currently our one test should be passing, so everything should be green. In order to see a test fail, let's make a temporary change to the getGreeting method. Change the first greeting from "Holamundo" to "Adios mundo". Save your change and click the run button to run the tests again. This time the test should fail. You should see a message something like the following: The test runner shows the failure message and includes a stack trace of the failure. The first line of the stack trace shows that the test failed on line 17 of MainActivityTest . Don't forget to restore the MainActivity class' getGreeting method back to fix the failing unit test. Conclusion That is it for this post. You now know how to add a unit test package to Android Studio. If you had any trouble with this post, be sure to check out the complete source code to the UTest project on my GitHub repo at: https://github.com/Rockncoder/UTest. From 14th-20th March we're throwing the spotlight on iOS and Android, and asking you which one you think will win out in the future. Tell us - then save 50% on a selection of our very best Android and iOS titles! About the author Troy Miles, also known asthe Rockncoder, currently has fun writing full stack code with ASP.NET MVC or Node.js on the backend and web or mobile up front. He started coding over 30 years ago, cutting his teeth writing games for C64, Apple II, and IBM PCs. After burning out, he moved on to Windows system programming before catching Internet fever just before the dot net bubble burst. After realizing that mobile devices were the perfect window into backend data, he added mobile programming to his repertoire. He loves competing in hackathons and randomly posting interesting code nuggets on his blog: http://therockncoder.blogspot.com/.
Read more
  • 0
  • 0
  • 22907

article-image-understanding-alfresco-upgrade-process
Packt
15 Mar 2016
10 min read
Save for later

Understanding the Alfresco Upgrade Process

Packt
15 Mar 2016
10 min read
In this article by Vandana Pal, author of Alfresco for Administrators, we will go deeper into the concepts of Alfresco. Upgrading Alfresco is a multistep process. Based on the customizations made and the new features available in Alfresco, you need to first decide the target version of Alfresco. On the basis of the currently installed version, there could be different paths to upgrade alfresco. Upgrading Alfresco involves moving all the content (database, content store, and indexes) to the new version of it, and if there is any feature customization to made in Alfresco, it needs to be upgraded in order to make it compatible with new system. The final step involves validating the entire system and its data. Now, let's go through all of the steps in detail. (For more resources related to this topic, see here.) Choosing an upgrade path Alfresco's major versions include 2.x, 3.x, 4.x, and 5.0. 2.x is the oldest version, and 5.0 is the latest version available in Alfresco. Recently, Alfresco has stopped providing support for any version that's older than 4.x. In general standard paths if your current Alfresco system is version 2.x, then you cannot directly upgrade to 4.x or 5.0 as there are major functionality changes. Alfresco recommends that you go to an intermediate stable version before upgrading to the final version. Here are few samples to upgrade the different versions of Alfresco Enterprise: Consider a situation where the current version is 2.x and the target version is 5.0. Now, here, you cannot directly upgrade Alfresco to v5.0. First, Alfresco needs to be upgraded to the stable version of 3.x, then to 4.x, and finally you can upgrade to 5.0. Consider a situation where the current version is 3.x and the target version is 5.0. Here, again, you need to upgrade to 4.x first before you can upgrade to 5.0. As there are major changes in the database, index changes are to new Alfresco v5.0. Before deciding on any particular path, it is very important to contact Alfresco's support to get information on the complete upgrade path. Also, before you choose the target version to be upgraded, there are few important points that need to be considered: Develop a proper understanding of the new features and changes that available in the latest version of Alfresco. For example, Alfresco explorer is deprecated in v.5.0, and if your end users use explorer to very large extent, then you might want to upgrade to the latest stable 4.x version. Impact analysis of customizations that are made in the current version of Alfresco. Decide your timeline for upgrade process. Standard upgrade guidelines Any upgrade that has standard guidelines needs to followed apart from any version you are currently upgrading. Let's go through each of the steps in details. Preparing a checklist Analyzing the current system and new target upgraded system is very important for any upgrade and planning purposes. The first step to do this should be to get details about the current infrastructure, such as the clustered environment, amount of customizations that have been made, important features that are used, data size, and so on. The second step involves deciding which upgraded target version to use and understanding all the features that your current application has. Once you have these details, decide on the target version where Alfresco will be upgraded and identify all the software requirements and environment details from the Alfresco support stack. For example, when you upgrade to the latest version of Alfresco, you might have to change the JDK version, the application server needs to be upgraded, the system stack might need to changed, the Solr version may need to be upgraded, and so on. Prepare a checklist of the new system requirements and the validation process. Setting up and validating of new environment Install the new Alfresco version on a separate sever from the production system. Validate whether the installation is performed correctly and has been documented. If you have a distributed clustered environment, enable clustering and validate whether the application is working as expected. Don't use the production data yet, validate it with the blank database and content store first. If your current Alfresco system has any code customizations or extensions deployed, make sure that all the code is compatible with the new version. Also, make sure that any configuration changes executed in the old system are also configured properly in the new Alfresco server. Deploy all the customized code and configurations in the new Alfresco server and perform a regression test. Ensure that your existing application works properly with the new version of Alfresco. All these validations are very important prior to the upgrade. The data upgrade process Once you have validated whether the new Alfresco server is installed and configured properly, it is time to actually upgrade the data. Here are the steps that need to be performed to do this: Back up the data you've collected from production, which will be used for the upgrade. You need to backup from the database, Content store repository, and indexes. This will be your snapshot of data that will then be upgraded to the latest version of Alfresco. Refer to following properties in their respective files to locate the data to be backed up: dir.root=<Cotnent store location. Property configured in alfresco-global.properties> db.url =<Database connection details. Property configured in alfresco-global.properties> data.dir.root=<Solr Index location. Property configured in solrcore.properties for both workspace and archive indexes> dir.indexes=<Lucene index path. Property configured in alfresco-global.properties> Restore the database content-store and indexes on separate server. If we keep  upgraded environment separate then production server can still be used and will not be impacted. As discussed in the earlier steps on installation and validation, once the new version of the Alfresco server is completely tested with customized code and configurations, make sure that the server is stopped. Configure the following properties in the alfresco-global.properties file with regard to the restored production database and content store that were discussed in step 2. Make sure that all your test data is wiped out in the new installation: dir.root=<Restored content store location> db.username=<Set correct alfresco database username> db.password=<Alfresco database password> db.name=<Database name> db.url=<Full Database URL> Point the Solr indexes to the newly restored indexes. Remove all the test data and model files that were created in the new installation. Configure the following properties for the Solr indexes in the alfresco-global.properties file. Also, make sure that all the old Solr configurations are also copied to the new instance:    index.subsystem.name= <Set proper subsystem name> Configure the following property in solrcore.properties to point to the new indexes for both the workspace and archive:     data.dir.root=<Set the full path for the indexes> If you are upgrading to Alfresco 5.0, there are few additional steps needed to upgrade the Solr version as Alfresco v5.0 uses Solr4. We will cover these details in the latter part of the article. If you are upgrading from Alfresco prior to version 4.x, make sure that the following jbpm properties are set to true to enable the Jbpm engine. As seen in the latest version of Alfresco, the workflow engine is now changed to activiti, and then it is enabled by default:     system.workflow.engine.jbpm.enabled=true Start the Alfresco server and monitor the logs. These will provide you with details about the upgrade process. If you have a clustered environment, you can configure all the nodes with the same code and configurations. Start only one node. Later on, you can restart all the other nodes, and all of them will have upgraded data. Once the upgrade is finished and the server starts properly, validate the upgraded server. If the server is validated properly and all goes well, you can switch over to the new upgraded server and the old production server can be removed. The Solr upgrade process for Alfresco 5 Alfresco supports only Solr4. It doesn't support Lucene or the older versions of Solr anymore. To completely the upgrade to Alfresco, a system has to recalculate all the indexes with the latest versions. In case the current system uses Lucene Indexes, the system needs to be upgraded to Solr before we can upgrade it to Solr4. Let's go through the steps to upgrade Solr: If the current repository uses Lucene indexes and is on an older version prior to Alfresco 4.x or on Alfresco 4.x, the first step you need to perform is to upgrade the system to Alfresco 4.x. Set up the new Solr1 and configure it to track all the recalculated indexes. For larger repositories, this can be long process. Once your indexes are read on Solr in Alfresco 4.x the system is ready to be upgraded to Alfresco. When you upgrade Alfresco 4.x to Alfresco, you will still be able to use the old Solr version and configure Sol4 to start tracking and recalculating the indexes. During the upgrade process, copy all the Solr indexes from Alfresco 4 to the new installation of Alfresco 5 as discussed in the upgrade process. Make sure that Alfresco 5 is running properly with the older version of Solr. Once the upgrade is finished, set up the new Solr4 instance on a separate server. It is not recommended that you have both the Solr servers on the same machine as the indexing process will consume a lot of memory and CPU resources. Verify in the Alfresco admin console whether the search subsystem is still set to an older version of Solr. Now, configure solrcore.properties located in workspace-SpacesStore/conf and archive-SpacesStore/conf in the new Solr4 installation to point it to the new Alfresco server and start tracking the indexes: data.dir.root=<Index Directory location> enable.alfresco.tracking=<Set to true to start the tracking> alfresco.host=<Alfresco Server Host> alfresco.port=<Alfresco Server Port> alfresco.port.ssl=<Alfresco Server SSL Port> alfresco.cron=<Cron expression to tracking Alfresco index> You can monitor the indexing status of Solr4 using JMX or the monitor service of Solr at http://<Solr Server Host and Port>/solr4/admin/cores?action=SUMMARY&wt=xml. Once Solr4 has indexed the complete repository and its report summary count matches the node count in the repository, switch the search subsystem from Solr to Solr4 using the Afresco Admin Console or by configuring the following property in the alfresco-global.properties file. Stop the older Solr version like this: index.subsystem.name=solr4 dir.keystore=<New solr certificate location> solr.port.ssl=<Solr SSL port> Summary In this article, we took a look at the different upgrading process for Alfresco in detail. We also talked about choosing an upgrade path, setting up a new environment, and the validation of a new environment. Resources for Article: Further resources on this subject: The Alfresco Platform [article] Alfresco Web Scrpits [article] Understanding WebSockets and Server-sent Events in Detail [article]
Read more
  • 0
  • 0
  • 2335

Packt
14 Mar 2016
8 min read
Save for later

Watson Analytics – Predict

Packt
14 Mar 2016
8 min read
In this article by James Miller, author of the book Learning IBM Watson Analytics, we will discuss the mining insights—those previously unknown—from your data. This typically requires complex modeling using sophisticated algorithms to process the data. With Watson though, you don't have to know which statistical test to run on your data or even how any of the algorithms actually work. The method you use with Watson is so much simpler: identify/refine your data, create a prediction, and then view the results—that's it! We have already covered identifying and refining data, so let's now look at predictions and how one would create a prediction. First, think of predictions as your virtual folders for each predictive analysis effort you are working on. Here, you identify your data, specify field properties within the data, and select targets and inputs. After you create the prediction, you can view it to see the output from the analysis. The output consists of visual and text insights. (For more resources related to this topic, see here.) Creating a Watson prediction The steps for creating a Watson prediction are straightforward: Starting on the Welcome page, click on Predict, as shown in the following screenshot: Next, on the Create new prediction dialog, you select a previously uploaded dataset from the list (or upload new data) that you want Watson Analytics to analyze: On the Create a new analysis page (shown in the next screenshot) we set some attributes for our prediction by the following ways: Giving it a name by entering it the Name your workbook field. Targets are the fields you may be most interested in and want to know more about. These are the fields that are perhaps influenced by other fields in the data. When creating a new prediction, Watson defines default targets and field properties for you, which you can remove (by clicking on the Delete icon next to it), and then add your own choices (by clicking on Select target). Keep in mind that all predictions must have at least one target (and up to five). Finally, click on Create. Once you have clicked on Create, Watson will generate the prediction. The following screenshot shows a prediction generated based on a Watson sample dataset: Viewing the results of a prediction Once a Watson prediction has been generated, you can view its results. Predictor visualization bar Across the top of the prediction page is the Top Predictors Bar (shown in the following screenshot), where you can click on To select a particular predictor that is interesting to you. Main Insights On the Main Insight section of the prediction page (shown below for our example), you can examine the top insights that Watson was able to derive from the data. Details From the Main Insights section, you can access (by clicking on the top predictor found; this is shown circled below) the Details page, which gives you the ability to drill into the details for individual fields and interactions of your prediction. Customization After you view the results, you might want to customize the prediction to refine the analysis to produce additional insights. IBM Watson allows you to change the number of targets and see the effect of the change on the prediction results. In addition, Watson allows you to save your updated prediction or revert at any time to any particular version as desired. Watson Analytics Assemble The Watson Assemble feature is where you can actually organize or assemble the most interesting or otherwise important artifacts exposed while using Watson to predict or to explore your data files (as well as other items collected or otherwise set aside during previous Assemble sessions). This, in a way, is where you can do some programming to create powerful methods of conveying information to others. Watson breaks assembly into two types, Views and Dashboards, both of which are made up of visualizations (visualizations are defined as a graph, chart, plot, table, map, or any other visual representation of data). Views Views are customizable containers for dashboards (defined below) and stories (sets of views over time). Dashboards Dashboards are a specific type of view that help monitor events or activities at a glance. A little help To make it easier to assemble your views and dashboards, Watson Analytics provides you with templates that contain predefined layouts and grid lines for easy arrangement and alignment of the visualizations in a view. As we did with predictions earlier, let's take a look at how the Assemble process works. From the main or welcome page, click on the plus or Add New icon (shown in the image below) and then click on Assemble: While creating a new Assemble, you'll need to choose a data file (shown in the image below) from the list displayed on the Create new view dialog (of course, you can also upload a new file). Once you select which data file you want to use (simply by clicking on the filename), Watson shows you the Create View page, as shown in the following screenshot: Notice that the Name your view field defaults to the name of the file that you selected, and you'll want to change that. Click in the textbox provided and type an appropriate name for what you are creating: Once you have entered a name for your view, you'll need to decide whether you'd like to assemble a Dashboard or a Story. Along the left side of the page (under Select a template), you can scroll vertically through a list of content types that you can use to organize your visualizations. We'll get much deeper into the process of assembling, but for now, let's select Dashboard (by clicking on the word Dashboard) and then Single Page layout (by double-clicking on the highlighted rectangle labeled Freeform). Watson will save your new dashboard and the template with a blank canvas opened (as shown here): Notice the Data set icon (circled in the following screenshot) at the bottom of the canvas. Under the dataset icon, the Data set list icon, the name of the dataset, and data columns are displayed. The list of data columns are in what is referred to as the Data tray. If you click on the Data set icon, the information below it is hidden; click on it again and the information reappears. Using the above, you can add columns to the canvas by Dragging them from the Data tray. Selecting a column (or multiple columns) from the Data set list. Selecting a column from a different data set. This is done by clicking on the dataset list icon and then the < icon to view and select a different dataset. Besides adding columns of data, you can add visualizations by clicking on the Visualization icon (shown in the following image) and selecting a visualization type that you want to use. Moving to the right (from the Visualizations icon), we have additional icons providing various other options. These are text, media, web page, image and shapes, each allowing you to add and enhance your dashboard view. The far-right icon (shown in the following screenshot) is the Properties icon. This icon allows you to change your dashboard's Theme and General Style. As of now, only a few themes and styles are available, but more are planned. Another option for enhancing your dashboard, should the above not be sufficient, is to access your Watson collection (by clicking on the collection icon on the far right of the main toolbar shown below) and drag selections from the collection list to the dashboard canvas. Finally, if nothing else suits your needs, you can have Watson create a new visualization based on a question you type in the What do you want to assemble? field (shown in the following screenshot): A simple use case To gain a better understanding of how to use the Watson Predict and Assemble features, let's now take a look at a simple use case. One of the best ways to learn a new tool is by using it, and to use Watson, you need data. Up to this point, we've utilized sample data for use cases that I created from various sources, but Watson has made many sample datasets available for use for your learning. To view the sample data options, simply click on Add from the main or Welcome page and then click on Sample Data: For more information about the available Watson-supplied sample data, you can go to https://community.watsonanalytics.com/resources. Summary We learned how to create prediction and to see the output from the analysis. Resources for Article: Further resources on this subject: Messaging with WebSphere Application Server 7.0 (Part 1) [article] Programming on Raspbian [article] se of macros in IBM Cognos 8 Report Studio [article]
Read more
  • 0
  • 0
  • 12835
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime
article-image-how-to-build-an-android-todo-app-with-phonegap-html-and-jquery
Robi Sen
14 Mar 2016
12 min read
Save for later

How to Build an Android To-Do App with PhoneGap, HTML and jQuery

Robi Sen
14 Mar 2016
12 min read
In this post, we are going to create a simple HTML 5, JavaScript, and CSS application then use PhoneGap to build it and turn it into an Android application, which will be useful for game development. We will learn how to structure a PhoneGap project, leverage Eclipse ADT for development, and use Eclipse as our build tool. To follow along with this post, it is useful to have a decent working knowledge of JavaScript and HTML, otherwise you might find the examples challenging. Understanding the typical workflow Before we begin developing our application, let’s look quickly at a workflow for creating a PhoneGap application. Generally you want to design your web application UI, create your HTML, and then develop your JavaScript application code. Then you should test it on your web browser to make sure everything works the way you would like it to. Finally, you will want to build it with PhoneGap and try deploying it to an emulator or mobile phone to test. And, if you plan to sell your application on an app store, you of course need to deploy it to an app store. The To-Do app For the example in this post we are going to build a simple To-Do app. The code for the whole application can be found here, but for now we will be working with two main files: the index.html and the todo.js. Usually we would create a new application using the command line argument phonegap create myapp but for this post we will just reuse the application we already made in Post 1. So, open your Eclipse ADT bundle and navigate to your project, which is most likely called HelloWorld since that’s the default app name. Now expand the application in the left pane of Eclipse and expand the www folder. You should end up seeing something like this: When PhoneGap creates an Android project it automatically creates several directories. The www directory under the root directory is where you create all your HTML, CSS, JavaScript, and store assets to be used in your project. When you build your project, using Eclipse or the command line, PhoneGap will turn your web application into your Android application. So, now that we know where to build our web application, let’s get started. Our goal is to make something that looks like the application in the following figure, which is the HTML we want to use shown in the Chrome browser: First let’s open the existing index.html file in Eclipse. We are going to totally rewrite the file so you can just delete all the existing HTML. Now let’s add the following code as shown here: <!DOCTYPE html> <html> <head> <meta charset="utf-8" /> <meta name="format-detection" content="telephone=no" /> <meta name="msapplication-tap-highlight" content="no" /> <meta name="viewport" content="user-scalable=no, initial-scale=1, maximum-scale=1, minimum-scale=1, width=device-width, height=device-height, target-densitydpi=device-dpi" /> <title>PhoneGap ToDo</title> <link rel="stylesheet" type="text/css" href="css/jquery.mobile-1.4.3.min.css"> <link rel="stylesheet" type="text/css" href="css/index.css" /> <link rel="stylesheet" type="text/css" href="css/jquery.mobile-1.0.1.custom.css?" /> <script type="text/javascript" src="js/jquery-1.11.1.min.js"></script> <script type="text/javascript"src="js/jquery.mobile-1.4.3.min.js"></script> </head> OK; there is a bunch of stuff going on in this code. If you are familiar with HTML, you can see this is where we are importing a majority of our style sheets and JavaScript. For this example we are going to make use of JQuery and JQuery Mobile. You can get JQuery from here http://jquery.com/download/ and JQuery mobile from here http://jquerymobile.com/download/, but it’s easier if you just download the files from GitHub here. Those files need to go under mytestapp/www/js. Next, download the style sheets from here on GitHub and put them in mytestapp/www/cs. You will also notice the use of the meta tag. PhoneGap uses the meta tag to help set preferences for your application such as window sizing of the application, scaling, and the like. For now this topic is too big for discussion, but we will address it in further posts. OK, with that being said, let’s work on the HTML for the GUI. Now add the code shown here: <body> <script type="text/javascript"src="js/todo.js"></script> <div id="index" data-url="index" data-role="page"> <div data-role="header"> <h1>PhoneGap ToDo</h1> </div> <div data-role="content"> <ul id="task_list" data-role="listview"> <li data-role="list-divider">Add a task</li> </ul> <form id="form_336" method="GET"> <div data-role="fieldcontain"> <label for="inp_337"></label> <input type="text" name="inp_337" id="inp_337" /> </div> <input id="add" type="button" data-icon="plus" value="Add"/> </form> </div></div> <div id="confirm" data-url="confirm" data-role="page"> <div data-role="header"> <h1>Finish Task</h1> </div> <div data-role="content"> Mark this task as<br> <a class="remove_task" href="#done" data-role="button" data-icon="delete" data-theme="f">Done</a> <a class="remove_task" href="#notdone" data-role="button" data-icon="check" data-theme="g">Not Done</a> <br><br> <a href="#index" data-role="button" data-icon="minus">Cancel</a> </div></div> <div id="done" data-url="done" data-role="page"> <div data-role="header"> <h1>Right On</h1> </div> <div data-role="content"> You did it<br><br> <a href="#index" data-role="button">Good Job</a> </div></div> <div id="notdone" data-url="notdone" data-role="page"> <div data-role="header"> <h1>Get to work!</h1> </div> <div data-role="content"> Keep at it<br><br> <a href="#index" data-role="button">Back</a> </div></div> </body> </html> This HTML should make the GUI you saw earlier in this post. Go ahead and save the HTML code. Now go to the js directory under www. Create a new file by right clicking and selecting create new file, text. Name the new file todo.js. Now open the file in Eclipse and add the following code: var todo = {}; /** Read the new task and add it to the list */ todo.add = function(event) { // Read the task from the input var task=$('input').val(); if (task) { // Add the task to array and refresh list todo.list[todo.list.length] = task; todo.refresh_list(); // Clear the input $('input').val(''); } event.prevetodoefault(); }; /** Remove the task which was marked as selected */ todo.remove = function() { // Remove from array and refresh list todo.list.splice(todo.selected,1); todo.refresh_list(); }; /** Recreate the entire list from the available list of tasks */ todo.refresh_list = function() { var $tasks = $('#task_list'), i; // Clear the existing task list $tasks.empty(); if (todo.list.length) { // Add the header $tasks.append('<li data-role="list-divider">To Do&#39;s</li>'); for (var i=0;i<todo.list.length;i++){ // Append each task var li = '<li><a data-rel="dialog" data-task="' + i + '" href="#confirm">' + todo.list[i] + '</a></li>' $tasks.append(li); } } // Add the header for addition of new tasks $tasks.append('<li data-role="list-divider">Add a task</li>'); // Use jQuery Mobile's listview method to refresh $tasks.listview('refresh'); // Store back the list localStorage.todo_list = JSON.stringify(todo.list || []); }; // Initialize the index page $(document).delegate('#index','pageinit', function() { // If no list is already present, initialize it if (!localStorage.todo_list) { localStorage.todo_list = "[]"; } // Load the list by parsing the JSON from localStorage todo.list = JSON.parse(localStorage.todo_list); $('#add').bind('vclick', todo.add); $('#task_list').on('vclick', 'li a', function() { todo.selected = $(this).data('task'); }); // Refresh the list everytime the page is reloaded $('#index').bind('pagebeforeshow', todo.refresh_list); }); // Bind the 'Done' and 'Not Done' buttons to task removal $(document).delegate('#confirm', 'pageinit', function(){ $('.remove_task').bind('vclick', todo.remove); }); // Make the transition in reverse for the buttons on the done and notdone pages $(document).delegate('#done, #notdone', 'pageinit', function(){ // We reverse transition for any button linking to index page $('[href="#index"]').attr('data-direction','reverse'); }) What todo.js does is store the task list as a JavaScript array. We then just create simple functions to add or remove from the array and then a function to update the list. To allow us to persist the task list we use HTML 5’s localStorage (for information on localStorage go here) to act like a simple data base and store simple name/value pairs directly in the browser. Because of this, we don’t need to use an actual database like SQLite or a custom file storage option. Now save the file and try out the application in your browser. Try playing with the application a bit to test out how it’s working. Once you can confirm that it’s working, build and deploy the application in the Android emulate via Eclipse. To do this create a custom “builder” in Eclipse to allow you to easily build or rebuild your PhoneGap applications each time you make want to make changes. Making Eclipse auto-build your PhoneGap apps One of the reasons we want to use the Eclipse ADT with PhoneGap is that we can simplify our workflow, assuming you’re doing most of your work targeting Android devices, by being able to do all of our web development, potentially native Android develop, testing, and building, all through Eclipse. Doing this, though, is not covered in the PhoneGap documentation and can cause a lot of confusion, since most people assume you have to use the PhoneGap CLI command line interface to do all the application building. To make your application auto-build, first right-click on the application and select Properties. Then select Builders. Now select New, which will pop up a configuration type screen. On this screen select Program. You should now see the Edit Configuration screen: Name the new builder “PhoneGap Builder” and for the location field select Browse File System and navigate to /android/cordova/build.bat under our mytestapp folder. Then, for a working directory, you will want to put in the path to your mytestapp root directory. Finally, you’ll want to use the argument - -local. Then select ok. What this will do is that every time you build the application in Eclipse it will run the build.bat file with the —local argument. This will build the .apk and update the project with your latest changes made in the application www directory. For this post that would be mytestappwww. Also, if you made any changes to the Android source code, which we will not in this post, those changes will be updated and applied to the APK build. Now that we have created a new builder, right-click on the project in the selected build. The application should now take a few seconds and then build. Once it has completed building, go ahead and select the project again and select Run As an Android application. Like what was shown in Post 1, expect this to take a few minutes as Eclipse starts the Android emulator and deploys the new Android app (you can find your Android app in mytestappplatformsandroidbin). You should now see something like the following: Go ahead and play around with the application. Summary In this post, you learned how to use PhoneGap and the Eclipse ADT to build your first real web application with HTML 5 and JQuery and then deploy it as a real Android application. You also used JQuery and HTML 5’s localStorage to simplify the creation of your GUI. Try playing around with your application and clean up the UI with CSS. In our next post we will dive deeper into working with PhoneGap to make our application more sophisticated and add additional capabilities using the phone’s camera and other sensors. About the author Robi Sen, CSO at Department 13, is an experienced inventor, serial entrepreneur, and futurist whose dynamic twenty-plus year career in technology, engineering, and research has led him to work on cutting edge projects for DARPA, TSWG, SOCOM, RRTO, NASA, DOE, and the DOD. Robi also has extensive experience in the commercial space, including the co-creation of several successful start-up companies. He has worked with companies such as UnderArmour, Sony, CISCO, IBM, and many others to help build out new products and services. Robi specializes in bringing his unique vision and thought process to difficult and complex problems allowing companies and organizations to find innovative solutions that they can rapidly operationalize or go to market with.
Read more
  • 0
  • 0
  • 18849

article-image-animation-framework-canvas
Soham Kamani
14 Mar 2016
8 min read
Save for later

Animation Framework for Canvas

Soham Kamani
14 Mar 2016
8 min read
The HTML <canvas> element is good for many things, one of them being animation. But often times, the code you write to implement this animation can get a little messy, to put it politely. To get an idea of what I mean by this, take a look at this simple example that shows you the most basic animation of moving a rectangle in one dimension from one point to the other. Although the code in that example is not that complex, once you try to imagine adding another square or circle in there, or changing the motion from a straight line to a curve, you can see how the complexity adds up and how it would be a nightmare to implement. This post will brief you on how to use object-oriented programming in JavaScript in the latest standard (ES6) to make animation in canvas less of a headache. Project structure and prerequisites Most browsers don't support all the specifications of ES6, so we will have to set up a build environment that will transpile all our ES6 code into ES5. Fortunately, this is really easy to do and there are a lot of awesome tutorials out there to help you get set up. We will also be using a node module called object-assign a lot. This is just an implementation of the native Object.assign for environments that don't support it. Our source file structure will look something like this: src ├── Component.js ├── Renderer.js ├── drawings │ └── Square.js ├── index.js └── motions └── LinearMotion.js Building blocks Let's see a top-down approach to making our animation more manageable. Firstly, we need a renderer. This will be responsible for updating and painting all our components on to the canvas. Nothing more, nothing less: //src/Renderer.js 'use strict'; let Renderer = function(canvasId){ const canvas = document.getElementById(canvasId); let self = this; self.canvas = canvas; self.ctx = canvas.getContext('2d'); self.components = []; }; module.exports = Renderer; Renderer.prototype.addComponent = function(drawObject){ this.components.push(drawObject); }; Renderer.prototype.update = function(){ this.components.forEach(component => { component.update(); }); }; Renderer.prototype.paint = function(){ let self = this; self.ctx.clearRect(0,0,self.canvas.width,self.canvas.height); self.ctx.beginPath(); this.components.forEach(component => { component.draw(self.ctx); }); self.ctx.stroke(); }; The Renderer has constructor methods: constructor - This initializes the canvas element and the 2d context to be used for drawing. addComponent - This pushes a "Component" into a list of components to be drawn. update - This calls the update method of each component. paint - This clears and repaints the canvas, by passing the context to the draw method of each component. NOTE We use polymorphism in JavaScript to achieve the functionality in Renderer. We expect each Component object to have an update and a draw method, but each component will have its own implementation. Simple English modeling - A renderer has a bunch of components that it can update and draw on to the canvas. Now that we have our renderer in place, we need to make the structure for a generic Component, whose methods the renderer keeps calling so often: //src/Component.js 'use strict'; import assign from 'object-assign'; let Component = function(options){ assign(this, options); }; module.exports = Component; Component.prototype.update = function(){ let {motion, drawing} = this; motion.move(); assign(drawing.position ,motion.getCurrentPosition()); }; Component.prototype.draw = function(ctx){ let {drawing} = this; drawing.draw(ctx); }; Pretty simple compared to Renderer. The constructor just assigns the options we pass it to this. Fundamentally, every component in animation will have two aspects that define it. The way it draws, and the way its state changes (in our case, this is represented by the way it moves, or its motion). motion and drawing are again two generic components, whose only requirement is that they implement a fixed set of methods. (move and getCurrentPosition in the case of motion, and draw in the case of drawing). Simple English modeling - A component has a drawing, which it can draw on to the canvas, and a motion, which updates its position state. In our example, we require a square which moves linearly (up and down a straight line). Let's define a Square drawing and a LinearMotion: //src/drawings/Square.js 'use strict'; import assign from 'object-assign'; let Square = function (options) { let self = this; assign(self, options); }; module.exports = Square; Square.prototype.draw = function (ctx) { let self = this; ctx.fillStyle = 'black'; ctx.rect(self.position.x, self.position.y, self.width, self.height); ctx.fill(); }; This is pretty self explanatory. A square implements a draw method, which draws a square on to the canvas based on the options you give it. ctx here is the 2d canvas context: //src/motions/LinearMotion.js 'use strict'; import assign from 'object-assign'; let LinearMotion = function (options) { assign(this, options); this.isMovingForward = true; this.distance = this.distance || this.center ; this.speed = this.speed || 2 ; }; module.exports = LinearMotion; LinearMotion.prototype.move = function () { let { center, distance, speed, maxDistanceFromCenter } = this; if(this.isMovingForward){ distance += speed; } else { distance -= speed; } let currentDistanceFromCenter = Math.abs(center - distance); if(currentDistanceFromCenter >= maxDistanceFromCenter ){ this.isMovingForward = !this.isMovingForward; } this.distance = distance; }; LinearMotion.prototype.getCurrentPosition = function(){ let x = this.distance; return { x }; }; The move method of LinearMotion has a bit of math in it, but in a nutshell, we assign a "center" and a "maximum distance from the center". On each call of the move method, we advance the distance from the center by the assigned speed, and in the appropriate direction (positive or negative depending on the direction). If the distance exceeds the maximum distance from the center, we reverse the direction. This will result in a back and fourth movement about the center. Finally, we implement the getCurrentPosition method to return only the x position as the resultant distance, meaning that our object will move back and fourth in the x direction. Putting it all together Now that we have all our building blocks and framework ready, let's put it all together: //src/index.js 'use strict'; import Renderer from './Renderer'; import Component from './Component'; import Square from './drawings/Square'; import LinearMotion from './motions/LinearMotion'; //Initialize a new renderer. "myCanvas" is the id of our HTML canvas element. const renderer = new Renderer('myCanvas'); //Initialize a new motion of type LinearMotion with center at 100pixels and maxDistanceFromCenter at 50 pixels let motion = new LinearMotion({ center : 100, maxDistanceFromCenter : 50 }); //Initialize a new Square with initial position at x = 100pixels and y= 10pixels let square = new Square({ width : 25, height : 25, position : { x : 100, y : 10 } }); //Initialize a new component and add it to the renderer. //The component would have our LinearMotion object as its motion and our square as its drawing. renderer.addComponent(new Component({ motion, drawing : square })); // this render function calls the update and paint method of our renderer. //"requestAnimationFrame" calls render 60 times each second, and is a native method present in browsers. const render = ()=>{ requestAnimationFrame(render); renderer.update(); renderer.paint(); }; render(); And that's it! Now bundle and compile this file using your favorite module bundler and insert it into your index.html file: <html> <head> <title>My Canvas Animation</title> </head> <body> <canvas id="myCanvas" width="600px" height="400px"></canvas> <script src="bundle.js" ></script> </body> </html> If all goes well, once you open your index.html file, you should get something that looks like this: Example Image Pretty cool, but we still haven't seen the full power of organizing your code properly. Let's put in one more square, but this time, we want a more natural kind of motion, something like how an object moves when oscillating on a string. Let's make a new SpringMotion constructor for this: //src/motions/SpringMotion.js 'use strict'; import assign from 'object-assign'; let SpringMotion = function(options){ assign(this, options); }; module.exports = SpringMotion; SpringMotion.prototype.move = function(){ let {a, v, s, center, k} = this; v = v || 0; let distanceFromCenter = center - s ; a = k * distanceFromCenter; v += a; s += v; assign(this,{a, v, s}); }; SpringMotion.prototype.getCurrentPosition = function(){ let {s} = this; return { x : s }; }; I won't get into the detail of this kind of motion as it involves a little bit of extra theory, which could take a whole blog post on its own. Now, all we have to do to add a new square with this spring motion is to modify index.js by adding the following code: let springSquare = new Square({ width : 25, height : 25, position : { x : 100, y : 40 } }); let springMotion = new SpringMotion({ center: 100, s: 150, k: 3e-3 }); renderer.addComponent(new Component({ motion : springMotion, drawing : springSquare })); So the only thing that we modified was the y position of the square and the type of motion. Example Image Awesome! As you can see, the motion of the second square looks much more natural, gradually slowing at the edges and speeding past the center. Additionally, we did not change any of our source files, just added another type of motion. Hopefully now, you're all set to get animating with HTML and canvas. If you're still doubtful, here's the live working example along with the complete source code. About the author Soham Kamani is a full stack web developer and electronics hobbyist. He is especially interested in JavaScript, Python, and IOT.
Read more
  • 0
  • 0
  • 2756

Packt
11 Mar 2016
11 min read
Save for later

Exploring Microsoft Dynamics NAV – An Introduction

Packt
11 Mar 2016
11 min read
In this article written by Alex Chow, author of the book Implementing Microsoft Dynamics NAV - Third Edition, we understand more on how Microsoft Dynamics NAV being an Enterprise Resource Planning (ERP) system is specifically made for growing small to mid-sized companies. (For more resources related to this topic, see here.) This is, at least, what Microsoft's marketing department says. In reality, Dynamics NAV is being used by large and publically-traded companies as well around the world. An ERP is a software that integrates the internal and external management information across an entire organization. The purpose of an ERP is to facilitate the flow of information between all business functions inside the boundaries of organizations. An ERP system is meant to handle all the functional areas within an organization on a single software system. This way, the output of an area can be used as the input of another area, without the need to duplicate data. This article will give you an idea of what Dynamics NAV is and what you can expect from it. The topics covered in this article are the following: What is Microsoft Dynamics NAV The functional areas found in Microsoft Dynamics NAV 2016 A history of Dynamics NAV Understanding Microsoft Dynamics NAV Microsoft Dynamics NAV 2016 is a Role Tailored ERP. Traditionally, ERP software is built to providing a lot of functionalities where users will need to hunt down the information. This is more of a passive approach to information in which the user will need to go somewhere within the system to retrieve information. Dynamics NAV works differently. The role-tailored experience is based on individuals within an organization, their roles, and the tasks they perform. When users first enter Dynamics NAV, they see the data needed for the daily tasks they do according to their role. Users belonging to different roles will have a different view of the system; each of them will see the functions they need to properly perform their daily tasks. Instead of the users chasing down information, the information comes to them. Here's an example of the main screen for an order processor. All the relevant information for a user who is processing sales orders are displayed in a business intelligent (BI) format: The functional areas within Dynamics NAV Dynamics NAV covers the following functional areas inside an organization: Financial management: Most of the functionalities from "off-the-shelf" accounting software can be found in this module. The functionalities include, but are not limited to, G/L budgeting, financial reporting, cash management, receivables and payables, fixed assets, VAT and tax reporting, intercompany transactions, cost accounting, consolidation, multicurrency, intrastate, and so on. Sales and marketing: This is for the companies that want to track customer orders and determine when the items can be promised to be delivered to the customer. This area covers customers, order processing, expected delivery, order promises, sales returns, pricing, contacts, marketing campaigns, and so on. Purchase: This module is required when you buy goods and services and you want to keep track of what you have ordered from your vendors and when the goods should be delivered to your door, so you can make the stuff or ship the stuff to your customers. This area includes vendors, order processing, approvals, planning, costing, and so on. Warehouse: Where are your items in your warehouse? This functional area answers this question for you. Under the warehouse area, you will find inventory, shipping and receiving, locations, warehouse bin contents, picking, put-aways, assembly, and so on. Manufacturing: The manufacturing area includes product design, bills of materials, routing, capacities, forecast, production planning, production order, costing, subcontracting, and so on. Job: This module is typically used for companies that deal with long and drawn out projects. Within this job area, you can create projects, phases and tasks, planning, time sheets, work in process, and likewise. Resource planning: If your company has internal resources that for which you keep track of cost and/or revenue, this module is for you. This area includes resources, capacity, and other tools to keep track of cost and revenue for resources. Service: This functional area is design for a company that sells items to their customers that needs to be serviced periodically, with or without warranty. Within this service area, you can manage service items, contract management, order processing, planning and dispatching, service tasks, and so on Human resources: This involves basic employee tracking. It allows you to manage employees, absences, and so on. One of the best-selling points about Dynamics NAV is that it can be customized. A brand new functional area can be created from scratch or new features can be added to an existing functional area. All the development is done with the programming language called C/AL. When someone creates a new functional area, a vertical (a wide range of functions for a specific industry) or horizontal (a wide range of functions that can be applied across an industry), they usually create it as an add-on. An add-on can be registered with Microsoft, with the appropriate fees of course. If some features are added to an existing area, usually it is a customization that will only be used on the database of the customer who asked for the feature. Making add-ons available greatly enhances the base Dynamics NAV functionalities to fit the needs of every industry in every business. One thing unique about Dynamics NAV is that the entire code is located on a single layer. Therefore, if you customize an area, you have to do it by modifying the standard code and adding code in the middle of the standard object definition. This made it a little tough to upgrade in the prior versions of Dynamics NAV. However, with the release of Dynamics NAV 2016, code upgrades can be done automatically using Power Shell! We will dive into Power Shell later. Dynamics NAV uses a three-tier architecture: SQL Server is the data tier and is used to store the data in a database. Microsoft Dynamics NAV Server is the middle or server tier, managing the entire business logic and communication. It also provides an additional layer of security between clients and the database and an additional layer for user authentication. On the client tier, we will find Windows clients and the web client. Dynamics NAV 2016 also supports other kinds of clients including Web Services (both SOAP and OData), mobile tablets, a SharePoint client through Microsoft Dynamics NAV Portal Framework, and NAS service. You can install Dynamics NAV in more complex scenarios, as you can have multiple instances of any of the core components. History of Dynamics NAV We are not historians, but we thought that it would be important to know where we come from and where we are going. Some of the current restrictions or features can be better understood if we know a bit of the history of Dynamics NAV. This is why we have added this section. Dynamics NAV was first developed by a Danish firm and the program was called Navision A/S. In 2002, Microsoft bought Navision A/S and included it in the Microsoft Business Solution division. The product has gone through several name changes. The names: Navision Financials, Navision Attain, and Microsoft Business solutions Navision Edition, have been used to refer to the product that is currently called Microsoft Dynamics NAV. Note that all the previous names included the word Navision. This is why many people keep calling it Navision instead of NAV. Prior to Dynamics NAV 2009, the development environment was actually the primary end user interface before Microsoft revamped the user interface that we call the Role Tailored Client (RTC). One of the greatest technological breakthroughs with the original Navision (the name before it was called Dynamics NAV) was that the application programming objects, the user interface, and the database resided together, in one file! Back in the late 1990s and early 2000s, no other software came close to having an efficient design like this. This was the main menu for Navision Financials version 2.0: We're now more than a decade away from 2000 and technology has changed quite a bit. Dynamics NAV has been very up to date with the latest technology that has the best impact for businesses. However, most of these improvements and updates are mostly in the backend. This is an important reason why Dynamics NAV has never faded into history. There were a couple of user interface improvements; however, largely, it mainly looks and feels very much the same as before. This is the main menu for Dynamics NAV 5.0: Then something happened. With the rise of a company called Apple, people started paying more attention to the aesthetics and the overall interface of the technology they're using. People demanded not just powerful software with a strong backend, but they also wanted an elegant design with a simple and intuitive user interface. Because of this shift in user perception, what was once the greatest innovation in accounting software since sliced bread, had become not obsolete, but outdated. When you put the old interface (called Classic Client) against some of the newer applications, even though the backend was light years ahead, the Classic Client was the ugly one. And we all know somebody who made a terrible decision based only on looks, but not really what's inside. So when NAV 2009 was introduced, the Role Tailored Client was released, which is the interface you see when you install Dynamics NAV for end users. NAV 2009 was unique in that it allowed both Classic Client and Role Tailored Client to coexist. This is mostly to appease the existing NAV gurus and users who did not want to learn the new interface. In addition, NAV 2009 replaced the classic reporting with the report definition language client-side (RDLC) reporting. RDLC reports brought in a big change because the layout of the report had to be designed in Visual Studio, outside Dynamics NAV, to bring in the advantages of SQL Server Reporting Services technology; while pages changed the way of developing the user interface. This is what NAV 2009 in the RTC looked like: At the first glance, NAV 2009 and NAV 2015 do not look too different. You will have to understand that there were significant user interface and usability changes. We can list out these changes, but if you're not already familiar with Dynamics NAV (or Navision), you'll will find this disinteresting. That grace period expired when NAV 2013 was released and the Classic Client user interface was completely removed. Microsoft basically renamed the Classic Client as Development Environment. For the foreseeable future, it looks like the Development Environment and the Windows Client environment will remain separated. Now we're at Dynamics NAV 2016, with tons of performance and usability enhancements, which is what this book is about. Functional areas The core functionalities of Dynamics NAV have not dramatically changed over the years. New functional areas have appeared and the existing ones still work as they did in the previous versions. In NAV 2009, Microsoft was focused on changing the entire architecture (for good), and NAV 2013 is the consolidation of the new architecture. NAV 2016 enhances what was released with NAV 2013. All these architectural changes were made to bring Dynamics NAV closer to the existing Microsoft technologies, namely, Microsoft Office 365, .NET, SQL Server, Azure, and so on; in the meantime, the core functionality has not undergone a drastic face-lift compared to the architecture. Microsoft has been adding small functional features and improving the existing functionalities with every new release. The base Dynamics NAV 2016 covers the following functional areas: Financial management Sales and marketing Purchase Warehouse Manufacturing Job Resource planning Service Human resources In Dynamics NAV, the financial management area is the epicenter of the entire application. The other areas are optional and their usage depends on the organization's needs. The sales and purchase areas are also commonly used within a Dynamics NAV implementation. Now let's have a closer view of each area. Summary In this article, we have seen that Dynamics NAV is an ERP system targeted at small and medium-sized companies. Dynamics NAV can be used on different environments such as the Windows client, the Web client, tablet client, the SharePoint client, or an external application that connects to Dynamics NAV via Web Services. The development environment is used to develop new features on top of Dynamics NAV. Resources for Article: Further resources on this subject: Implementing Microsoft Dynamics AX[article] Financial Management with Microsoft Dynamics AX 2012 R3[article] Getting Started with Microsoft Dynamics CRM 2013 Marketing[article]
Read more
  • 0
  • 0
  • 4099

article-image-exploring-hdfs
Packt
10 Mar 2016
17 min read
Save for later

Exploring HDFS

Packt
10 Mar 2016
17 min read
In this article by Tanmay Deshpande, the author of the book Hadoop Real World Solutions Cookbook- Second Edition, we'll cover the following recipes: Loading data from a local machine to HDFS Exporting HDFS data to a local machine Changing the replication factor of an existing file in HDFS Setting the HDFS block size for all the files in a cluster Setting the HDFS block size for a specific file in a cluster Enabling transparent encryption for HDFS Importing data from another Hadoop cluster Recycling deleted data from trash to HDFS Saving compressed data in HDFS Hadoop has two important components: Storage: This includes HDFS Processing: This includes Map Reduce HDFS takes care of the storage part of Hadoop. So, let's explore the internals of HDFS through various recipes. (For more resources related to this topic, see here.) Loading data from a local machine to HDFS In this recipe, we are going to load data from a local machine's disk to HDFS. Getting ready To perform this recipe, you should have an already Hadoop running cluster. How to do it... Performing this recipe is as simple as copying data from one folder to another. There are a couple of ways to copy data from the local machine to HDFS. Using the copyFromLocal commandTo copy the file on HDFS, let's first create a directory on HDFS and then copy the file. Here are the commands to do this: hadoop fs -mkdir /mydir1 hadoop fs -copyFromLocal /usr/local/hadoop/LICENSE.txt /mydir1 Using the put commandWe will first create the directory, and then put the local file in HDFS: hadoop fs -mkdir /mydir2 hadoop fs -put /usr/local/hadoop/LICENSE.txt /mydir2 You can validate that the files have been copied to the correct folders by listing the files: hadoop fs -ls /mydir1 hadoop fs -ls /mydir2 How it works... When you use HDFS copyFromLocal or the put command, the following things will occur: First of all, the HDFS client (the command prompt, in this case) contacts NameNode because it needs to copy the file to HDFS. NameNode then asks the client to break the file into chunks of different cluster block sizes. In Hadoop 2.X, the default block size is 128MB. Based on the capacity and availability of space in DataNodes, NameNode will decide where these blocks should be copied. Then, the client starts copying data to specified DataNodes for a specific block. The blocks are copied sequentially one after another. When a single block is copied, the block is sent to DataNode into packets that are 4MB in size. With each packet, a checksum is sent; once the packet copying is done, it is verified with checksum to check whether it matches. The packets are then sent to the next DataNode where the block will be replicated. The HDFS client's responsibility is to copy the data to only the first node; the replication is taken care by respective DataNode. Thus, the data block is pipelined from one DataNode to the next. When the block copying and replication is taking place, metadata on the file is updated in NameNode by DataNode. Exporting data from HDFS to Local machine In this recipe, we are going to export/copy data from HDFS to the local machine. Getting ready To perform this recipe, you should already have a running Hadoop cluster. How to do it... Performing this recipe is as simple as copying data from one folder to the other. There are a couple of ways in which you can export data from HDFS to the local machine. Using the copyToLocal command, you'll get this code: hadoop fs -copyToLocal /mydir1/LICENSE.txt /home/ubuntu Using the get command, you'll get this code: hadoop fs -get/mydir1/LICENSE.txt /home/ubuntu How it works... When you use HDFS copyToLocal or the get command, the following things occur: First of all, the client contacts NameNode because it needs a specific file in HDFS. NameNode then checks whether such a file exists in its FSImage. If the file is not present, the error code is returned to the client. If the file exists, NameNode checks the metadata for blocks and replica placements in DataNodes. NameNode then directly points DataNode from where the blocks would be given to client one by one. The data is directly copied from DataNode to the client machine. and it never goes through NameNode to avoid bottlenecks. Thus, the file is exported to the local machine from HDFS. Changing the replication factor of an existing file in HDFS In this recipe, we are going to take a look at how to change the replication factor of a file in HDFS. The default replication factor is 3. Getting ready To perform this recipe, you should already have a running Hadoop cluster. How to do it... Sometimes. there might be a need to increase or decrease the replication factor of a specific file in HDFS. In this case, we'll use the setrep command. This is how you can use the command: hadoop fs -setrep [-R] [-w] <noOfReplicas><path> ... In this command, a path can either be a file or directory; if its a directory, then it recursively sets the replication factor for all replicas. The w option flags the command and should wait until the replication is complete The r option is accepted for backward compatibility First, let's check the replication factor of the file we copied to HDFS in the previous recipe: hadoop fs -ls /mydir1/LICENSE.txt -rw-r--r-- 3 ubuntu supergroup 15429 2015-10-29 03:04 /mydir1/LICENSE.txt Once you list the file, it will show you the read/write permissions on this file, and the very next parameter is the replication factor. We have the replication factor set to 3 for our cluster, hence, you the number is 3. Let's change it to 2 using this command: hadoop fs -setrep -w 2 /mydir1/LICENSE.txt It will wait till the replication is adjusted. Once done, you can verify this again by running the ls command: hadoop fs -ls /mydir1/LICENSE.txt -rw-r--r-- 2 ubuntu supergroup 15429 2015-10-29 03:04 /mydir1/LICENSE.txt How it works... Once the setrep command is executed, NameNode will be notified, and then NameNode decides whether the replicas need to be increased or decreased from certain DataNode. When you are using the –w command, sometimes, this process may take too long if the file size is too big. Setting the HDFS block size for all the files in a cluster In this recipe, we are going to take a look at how to set a block size at the cluster level. Getting ready To perform this recipe, you should already have a running Hadoop cluster. How to do it... The HDFS block size is configurable for all files in the cluster or for a single file as well. To change the block size at the cluster level itself, we need to modify the hdfs-site.xml file. By default, the HDFS block size is 128MB. In case we want to modify this, we need to update this property, as shown in the following code. This property changes the default block size to 64MB: <property> <name>dfs.block.size</name> <value>67108864</value> <description>HDFS Block size</description> </property> If you have a multi-node Hadoop cluster, you should update this file in the nodes, that is, NameNode and DataNode. Make sure you save these changes and restart the HDFS daemons: /usr/local/hadoop/sbin/stop-dfs.sh /usr/local/hadoop/sbin/start-dfs.sh This will set the block size for files that will now get added to the HDFS cluster. Make sure that this does not change the block size of the files that are already present in HDFS. There is no way to change the block sizes of existing files. How it works... By default, the HDFS block size is 128MB for Hadoop 2.X. Sometimes, we may want to change this default block size for optimization purposes. When this configuration is successfully updated, all the new files will be saved into blocks of this size. Ensure that these changes do not affect the files that are already present in HDFS; their block size will be defined at the time being copied. Setting the HDFS block size for a specific file in a cluster In this recipe, we are going to take a look at how to set the block size for a specific file only. Getting ready To perform this recipe, you should already have a running Hadoop cluster. How to do it... In the previous recipe, we learned how to change the block size at the cluster level. But this is not always required. HDFS provides us with the facility to set the block size for a single file as well. The following command copies a file called myfile to HDFS, setting the block size to 1MB: hadoop fs -Ddfs.block.size=1048576 -put /home/ubuntu/myfile / Once the file is copied, you can verify whether the block size is set to 1MB and has been broken into exact chunks: hdfs fsck -blocks /myfile Connecting to namenode via http://localhost:50070/fsck?ugi=ubuntu&blocks=1&path=%2Fmyfile FSCK started by ubuntu (auth:SIMPLE) from /127.0.0.1 for path /myfile at Thu Oct 29 14:58:00 UTC 2015 .Status: HEALTHY Total size: 17276808 B Total dirs: 0 Total files: 1 Total symlinks: 0 Total blocks (validated): 17 (avg. block size 1016282 B) Minimally replicated blocks: 17 (100.0 %) Over-replicated blocks: 0 (0.0 %) Under-replicated blocks: 0 (0.0 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor: 1 Average block replication: 1.0 Corrupt blocks: 0 Missing replicas: 0 (0.0 %) Number of data-nodes: 3 Number of racks: 1 FSCK ended at Thu Oct 29 14:58:00 UTC 2015 in 2 milliseconds The filesystem under path '/myfile' is HEALTHY How it works... When we specify the block size at the time of copying a file, it overwrites the default block size and copies the file to HDFS by breaking the file into chunks of a given size. Generally, these modifications are made in order to perform other optimizations. Make sure you make these changes, and you are aware of their consequences. If the block size isn't adequate enough, it will increase the parallelization, but it will also increase the load on NameNode as it would have more entries in FSImage. On the other hand, if the block size is too big, then it will reduce the parallelization and degrade the processing performance. Enabling transparent encryption for HDFS When handling sensitive data, it is always important to consider the security measures. Hadoop allows us to encrypt sensitive data that's present in HDFS. In this recipe, we are going to see how to encrypt data in HDFS. Getting ready To perform this recipe, you should already have a running Hadoop cluster. How to do it... For many applications that hold sensitive data, it is very important to adhere to standards such as PCI, HIPPA, FISMA, and so on. To enable this, HDFS provides a utility called encryption zone where we can create a directory in it so that data is encrypted on writes and decrypted on read. To use this encryption facility, we first need to enable Hadoop Key Management Server (KMS): /usr/local/hadoop/sbin/kms.sh start This would start KMS in the Tomcat web server. Next, we need to append the following properties in core-site.xml and hdfs-site.xml. In core-site.xml, add the following property: <property> <name>hadoop.security.key.provider.path</name> <value>kms://http@localhost:16000/kms</value> </property> In hds-site.xml, add the following property: <property> <name>dfs.encryption.key.provider.uri</name> <value>kms://http@localhost:16000/kms</value> </property> Restart the HDFS daemons: /usr/local/hadoop/sbin/stop-dfs.sh /usr/local/hadoop/sbin/start-dfs.sh Now, we are all set to use KMS. Next, we need to create a key that will be used for the encryption: hadoop key create mykey This will create a key, and then, save it on KMS. Next, we have to create an encryption zone, which is a directory in HDFS where all the encrypted data is saved: hadoop fs -mkdir /zone hdfs crypto -createZone -keyName mykey -path /zone We will change the ownership to the current user: hadoop fs -chown ubuntu:ubuntu /zone If we put any file into this directory, it will encrypt and would decrypt at the time of reading: hadoop fs -put myfile /zone hadoop fs -cat /zone/myfile How it works... There can be various types of encryptions one can do in order to comply with security standards, for example, application-level encryption, database level, file level, and disk-level encryption. The HDFS transparent encryption sits between the database and file-level encryptions. KMS acts like proxy between HDFS clients and HDFS's encryption provider via HTTP REST APIs. There are two types of keys used for encryption: Encryption Zone Key( EZK) and Data Encryption Key (DEK). EZK is used to encrypt DEK, which is also called Encrypted Data Encryption Key(EDEK). This is then saved on NameNode. When a file needs to be written to the HDFS encryption zone, the client gets EDEK from NameNode and EZK from KMS to form DEK, which is used to encrypt data and store it in HDFS (the encrypted zone). When an encrypted file needs to be read, the client needs DEK, which is formed by combining EZK and EDEK. These are obtained from KMS and NameNode, respectively. Thus, encryption and decryption is automatically handled by HDFS. and the end user does not need to worry about executing this on their own. You can read more on this topic at http://blog.cloudera.com/blog/2015/01/new-in-cdh-5-3-transparent-encryption-in-hdfs/. Importing data from another Hadoop cluster Sometimes, we may want to copy data from one HDFS to another either for development, testing, or production migration. In this recipe, we will learn how to copy data from one HDFS cluster to another. Getting ready To perform this recipe, you should already have a running Hadoop cluster. How to do it... Hadoop provides a utility called DistCp, which helps us copy data from one cluster to another. Using this utility is as simple as copying from one folder to another: hadoop distcp hdfs://hadoopCluster1:9000/source hdfs://hadoopCluster2:9000/target This would use a Map Reduce job to copy data from one cluster to another. You can also specify multiple source files to be copied to the target. There are couple of other options that we can also use: -update: When we use DistCp with the update option, it will copy only those files from the source that are not part of the target or differ from the target. -overwrite: When we use DistCp with the overwrite option, it overwrites the target directory with the source. How it works... When DistCp is executed, it uses map reduce to copy the data and also assists in error handling and reporting. It expands the list of source files and directories and inputs them to map tasks. When copying from multiple sources, collisions are resolved in the destination based on the option (update/overwrite) that's provided. By default, it skips if the file is already present at the target. Once the copying is complete, the count of skipped files is presented. You can read more on DistCp at https://hadoop.apache.org/docs/current/hadoop-distcp/DistCp.html. Recycling deleted data from trash to HDFS In this recipe, we are going to see how recover deleted data from the trash to HDFS. Getting ready To perform this recipe, you should already have a running Hadoop cluster. How to do it... To recover accidently deleted data from HDFS, we first need to enable the trash folder, which is not enabled by default in HDFS. This can be achieved by adding the following property to core-site.xml: <property> <name>fs.trash.interval</name> <value>120</value> </property> Then, restart the HDFS daemons: /usr/local/hadoop/sbin/stop-dfs.sh /usr/local/hadoop/sbin/start-dfs.sh This will set the deleted file retention to 120 minutes. Now, let's try to delete a file from HDFS: hadoop fs -rmr /LICENSE.txt 15/10/30 10:26:26 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 120 minutes, Emptier interval = 0 minutes. Moved: 'hdfs://localhost:9000/LICENSE.txt' to trash at: hdfs://localhost:9000/user/ubuntu/.Trash/Current We have 120 minutes to recover this file before it is permanently deleted from HDFS. To restore the file to its original location, we can execute the following commands. First, let's confirm whether the file exists: hadoop fs -ls /user/ubuntu/.Trash/Current Found 1 items -rw-r--r-- 1 ubuntu supergroup 15429 2015-10-30 10:26 /user/ubuntu/.Trash/Current/LICENSE.txt Now, restore the deleted file or folder; it's better to use the distcp command instead of copying each file one by one: hadoop distcp hdfs //localhost:9000/user/ubuntu/.Trash/Current/LICENSE.txt hdfs://localhost:9000/ This will start a map reduce job to restore data from the trash to the original HDFS folder. Check the HDFS path; the deleted file should be back to its original form. How it works... Enabling trash enforces the file retention policy for a specified amount of time. So, when trash is enabled, HDFS does not execute any blocks deletions or movements immediately but only updates the metadata of the file and its location. This way, we can accidently stop deleting files from HDFS; make sure that trash is enabled before experimenting with this recipe. Saving compressed data on HDFS In this recipe, we are going to take a look at how to store and process compressed data in HDFS. Getting ready To perform this recipe, you should already have a running Hadoop. How to do it... It's always good to use compression while storing data in HDFS. HDFS supports various types of compression algorithms such as LZO, BIZ2, Snappy, GZIP, and so on. Every algorithm has its own pros and cons when you consider the time taken to compress and decompress and the space efficiency. These days people prefer Snappy compression as it aims to achieve a very high speed and reasonable amount compression. We can easily store and process any number of files in HDFS. To store compressed data, we don't need to specifically make any changes to the Hadoop cluster. You can simply copy the compressed data in the same way it's in HDFS. Here is an example of this: hadoop fs -mkdir /compressed hadoop fs –put file.bz2 /compressed Now, we'll run a sample program to take a look at how Hadoop automatically uncompresses the file and processes it: hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.0.jar wordcount /compressed /compressed_out Once the job is complete, you can verify the output. How it works... Hadoop explores native libraries to find the support needed for various codecs and their implementations. Native libraries are specific to the platform that you run Hadoop on. You don't need to make any configurations changes to enable compression algorithms. As mentioned earlier, Hadoop supports various compression algorithms that are already familiar to the computer world. Based on your needs and requirements (more space or more time), you can choose your compression algorithm. Take a look at http://comphadoop.weebly.com/ for more information on this. Summary We covered major factors with respect to HDFS in this article which comprises of recipes that help us to load, extract, import, export and saving data in HDFS. It also covers enabling transparent encryption for HDFS as well adjusting block size of HDFS cluster. Resources for Article: Further resources on this subject: Hadoop and MapReduce [article] Advanced Hadoop MapReduce Administration [article] Integration with Hadoop [article]
Read more
  • 0
  • 0
  • 29624
article-image-how-set-coreos-environment
Packt
10 Mar 2016
17 min read
Save for later

How to Set Up CoreOS Environment

Packt
10 Mar 2016
17 min read
In this article, Kingston Smiler and Shantanu Agrawal, the authors of the book Learning CoreOS, explain how CoreOS can be installed on a variety of platforms such as bare metal servers, cloud providers virtual machines, physical servers, and so on. This article  describes in detail how to bring up your first CoreOS environment focusing on deploying CoreOS on a Virtual Machine. When deploying in a virtualization environment, tools such as Vagrant comes in very handy in managing the CoreOS virtual machines. Vagrant enables setting up CoreOS with multiple nodes even on single laptops or workstations easily with minimum configuration. Vagrant supports VirtualBox, a commonly used virtualization application. Both Vagrant and VirtualBox are available for multiple architecture like Intel or AMD, and operating systems such as Windows, Linux, Solaris, and Mac. This article covers setting up CoreOS on VirtualBox, VMware VSphere, and also the following topics: VirtualBox installation Introduction to Vagrant CoreOS on VMware VSphere setup Git is used for downloading all the required software mentioned in this  article. (For more resources related to this topic, see here.) Installing Git Download the latest version of Git installation as per the host operating system from https://git-scm.com/download. After the download is complete, start the installation. The installation of Git using this procedure is useful for Mac and Windows. For all Linux distributions, the Git client is available through its package manager. For example, if the operation system is CentOS, the package manager yum can be used to install Git. Installing VirtualBox Download the latest version of VirtualBox as per the host operating system and architecture from https://www.virtualbox.org/wiki/Downloads. After the download is complete, start the installation. During installation, continue with the default options. VirtualBox installation resets the host machine’s network adapters during installation. This will result in the network connection toggle. After the installation is successful, Installer will print the status of the operation. Introduction to Vagrant Vagrant provides a mechanism to install and configure a development, test, or production environment. Vagrant works along with various virtualization applications such as VirtualBox, VMware, AWS, and so on. All installation, setup information, configuration, and dependencies are maintained in a file and virtual machine can be configured and brought up using a simple Vagrant command. This also helps to automate the process of installation and configuration of machines using commonly available scripting languages. Vagrant helps in creating an environment that is exactly the same across users and deployments. Vagrant also provides simple commands to manage the virtual machines. In the context of CoreOS, Vagrant will help to create multiple machines of the CoreOS cluster with ease and with the same environment. Installing Vagrant Download and install the latest version of Vagrant from http://www.vagrantup.com/downloads. Choose default settings during installation. Vagrant configuration files The Vagrant configuration file contains the configuration and provisioning information of the virtual machines. The configuration filename is Vagrantfile and the file syntax is Ruby. The configuration file can be present in any of the directory levels starting from the current working directory. The file in the current working directory is read first, then the file (if present) in one directory level back, and so on until /. Files are merged as they are read. For most of the configuration parameters, newer settings overwrite the older settings except for a few parameters where they are appended. A Vagrantfile template and other associated files can be cloned from the GIT repository (https://github.com/coreos/coreos-vagrant.git). Run the following command from the terminal to clone the repository. Note that the procedure to start a terminal may vary from OS to OS. For example, in Windows, the terminal for running Git commands is by running Git Bash. $ git clone https://github.com/coreos/coreos-vagrant/ A directory, coreos-vagrant, is created after git clone. Along with other files associated to the git repository, the directory contains Vagrantfile, user-data.sample, and config.rb.sample. Rename user-data.sample to user-data and config.rb.sample to config.rb. git clone https://github.com/coreos/coreos-vagrant/ Cloning into 'coreos-vagrant'... remote: Counting objects: 402, done. remote: Total 402 (delta 0), reused 0 (delta 0), pack-reused 402 Receiving objects: 100% (402/402), 96.63 KiB | 31.00 KiB/s, done. Resolving deltas: 100% (175/175), done.   cd coreos-vagrant/ ls config.rb.sample*  CONTRIBUTING.md*  DCO*  LICENSE*  MAINTAINERS*  NOTICE*  README.md*  user-data.sample*  Vagrantfile* Vagrantfile contains template configuration to create and configure the CoreOS virtual machine using VirtualBox. Vagrantfile includes the config.rb file using the require directive. … CONFIG = File.join(File.dirname(__FILE__), "config.rb") …. if File.exist?(CONFIG)   require CONFIG end …   … CLOUD_CONFIG_PATH = File.join(File.dirname(__FILE__), "user-data") …       if File.exist?(CLOUD_CONFIG_PATH)         config.vm.provision :file, :source => "#{CLOUD_CONFIG_PATH}",         :destination => "/tmp/vagrantfile-user-data"         config.vm.provision :shell, :inline => "mv /tmp/vagrantfile-         user-data /var/lib/coreos-vagrant/", :privileged => true       end … Cloud-config cloud config files are special files that get executed by the cloud-init process when the CoreOS system starts or when the configuration is dynamically updated. Typically, the cloud config file contains the various OS level configuration of the docker container such as networking, user administration, systemd units, and so on. For CoreOS, user-data is the name of the cloud-config file and is present inside the base directory of the vagrant folder. The systemd units files are configuration files containing information about a process. The cloud-config file uses the YAML file format. A cloud-config file must contain #cloud-config as the first line, followed by an associative array that has zero or more of the following keys: coreos: This key provides configuration of the services provided by CoreOS. Configuration for some of the important services are described next: etc2: This key replaces the previously used etc service. The parameters for etc2 are used to generate the systemd unit drop-in file for etcd2 services. Some of the important parameters of the etc2 configuration are: discovery: This specifies the unique token used to identify all the etcd members forming a cluster. The unique token can be generated by accessing the free discovery service (https://discovery.etcd.io/new?size=<clustersize>). This is used when the discovery mechanism is used to identify cluster etcd members in cases where IP addresses of all the nodes are not known beforehand. The token generated is also called the discovery URL. The discovery service helps clusters to connect to each other using initial-advertise-peer-urls provided by each member by storing the connected etcd members, the size of the cluster, and other metadata against the discovery URL. initial-advertise-peer-urls: This specifies the member’s own peer URLs that are advertised to the cluster. The IP should be accessible to all etcd members. Depending on accessibility, a public and/or private IP can be used. advertise-client-urls: This specifies the member’s own client URLs that are advertised to the cluster. The IP should be accessible to all etcd members. Depending on accessibility, a public and/or private IP can be used. listen-client-urls: This specifies the list of self URLs on which the member is listening for client traffic. All advertised client URLs should be part of this configuration. listen-peer-urls: This specifies the list of self URLs on which the member is listening for peer traffic. All advertised peer URLs should be part of this configuration. On some platforms, the providing IP can be automated by using templating feature. Instead of providing actual IP addresses, the fields $public_ipv4 or $private_ipv4 can be provided. $public_ipv4 is a substitution variable for the public IPV4 address of the machine. $private_ipv4 is a substitution variable for the private IPV4 address of the machine. The following is sample coreos configuration in the cloud-config file: #cloud-config coreos:   etcd2:     discovery: https://discovery.etcd.io/d54166dee3e709cf35b0d78913621df6     # multi-region and multi-cloud deployments need to use     $public_ipv4     advertise-client-urls: http://$public_ipv4:2379     initial-advertise-peer-urls: http://$private_ipv4:2380     # listen on both the official ports and the legacy ports     # legacy ports can be omitted if your application doesn't     depend on them     listen-client-urls:     http://0.0.0.0:2379,http://0.0.0.0:4001    listen-peer-urls: http://$private_ipv4:2380,http://$private_ipv4:7001 fleet: The parameters for fleet are used to generate environment variables for the fleet service. The fleet service manages the running of containers on clusters. Some of the important parameters of the fleet configuration are: etcd_servers: This provides the list of ULRs through which etcd services can be reached. The URLs configured should be one of the listen-client-urls for etcd services. public_ip: The IP address that should be published with the local machine’s state. The following is a sample fleet configuration in the cloud-config file: #cloud-config   fleet:     etcd_servers: http:// $public_ipv4:2379,http:// $public_ipv4:4001 public-ip: $public_ipv4  flannel: The parameters for flannel are used to generate environment variables for the flannel service. The flannel service provides communication between containers. locksmith: The parameters for locksmith are used to generate environment variables for the locksmith service. The locksmith service provides reboot management of clusters. update: These parameters manipulate settings related to how CoreOS instances are updated. Units: These parameters specify the set of systemd units that need to be started after boot-up. Some of the important parameters of unit configuration are: name: This specifies the name of the service. command: This parameter specifies the command to execute on the unit: start, stop, reload, restart, try-restart, reload-or-restart, reload-or-try-restart. enable: This flag (true/false) specifies if the Install section of the unit file has to be ignored or not. drop-ins: This contains a list of the unit’s drop-in files. Each unit information set contains name, which specifies the unit’s drop-in files, and content, which is plain text representing the unit’s drop-in file. The following is a sample unit configuration in the cloud-config file. #cloud-config   units:     - name: etcd2.service       command: start     - name: fleet.service       command: start     - name: docker-tcp.socket       command: start       enable: true       content: |         [Unit]         Description=Docker Socket for the API           [Socket]         ListenStream=2375         Service=docker.service         BindIPv6Only=both           [Install]         WantedBy=sockets.target ssh_authorized_keys: This parameter specifies the public SSH keys that will be authorized for the core user. hostname: This specifies the hostname of the member. users: This specifies the list of users to be created or updated on the member. Each user information contains name, password, homedir, shell, and so on. write_files: This specifies the list of files that are to be created on the member. Each file information contains path, permission, owner, content, and so on. manage_etc_hosts: This specifies the content of the /etc/hosts file for local name resolution. Currently, only localhost is supported. The config.rb configuration file This file contains information to configure the CoreOS cluster. This file provides the configuration value for the parameters used by Vagrantfile. Vagrantfile accesses the configuration by including the config.rb file. The following are the parameters: $num_instances: This parameter specifies the number of nodes in the cluster $shared_folders: This parameter specifies the list of shared folder paths on the host machine along with the respective path on the member $forwarded_ports: This specifies the port forwarding from the member to the host machine $vm_gui: This flag specifies if GUI is to be set up for the member $vm_memory: This parameter specifies the memory for the member in MBs $vm_cpus: This specifies the number of CPUs to be allocated for the member $instance_name_prefix: This parameter specifies the prefix to be used for the member name $update_channel: This parameter specifies the update channel (alpha, beta, and so on) for CoreOS The following is a sample config.rb file: $num_instances=1 $new_discovery_url="https://discovery.etcd.io/new?size=#{$num_instances}" # To automatically replace the discovery token on 'vagrant up', uncomment # the lines below: # #if File.exists?('user-data') && ARGV[0].eql?('up') #  require 'open-uri' #  require 'yaml' # #  token = open($new_discovery_url).read # #  data = YAML.load(IO.readlines('user-data')[1..-1].join) #  if data['coreos'].key? 'etcd' #    data['coreos']['etcd']['discovery'] = token #  end #  if data['coreos'].key? 'etcd2' #    data['coreos']['etcd2']['discovery'] = token #  end # #  # Fix for YAML.load() converting reboot-strategy from 'off' to      false` #  if data['coreos']['update'].key? 'reboot-strategy' #     if data['coreos']['update']['reboot-strategy'] == false #          data['coreos']['update']['reboot-strategy'] = 'off' #       end #  end # #  yaml = YAML.dump(data) #  File.open('user-data', 'w') { |file| file.write("#cloud-    confignn#{yaml}") } #end $instance_name_prefix="coreOS-learn" $image_version = "current" $update_channel='alpha' $vm_gui = false $vm_memory = 1024 $vm_cpus = 1 $shared_folders = {} $forwarded_ports = {} Starting a CoreOS VM using Vagrant Once the config.rb and user-config files are updated with the actual configuration parameter, execute the command vagrant up in the directory where configuration files are present to start the CoreOS VM image. Once the vagrant up command is successfully executed, the CoreOS in the VM environment is ready: vagrant up Bringing machine 'core-01' up with 'virtualbox' provider... ==> core-01: Checking if box 'coreos-alpha' is up to date... ==> core-01: Clearing any previously set forwarded ports... ==> core-01: Clearing any previously set network interfaces... ==> core-01: Preparing network interfaces based on configuration... core-01: Adapter 1: nat core-01: Adapter 2: hostonly ==> core-01: Forwarding ports... core-01: 22 => 2222 (adapter 1) ==> core-01: Running 'pre-boot' VM customizations... ==> core-01: Booting VM... ==> core-01: Waiting for machine to boot. This may take a few minutes...     core-01: SSH address: 127.0.0.1:2222 core-01: SSH username: core core-01: SSH auth method: private key       core-01: Warning: Connection timeout. Retrying... ==> core-01: Machine booted and ready! ==> core-01: Setting hostname... ==> core-01: Configuring and enabling network interfaces... ==> core-01: Machine already provisioned. Run `vagrant provision` or              use the `--provision` ==> core-01: flag to force provisioning. Provisioners marked to run              always will still run. vagrant status Current machine states:   core-01                   running (virtualbox) The VM is running. To stop this VM, you can run vagrant halt to shut it down forcefully, or you can run vagrant suspend to simply suspend the virtual machine. In either case, to restart it again, simply run vagrant up. Setting up CoreOS on VMware vSphere VMware vSphere is a server virtualization platform that uses VMware’s ESX/ESXi hypervisor. VMware VSphere provides complete platform, toolsets, and virtualization infrastructure to provide and manage virtual machines in bare metal. VMware vSphere consists of VMware vCenter Server and VMware vSphere Client. VMware vCenter Server manages the virtual as well as the physical resources. VMware vSphere Client provides a GUI to install and manage virtual machines in bare metal. Installing VMware vSphere Client Download the latest version of VMware vSphere Client installation as per the host operating system and architecture from http://vsphereclient.vmware.com/vsphereclient/1/9/9/3/0/7/2/VMware-viclient-all-5.5.0-1993072.exe. After the download is complete, start the installation. During installation, continue with the default options. Once the installation is complete, open the VMware vSphere Client application. This opens a new GUI. In the IP address / Name field, enter the IP address/hostname to directly manage a single host. Enter the IP address/hostname of vCenter Server to manage multiple hosts. In the User name and Password field, enter the username and password. Download the latest version of the CoreOS image from http://stable.release.core-os.net/amd64-usr/current/coreos_production_vmware_ova.ova. Once the download is complete, the next step is to create the VM image using the downloaded ova file. The steps to create the VM image are as follows: Open the VMware vSphere Client application. Enter IP Address, username, and password as mentioned earlier. Click on the File menu. Click on Deploy OVF Template. This opens a new wizard. Specify the location of the ova file that was downloaded earlier. Click on Next. Specify the name of the VM and Inventory location in the Name and Location tab. Specify the host/server where this VM is to be deployed in the Host/Cluster tab. Specify the location where the VM image should be stored in the Storage tab. Specify the disk format in the Disk Format tab. Click on Next. It takes a while to deploy the VM image. Once the VM image is deployed in the VMware server, we need to start the CoreOS VM with the appropriate cloud-config file having required configuration property. The cloud-config file in VMware vSphere should be specified by attaching a config-drive with filesystem labeling config-2 by attaching CD-ROMs or new drives. The steps to create a config-drive which is an iso file for VMware vSphere. The following are the commands to create the iso file in a Linux-based operating system: Create a folder, say /tmp/new-drive/openstack/latest, as follows: mkdir -p /tmp/new-drive/openstack/latest Copy the user_data file, which is the cloud-config file, into the folder: cp user_data /tmp/new-drive/openstack/latest/user_data Create the iso file using the command mkisofs as follows: mkisofs -R -V config-2 -o configdrive.iso /tmp/new-drive Once the config-drive file is created, perform the following steps to attach the config file into the VML: Transfer the iso image to the machine wherein the VMware vSphere Client program is running. Open VMware vSphere Client. Click on the CoreOS VM and go to the Summary tab of the VM as shown in the following screenshot: Right-click over the DataStore section and click on Browse Datastore. This will open a new window called Datastore Browser. Select the folder named iso. Click on the Upload file to Datastore icon. Select the iso file in the local machine and upload the iso file to the data store. The next step is to attach the iso file as a cloud-config file for the VM. Perform the following steps: Go to CoreOS VM and right-click. Click on Properties. Select CD/DVD drive 1. In the right-hand side, select Device Status as Connected as well as Connect at power on. Click on Datastore ISO File and select the uploaded iso file from the data store. Once the iso file is uploaded and attached to the VM, start the VM. The CoreOS VM  the VMware environment is ready. Summary In this article, we were able to set up and run CoreOS with a single machine using Vagrant and VirtualBox. Resources for Article: Further resources on this subject: Getting Started with etcd [article] CoreOS Networking and Flannel Internals [article] Deploying a Play application on CoreOS and Docker [article]
Read more
  • 0
  • 0
  • 3954

article-image-creating-coin-material
Packt
10 Mar 2016
7 min read
Save for later

Creating a Coin Material

Packt
10 Mar 2016
7 min read
In this article by Alan Thorn, the author of Unity 5.x By Example, the coin object, as a concept, represents a basic or fundamental unit in our game logic because the player character should be actively searching the level looking for coins to collect before a timer runs out. This means that the coin is more than mere appearance; its purpose in the game is not simply eye candy, but is functional. It makes an immense difference to the game outcome whether the coin is collected by the player or not. Therefore, the coin object, as it stands, is lacking in two important respects. Firstly, it looks dull and grey—it doesn't really stand out and grab the player's attention. Secondly, the coin cannot actually be collected yet. Certainly, the player can walk into the coin, but nothing appropriate happens in response. Figure 2.1: The coin object so far The completed CollectionGame project, as discussed in this article and the next, can be found in the book companion files in the Chapter02/CollectionGame folder. (For more resources related to this topic, see here.) In this section, we'll focus on improving the coin appearance using a material. A material defines an algorithm (or instruction set) specifying how the coin should be rendered. A material doesn't just say what the coin should look like in terms of color; it defines how shiny or smooth a surface is, as opposed to rough and diffuse. This is important to recognize and is why a texture and material refer to different things. A texture is simply an image file loaded in memory, which can be wrapped around a 3D object via its UV mapping. In contrast, a material defines how one or more textures can be combined together and applied to an object to shape its appearance. To create a new material asset in Unity, right-click on an empty area in the Project panel, and from the context menu, choose Create | Material. See Figure 2.2. You can also choose Assets | Create | Material from the application menu. Figure 2.2: Creating a material A material is sometimes called a Shader. If needed, you can create custom materials using a Shader Language or you can use a Unity add-on, such as Shader Forge. After creating a new material, assign it an appropriate name from the Project panel. As I'm aiming for a gold look, I'll name the material mat_GoldCoin. Prefixing the asset name with mat helps me know, just from the asset name, that it's a material asset. Simply type a new name in the text edit field to name the material. You can also click on the material name twice to edit the name at any time later. See Figure 2.3: Figure 2.3: Naming a material asset Next, select the material asset in the Project panel, if it's not already selected, and its properties display immediately in the object Inspector. There are lots of properties listed! In addition, a material preview displays at the bottom of the object Inspector, showing you how the material would look, based on its current settings, if it were applied to a 3D object, such as a sphere. As you change material settings from the Inspector, the preview panel updates automatically to reflect your changes, offering instant feedback on how the material would look. See the following screenshot: Figure 2.4: Material properties are changed from the Object Inspector Let's now create a gold material for the coin. When creating any material, the first setting to choose is the Shader type because this setting affects all other parameters available to you. The Shader type determines which algorithm will be used to shade your object. There are many different choices, but most material types can be approximated using either Standard or Standard (Specular setup). For the gold coin, we can leave the Shader as Standard. See the following screenshot: Figure 2.5: Setting the material Shader type Right now, the preview panel displays the material as a dull grey, which is far from what we need. To define a gold color, we must specify the Albedo. To do this, click on the Albedo color slot to display a Color picker, and from the Color picker dialog, select a gold color. The material preview updates in response to reflect the changes. Refer to the following screenshot: Figure 2.6: Selecting a gold color for the Albedo channel The coin material is looking better than it did, but it's still supposed to represent a metallic surface, which tends to be shiny and reflective. To add this quality to our material, click and drag the Metallic slider in the object Inspector to the right-hand side, setting its value to 1. This indicates that the material represents a fully metal surface as opposed to a diffuse surface such as cloth or hair. Again, the preview panel will update to reflect the change. See Figure 2.7: Figure 2.7: Creating a metallic material We now have a gold material created, and it's looking good in the preview panel. If needed, you can change the kind of object used for a preview. By default, Unity assigns the created material to a sphere, but other primitive objects are allowed, including cubes, cylinders, and torus. This helps you preview materials under different conditions. You can change objects by clicking on the geometry button directly above the preview panel to cycle through them. See Figure 2.8: Figure 2.8: Previewing a material on an object When your material is ready, you can assign it directly to meshes in your scene just by dragging and dropping. Let's assign the coin material to the coin. Click and drag the material from the Project panel to the coin object in the scene. On dropping the material, the coin will change appearance. See Figure 2.9: Figure 2.9: Assigning the material to the coin You can confirm that material assignment occurred successfully and can even identify which material was assigned by selecting the coin object in the scene and viewing its Mesh Renderer component from the object Inspector. The Mesh Renderer component is responsible for making sure that a mesh object is actually visible in the scene when the camera is looking. The Mesh Renderer component contains a Materials field. This lists all materials currently assigned to the object. By clicking on the material name from the Materials field, Unity automatically selects the material in the Project panel, making it quick and simple to locate materials. See Figure 2.10, The Mesh Renderer component lists all materials assigned to an object: Mesh objects may have multiple materials with different materials assigned to different faces. For best in-game performance, use as few unique materials on an object as necessary. Make the extra effort to share materials across multiple objects, if possible. Doing so can significantly enhance the performance of your game. For more information on optimizing rendering performance, see the online documentation at http://docs.unity3d.com/Manual/OptimizingGraphicsPerformance.html. Figure 2.10: The Mesh Renderer component lists all materials assigned to an object That's it! You now have a complete and functional gold material for the collectible coin. It's looking good. However, we're still not finished with the coin. The coin looks right, but it doesn't behave right. Specifically, it doesn't disappear when touched, and we don't yet keep track of how many coins the player has collected overall. To address this, then, we'll need to script. Summary Excellent work! In this article, you've completed the coin collection game as well as your first game in Unity. Resources for Article: Further resources on this subject: Animation features in Unity 5 [article] Saying Hello to Unity and Android [article] Learning NGUI for Unity [article]
Read more
  • 0
  • 0
  • 18617

article-image-vm-it-not-what-you-think
Packt
10 Mar 2016
10 min read
Save for later

VM, It Is Not What You Think!

Packt
10 Mar 2016
10 min read
In this article by Iwan 'e1' Rahabok, the author of the book VMware Performance and Capacity Management, Second Edition, we will look at why a seemingly simple technology, a virtualized x86 machine, has huge ramifications for the IT industry. In fact, it is turning a lot of things upside down and breaking down silos that have existed for decades in large IT organizations. We will cover the following topics: Why virtualization is not what we think it is Virtualization versus partitioning A comparison between a physical server and a virtual machine (For more resources related to this topic, see here.) Our journey into the virtual world A virtual machine, or simply, VM - who doesn't know what it is? Even a business user who has never seen one knows what it is. It is just a physical server, virtualized. Nothing more. Wise men say that small leaks sink the ship. This is a good way to explain why IT departments that manage physical servers well struggle when the same servers are virtualized. We can also use the Pareto principle (80/20 rule). 80 percent of a VM is identical to a physical server. But it's the 20 percent of difference that hits you. We will highlight some of this 20 percent portion, focusing on areas that impact data center management. The change caused by virtualization is much larger than the changes brought about by previous technologies. In the past two or more decades, we transitioned from mainframes to the client/server-based model and then to the web-based model. These are commonly agreed upon as the main evolutions in IT architecture. However, all of these are just technological changes. They changed the architecture, yes, but they did not change the operation in a fundamental way. Both the client-server and web shifts did not talk about the journey. There was no journey to the client-server based model. However, with virtualization, we talk about the journey. It is a journey because the changes are massive and involve a lot of people. In 2007, Gartner correctly predicted the impact of virtualization (http://www.gartner.com/newsroom/id/505040). More than 8 years later, we are still in the midst of the journey. Proving how pervasive the change is, here is the summary on the article from Gartner: Notice how Gartner talks about a change in culture. Virtualization has a cultural impact too. In fact, if your virtualization journey is not fast enough, look at your organization's structure and culture. Have you broken the silos? Do you empower your people to take risks and do things that have never been done before? Are you willing to flatten the organizational chart? The silos that have served you well are likely your number one barrier to a hybrid cloud. So why exactly is virtualization causing such a fundamental shift? To understand this, we need to go back to the basics, which is exactly what virtualization is. It's pretty common that chief information officers (CIOs) have a misconception about what it is. Take a look at the following comments. Have you seen them in your organization? VM is just a virtualized physical machine. Even VMware says that the guest OS is not aware it's virtualized and that it does not run differently. It is still about monitoring CPU, RAM, disk, network, and other resources. No difference. It is a technological change. Our management process does not have to change. All of these VMs must still feed into our main enterprise IT management system. This is how we have run our business for decades, and it works. If only life were that simple, we would all be 100-percent virtualized and have no headaches! Virtualization has been around for years, and yet, most organizations have not mastered it. The proof of mastering it is when you have completed the journey and have reached the highest level of the virtualization maturity model. Not all virtualizations are equal There are plenty of misconceptions about the topic of virtualization, especially among IT folks who are not familiar with virtualization. CIOs who have not felt the strategic impact of virtualization (be it a good or bad experience) tend to carry these misconceptions. Although virtualization looks similar to a physical system from the outside, it is completely re-architected under the hood. So, let's take a look at the first misconception: what exactly is virtualization? Because it is an industry trend, virtualization is often generalized to include other technologies that are not virtualized. This is a typical strategy by IT vendors who have similar technologies. A popular technology often branded under virtualization is hardware partitioning; since it is parked under the umbrella of virtualization, both should be managed in the same way. Since both are actually different, customers who try to manage both with a single piece of management software struggle to do well. Partitioning and virtualization are two different architectures in computer engineering, resulting in there being major differences between their functionalities. They are shown in the following screenshot: Virtualization versus partitioning With partitioning, there is no hypervisor that virtualizes the underlying hardware. There is no software layer separating the VM and the physical motherboard. There is, in fact, no VM. This is why some technical manuals for partitioning technology do not even use the term VM. The manuals use the term domain, partition, or container instead. There are two variants of partitioning technology, hardware-level and OS-level partitioning, which are covered in the following bullet points: In hardware-level partitioning, each partition runs directly on the hardware. It is not virtualized. This is why it is more scalable and has less of a performance hit. Because it is not virtualized, it has to have an awareness of the underlying hardware. As a result, it is not fully portable. You cannot move the partition from one hardware model to another. The hardware has to be built for a purpose to support that specific version of the partition. The partitioned OS still needs all the hardware drivers and will not work on other hardware if the compatibility matrix does not match. As a result, even the version of the OS matters, as it is just like a physical server. In OS-level partitioning, there is a parent OS that runs directly on the server motherboard. This OS then creates an "OS partition", where other OSes can run. We use double quotes as it is not exactly the full OS that runs inside that partition. The OS has to be modified and qualified to be able to run as a zone or container. Because of this, application compatibility is affected. This is different in a VM, where there is no application compatibility issue as the hypervisor is transparent to the guest OS. Hardware partitioning We covered the difference between virtualization and partitioning from an engineering point of view. However, does it translate into different data center architectures and operations? We will focus on hardware partitioning since there are fundamental differences between hardware partitioning and software partitioning. The use case for both is also different. Software partitioning is typically used in native cloud applications. With that, let's do a comparison between hardware partitioning and virtualization. We will start with availability. With virtualization, all VMs are protected by vSphere High Availability (vSphere HA), which provides 100 percent protection and that too without VM awareness. Nothing needs to be done at the VM layer. No shared or quorum disk and no heartbeat-network VM is required to protect a VM with basic HA. With hardware partitioning, the protection has to be configured manually, one by one for each logical partition (LPAR) or logical domain (LDOM). The underlying platform does not provide that. With virtualization, you can even go beyond five nines (99.999 percent) and move to 100 percent with vSphere Fault Tolerance. This is not possible in the partitioning approach as there is no hypervisor that replays CPU instructions. Also, because it is virtualized and transparent to the VM, you can turn the Fault Tolerance capability on and off on demand. Fault Tolerance is completely defined in the software. Another area of difference between partitioning and virtualization is disaster recovery (DR). With partitioning technology, the DR site requires another instance to protect the production instance. It is a different instance, with its own OS image, hostname, and IP address. Yes, we can perform a Storage Area Network (SAN) boot, but that means another Logical Unit Number (LUN) is required to manage, zone, replicate, and so on. Disaster recovery is not scalable to thousands of servers. To make it scalable, it has to be simpler. Compared to partitioning, virtualization takes a different approach. The entire VM fits inside a folder; it becomes like a document and we migrate the entire folder as if the folder is one object. This is what vSphere Replication or Site Recovery Manager do. They perform a replication per VM; there is no need to configure a SAN boot. The entire DR exercise, which can cover thousands of virtual servers, is completely automated and has audit logs automatically generated. Many large enterprises have automated their DR with virtualization. There is probably no company that has automated DR for their entire LPAR, LDOM, or container. In the previous paragraph, we're not implying LUN-based or hardware-based replication as inferior solutions. We're merely driving the point that virtualization enables you to do things differently. We're also not saying that hardware partitioning is an inferior technology. Every technology has its advantages and disadvantages and addresses different use cases. Before joining VMware, the author was a Sun Microsystems sales engineer for five years, so he is aware of the benefits of UNIX partitioning. This article is merely trying to dispel the misunderstanding that hardware partitioning equals virtualization. OS partitioning We've covered the differences between hardware partitioning and virtualization. Let's switch gear to software partitioning. In 2016, the adoption of Linux containers will continue its rapid rise. You can actually use both containers and virtualization, and they complement each other in some use cases. There are two main approaches to deploying containers: Run them directly on bare metal Run them inside a virtual machine As both technologies evolve, the gap gets wider. As a result, managing a software partition is different from managing a VM. Securing a container is different to securing a VM. Be careful when opting for a management solution that claims to manage both. You will probably end up with the most common denominator. This is one reason why VMware is working on vSphere Integrated Containers and the Photon platform. Now that's a separate topic by itself! Summary We hope you enjoyed the comparison and found it useful. We covered, to a great extent, the impact caused by virtualization and the changes it introduces. We started by clarifying that virtualization is a different technology compared to partitioning. We then explained that once a physical server is converted to a virtual machine, it takes on a different form and has radically different properties. Resources for Article: Further resources on this subject: Deploying New Hosts with vCenter [article] VMware vCenter Operations Manager Essentials - Introduction to vCenter Operations Manager [article] VMware vRealize Operations Performance and Capacity Management [article]
Read more
  • 0
  • 0
  • 15101
article-image-python-data-structures
Packt
09 Mar 2016
23 min read
Save for later

Python Data Structures

Packt
09 Mar 2016
23 min read
In this article written by Dusty Phillips, author of the book Python 3 Object-oriented Programming - Second Edition we'll be discussing the object-oriented features of data structures, when they should be used instead of a regular class, and when they should not be used. In particular, we'll be covering: Tuples and named tuples Dictionaries (For more resources related to this topic, see here.) Empty objects Let's start with the most basic Python built-in, one that we've seen many times already, the one that we've extended in every class we have created: the object. Technically, we can instantiate an object without writing a subclass: >>> o = object() >>> o.x = 5 Traceback (most recent call last):   File "<stdin>", line 1, in <module> AttributeError: 'object' object has no attribute 'x' Unfortunately, as you can see, it's not possible to set any attributes on an object that was instantiated directly. This isn't because the Python developers wanted to force us to write our own classes, or anything so sinister. They did this to save memory; a lot of memory. When Python allows an object to have arbitrary attributes, it takes a certain amount of system memory to keep track of what attributes each object has, for storing both the attribute name and its value. Even if no attributes are stored, memory is allocated for potential new attributes. Given the dozens, hundreds, or thousands of objects (every class extends object) in a typical Python program; this small amount of memory would quickly become a large amount of memory. So, Python disables arbitrary properties on object, and several other built-ins, by default. It is possible to restrict arbitrary properties on our own classes using slots. You now have a search term if you are looking for more information. In normal use, there isn't much benefit to using slots, but if you're writing an object that will be duplicated thousands of times throughout the system, they can help save memory, just as they do for object. It is, however, trivial to create an empty object class of our own; we saw it in our earliest example: class MyObject:     pass And, as we've already seen, it's possible to set attributes on such classes: >>> m = MyObject() >>> m.x = "hello" >>> m.x 'hello' If we wanted to group properties together, we could store them in an empty object like this. But we are usually better off using other built-ins designed for storing data. It has been stressed that classes and objects should only be used when you want to specify both data and behaviors. The main reason to write an empty class is to quickly block something out, knowing we'll come back later to add behavior. It is much easier to adapt behaviors to a class than it is to replace a data structure with an object and change all references to it. Therefore, it is important to decide from the outset if the data is just data, or if it is an object in disguise. Once that design decision is made, the rest of the design naturally falls into place. Tuples and named tuples Tuples are objects that can store a specific number of other objects in order. They are immutable, so we can't add, remove, or replace objects on the fly. This may seem like a massive restriction, but the truth is, if you need to modify a tuple, you're using the wrong data type (usually a list would be more suitable). The primary benefit of tuples' immutability is that we can use them as keys in dictionaries, and in other locations where an object requires a hash value. Tuples are used to store data; behavior cannot be stored in a tuple. If we require behavior to manipulate a tuple, we have to pass the tuple into a function (or method on another object) that performs the action. Tuples should generally store values that are somehow different from each other. For example, we would not put three stock symbols in a tuple, but we might create a tuple of stock symbol, current price, high, and low for the day. The primary purpose of a tuple is to aggregate different pieces of data together into one container. Thus, a tuple can be the easiest tool to replace the "object with no data" idiom. We can create a tuple by separating the values with a comma. Usually, tuples are wrapped in parentheses to make them easy to read and to separate them from other parts of an expression, but this is not always mandatory. The following two assignments are identical (they record a stock, the current price, the high, and the low for a rather profitable company): >>> stock = "FB", 75.00, 75.03, 74.90 >>> stock2 = ("FB", 75.00, 75.03, 74.90) If we're grouping a tuple inside of some other object, such as a function call, list comprehension, or generator, the parentheses are required. Otherwise, it would be impossible for the interpreter to know whether it is a tuple or the next function parameter. For example, the following function accepts a tuple and a date, and returns a tuple of the date and the middle value between the stock's high and low value: import datetime def middle(stock, date):     symbol, current, high, low = stock     return (((high + low) / 2), date)   mid_value, date = middle(("FB", 75.00, 75.03, 74.90),         datetime.date(2014, 10, 31)) The tuple is created directly inside the function call by separating the values with commas and enclosing the entire tuple in parenthesis. This tuple is then followed by a comma to separate it from the second argument. This example also illustrates tuple unpacking. The first line inside the function unpacks the stock parameter into four different variables. The tuple has to be exactly the same length as the number of variables, or it will raise an exception. We can also see an example of tuple unpacking on the last line, where the tuple returned inside the function is unpacked into two values, mid_value and date. Granted, this is a strange thing to do, since we supplied the date to the function in the first place, but it gave us a chance to see unpacking at work. Unpacking is a very useful feature in Python. We can group variables together to make storing and passing them around simpler, but the moment we need to access all of them, we can unpack them into separate variables. Of course, sometimes we only need access to one of the variables in the tuple. We can use the same syntax that we use for other sequence types (lists and strings, for example) to access an individual value: >>> stock = "FB", 75.00, 75.03, 74.90 >>> high = stock[2] >>> high 75.03 We can even use slice notation to extract larger pieces of tuples: >>> stock[1:3] (75.00, 75.03) These examples, while illustrating how flexible tuples can be, also demonstrate one of their major disadvantages: readability. How does someone reading this code know what is in the second position of a specific tuple? They can guess, from the name of the variable we assigned it to, that it is high of some sort, but if we had just accessed the tuple value in a calculation without assigning it, there would be no such indication. They would have to paw through the code to find where the tuple was declared before they could discover what it does. Accessing tuple members directly is fine in some circumstances, but don't make a habit of it. Such so-called "magic numbers" (numbers that seem to come out of thin air with no apparent meaning within the code) are the source of many coding errors and lead to hours of frustrated debugging. Try to use tuples only when you know that all the values are going to be useful at once and it's normally going to be unpacked when it is accessed. If you have to access a member directly or using a slice and the purpose of that value is not immediately obvious, at least include a comment explaining where it came from. Named tuples So, what do we do when we want to group values together, but know we're frequently going to need to access them individually? Well, we could use an empty object, as discussed in the previous section (but that is rarely useful unless we anticipate adding behavior later), or we could use a dictionary (most useful if we don't know exactly how many or which specific data will be stored), as we'll cover in the next section. If, however, we do not need to add behavior to the object, and we know in advance what attributes we need to store, we can use a named tuple. Named tuples are tuples with attitude. They are a great way to group read-only data together. Constructing a named tuple takes a bit more work than a normal tuple. First, we have to import namedtuple, as it is not in the namespace by default. Then, we describe the named tuple by giving it a name and outlining its attributes. This returns a class-like object that we can instantiate with the required values as many times as we want: from collections import namedtuple Stock = namedtuple("Stock", "symbol current high low") stock = Stock("FB", 75.00, high=75.03, low=74.90) The namedtuple constructor accepts two arguments. The first is an identifier for the named tuple. The second is a string of space-separated attributes that the named tuple can have. The first attribute should be listed, followed by a space (or comma if you prefer), then the second attribute, then another space, and so on. The result is an object that can be called just like a normal class to instantiate other objects. The constructor must have exactly the right number of arguments that can be passed in as arguments or keyword arguments. As with normal objects, we can create as many instances of this "class" as we like, with different values for each. The resulting namedtuple can then be packed, unpacked, and otherwise treated like a normal tuple, but we can also access individual attributes on it as if it were an object: >>> stock.high 75.03 >>> symbol, current, high, low = stock >>> current 75.00 Remember that creating named tuples is a two-step process. First, use collections.namedtuple to create a class, and then construct instances of that class. Named tuples are perfect for many "data only" representations, but they are not ideal for all situations. Like tuples and strings, named tuples are immutable, so we cannot modify an attribute once it has been set. For example, the current value of my company's stock has gone down since we started this discussion, but we can't set the new value: >>> stock.current = 74.98 Traceback (most recent call last):   File "<stdin>", line 1, in <module> AttributeError: can't set attribute If we need to be able to change stored data, a dictionary may be what we need instead. Dictionaries Dictionaries are incredibly useful containers that allow us to map objects directly to other objects. An empty object with attributes to it is a sort of dictionary; the names of the properties map to the property values. This is actually closer to the truth than it sounds; internally, objects normally represent attributes as a dictionary, where the values are properties or methods on the objects (see the __dict__ attribute if you don't believe me). Even the attributes on a module are stored, internally, in a dictionary. Dictionaries are extremely efficient at looking up a value, given a specific key object that maps to that value. They should always be used when you want to find one object based on some other object. The object that is being stored is called the value; the object that is being used as an index is called the key. We've already seen dictionary syntax in some of our previous examples. Dictionaries can be created either using the dict() constructor or using the {} syntax shortcut. In practice, the latter format is almost always used. We can prepopulate a dictionary by separating the keys from the values using a colon, and separating the key value pairs using a comma. For example, in a stock application, we would most often want to look up prices by the stock symbol. We can create a dictionary that uses stock symbols as keys, and tuples of current, high, and low as values like this: stocks = {"GOOG": (613.30, 625.86, 610.50),           "MSFT": (30.25, 30.70, 30.19)} As we've seen in previous examples, we can then look up values in the dictionary by requesting a key inside square brackets. If the key is not in the dictionary, it will raise an exception: >>> stocks["GOOG"] (613.3, 625.86, 610.5) >>> stocks["RIM"] Traceback (most recent call last):   File "<stdin>", line 1, in <module> KeyError: 'RIM' We can, of course, catch the KeyError and handle it. But we have other options. Remember, dictionaries are objects, even if their primary purpose is to hold other objects. As such, they have several behaviors associated with them. One of the most useful of these methods is the get method; it accepts a key as the first parameter and an optional default value if the key doesn't exist: >>> print(stocks.get("RIM")) None >>> stocks.get("RIM", "NOT FOUND") 'NOT FOUND' For even more control, we can use the setdefault method. If the key is in the dictionary, this method behaves just like get; it returns the value for that key. Otherwise, if the key is not in the dictionary, it will not only return the default value we supply in the method call (just like get does), it will also set the key to that same value. Another way to think of it is that setdefault sets a value in the dictionary only if that value has not previously been set. Then it returns the value in the dictionary, either the one that was already there, or the newly provided default value. >>> stocks.setdefault("GOOG", "INVALID") (613.3, 625.86, 610.5) >>> stocks.setdefault("BBRY", (10.50, 10.62, 10.39)) (10.50, 10.62, 10.39) >>> stocks["BBRY"] (10.50, 10.62, 10.39) The GOOG stock was already in the dictionary, so when we tried to setdefault it to an invalid value, it just returned the value already in the dictionary. BBRY was not in the dictionary, so setdefault returned the default value and set the new value in the dictionary for us. We then check that the new stock is, indeed, in the dictionary. Three other very useful dictionary methods are keys(), values(), and items(). The first two return an iterator over all the keys and all the values in the dictionary. We can use these like lists or in for loops if we want to process all the keys or values. The items() method is probably the most useful; it returns an iterator over tuples of (key, value) pairs for every item in the dictionary. This works great with tuple unpacking in a for loop to loop over associated keys and values. This example does just that to print each stock in the dictionary with its current value: >>> for stock, values in stocks.items(): ...     print("{} last value is {}".format(stock, values[0])) ... GOOG last value is 613.3 BBRY last value is 10.50 MSFT last value is 30.25 Each key/value tuple is unpacked into two variables named stock and values (we could use any variable names we wanted, but these both seem appropriate) and then printed in a formatted string. Notice that the stocks do not show up in the same order in which they were inserted. Dictionaries, due to the efficient algorithm (known as hashing) that is used to make key lookup so fast, are inherently unsorted. So, there are numerous ways to retrieve data from a dictionary once it has been instantiated; we can use square brackets as index syntax, the get method, the setdefault method, or iterate over the items method, among others. Finally, as you likely already know, we can set a value in a dictionary using the same indexing syntax we use to retrieve a value: >>> stocks["GOOG"] = (597.63, 610.00, 596.28) >>> stocks['GOOG'] (597.63, 610.0, 596.28) Google's price is lower today, so I've updated the tuple value in the dictionary. We can use this index syntax to set a value for any key, regardless of whether the key is in the dictionary. If it is in the dictionary, the old value will be replaced with the new one; otherwise, a new key/value pair will be created. We've been using strings as dictionary keys, so far, but we aren't limited to string keys. It is common to use strings as keys, especially when we're storing data in a dictionary to gather it together (instead of using an object with named properties). But we can also use tuples, numbers, or even objects we've defined ourselves as dictionary keys. We can even use different types of keys in a single dictionary: random_keys = {} random_keys["astring"] = "somestring" random_keys[5] = "aninteger" random_keys[25.2] = "floats work too" random_keys[("abc", 123)] = "so do tuples"   class AnObject:     def __init__(self, avalue):         self.avalue = avalue   my_object = AnObject(14) random_keys[my_object] = "We can even store objects" my_object.avalue = 12 try:     random_keys[[1,2,3]] = "we can't store lists though" except:     print("unable to store listn")   for key, value in random_keys.items():     print("{} has value {}".format(key, value)) This code shows several different types of keys we can supply to a dictionary. It also shows one type of object that cannot be used. We've already used lists extensively, and we'll be seeing many more details of them in the next section. Because lists can change at any time (by adding or removing items, for example), they cannot hash to a specific value. Objects that are hashable basically have a defined algorithm that converts the object into a unique integer value for rapid lookup. This hash is what is actually used to look up values in a dictionary. For example, strings map to integers based on the characters in the string, while tuples combine hashes of the items inside the tuple. Any two objects that are somehow considered equal (like strings with the same characters or tuples with the same values) should have the same hash value, and the hash value for an object should never ever change. Lists, however, can have their contents changed, which would change their hash value (two lists should only be equal if their contents are the same). Because of this, they can't be used as dictionary keys. For the same reason, dictionaries cannot be used as keys into other dictionaries. In contrast, there are no limits on the types of objects that can be used as dictionary values. We can use a string key that maps to a list value, for example, or we can have a nested dictionary as a value in another dictionary. Dictionary use cases Dictionaries are extremely versatile and have numerous uses. There are two major ways that dictionaries can be used. The first is dictionaries where all the keys represent different instances of similar objects; for example, our stock dictionary. This is an indexing system. We use the stock symbol as an index to the values. The values could even have been complicated self-defined objects that made buy and sell decisions or set a stop-loss, rather than our simple tuples. The second design is dictionaries where each key represents some aspect of a single structure; in this case, we'd probably use a separate dictionary for each object, and they'd all have similar (though often not identical) sets of keys. This latter situation can often also be solved with named tuples. These should typically be used when we know exactly what attributes the data must store, and we know that all pieces of the data must be supplied at once (when the item is constructed). But if we need to create or change dictionary keys over time or we don't know exactly what the keys might be, a dictionary is more suitable. Using defaultdict We've seen how to use setdefault to set a default value if a key doesn't exist, but this can get a bit monotonous if we need to set a default value every time we look up a value. For example, if we're writing code that counts the number of times a letter occurs in a given sentence, we could do this: def letter_frequency(sentence):     frequencies = {}     for letter in sentence:         frequency = frequencies.setdefault(letter, 0)         frequencies[letter] = frequency + 1     return frequencies Every time we access the dictionary, we need to check that it has a value already, and if not, set it to zero. When something like this needs to be done every time an empty key is requested, we can use a different version of the dictionary, called defaultdict: from collections import defaultdict def letter_frequency(sentence):     frequencies = defaultdict(int)     for letter in sentence:         frequencies[letter] += 1     return frequencies This code looks like it couldn't possibly work. The defaultdict accepts a function in its constructor. Whenever a key is accessed that is not already in the dictionary, it calls that function, with no parameters, to create a default value. In this case, the function it calls is int, which is the constructor for an integer object. Normally, integers are created simply by typing an integer number into our code, and if we do create one using the int constructor, we pass it the item we want to create (for example, to convert a string of digits into an integer). But if we call int without any arguments, it returns, conveniently, the number zero. In this code, if the letter doesn't exist in the defaultdict, the number zero is returned when we access it. Then we add one to this number to indicate we've found an instance of that letter, and the next time we find one, that number will be returned and we can increment the value again. The defaultdict is useful for creating dictionaries of containers. If we want to create a dictionary of stock prices for the past 30 days, we could use a stock symbol as the key and store the prices in list; the first time we access the stock price, we would want it to create an empty list. Simply pass list into the defaultdict, and it will be called every time an empty key is accessed. We can do similar things with sets or even empty dictionaries if we want to associate one with a key. Of course, we can also write our own functions and pass them into the defaultdict. Suppose we want to create a defaultdict where each new element contains a tuple of the number of items inserted into the dictionary at that time and an empty list to hold other things. Nobody knows why we would want to create such an object, but let's have a look: from collections import defaultdict num_items = 0 def tuple_counter():     global num_items     num_items += 1     return (num_items, [])   d = defaultdict(tuple_counter) When we run this code, we can access empty keys and insert into the list all in one statement: >>> d = defaultdict(tuple_counter) >>> d['a'][1].append("hello") >>> d['b'][1].append('world') >>> d defaultdict(<function tuple_counter at 0x82f2c6c>, {'a': (1, ['hello']), 'b': (2, ['world'])}) When we print dict at the end, we see that the counter really was working. This example, while succinctly demonstrating how to create our own function for defaultdict, is not actually very good code; using a global variable means that if we created four different defaultdict segments that each used tuple_counter, it would count the number of entries in all dictionaries, rather than having a different count for each one. It would be better to create a class and pass a method on that class to defaultdict. Counter You'd think that you couldn't get much simpler than defaultdict(int), but the "I want to count specific instances in an iterable" use case is common enough that the Python developers created a specific class for it. The previous code that counts characters in a string can easily be calculated in a single line: from collections import Counter def letter_frequency(sentence):     return Counter(sentence) The Counter object behaves like a beefed up dictionary where the keys are the items being counted and the values are the number of such items. One of the most useful functions is the most_common() method. It returns a list of (key, count) tuples ordered by the count. You can optionally pass an integer argument into most_common() to request only the top most common elements. For example, you could write a simple polling application as follows: from collections import Counter   responses = [     "vanilla",     "chocolate",     "vanilla",     "vanilla",     "caramel",     "strawberry",     "vanilla" ]   print(     "The children voted for {} ice cream".format(         Counter(responses).most_common(1)[0][0]     ) ) Presumably, you'd get the responses from a database or by using a complicated vision algorithm to count the kids who raised their hands. Here, we hardcode it so that we can test the most_common method. It returns a list that has only one element (because we requested one element in the parameter). This element stores the name of the top choice at position zero, hence the double [0][0] at the end of the call. I think they look like a surprised face, don't you? Your computer is probably amazed it can count data so easily. It's ancestor, Hollerith's Tabulating Machine for the 1890 US census, must be so jealous! Summary We've covered several built-in data structures and attempted to understand how to choose one for specific applications. Sometimes, the best thing we can do is create a new class of objects, but often, one of the built-ins provides exactly what we need. When it doesn't, we can always use inheritance or composition to adapt them to our use cases. We can even override special methods to completely change the behavior of built-in syntaxes. Be sure to pick up the full title, Python 3 Object-oriented Programming - Second Edition, to continue your learning in the world of OOP Python. If you want to go above and beyond then there’s no better way than building on what you’ve discovered with Mastering Object-oriented Python and Learning Object-Oriented Programming too! Resources for Article:   Further resources on this subject: Python LDAP applications - extra LDAP operations and the LDAP URL library [article] Exception Handling in MySQL for Python [article] Data Transactions Made Easy with MySQL and Python [article]
Read more
  • 0
  • 0
  • 3247

article-image-introduction-python-lists-and-dictionaries
Packt
09 Mar 2016
10 min read
Save for later

An Introduction to Python Lists and Dictionaries

Packt
09 Mar 2016
10 min read
In this article by Jessica Ingrassellino, the author of Python Projects for Kids, you will learn that Python has very efficient ways of storing data, which is one reason why it is popular among many companies that make web applications. You will learn about the two most important ways to store and retrieve data in Python—lists and dictionaries. (For more resources related to this topic, see here.) Lists Lists have many different uses when coding and many different operations can be performed on lists, thanks to Python. In this article, you will only learn some of the many uses of lists. However, if you wish to learn more about lists, the Python documentation is very detailed and available at https://docs.python.org/3/tutorial/datastructures.html?highlight=lists#more-on-lists. First, it is important to note that a list is made by assigning it a name and putting the items in the list inside of square brackets []. In your Python shell, type the three lists, one on each line: fruit = ['apple', 'banana', 'kiwi', 'dragonfruit'] years = [2012,  2013,  2014,  2015] students_in_class = [30,  22,  28,  33] The lists that you just typed have a particular kind of data inside. However, one good feature of lists is that they can mix up data types within the same list. For example, I have made this list that combines strings and integers: computer_class = ['Cynthia', 78, 42, 'Raj', 98, 24, 35, 'Kadeem', 'Rachel'] Now that we have made the lists, we can get the contents of the list in many ways. In fact, once you create a list, the computer remembers the order of the list, and the order stays constant until it is changed purposefully. The easiest way for us to see that the order of lists is maintained is to run tests on the lists that we have already made. The first item of a Python list is always counted as 0 (zero). So, for our first test, let's see if asking for the 0 item actually gives us the first item. Using our fruit list, we will type the name of the list inside of the print statement, and then add square brackets [] with the number 0: print(fruit[0]) Your output will be apple, since apple is the first fruit in the list that we created earlier. So, we have evidence that counting in Python does start with 0. Now, we can try to print the fourth item in the fruit list. You will notice that we are entering 3 in our print command. This is because the first item started at 0. Type the following code into your Python shell: print(fruit[3]) What is your outcome? Did you expect dragonfruit to be the answer? If so, good, you are learning to count items in lists. If not, remember that the first item in a list is the 0 item. With practice, you will become better at counting items in Python lists. For extra practice, work with the other lists that we made earlier, and try printing different items from the list by changing the number in the following line of code: print(list_name[item_number]) Where the code says list_name, write the name of the list that you want to use. Where the code says item_number, write the number of the item that you want to print. Remember that lists begin counting at 0. Changing the list – adding and removing information Even though lists have order, lists can be changed. Items can be added to a list, removed from a list, or changed in a list. Again, there are many ways to interact with lists. We will only discuss a few here, but you can always read the Python documentation for more information. To add an item to our fruit list, for example, we can use a method called list.append(). To use this method, type the name of the list, a dot, the method name append, and then parenthesis with the item that you would like to add contained inside. If the item is a string, remember to use single quotes. Type the following code to add an orange to the list of fruits that we have made:   fruit.append('orange') Then, print the list of fruit to see that orange has been added to the list:     print(fruit) Now, let's say that we no longer wish for the dragonfruit to appear on our list. We will use a method called list.remove(). To do this, we will type the name of our list, a dot, the method name called remove, and the name of the item that we wish to remove:     fruit.remove('dragonfruit') Then, we will print the list to see that the dragonfruit has been removed:     print(fruit) If you have more than one of the same item in the list, list.remove() will only remove the first instance of that item. The other items with the same name need to be removed separately. Loops and lists Lists and for loops work very well together. With lists, we can do something called iteration. By itself, the word iteration means to repeat a procedure over and over again. We know that the for loops repeat things for a limited and specific number of times. In this sample, we have three colors in our list. Make this list in your Python terminal: colors = ['green', 'yellow', 'red'] Using our list, we may decide that for each color in the list, we want to print the statement called I see and add each color in our list. Using the for loop with the list, we can type the print statement one time and get three statements in return. Type the following for loop into your Python shell: for color in colors:        print('I see  ' + str(color)  +  '.') Once you are done typing the print line and pressing Enter twice, your for loop will start running, and you should see the following statements printed out in your Python shell: As you can imagine, lists and the for loops are very powerful when used together. Instead of having to type the line three times with three different pieces of code, we only had to type two lines of code. We used the str() method to make sure that the sentence that we printed combined with the list items. Our for loop is helpful because those two lines of code would work if there were 20 colors in our list. Dictionaries Dictionaries are another way to organize data. At first glance, a dictionary may look just like a list. However, dictionaries have different jobs, rules, and syntax. Dictionaries have names and use curly braces to store information. For example, if we wanted to make a dictionary called sports, we would then put the dictionary entries inside of curly braces. Here is a simple example: numbers = {'one': 1, 'two': 2, 'three': 3} Key/value pairs in dictionaries A dictionary stores information with things called keys and values. In a dictionary of items, for example, we may have keys that tell us the names of each item and values that tell us how many of each item we have in our inventory. Once we store these items in our dictionary, we can add or remove new items (keys), add new amounts (values), or change the amounts of existing items. Here is an example of a dictionary that could hold some information for a game. Let’s suppose that the hero in our game has some items needed to survive. Here is a dictionary of our hero's items: items = {'arrows' : 200, 'rocks' : 25, 'food' : 15, 'lives' : 2} Unlike lists, a dictionary uses keys and values to find information. So, this dictionary has the keys called arrows, rocks, food, and lives. Each of the numbers tells us the amount of items that our hero has. Dictionaries have different characteristics than lists do. So, we can look up certain items in our dictionary using the print function: print(items['arrows']) The result of this print command will print 200, as this is the number of arrows our hero has in his inventory: Changing the dictionary – adding and removing information Python offers us ways not only to make a dictionary but to also add and remove things from our dictionaries. For example, let's say that in our game, we allow the player to discover a fireball later in the game. To add the item to the dictionary, we will use what is called the subscript method to add a new key and a new value to our dictionary. This means that we will use the name of the dictionary and square brackets to write the name of the item that we wish to add, and finally, we will set the value to how many items we want to put into our dictionary:   items['fireball'] = 10 If we print the entire dictionary of items, you will see that fireball has been added:   print(items)   items = {'arrows' : 200, 'rocks' : 25, 'food' : 15, 'lives' : 2, 'fireball' : 10} We can also change the number of items in our dictionary using the dict.update() method. This method uses the name of the dictionary and the word update. Then, in parentheses (), we use curly braces {} to type the name of the item that we wish to update—a colon (:), and the new number of items we want in the dictionary. Try this in your Python shell:   items.update({'rocks':10})   print(items) You will notice that if you have done print(items), then you will now have 10 rocks instead of 25. We have successfully updated our number of items. To remove something from a dictionary, one must reference the key or the name of the item and delete the item. By doing so, the value that goes with the item will also be removed. In Python, this means using del along with the name of the dictionary and the name of the item you wish to remove. Using the items list as our example, let's remove lives, and then use a print statement to test and see if the lives key was removed:   del items['lives']   print(items) The items list will now look as follows: With dictionaries, information is stored and retrieved differently than with lists, but we can still perform the same operation of adding and removing information as well as making changes to information. Summary Lists and dictionaries are two different ways to store and retrieve information in Python. Although, the sample data that we saw in this article was small, Python can handle large sets of data and process them at high speeds. Learning to use lists and dictionaries will allow you to solve complicated programming problems in many realms, including gaming, web app development, and data analysis. Resources for Article: Further resources on this subject: Exception Handling in MySQL for Python [article] Configuring and securing PYTHON LDAP Applications Part 1 [article] Web scraping with Python (Part 2) [article]
Read more
  • 0
  • 0
  • 4833
Modal Close icon
Modal Close icon