How-To Tutorials

article-image-angularjs-web-application-development-cookbook

08 May 2015

2 min read

AngularJS Web Application Development Cookbook

08 May 2015

Architect performant applications and implement best practices in AngularJS. Packed with easy-to-follow recipes, this practical guide will show you how to unleash the full might of the AngularJS framework. Skip straight to practical solutions and quick, functional answers to your problems without hand-holding or slogging through the basics. (For more resources related to this topic, see here.) Some highlights include: Architecting recursive directives Extensively customizing your search filter Custom routing attributes Animating ngRepeat Animating ngInclude, ngView, and ngIf Animating ngSwitch Animating ngClass, and class attributes Animating ngShow, and ngHide The goal of this text is to have you walk away from reading about an AngularJS concept armed with a solid understanding of how it works, insight into the best ways to wield it in real-world applications, and annotated code examples to get you started. Why you should buy this book A collection of recipes demonstrating optimal organization, scaleable architecture, and best practices for use in small and large-scale production applications. Each recipe contains complete, functioning examples and detailed explanations on how and why they are organized and built that way, as well as alternative design choices for different situations. The author of this book is a full stack developer at DoorDash (YC S13), where he joined as the first engineer. He led their adoption of AngularJS, and he also focuses on the infrastructural, predictive, and data projects within the company. Matt has a degree in Computer Engineering from the University of Illinois at Urbana-Champaign. He is the author of the video series Learning AngularJS, available through O'Reilly Media. Previously, he worked as an engineer at several educational technology start-ups. Almost every example in this book has been added to JSFiddle, with the links provided in the book. This allows you to merely visit a URL in order to test and modify the code with no setup of any kind, on any major browser and on any major operating system. Resources for Article: Further resources on this subject: Working with Live Data and AngularJS [article] Angular Zen [article] AngularJS Project [article]

0
0
2093

How-To Tutorials

Packt

08 May 2015

4 min read

Mastering Lumion 3D

Packt

08 May 2015

4 min read

Welcome to this treasure house of Lumion 3D! This article will guide you through the intricacies of using Lumion—the next generation graphical tool. It will also present crisp notes from the book, Mastering Lumion 3D. Why Lumion 3D? ''To suppose that the eye with all its inimitable contrivances for adjusting the focus to different distances, for admitting different amounts of light, and for the correction of spherical and chromatic aberration, could have been formed by natural selection, seems, I confess, absurd in the highest degree..." - Charles Darwin The eye is indeed one of the most complex structures man is gifted with. The eye beholds the beauty of nature in the most magnificent way. To replicate what the eye sees in nature onto the computer screen requires the finest tools man has ever developed. Lumion 3D is one such tool. It has a strong emphasis on architectural visualization. Hobbyists use it a lot to create architectural CAD art. Professionals use it to create mock-ups of concepts. The result is ultra-realistic visualizations of concepts for buildings, outside areas, furnishings, landmarks, and skyscrapers. The core of this product comes from the fact that architects and designers do not need to know computer graphics skills, but just to learn the tools and workflow in Lumion. Highlights of Lumion 3D Indeed, you can breathe life into your architectural designs with Lumion. These are some of the features that stand out: Ease of use: Adding trees, clouds, people, artistic effects, and materials, and converting your 3DS Max design into an amazing 3D image or 3D flythrough movie is easy. It enables anyone to create movies and images without any prior training. Through Lumion, you can make beautiful SketchUp renderings yourself. You don't need to outsource this work any longer. Fine graphics: The graphics in Lumion is too good. It has an edge over other contemporary software in this regard. Speed: Generating animation is fast with Lumion. In comparison to traditional 3D rendering programs that take days to process animations, Lumion takes anything from a few minutes to a few hours. Why Mastering Lumion 3D? Lumion can be an intuitive tool, but that doesn't mean we can automatically produce a better architectural visualization. The reason why Ciro Cardoso, the author, wrote this book was because like you, the first time he picked up Lumion, he felt that there was something missing on his projects. Mastering Lumion 3D covers the process of picking a 3D model, preparing it, and then building layers on top of layers of detail, using textures and optimized 3D models. However, it doesn't stop there because several chapters are dedicated exclusively toward explaining how to use Lumion's effects and other special features to take your project to an expert level. This book is written in a way that will hopefully cover all the questions you may have when starting the first steps with Lumion. On the other hand, if you are an intermediate or advanced user, you can find some unique techniques that will make you look at Lumion from another perspective. The journey to write this book was filled not only with the author's experience, but also from what he learned while working with other great professionals. What you need for this book Lumion Version 4 is used for all the examples in this book, but you can follow the explanations using the free version or a previous Lumion version. Although Adobe Photoshop is used in some examples, you can use GIMP as an alternative. Also, ensure that your system has a good graphics card. This will make working with Lumion fun. Who this book is for This book is designed for all levels of Lumion users, from beginners to advanced users. You will find useful insights and professional techniques to improve and develop your skills in order to fully control and master Lumion. However, this book doesn't cover the process of transforming 2D information (CAD plan) into a 3D model. If you are really interested in drawing great architectural designs, this book is for you. Resources for Article: Further resources on this subject: Integrating Direct3D with XAML and Windows 8.1 [article] Diving Straight into Photographic Rendering [article] What is Lumion? [article]

0
0
1904

How-To Tutorials

article-image-learning-selenium-testing-tools-python

Packt

08 May 2015

3 min read

Learning Selenium Testing Tools with Python

Packt

08 May 2015

3 min read

0
0
2647

article-image-nodejs-building-maintainable-codebase

Benjamin Reed

06 May 2015

8 min read

NodeJS: Building a Maintainable Codebase

Benjamin Reed

06 May 2015

8 min read

NodeJS has become the most anticipated web development technology since Ruby on Rails. This is not an introduction to Node. First, you must realize that NodeJS is not a direct competitor to Rails or Django. Instead, Node is a collection of libraries that allow JavaScript to run on the v8 runtime. Node powers many tools, and some of the tools have nothing to do with a scaling web application. For instance, GitHub’s Atom editor is built on top of Node. Its web application frameworks, like Express, are the competitors. This article can apply to all environments using Node. Second, Node is designed under the asynchronous ideology. Not all of the operations in Node are asynchronous. Many libraries offer synchronous and asynchronous options. A Node developer must decipher the best operation for his or her needs. Third, you should have a solid understanding of the concept of a callback in Node. Over the course of two weeks, a team attempted to refactor a Rails app to be an Express application. We loved the concepts behind Node, and we truly believed that all we needed was a barebones framework. We transferred our controller logic over to Express routes in a weekend. As a beginning team, I will analyze some of the pitfalls that we came across. Hopefully, this will help you identify strategies to tackle Node with your team. First, attempt to structure callbacks and avoid anonymous functions. As we added more and more logic, we added more and more callbacks. Everything was beautifully asynchronous, and our code would successfully run. However, we soon found ourselves debugging an anonymous function nested inside of other anonymous functions. In other words, the codebase was incredibly difficult to follow. Anyone starting out with Node could potentially notice the novice “spaghetti code.” Here’s a simple example of nested callbacks: router.put('/:id', function(req, res) { console.log("attempt to update bathroom"); models.User.find({ where: {id: req.param('id')} }).success(function (user) { var raw_cell = req.param('cell') ? req.param('cell') : user.cell; var raw_email = req.param('email') ? req.param('email') : user.email; var raw_username = req.param('username') ? req.param('username') : user.username; var raw_digest = req.param('digest') ? req.param('digest') : user.digest; user.cell = raw_cell; user.email = raw_email; user.username = raw_username; user.digest = raw_digest; user.updated_on = new Date(); user.save().success(function () { res.json(user); }).error(function () { res.json({"status": "error"}); }); }) .error(function() { res.json({"status": "error"}); }) }); Notice that there are many success and error callbacks. Locating a specific callback is not difficult if the whitespace is perfect or the developer can count closing brackets back up to the destination. However, this is pretty nasty to any newcomer. And this illegibility will only increase as the application becomes more complex. A developer may get this response: {"status": "error"} Where did this response come from? Did the ORM fail to update the object? Did it fail to find the object in the first place? A developer could add descriptions to the json in the chained error callbacks, but there has to be a better way. Let’s extract some of the callbacks into separate methods: router.put('/:id', function(req, res) { var id = req.param('id'); var query = { where: {id: id} }; // search for user models.User.find(query).success(function (user) { // parse req parameters var raw_cell = req.param('cell') ? req.param('cell') : user.cell; var raw_email = req.param('email') ? req.param('email') : user.email; var raw_username = req.param('username') ? req.param('username') : user.username; // set user attributes user.cell = raw_cell; user.email = raw_email; user.username = raw_username; user.updated_on = new Date(); // attempt to save user user.save() .success(SuccessHandler.userSaved(res, user)) .error(ErrorHandler.userNotSaved(res, id)); }) .error(ErrorHandler.userNotFound(res, id)) }); var ErrorHandler = { userNotFound: function(res, user_id) { res.json({"status": "error", "description": "The user with the specified id could not be found.", "user_id": user_id}); }, userNotSaved: function(res, user_id) { res.json({"status": "error", "description": "The update to the user with the specified id could not be completed.", "user_id": user_id}); } }; var SuccessHandler = { userSaved: function(res, user) { res.json(user); } } This seemed to help clean up our minimal sample. There is now only one anonymous function. The code seems to be a lot more readable and independent. However, our code is still cluttered by chaining success and error callbacks. One could make these global mutable variables, or, perhaps we can consider another approach. Futures, also known as promises, are becoming more prominent. Twitter has adopted them in Scala. It is definitely something to consider. Next, do what makes your team comfortable and productive. At the same time, do not compromise the integrity of the project. There are numerous posts that encourage certain styles over others. There are also extensive posts on the subject of CoffeeScript. If you aren’t aware, CoffeeScript is a language with some added syntactic flavor that compiles to JavaScript. Our team was primarily ruby developers, and it definitely appealed to us. When we migrated some of the project over to CoffeeScript, we found that our code was a lot shorter and appeared more legible. GitHub uses CoffeeScript for the Atom text editor to this day, and the Rails community has openly embraced it. The majority of node module documentation will use JavaScript, so CoffeeScript developers will have to become acquainted with translation. There are some problems with CoffeeScript being ES6 ready, and there are some modules that are clearly not meant to be utilized in CoffeeScript. CoffeeScript is an open source project, but it has appears to have a good backbone and a stable community. If your developers are more comfortable with it, utilize it. When it comes to open source projects, everyone tends to trust them. In the purest form, open source projects are absolutely beautiful. They make the lives of all of the developers better. Nobody has to re-implement the wheel unless they choose. Obviously, both Node and CoffeeScript are open source. However, the community is very new, and it is dangerous to assume that any package you find on NPM is stable. For us, the problem occurred when we searched for an ORM. We truly missed ActiveRecord, and we assumed that other projects would work similarly. We tried several solutions, and none of them interacted the way we wanted. Besides expressing our entire schema in a JavaScript format, we found relations to be a bit of a hack. Settling on one, we ran our server. And our database cleared out. That’s fine in development, but we struggled to find a way to get it into production. We needed more documentation. Also, the module was not designed with CoffeeScript in mind. We practically needed to revert to JavaScript. In contrast, the Node community has openly embraced some NoSQL databases, such as MongoDB. They are definitely worth considering. Either way, make sure that your team’s dependencies are very well documented. There should be a written documentation for each exposed object, function, etc. To sum everything up, this article comes down to two fundamental things learned in any computer science class: write modular code and document everything. Do your research on Node and find a style that is legible for your team and any newcomers. A NodeJS project can only be maintained if developers utilizing the framework recognize the importance of the project in the future. If your code is messy now, it will only become messier. If you cannot find necessary information in a module’s documentation, you probably will miss other information when there is a problem in production. Don’t take shortcuts. A node application can only be as good as its developers and dependencies. About the Author Benjamin Reed began Computer Science classes at a nearby university in Nashville during his sophomore year in high school. Since then, he has become an advocate for open source. He is now pursing degrees in Computer Science and Mathematics fulltime. The Ruby community has intrigued him, and he openly expresses support for the Rails framework. When asked, he believes that studying Rails has led him to some of the best practices and, ultimately, has made him a better programmer. iOS development is one of his hobbies, and he enjoys scouting out new projects on GitHub. On GitHub, he’s appropriately named @codeblooded. On Twitter, he’s @benreedDev.

0
0
2586

How-To Tutorials

article-image-command-line-companion-called-artisan

Packt

06 May 2015

17 min read

A Command-line Companion Called Artisan

Packt

06 May 2015

17 min read

0
0
7481

article-image-getting-started-websockets

Packt

06 May 2015

6 min read

Getting Started with WebSockets

Packt

06 May 2015

6 min read

In this article by Varun Chopra, author of the book WebSocket Essentials – Building Apps with HTML5 WebSockets, we will try to understand why we need and what is the importance of WebSockets, followed by when to use them and how WebSockets actually work. Client server communication is one of the most important parts of any web application. Data communication between the server and client has to be smooth and fast so that the user can have an excellent experience. If we look into the traditional methods of server communication, we will find that those methods were limited and were not really the best solutions. These methods have been used by people for a long period of time and made HTML the second choice for data communication. (For more resources related to this topic, see here.) Why WebSockets The answer to why we need WebSockets lies in the question—what are the problems with the other methods of communication? Some of the methods used for server communication are request/response, polling, and long-polling, which have been explained as follows: Request/Response: This is a commonly used mechanisms in which the client requests the server and gets a response. This process is driven by some interaction like the click of a button on the webpage to refresh the whole page. When AJAX came into the picture, it made the webpages dynamic and helped in loading some part of the webpage without loading the whole page. Polling: There are scenarios where we need the data to be reflected without user interaction, such as the score of a football match. In polling, the data is fetched after a period of time and it keeps hitting the server, regardless of whether the data has changed or not. This causes unnecessary calls to the server, opening a connection and then closing it every time. Long-polling: This is basically a connection kept open for a particular time period. This is one of the ways of achieving real-time communication, but it works only when you know the time interval. The problems with these methods lead to the solution, which is WebSockets. It solves all the problems faced during the use of the old methods. Importance of WebSockets WebSockets comes into the picture to save us from the old heavy methods of server communication. WebSockets solved one of the biggest problems of server communication by providing a full-duplex two-way communication bridge. It provides both the server and client the ability to send data at any point of time, which was not provided by any of the old methods. This has not only improved performance but also reduced the latency of data. It creates a lightweight connection which we can keep open for a long time without sacrificing the performance. It also gives us full control to open and close the connection at any point of time. WebSockets comes as a part of HTML5 standard, so we do not need to worry about adding some extra plugin to make it work. WebSockets API is fully supported and implemented by JavaScript. Almost all modern browsers now support WebSockets; this can be checked using the website http://caniuse.com/#feat=websockets which gives the following screenshot: WebSockets need to be implemented on both the client and server side. On the client side, the API is a part of HTML5. But on the server side, we need to use a library that implements WebSockets. There are many—or we can say almost all—servers that support WebSockets API libraries now. Node.js, which is a modern JavaScript based platform also supports WebSockets based server implementation using different packages, which makes it really easy for developers to code both server and client-side code without learning another language. When to use WebSockets being a very powerful way of communication between the client and server, it is really useful for applications which need a lot of server interaction. As WebSockets gives us the benefit of real-time communication, applications that require real-time data transfer, like chatting applications, can leverage WebSockets. It is not only used for real-time communication but also for scenarios where we need only the server to push the data to the client. The decision to use WebSockets can be made when we know the exact purpose of its usage. We should not use WebSockets when we just have to create a website with static pages and hardly any interaction. We should use WebSockets where the communication is higher in terms of data passing between the client and server. There are many applications like stock applications where the data keeps updating in real time. Collaborative applications need real-time data sharing, such as a game of chess or a Ping-Pong game. WebSockets is majorly utilized in real-time gaming web applications. How it works? WebSockets communicates using the TCP layer. The connection is established over HTTP and is basically a handshake mechanism between the client and server. After the handshake, the connection is upgraded to TCP. Let's see how it works through this flow diagram: The first step is the HTTP call that is initiated from the client side; the header of the HTTP call looks like this: GET /chat HTTP/1.1 Host: server.example.com Upgrade: websocket Connection: Upgrade Sec-WebSocket-Key: x3JJHMbDL1EzLkh9GBhXDw== Sec-WebSocket-Protocol: chat, superchat Sec-WebSocket-Version: 13 Origin: http://example.com Here, Host is the name of the server that we are hitting. Upgrade shows that it is an upgrade call for, in this case, WebSockets. Connection defines that it is an upgrade call. Sec-Websocket-Key is a randomly generated key which is further used to authenticate the response. It is the authentication key of the handshake. Origin is also another important parameter which shows where the call originated from; on the server side, it is used to check the requester's authenticity. Once the server checks the authenticity a response is sent back, which looks like this: HTTP/1.1 101 Switching Protocols Upgrade: websocket Connection: Upgrade Sec-WebSocket-Accept: HSmrc0sMlYUkAGmm5OPpG2HaGWk= Sec-WebSocket-Protocol: chat Here, Sec-WebSocket-Accept has a key which is decoded and checked with the key sent for confirmation that the response is coming to the right originator. So, once the connection is open, the client and server can send the data to each other. The data is sent in the form of small packets using TCP protocol. These calls are not HTTP so they are not visible directly under the Network tab of Developer Tools of a browser. Summary We learned why we need WebSockets and what their importance is. Along with that, we also learned when to use WebSockets and how they actually work. Resources for Article: Further resources on this subject: Let's Chat [article] WebSocket – a Handshake! [article] Understanding WebSockets and Server-sent Events in Detail [article]

0
0
3798

How-To Tutorials

article-image-controlling-movement-robot-legs

Packt

06 May 2015

18 min read

Controlling the Movement of a Robot with Legs

Packt

06 May 2015

18 min read

0
0
12119

How-To Tutorials

Packt

06 May 2015

23 min read

Introducing PostgreSQL 9

Packt

06 May 2015

23 min read

In this article by Simon Riggs, Gianni Ciolli, Hannu Krosing, Gabriele Bartolini, the authors of PostgreSQL 9 Administration Cookbook - Second Edition, we will introduce PostgreSQL 9. PostgreSQL is a feature-rich, general-purpose database management system. It's a complex piece of software, but every journey begins with the first step. (For more resources related to this topic, see here.) We'll start with your first connection. Many people fall at the first hurdle, so we'll try not to skip that too swiftly. We'll quickly move on to enabling remote users, and from there, we will move to access through GUI administration tools. We will also introduce the psql query tool. PostgreSQL is an advanced SQL database server, available on a wide range of platforms. One of the clearest benefits of PostgreSQL is that it is open source, meaning that you have a very permissive license to install, use, and distribute PostgreSQL without paying anyone fees or royalties. On top of that, PostgreSQL is well-known as a database that stays up for long periods and requires little or no maintenance in most cases. Overall, PostgreSQL provides a very low total cost of ownership. PostgreSQL is also noted for its huge range of advanced features, developed over the course of more than 20 years of continuous development and enhancement. Originally developed by the Database Research Group at the University of California, Berkeley, PostgreSQL is now developed and maintained by a huge army of developers and contributors. Many of those contributors have full-time jobs related to PostgreSQL, working as designers, developers, database administrators, and trainers. Some, but not many, of those contributors work for companies that specialize in support for PostgreSQL, like we (the authors) do. No single company owns PostgreSQL, nor are you required (or even encouraged) to register your usage. PostgreSQL has the following main features: Excellent SQL standards compliance up to SQL:2011 Client-server architecture Highly concurrent design where readers and writers don't block each other Highly configurable and extensible for many types of applications Excellent scalability and performance with extensive tuning features Support for many kinds of data models: relational, document (JSON and XML), and key/value What makes PostgreSQL different? The PostgreSQL project focuses on the following objectives: Robust, high-quality software with maintainable, well-commented code Low maintenance administration for both embedded and enterprise use Standards-compliant SQL, interoperability, and compatibility Performance, security, and high availability What surprises many people is that PostgreSQL's feature set is more comparable with Oracle or SQL Server than it is with MySQL. The only connection between MySQL and PostgreSQL is that these two projects are open source; apart from that, the features and philosophies are almost totally different. One of the key features of Oracle, since Oracle 7, has been snapshot isolation, where readers don't block writers and writers don't block readers. You may be surprised to learn that PostgreSQL was the first database to be designed with this feature, and it offers a complete implementation. In PostgreSQL, this feature is called Multiversion Concurrency Control (MVCC). PostgreSQL is a general-purpose database management system. You define the database that you would like to manage with it. PostgreSQL offers you many ways to work. You can use a normalized database model, augmented with features such as arrays and record subtypes, or use a fully dynamic schema with the help of JSONB and an extension named hstore. PostgreSQL also allows you to create your own server-side functions in any of a dozen different languages. PostgreSQL is highly extensible, so you can add your own data types, operators, index types, and functional languages. You can even override different parts of the system using plugins to alter the execution of commands or add a new optimizer. All of these features offer a huge range of implementation options to software architects. There are many ways out of trouble when building applications and maintaining them over long periods of time. In the early days, when PostgreSQL was still a research database, the focus was solely on the cool new features. Over the last 15 years, enormous amounts of code have been rewritten and improved, giving us one of the most stable and largest software servers available for operational use. You may have read that PostgreSQL was, or is, slower than My Favorite DBMS, whichever that is. It's been a personal mission of mine over the last ten years to improve server performance, and the team has been successful in making the server highly performant and very scalable. That gives PostgreSQL enormous headroom for growth. Who is using PostgreSQL? Prominent users include Apple, BASF, Genentech, Heroku, IMDB.com, Skype, McAfee, NTT, The UK Met Office, and The U. S. National Weather Service. 5 years ago, PostgreSQL received well in excess of 1 million downloads per year, according to data submitted to the European Commission, which concluded, "PostgreSQL is considered by many database users to be a credible alternative." We need to mention one last thing. When PostgreSQL was first developed, it was named Postgres, and therefore many aspects of the project still refer to the word "postgres"; for example, the default database is named postgres, and the software is frequently installed using the postgres user ID. As a result, people shorten the name PostgreSQL to simply Postgres, and in many cases use the two names interchangeably. PostgreSQL is pronounced as "post-grez-q-l". Postgres is pronounced as "post-grez." Some people get confused, and refer to "Postgre", which is hard to say, and likely to confuse people. Two names are enough, so please don't use a third name! The following sections explain the key areas in more detail. Robustness PostgreSQL is robust, high-quality software, supported by automated testing for both features and concurrency. By default, the database provides strong disk-write guarantees, and the developers take the risk of data loss very seriously in everything they do. Options to trade robustness for performance exist, though they are not enabled by default. All actions on the database are performed within transactions, protected by a transaction log that will perform automatic crash recovery in case of software failure. Databases may be optionally created with data block checksums to help diagnose hardware faults. Multiple backup mechanisms exist, with full and detailed Point-In-Time Recovery, in case of the need for detailed recovery. A variety of diagnostic tools are available. Database replication is supported natively. Synchronous Replication can provide greater than "5 Nines" (99.999 percent) availability and data protection, if properly configured and managed. Security Access to PostgreSQL is controllable via host-based access rules. Authentication is flexible and pluggable, allowing easy integration with any external security architecture. Full SSL-encrypted access is supported natively. A full-featured cryptographic function library is available for database users. PostgreSQL provides role-based access privileges to access data, by command type. Functions may execute with the permissions of the definer, while views may be defined with security barriers to ensure that security is enforced ahead of other processing. All aspects of PostgreSQL are assessed by an active security team, while known exploits are categorized and reported at http://www.postgresql.org/support/security/. Ease of use Clear, full, and accurate documentation exists as a result of a development process where doc changes are required. Hundreds of small changes occur with each release that smooth off any rough edges of usage, supplied directly by knowledgeable users. PostgreSQL works in the same way on small or large systems and across operating systems. Client access and drivers exist for every language and environment, so there is no restriction on what type of development environment is chosen now, or in the future. SQL Standard is followed very closely; there is no weird behavior, such as silent truncation of data. Text data is supported via a single data type that allows storage of anything from 1 byte to 1 gigabyte. This storage is optimized in multiple ways, so 1 byte is stored efficiently, and much larger values are automatically managed and compressed. PostgreSQL has a clear policy to minimize the number of configuration parameters, and with each release, we work out ways to auto-tune settings. Extensibility PostgreSQL is designed to be highly extensible. Database extensions can be loaded simply and easily using CREATE EXTENSION, which automates version checks, dependencies, and other aspects of configuration. PostgreSQL supports user-defined data types, operators, indexes, functions and languages. Many extensions are available for PostgreSQL, including the PostGIS extension that provides world-class Geographical Information System (GIS) features. Performance and concurrency PostgreSQL 9.4 can achieve more than 300,000 reads per second on a 32-CPU server, and it benchmarks at more than 20,000 write transactions per second with full durability. PostgreSQL has an advanced optimizer that considers a variety of join types, utilizing user data statistics to guide its choices. PostgreSQL provides MVCC, which enables readers and writers to avoid blocking each other. Taken together, the performance features of PostgreSQL allow a mixed workload of transactional systems and complex search and analytical tasks. This is important because it means we don't always need to unload our data from production systems and reload them into analytical data stores just to execute a few ad hoc queries. PostgreSQL's capabilities make it the database of choice for new systems, as well as the right long-term choice in almost every case. Scalability PostgreSQL 9.4 scales well on a single node up to 32 CPUs. PostgreSQL scales well up to hundreds of active sessions, and up to thousands of connected sessions when using a session pool. Further scalability is achieved in each annual release. PostgreSQL provides multinode read scalability using the Hot Standby feature. Multinode write scalability is under active development. The starting point for this is Bi-Directional Replication. SQL and NoSQL PostgreSQL follows SQL Standard very closely. SQL itself does not force any particular type of model to be used, so PostgreSQL can easily be used for many types of models at the same time, in the same database. PostgreSQL supports the more normal SQL language statement. With PostgreSQL acting as a relational database, we can utilize any level of denormalization, from the full Third Normal Form, to the more normalized Star Schema models. PostgreSQL extends the relational model to provide arrays, row types, and range types. A document-centric database is also possible using PostgreSQL's text, XML, and binary JSON (JSONB) data types, supported by indexes optimized for documents and by full text search capabilities. Key/value stores are supported using the hstore extension. Popularity When MySQL was taken over some years back, it was agreed in the EU monopoly investigation that followed that PostgreSQL was a viable competitor. That's been certainly true, with the PostgreSQL user base expanding consistently for more than a decade. Various polls have indicated that PostgreSQL is the favorite database for building new, enterprise-class applications. The PostgreSQL feature set attracts serious users who have serious applications. Financial services companies may be PostgreSQL's largest user group, though governments, telecommunication companies, and many other segments are strong users as well. This popularity extends across the world; Japan, Ecuador, Argentina, and Russia have very large user groups, and so do USA, Europe, and Australasia. Amazon Web Services' chief technology officer Dr. Werner Vogels described PostgreSQL as "an amazing database", going on to say that "PostgreSQL has become the preferred open source relational database for many enterprise developers and start-ups, powering leading geospatial and mobile applications". Commercial support Many people have commented that strong commercial support is what enterprises need before they can invest in open source technology. Strong support is available worldwide from a number of companies. 2ndQuadrant provides commercial support for open source PostgreSQL, offering 24 x 7 support in English and Spanish with bug-fix resolution times. EnterpriseDB provides commercial support for PostgreSQL as well as their main product, which is a variant of Postgres that offers enhanced Oracle compatibility. Many other companies provide strong and knowledgeable support to specific geographic regions, vertical markets, and specialized technology stacks. PostgreSQL is also available as hosted or cloud solutions from a variety of companies, since it runs very well in cloud environments. A full list of companies is kept up to date at http://www.postgresql.org/support/professional_support/. Research and development funding PostgreSQL was originally developed as a research project at the University of California, Berkeley in the late 1980s and early 1990s. Further work was carried out by volunteers until the late 1990s. Then, the first professional developer became involved. Over time, more and more companies and research groups became involved, supporting many professional contributors. Further funding for research and development was provided by the NSF. The project also received funding from the EU FP7 Programme in the form of the 4CaaST project for cloud computing and the AXLE project for scalable data analytics. AXLE deserves a special mention because it is a 3-year project aimed at enhancing PostgreSQL's business intelligence capabilities, specifically for very large databases. The project covers security, privacy, integration with data mining, and visualization tools and interfaces for new hardware. Further details of it are available at http://www.axleproject.eu. Other funding for PostgreSQL development comes from users who directly sponsor features and companies selling products and services based around PostgreSQL. Monitoring Databases are not isolated entities. They live on computer hardware using CPUs, RAM, and disk subsystems. Users access databases using networks. Depending on the setup, databases themselves may need network resources to function in any of the following ways: performing some authentication checks when users log in, using disks that are mounted over the network (not generally recommended), or making remote function calls to other databases. This means that monitoring only the database is not enough. As a minimum, one should also monitor everything directly involved in using the database. This means knowing the following: Is the database host available? Does it accept connections? How much of the network bandwidth is in use? Have there been network interruptions and dropped connections? Is there enough RAM available for the most common tasks? How much of it is left? Is there enough disk space available? When will it run out of disk space? Is the disk subsystem keeping up? How much more load can it take? Can the CPU keep up with the load? How many spare idle cycles do the CPUs have? Are other network services the database access depends on (if any) available? For example, if you use Kerberos for authentication, you need to monitor it as well. How many context switches are happening when the database is running? For most of these things, you are interested in history; that is, how have things evolved? Was everything mostly the same yesterday or last week? When did the disk usage start changing rapidly? For any larger installation, you probably have something already in place to monitor the health of your hosts and network. The two aspects of monitoring are collecting historical data to see how things have evolved and getting alerts when things go seriously wrong. Tools based on Round Robin Database Tool (RRDtool) such as Cacti and Munin are quite popular for collecting the historical information on all aspects of the servers and presenting this information in an easy-to-follow graphical form. Seeing several statistics on the same timescale can really help when trying to figure out why the system is behaving the way it is. Another popular open source solution is Ganglia, a distributed monitoring solution particularly suitable for environments with several servers and in multiple locations. Another aspect of monitoring is getting alerts when something goes really wrong and needs (immediate) attention. For alerting, one of the most widely used tools is Nagios, with its fork (Icinga) being an emerging solution. The aforementioned trending tools can integrate with Nagios. However, if you need a solution for both the alerting and trending aspects of a monitoring tool, you might want to look into Zabbix. Then, of course, there is Simple Network Management Protocol (SNMP), which is supported by a wide array of commercial monitoring solutions. Basic support for monitoring PostgreSQL through SNMP is found in pgsnmpd. This project does not seem very active though. However, you can find more information about pgsnmpd and download it from http://pgsnmpd.projects.postgresql.org/. Providing PostgreSQL information to monitoring tools Historical monitoring information is best to use when all of it is available from the same place and at the same timescale. Most monitoring systems are designed for generic purposes, while allowing application and system developers to integrate their specific checks with the monitoring infrastructure. This is possible through a plugin architecture. Adding new kinds of data inputs to them means installing a plugin. Sometimes, you may need to write or develop this plugin, but writing a plugin for something such as Cacti is easy. You just have to write a script that outputs monitored values in simple text format. In most common scenarios, the monitoring system is centralized and data is collected directly (and remotely) by the system itself or through some distributed components that are responsible for sending the observed metrics back to the main node. As far as PostgreSQL is concerned, some useful things to include in graphs are the number of connections, disk usage, number of queries, number of WAL files, most numbers from pg_stat_user_tables and pg_stat_user_indexes, and so on, as shown here: An example of a dashboard in Cacti The preceding Cacti screenshot includes data for CPU, disk, and network usage; pgbouncer connection pooler; and the number of PostgreSQL client connections. As you can see, they are nicely correlated. One Swiss Army knife script, which can be used from both Cacti and Nagios/Icinga, is check_postgres. It is available at http://bucardo.org/wiki/Check_postgres. It has ready-made reporting actions for a large array of things worth monitoring in PostgreSQL. For Munin, there are some PostgreSQL plugins available at the Munin plugin repository at https://github.com/munin-monitoring/contrib/tree/master/plugins/postgresql. The following screenshot shows a Munin graph about PostgreSQL buffer cache hits for a specific database, where cache hits (blue line) dominate reads from the disk (green line): Finding more information about generic monitoring tools Setting up the tools themselves is a larger topic. In fact, each of these tools has more than one book written about them. The basic setup information and the tools themselves can be found at the following URLs: RRDtool: http://www.mrtg.org/rrdtool/ Cacti: http://www.cacti.net/ Ganglia: http://ganglia.sourceforge.net/ Icinga: http://www.icinga.org Munin: http://munin-monitoring.org/ Nagios: http://www.nagios.org/ Zabbix: http://www.zabbix.org/ Real-time viewing using pgAdmin You can also use pgAdmin to get a quick view of what is going on in the database. For better control, you need to install the adminpack extension in the destination database, by issuing this command: CREATE EXTENSION adminpack; This extension is a part of the additionally supplied modules of PostgreSQL (aka contrib). It provides several administration functions that PgAdmin (and other tools) can use in order to manage, control, and monitor a Postgres server from a remote location. Once you have installed adminpack, connect to the database and then go to Tools | Server Status. This will open a window similar to what is shown in the following screenshot, reporting locks and running transactions: Loading data from flat files Loading data into your database is one of the most important tasks. You need to do this accurately and quickly. Here's how. Getting ready You'll need a copy of pgloader, which is available at http://github.com/dimitri/pgloader. At the time of writing this article, the current stable version is 3.1.0. The 3.x series is a major rewrite, with many additional features, and the 2.x series is now considered obsolete. How to do it… PostgreSQL includes a command named COPY that provides the basic data load/unload mechanism. The COPY command doesn't do enough when loading data, so let's skip the basic command and go straight to pgloader. To load data, we need to understand our requirements, so let's break this down into a step-by-step process, as follows: Identify the data files and where they are located. Make sure that pgloader is installed at the location of the files. Identify the table into which you are loading, ensure that you have the permissions to load, and check the available space. Work out the file type (fixed, text, or CSV) and check the encoding. Specify the mapping between columns in the file and columns on the table being loaded. Make sure you know which columns in the file are not needed—pgloader allows you to include only the columns you want. Identify any columns in the table for which you don't have data. Do you need them to have a default value on the table, or does pgloader need to generate values for those columns through functions or constants? Specify any transformations that need to take place. The most common issue is date formats, though possibly there may be other issues. Write the pgloader script. pgloader will create a log file to record whether the load has succeeded or failed, and another file to store rejected rows. You need a directory with sufficient disk space if you expect them to be large. Their size is roughly proportional to the number of failing rows. Finally, consider what settings you need for performance options. This is definitely last, as fiddling with things earlier can lead to confusion when you're still making the load work correctly. You must use a script to execute pgloader. This is not a restriction; actually it is more like best practice, because it makes it much easier to iterate towards something that works. Loads never work the first time, except in the movies! Let's look at a typical example from pgloader's documentation—the example.load file: LOAD CSV FROM 'GeoLiteCity-Blocks.csv' WITH ENCODING iso-646-us HAVING FIELDS ( startIpNum, endIpNum, locId ) INTO postgresql://user@localhost:54393/dbname?geolite.blocks TARGET COLUMNS ( iprange ip4r using (ip-range startIpNum endIpNum), locId ) WITH truncate, skip header = 2, fields optionally enclosed by '"', fields escaped by backslash-quote, fields terminated by 't' SET work_mem to '32 MB', maintenance_work_mem to '64 MB'; We can use the load script like this: pgloader --summary summary.log example.load How it works… pgloader copes gracefully with errors. The COPY command loads all rows in a single transaction, so only a single error is enough to abort the load. pgloader breaks down an input file into reasonably sized chunks, and loads them piece by piece. If some rows in a chunk cause errors, then pgloader will split it iteratively until it loads all the good rows and skips all the bad rows, which are then saved in a separate "rejects" file for later inspection. This behavior is very convenient if you have large data files with a small percentage of bad rows; for instance, you can edit the rejects, fix them, and finally, load them with another pgloader run. Versions 2.x of pgloader were written in Python and connected to PostgreSQL through the standard Python client interface. Version 3.x is written in Common Lisp. Yes, pgloader is less efficient than loading data files using a COPY command, but running a COPY command has many more restrictions: the file has to be in the right place on the server, has to be in the right format, and must be unlikely to throw errors on loading. pgloader has additional overhead, but it also has the ability to load data using multiple parallel threads, so it can be faster to use as well. pgloader's ability to call out to reformat functions is often essential in most cases; straight COPY is just too simple. pgloader also allows loading from fixed-width files, which COPY does not. There's more… If you need to reload the table completely from scratch, then specify the –WITH TRUNCATE clause in the pgloader script. There are also options to specify SQL to be executed before and after loading the data. For instance, you may have a script that creates the empty tables before, or you can add constraints after, or both. After loading, if we have load errors, then there will be some junk loaded into the PostgreSQL tables. It is not junk that you can see, or that gives any semantic errors, but think of it more like fragmentation. You should think about whether you need to add a VACUUM command after the data load, though this will make the load take possibly much longer. We need to be careful to avoid loading data twice. The only easy way of doing that is to make sure that there is at least one unique index defined on every table that you load. The load should then fail very quickly. String handling can often be difficult, because of the presence of formatting or nonprintable characters. The default setting for PostgreSQL is to have a parameter named standard_conforming_strings set to off, which means that backslashes will be assumed to be escape characters. Put another way, by default, the n string means line feed, which can cause data to appear truncated. You'll need to turn standard_conforming_strings to on, or you'll need to specify an escape character in the load-parameter file. If you are reloading data that has been unloaded from PostgreSQL, then you may want to use the pg_restore utility instead. The pg_restore utility has an option to reload data in parallel, -j number_of_threads, though this is only possible if the dump was produced using the custom pg_dump format. This can be useful for reloading dumps, though it lacks almost all of the other pgloader features discussed here. If you need to use rows from a read-only text file that does not have errors, and you are using version 9.1 or later of PostgreSQL, then you may consider using the file_fdw contrib module. The short story is that it lets you create a "virtual" table that will parse the text file every time it is scanned. This is different from filling a table once and for all, either with COPY or pgloader; therefore, it covers a different use case. For example, think about an external data source that is maintained by a third party and needs to be shared across different databases. You may wish to send an e-mail to Dimitri Fontaine, the current author and maintainer of most of pgloader. He always loves to receive e-mails from users. Summary PostgreSQL provides a lot of features, which make it the most advanced open source database. Resources for Article: Further resources on this subject: Getting Started with PostgreSQL [article] Installing PostgreSQL [article] PostgreSQL – New Features [article]

0
0
3530

Packt

06 May 2015

11 min read

Introduction to Hadoop

Packt

06 May 2015

11 min read

In this article by Shiva Achari, author of the book Hadoop Essentials, you'll get an introduction about Hadoop, its uses, and advantages (For more resources related to this topic, see here.) Hadoop In big data, the most widely used system is Hadoop. Hadoop is an open source implementation of big data, which is widely accepted in the industry, and benchmarks for Hadoop are impressive and, in some cases, incomparable to other systems. Hadoop is used in the industry for large-scale, massively parallel, and distributed data processing. Hadoop is highly fault tolerant and configurable to as many levels as we need for the system to be fault tolerant, which has a direct impact to the number of times the data is stored across. As we have already touched upon big data systems, the architecture revolves around two major components: distributed computing and parallel processing. In Hadoop, the distributed computing is handled by HDFS, and parallel processing is handled by MapReduce. In short, we can say that Hadoop is a combination of HDFS and MapReduce, as shown in the following image: Hadoop history Hadoop began from a project called Nutch, an open source crawler-based search, which processes on a distributed system. In 2003–2004, Google released Google MapReduce and GFS papers. MapReduce was adapted on Nutch. Doug Cutting and Mike Cafarella are the creators of Hadoop. When Doug Cutting joined Yahoo, a new project was created along the similar lines of Nutch, which we call Hadoop, and Nutch remained as a separate sub-project. Then, there were different releases, and other separate sub-projects started integrating with Hadoop, which we call a Hadoop ecosystem. The following figure and description depicts the history with timelines and milestones achieved in Hadoop: Description 2002.8: The Nutch Project was started 2003.2: The first MapReduce library was written at Google 2003.10: The Google File System paper was published 2004.12: The Google MapReduce paper was published 2005.7: Doug Cutting reported that Nutch now uses new MapReduce implementation 2006.2: Hadoop code moved out of Nutch into a new Lucene sub-project 2006.11: The Google Bigtable paper was published 2007.2: The first HBase code was dropped from Mike Cafarella 2007.4: Yahoo! Running Hadoop on 1000-node cluster 2008.1: Hadoop made an Apache Top Level Project 2008.7: Hadoop broke the Terabyte data sort Benchmark 2008.11: Hadoop 0.19 was released 2011.12: Hadoop 1.0 was released 2012.10: Hadoop 2.0 was alpha released 2013.10: Hadoop 2.2.0 was released 2014.10: Hadoop 2.6.0 was released Advantages of Hadoop Hadoop has a lot of advantages, and some of them are as follows: Low cost—Runs on commodity hardware: Hadoop can run on average performing commodity hardware and doesn't require a high performance system, which can help in controlling cost and achieve scalability and performance. Adding or removing nodes from the cluster is simple, as an when we require. The cost per terabyte is lower for storage and processing in Hadoop. Storage flexibility: Hadoop can store data in raw format in a distributed environment. Hadoop can process the unstructured data and semi-structured data better than most of the available technologies. Hadoop gives full flexibility to process the data and we will not have any loss of data. Open source community: Hadoop is open source and supported by many contributors with a growing network of developers worldwide. Many organizations such as Yahoo, Facebook, Hortonworks, and others have contributed immensely toward the progress of Hadoop and other related sub-projects. Fault tolerant: Hadoop is massively scalable and fault tolerant. Hadoop is reliable in terms of data availability, and even if some nodes go down, Hadoop can recover the data. Hadoop architecture assumes that nodes can go down and the system should be able to process the data. Complex data analytics: With the emergence of big data, data science has also grown leaps and bounds, and we have complex and heavy computation intensive algorithms for data analysis. Hadoop can process such scalable algorithms for a very large-scale data and can process the algorithms faster. Uses of Hadoop Some examples of use cases where Hadoop is used are as follows: Searching/text mining Log processing Recommendation systems Business intelligence/data warehousing Video and image analysis Archiving Graph creation and analysis Pattern recognition Risk assessment Sentiment analysis Hadoop ecosystem A Hadoop cluster can be of thousands of nodes, and it is complex and difficult to manage manually, hence there are some components that assist configuration, maintenance, and management of the whole Hadoop system. In this article, we will touch base upon the following components: Layer Utility/Tool name Distributed filesystem Apache HDFS Distributed programming Apache MapReduce Apache Hive Apache Pig Apache Spark NoSQL databases Apache HBase Data ingestion Apache Flume Apache Sqoop Apache Storm Service programming Apache Zookeeper Scheduling Apache Oozie Machine learning Apache Mahout System deployment Apache Ambari All the components above are helpful in managing Hadoop tasks and jobs. Apache Hadoop The open source Hadoop is maintained by the Apache Software Foundation. The official website for Apache Hadoop is http://hadoop.apache.org/, where the packages and other details are described elaborately. The current Apache Hadoop project (version 2.6) includes the following modules: Hadoop common: The common utilities that support other Hadoop modules Hadoop Distributed File System (HDFS): A distributed filesystem that provides high-throughput access to application data Hadoop YARN: A framework for job scheduling and cluster resource management Hadoop MapReduce: A YARN-based system for parallel processing of large datasets Apache Hadoop can be deployed in the following three modes: Standalone: It is used for simple analysis or debugging. Pseudo distributed: It helps you to simulate a multi-node installation on a single node. In pseudo-distributed mode, each of the component processes runs in a separate JVM. Instead of installing Hadoop on different servers, you can simulate it on a single server. Distributed: Cluster with multiple worker nodes in tens or hundreds or thousands of nodes. In a Hadoop ecosystem, along with Hadoop, there are many utility components that are separate Apache projects such as Hive, Pig, HBase, Sqoop, Flume, Zookeper, Mahout, and so on, which have to be configured separately. We have to be careful with the compatibility of subprojects with Hadoop versions as not all versions are inter-compatible. Apache Hadoop is an open source project that has a lot of benefits as source code can be updated, and also some contributions are done with some improvements. One downside for being an open source project is that companies usually offer support for their products, not for an open source project. Customers prefer support and adapt Hadoop distributions supported by the vendors. Let's look at some Hadoop distributions available. Hadoop distributions Hadoop distributions are supported by the companies managing the distribution, and some distributions have license costs also. Companies such as Cloudera, Hortonworks, Amazon, MapR, and Pivotal have their respective Hadoop distribution in the market that offers Hadoop with required sub-packages and projects, which are compatible and provide commercial support. This greatly reduces efforts, not just for operations, but also for deployment, monitoring, and tools and utility for easy and faster development of the product or project. For managing the Hadoop cluster, Hadoop distributions provide some graphical web UI tooling for the deployment, administration, and monitoring of Hadoop clusters, which can be used to set up, manage, and monitor complex clusters, which reduce a lot of effort and time. Some Hadoop distributions which are available are as follows: Cloudera: According to The Forrester Wave™: Big Data Hadoop Solutions, Q1 2014, this is the most widely used Hadoop distribution with the biggest customer base as it provides good support and has some good utility components such as Cloudera Manager, which can create, manage, and maintain a cluster, and manage job processing, and Impala is developed and contributed by Cloudera which has real-time processing capability. Hortonworks: Hortonworks' strategy is to drive all innovation through the open source community and create an ecosystem of partners that accelerates Hadoop adoption among enterprises. It uses an open source Hadoop project and is a major contributor to Hadoop enhancement in Apache Hadoop. Ambari was developed and contributed to Apache by Hortonworks. Hortonworks offers a very good, easy-to-use sandbox for getting started. Hortonworks contributed changes that made Apache Hadoop run natively on the Microsoft Windows platforms including Windows Server and Microsoft Azure. MapR: MapR distribution of Hadoop uses different concepts than plain open source Hadoop and its competitors, especially support for a network file system (NFS) instead of HDFS for better performance and ease of use. In NFS, Native Unix commands can be used instead of Hadoop commands. MapR have high availability features such as snapshots, mirroring, or stateful failover. Amazon Elastic MapReduce (EMR): AWS's Elastic MapReduce (EMR) leverages its comprehensive cloud services, such as Amazon EC2 for compute, Amazon S3 for storage, and other services, to offer a very strong Hadoop solution for customers who wish to implement Hadoop in the cloud. EMR is much advisable to be used for infrequent big data processing. It might save you a lot of money. Pillars of Hadoop Hadoop is designed to be highly scalable, distributed, massively parallel processing, fault tolerant and flexible and the key aspect of the design are HDFS, MapReduce and YARN. HDFS and MapReduce can perform very large scale batch processing at a much faster rate. Due to contributions from various organizations and institutions Hadoop architecture has undergone a lot of improvements, and one of them is YARN. YARN has overcome some limitations of Hadoop and allows Hadoop to integrate with different applications and environments easily, especially in streaming and real-time analysis. One such example that we are going to discuss are Storm and Spark, they are well known in streaming and real-time analysis, both can integrate with Hadoop via YARN. Data access components MapReduce is a very powerful framework, but has a huge learning curve to master and optimize a MapReduce job. For analyzing data in a MapReduce paradigm, a lot of our time will be spent in coding. In big data, the users come from different backgrounds such as programming, scripting, EDW, DBA, analytics, and so on, for such users there are abstraction layers on top of MapReduce. Hive and Pig are two such layers, Hive has a SQL query-like interface and Pig has Pig Latin procedural language interface. Analyzing data on such layers becomes much easier. Data storage component HBase is a column store-based NoSQL database solution. HBase's data model is very similar to Google's BigTable framework. HBase can efficiently process random and real-time access in a large volume of data, usually millions or billions of rows. HBase's important advantage is that it supports updates on larger tables and faster lookup. The HBase data store supports linear and modular scaling. HBase stores data as a multidimensional map and is distributed. HBase operations are all MapReduce tasks that run in a parallel manner. Data ingestion in Hadoop In Hadoop, storage is never an issue, but managing the data is the driven force around which different solutions can be designed differently with different systems, hence managing data becomes extremely critical. A better manageable system can help a lot in terms of scalability, reusability, and even performance. In a Hadoop ecosystem, we have two widely used tools: Sqoop and Flume, both can help manage the data and can import and export data efficiently, with a good performance. Sqoop is usually used for data integration with RDBMS systems, and Flume usually performs better with streaming log data. Streaming and real-time analysis Storm and Spark are the two new fascinating components that can run on YARN and have some amazing capabilities in terms of processing streaming and real-time analysis. Both of these are used in scenarios where we have heavy continuous streaming data and have to be processed in, or near, real-time cases. The example which we discussed earlier for traffic analyzer is a good example for use cases of Storm and Spark. Summary In this article, we explored a bit about Hadoop history, finally migrating to the advantages and uses of Hadoop. Hadoop systems are complex to monitor and manage, and we have separate sub-projects' frameworks, tools, and utilities that integrate with Hadoop and help in better management of tasks, which are called a Hadoop ecosystem. Resources for Article: Further resources on this subject: Hive in Hadoop [article] Hadoop and MapReduce [article] Evolution of Hadoop [article]

0
0
3178

article-image-provisioning-docker-containers

Xavier Bruhiere

06 May 2015

10 min read

Provisioning Docker Containers

Xavier Bruhiere

06 May 2015

10 min read

Docker containers are spreading fast. They're sneaking into our development environments, production servers, and the proliferation of links in this post emphasizes how hot the topic currently is. Containers encapsulate applications into a portable machine you can easily build, control and ship. It brings most of the modern services just one command away, clean development environments, and agile production infrastructure, to name a few of the benefits. While getting started is insanely easy, real life applications can be tricky when you try to push a bit on the boundaries. In this post, we're going to study how to provision docker containers, prototyping along the way our very own image builder. Hopefully, by the end of this post, you should get a good idea of the challenges and opportunities involved. As of today, docker hub features 15,000 images and the most popular one was downloaded 3.588.280 times. We better be good at crafting them! Configuration First thing first, we need a convenient way to describe how to build the application. This is what files like travis.yml exactly aim to, so here is a good place to start. # The official base image to use language: python # Container build steps install: # For the sake of genericity, we introduce support for templating {% for pkg in dependencies %} - pip install {{ pkg }} {% endfor %} # Validating the build script: - pylint {{ project }} tests/ - nosetests --with-coverage --cover-package {{ project }} Yaml formatting is also a decent choice, easily processed both by humans and machines (and I think this is something Ansible and Salt get right in configuration management). I'm also biased toward python for exploration, so here is the code to load the information into our program. # run (sudo) pip install pyyaml==3.11 jinja2==2.7.3 import jinja2, yaml def load_manifest(filepath, **properties): tpl = jinja2.Template(open('travis.yml').read()) return yaml.load(tpl.render(**properties)) This setup gives us the simplest configuration interface ever (files), version control for our build, centralized view of container definitions, trivial management, easy integration for future tools like, say, container provisioning. You can already enjoy those benefits with projects built by hashicorp or with the application container specification. While I plan to borrow a lot of the concepts behind the latter, we don't need this level of precision nor to constrain our code to their layout conventions. Regarding tools like packer, they're oversized here, although we already took some inspiration from them : configuration as template files. Model So far so good. We have a nice dictionary, describing a simple application. However I propose to transcribe this structure into a directed graph. It will bring hierarchical order to the steps, and whenever we parallelize them, like independent tasks or tests, we will simply branch out. class Node(object): def __init__(self, tag, **properties): # Node will be processed later. The tag provided here will indicate how to self.tag = tag self.props = properties # Children nodes connected to this one self.outgoings = [] class Graph(object): def __init__(self, startnode): self.nodes = [startnode] def connect(self, node, *child_nodes): for child in child_nodes: node.outgoings.append(child) self.nodes.append(child) def walk(self, node_ptr, callback): callback(node_ptr) for node in node_ptr.outgoings: # Recursively follow nodes self.walk(node, callback) Starting from the data we previously loaded, we finally model our application into a suitable structure. def build_graph(data, artifact): # Initialization node_ptr = Node("start", image=data["language"]) graph = Graph(node_ptr) # Provision for task in data["install"]: task_node = Node("task", command=task) graph.connect(node_ptr, task_node) node_ptr = task_node # Validation, on a different branch test_node_ptr = node_ptr for test in data["script"]: test_node = Node("test", command=test) graph.connect(node_ptr, test_node) test_node_ptr = test_node # Finalization graph.connect(node_ptr, Node("commit", repo=artifact)) return graph Build Flow While our implementation is really naive, we now have a convenient structure to work on. Keeping up with our fictional model, the following graph represents the build workflow as a simple Finite State Machine. !container fsm Some remarks : * travis.yml steps, i.e. graph nodes, became events. * We handle caching like docker build does. A new container is only started when a new task is received. Pieces begin to come in place. The walk() method of the Graph is a perfect fit to emit events and the state machine is a robust solution to safely manage a container life-cycle with a conditional start. As a bonus point, it decouples the data model and the build process (loosely coupled components are cool). Execution In order to focus on provisioning issues instead of programmatic implementations, however, we're going to prefer the _good enough_ Factory below. # pip install docker-py==1.1.0 import docker class Factory(object): """ Manage the build workflow. """" def __init__(self, endpoint=None): endpoint = endpoint or os.environ.get('DOCKER_HOST', 'unix://var/run/docker.sock') self.conn = docker.Client(endpoint) self.container = None def start(self, image): self.container = self.conn.create_container(image=image, command='sleep 360') self.conn.start(self.container['Id']) def provision(self, command): self.conn.execute(self.container['Id'], command) def teardown(self, artifact): self.conn.commit(self.container['Id'], repository='my/container', tag='awesome') self.conn.stop(self.container['Id']) self.conn.remove_container(self.container['Id']) def callback(self, node): #print("[factory] new step {}: {}".format(node.tag, node.props)) if node.tag == "start": self.start_container(node.props["image"]) elif node.tag == "task": self.provision(node.props["command"]) elif node.tag == "commit": self.teardown_container(node.props["repo"]) We leverage docker exec feature to run commands inside the container. This approach gives us an important asset: 0 requirements on the target to make it work with our project. We're compatible with every container and we have nothing to pre-install, i.e. no overhead and no extra bytes for our final image. At this point, you should be able to synthetize a cute, completely useless, little python container. data = load_manifest('travis.yml', project='factory', packages=['requests', 'ipython']) graph = build_graph(data, "test/factory") graph.walk(graph.nodes[0], Factory().callback) Getting smarter As mentioned, docker cli optimizes subsequent builds by skipping previous successful steps, speeding up development workflow. But it also has its flaws. What if we could run commands with strong security guarantees and we know to be pinned at the exact same version, across different run? Basically, we want reliable, reproducible builds and tools like Snappy and Nix come handy for the task. Both solutions ensures the security and the stability of what we're provisioning, avoiding side effects on/from other unrelated os components. Going further Our modest tool takes shape, but we're still lacking an important feature: copying files from the host inside the container (code, configuration files). The former is straightforward as docker supports mapping volumes. The latter can be solved by what I think is an elegant solution, powered by consul-template and explained below. * First we build a container full of useful binaries our future other containers may need (at least consul-template). FROM scratch MAINTAINER Xavier Bruhiere <xavier.bruhiere@gmail.com> # This directory contains the programs ADD ./tools /tools # And we expose it to the world VOLUME /tools WORKDIR /tools ENTRYPOINT ["/bin/sh"] docker build -t factory/toolbox . # It just needs to exist to be available, not even run docker create --name toolbox factory/toolbox * We make those tools available by mapping the toolbox to the target container. This is in fact a common practice known as data containers. self.conn.start(self.container['Id'], volumes_from='toolbox') * Files, optionally being go templates, are grouped inside a directory on the host, along with a configuration specifying where to move them. The project's readme explains it all. * Finally we insert the following task before the others to perform the copy, rendering templates in the process with values from consul key/value store. cmd = '/tools/consul-template -config /root/app/templates/template.hcl -consul 192.168.0.17:8500 -once' task_node = Node("task", command=cmd) graph.connect(node_ptr, task_node) We now know how to provide useful binary tools and any parametric file inside the build. ### Base image Keeping our tools outside the container let us factorize common utilities and avoid fat images. But we can go further and take a special look to the base we're using. Small images improve download, build speed and therefore are much easier to deal with, both for development and production. Projects like docker-alpine try to define the minimal common ground for applications, while unikernels want to compile and link necessary os components along with the app to produce an artefact ultra specialized (and we can go even further and strip down the final image). Those philosophies also limit maintenance overhead (less moving parts reduce side effects and unexpected behaviors), attack surface and are especially efficient when keeping a single responsibility per container (not necessarily a single process, though). Having a common base image is also a good opportunity to solve one and for all some issues with docker defaults, like phusion suggests. On the other hand, using a common layer for all future builds prevents us from exploiting community creations. Official language images allows one to quickly containerize its application on top of solid ground. As always, it really depends on the use case. Brainstorm of improvements What's more, here is a totally non-exhaustive list of ideas to push further our investigation : Container engine agnostic : who knows who will be the big player tomorrow. Instead of a docker client we could implement drivers for [rkt]() or [lxd](). We could also split the Factory into an engine and a provisioner components. Since we fully control the build flow, we could change the graph walker callback into an interactive prompt to manually build, debug and inspect the container. Given multiple apps and remote docker endpoints, builds could be parallel and distributed. We could modify our load_manifest function to recursively load other manifest required. With reusable modules we could share the best ones (much like Ansible-galaxy). Built-in integration tests with the help of docker-compose and third party containers Currently, the container is launched with a sleep command. We could instead place terminus within our toolbox and use it at runtime to gather host information and eventually reuse it in our templates (again, very similar to Salt pillars for example). Wrapping up We merely scratched the surface of container provisioning but yet, there are plenty of exciting opportunities for supporting developers' efficiency. While the fast progresses in container technologies might seem overwhelming, I hope the directions provided here gave you a modest overview of what is happening. There are a lot of open questions and interesting experiments, so I encourage you to be part of it ! About the Author Xavier Bruhiere is the CEO of Hive Tech. He contributes to many community projects, including Occulus Rift, Myo, Docker and Leap Motion. In his spare time he enjoys playing tennis, the violin and the guitar. You can reach him at @XavierBruhiere.

0
0
17432

How-To Tutorials

article-image-why-big-data-financial-sector

Packt

06 May 2015

7 min read

Why Big Data in the Financial Sector?

Packt

06 May 2015

7 min read

0
0
2627

Packt

06 May 2015

21 min read

Preparing our Solution

Packt

06 May 2015

21 min read

This article by Simon Buxton and Mat Fergusson, the authors of Microsoft Dynamics AX 2012 R3 Programming – Getting Started, covers the preparation work required before we start cutting code. Some parts of this may be skipped or reduced, depending on the scale of the development. This article does not cover the installation and configuration of the required environments; it is assumed that this is already done. We also assume that our development environment has the AX client, management tools, and Visual Studio 2010 professional installed. If you are using cumulative update 8 (CU8), you need to use Visual Studio 2013 Professional. If we are to use Team Foundation Server (TFS), each developer must have their own development environment. Typically, we will have a virtual server as a single box AX installation. We will cover the following topics in this article: Creating the models Designing the technical solution (For more resources related to this topic, see here.) Creating the models The models required depend on your organization's requirements. In this section, we will create the models based on our Fleet Management System, from a customer or end user perspective. Models should also include your prefix. In this case, we will use the Con prefix for Contoso. We will create our models in the USR layer, as explained later in this section. ISV models ISVs will normally have a base model that contains shared code between all models, and a model per add-on or vertical solution. Some care is required in ensuring that there isn't a circular dependency chain between models; that is, both reference each other's models, requiring the installation to have special instructions. By following the naming convention of prefixing elements—an ISV with the Axp prefix and an add-on named Documental—they can name the model AxpDocumental. VAR models If we are a VAR building a solution to customer-specific requirements, we will have three models: one for the actual modifications, another for changes to security, and the third for the labels. For example, if our prefix is Bcl and the customer is Contoso, we will have BclContoso, BclContosoLabels, and BclContosoSecurity. Creating security in a separate model is not mandatory, but helps when implementing projects with the Sure Step methodology because it allows security to be worked on in separate streams. Customer or end user models If we are a global organization, with separate Dynamics AX installations, we may decide to develop a central application, which is then installed on each site. In this case, we will have three models placed in the CUS layer. For the Contoso example, we have ConGlobalApplication, ConGlobalLabels, and ConGlobalSecurity. The most common scenario is to host Dynamics AX centrally, therefore having one application. The same three models are required, but this time in the USR layer: ConApplication, ConLabels, and ConSecurity. It may make sense to place distinct sets of functionality in separate models, and this would certainly help in managing separate development streams. Over time, the models tend to develop cyclic dependencies, which require them to be merged in order to ensure that a complete set of code is deployed. In our example, we will create a model for a specific functionality—our Fleet Management System—and it will make sense to hold it in a separate stream. However, it would be a particularly bad idea to hold each module's modifications in its own model. Creating the models Before we create the models, we must be in the correct layer; the USR layer in this case. The model creation is done by following these steps: Open the Dynamics AX Client, and open the development environment (Ctrl + D). From the main menu, go to Tools | Model management | Create model. Complete the Create model form as shown here: Field Description Model name This is the name of the model. It can contain spaces, but it is normally the same as the display name. Model Publisher Your organization or department. Layer The layer the model should be created in. This should be the current layer. Version The version of the model. This becomes part of the strong name of the model during signing. Model description Long description of the model. It is a good idea to link this to a functional or technical design document. Leave Set as current model checked and press OK. An example of a completed Create model form is shown in the following screenshot: If we have version control enabled for TFS, we will also ask for the Model repository folder. It will suggest, in our example C:ProjectsVCSAX6015GettingStarted<Model folder>. Replace <Model folder> with the model name, as shown in the following screenshot: Using the preceding instructions as a guide, we will need to create the following models. Usefully, AX will remember the previous information, so we only need to populate Model name and Model display name: ConFleetManagement ConLabels ConFleetManagementSecurity Designing the technical solution In most implementations, we have several roles involved in the solution design, build, and implementation. Our role is to design and develop a technical solution to a business requirement, and as discussed earlier, we will follow the design and build of a Fleet Management System. The first steps in this are to analyze the business requirement and design a solution within Dynamics AX. This work will typically be led by a consultant, who will (in short) perform the following: Match the business requirements to the AX functionality. The requirement may require new functionality or an extension of existing functionality. Discuss the technical solution with a technical consultant/developer in order to design a solution that is feasible in AX in a suitable time frame. Work through the solution with the solution architect to ensure that it fits in the overall solution design. Create functional design documents. These will be signed by the customer stakeholders and process owners. The consultant may propose table structures as parts of the functional design, but these are only to reinforce the requirement. The technical designer may find a more appropriate solution to this. The process is intended to leverage the skills of all parties in the solution delivery, allowing all parties to use their skills by abstracting the solution. Here is its summary: The customer or process owner understands the process The consultant is an expert in AX and focuses on the solution, creating a FDD The solution architect validates the FDD against the solution design The technical architect (or lead/analyst developer) creates the technical design while validating that it fits with the overall technical solution These roles often merge, but there should always be a separation of business requirement definition and technical design definition. This freedom over the technical design does not mean we have total freedom over the technical solution. At all levels, it has to both match the original requirement and fit in with the overall solution. Just because it is technically cool does not mean it is appropriate. Our purpose is to create a technical solution to a business requirement. We will evolve the design throughout this article, and the reason for each decision will be explained. It is important to understand and follow these design goals: Upgrades and system maintainability: Minimize the footprint on standard AX. Design for code reuse: This could span from creating a general framework to a useful static function on a global class. Design for a service-oriented architecture: Always consider that your code might be used as a service or as part of a service. This paradigm also promotes code reuse. Validate the design: Always validate the technical design against the original requirement, which is a very common cause of time overruns and cost. A prototype can be useful for this. Use design patterns: Do not reinvent the wheel. Patterns save design time, reduce mistakes, and promote a solution that better conforms to best practice. The technical design will include decisions on what technologies, frameworks, and patterns we will use. We may revert some decisions later on, but the majority we need to be sure we get right first time. One such design element are the data structures. Once we start using them in code and the UI, it makes any changes to this more and more difficult. Some elements can't easily be reverted, such as whether to use table inheritance or not. Table inheritance is a little like class inheritance. For example, we may have a core vehicle table, and specialized tables that inherit its properties (fields) and methods. As a more specific example, an articulated truck will have different attributes compared to a company car. Data structure design considerations The data structure architecture within Dynamics AX is breathtaking. When designing the technical structure, the tables and views should be considered along with classes as part of your static structure. We are not making the classes persist in the database. The table definition may be designed on object-oriented (OO) principles, but we are using this to define physical tables that are transacted on reliably. The key concepts within this are described in the following sections. Extended data types In traditional database design, a field tended to be one of a limited set of primitive types, such as string, number, and so on. The extended data type (EDT) system in AX allows us to define types with extended properties. With this, we can control the following categories of properties: Appearance For example, the label, help text, size (string), alignment, and other type-specific properties. Behavior Direction, for example, RTL and presence information. Business intelligence Information used by the system to generate the OLAP database. Data A reference table, internal ID information, or a reference form (the form used to open the record identified by the database relation). Relationships For the example of the ItemId EDT, this would specify that this EDT references InventTable.ItemId. In this way, the system knows when the EDT is associated with a field on a child table and which table and record it references. Additionally, the table will often have a reference to the form that is used to edit the data, allowing the user to quickly navigate to the details form. The specific properties aren't important for now, but understanding the concept is. Using EDTs ensures database consistency (primarily type and size) and user interface consistency (label, help text, and so on). This is done with very little effort, as we only have to change the EDT properties. Even more powerfully, changing the size of an EDT will change the size of all fields that reference it. Therefore, we will always use an EDT when creating a field, and almost always use EDTs as variable declarations and method parameters. We can override most of the other properties on the table field, but we rarely do this. A key benefit is that we can control these properties with little effort, ensuring consistency throughout the user interface. In some cases, we need to have a minor difference; for instance, we may wish to change a label when used in a specific context. Rather than changing the field label, it is better to create a new EDT that extends the primary EDT. It is possible to change standard EDTs, but great care must be taken, as we need to know the full effects of the change. Base enums – enumerated types Base enums are what is more commonly known as enumerated types. They provide a list of options that are stored as a number in the database; the user interface will always display the corresponding text. They are equivalent to integers, and can be cast between the integer and the symbol (text). Enums are great for status fields, where we need to have code written against a specific value. Writing code against number or string literals is bad practice. Should the option not exist or be removed from the enum's definition, we will get a compilation error when the code is compiled. An example Enum is SalesStatus, which contains the following elements: Symbol Label (based on en-us) Enum Value None None 0 Backorder Open order 1 Delivered Delivered 2 Invoiced Invoiced 3 Canceled Canceled 4 You should always reference the symbol in the code. AX will understand that and translate it into the Enum value. Tables The table definitions stored within AX synchronize with the business database in SQL Server; often, this is automatic. No changes should be made to the SQL Server table definition, as this will be overwritten whenever the database is synchronized. The table definition information controls both the user interface and how the physical table in SQL Server is created. This includes table properties, field definitions, and indexes. Relationships and referential integrity constraints are not created within SQL Server; these are managed within AX. They control what happens to a child table when a record is deleted or the primary key is renamed. The field definitions also control both the user interface and the physical field in SQL Server. These are usually set by the EDT it is associated with. A key differentiator for tables is that we can create methods and override table event methods such as validateField, validateWrite, modifiedField, insert, delete, and so on. This allows us to place table-level validation and events on the table centrally and not on the interface. In AX 2012, we can now have inheritance within tables and valid date/time states. Inheritance will allow us to have a base table, such as a vehicle table, and specialized tables that extend it, such as a bulk/loose product truck with silos and a vehicle designed to take pallets. The interface natively understands this relationship and can display records for all inherited tables in one grid control. The valid time state provides a way to version records. For example, as data about an employee changes, we can at any time ask the system what the data was at a specific point in time. The key consideration here in the design is to determine the structure and events we need to handle, which in turn drives the required EDTs and base enums. Views Views are SQL view definitions, created by constructing a query of one or more tables. They can contain aggregates and also calculate view fields, which are essentially a piece of Transact-SQL that equates to a column in view. These are useful when flattening data from normalized tables. The only real drawback is that they are read-only views, and when they are placed on the user interface in a grid control, the grid control becomes read-only. This means that if we add a column from a related table that is editable, it will become read-only in the grid control. Maps Maps provide a method of sharing code for similar tables. A good example of this is the pricing logic for sales and purchase order lines that is handled by the SalesPurchLine map. Maps contain a list of reference fields, details of how these reference fields map to the actual table fields, and methods. This is best explained with an example, such as calculating the stocking unit quantity from the quantity ordered (for example, stored in cases and sold in each). Rather than write this code on the sales order line and the purchase order line, we can do this once using a map. On the SalesPurchLine map, there are fields for PurchSalesUnit, ItemId, and SalesPurchQty. They are mapped as follows: Map field SalesLine field PurchLine field ItemId ItemId ItemId PurchSalesUnit SalesUnit PurchUnit SalesPurchQty SalesQty PurchQty We can create a simplified method on the map that contains the following code snippet: InventQty calcQtyOrdered(Qty _qtySalesPurch = realMin()) { InventQty qty; InventTable inventTable; Qty qtySalesPurch = _qtySalesPurch; ; if (qtySalesPurch == realMin()) qtySalesPurch = this.SalesPurchQty; if (!qtySalesPurch) return 0; // this is actually calling a method that should exist // on the actual table, e.g SalesLine inventTable = this.inventTable(); qty = UnitConvert::qty(qtySalesPurch, this.PurchSalesUnit, inventTable.inventUnitId(), this.ItemId); return decround(qty,InventTable::inventDecimals(this.ItemId)); } On the sales and purchase order line table, we can call the preceding method. An example from the PurchLine table is as follows: AmountCur calcLineAmount(Qty qty = this.PurchQty) { AmountCur ret; if (this.LineDeliveryType != LineDeliveryType::OrderLineWithMultipleDeliveries) { ret = this.SalesPurchLine::calcLineAmount(qty); } return ret; } The map will then automatically construct itself using the field mapping. Hence, we call methods on the map as if they are static methods with the :: sign. So, when called from the sales order line, this.PurchSalesUnit becomes SalesLine.SalesUnit and this.SalesPurchQty becomes SalesLine.SalesQty. This can be a useful feature for reusing code across tables that provide similar functionalities. Classes Class definitions with AX provide functionality similar to C++, C#, or Java classes, in that they support inheritance and encapsulation. Interfaces can also be used and implemented in a way similar to C#. Classes are created for the following purposes: User interface interaction Table event handling Services General processing (for example data updates, batch routines, and so on) Although it is common to use a class to handle table events, the table itself will handle the interaction with the database. Form interaction classes are not mandatory for list pages, such as Accounts receivable | Common | Customers | All customers, but should be used on data entry forms that require logic. This ensures consistency and allows easier maintenance of logic. When designing the static (mainly table and class) structure, we should break the design down so that we can easily expose that task as a service. An example can be a class that takes a vehicle out of service. This may perform many tasks: checking whether it was planned on loads, replacing the vehicle on the basis of a rule set, changing the status of the vehicle, and so on. We may have other requirements to simply change the status of vehicle or associate suitable vehicles with unallocated loads. We have already written the code to change the status and the code to find a suitable vehicle. If we had classes for finding a suitable vehicle, changing status, and so on, we could've reused them, be it on a form or from code or linked to a service that could be used by a mobile application. The point here is that we should break discrete pieces of functionality down into separate classes, as it then become much easier to reuse later on. Forms Most forms will be built from templates, which help us provide a consistent user interface that will look and feel much like the rest of AX. The helps reduce training time, reduce user error, and improve end user acceptance. The following templates are available: A list page A details form—master A details form—transaction A simple list A simple list, with a details section A table of contents A dialog A drop dialog The list page is our main entry point to both master data (items, customers, and so on) and transactional forms, such as sales orders. The list page provides the user with a searchable list of records, with a button ribbon allowing the user to interact with the record, for example, Post sales invoice. They also provide key business intelligence about the current record, which means we don't have to navigate to the details form to make a decision on the record in question. There are two types of details forms: master and transaction. Master forms are like customer and item forms, while sales order details and purchase order details are transaction forms. Details forms are designed to be opened from a list page. Simple lists are useful for setup groups, where the form contains only a grid of a few fields. These are useful for simple lists of setup data. The simple list with details version contains a grid on the left and a details section that can be arranged into tabs to present many fields. Both of these form types are designed to be opened from the content area or menu. Tables of contents are designed to be used for parameter forms. Although they may act on more than one data source (table), the tables will typically have a single record. Dialog and drop dialog are similar in that both are designed to ask for limited information and then trigger an action. Both are usually called from another form. The difference is in how they are presented. The drop dialog will appear to drop down and be a part of the calling form, whereas a dialogs appears as a pop-up window. The difference is cosmetic, but drop dialogs are often preferred as they can't fall behind the current window. Designing test plans There are two main types of testing: unit testing and integration (process) testing. We are more concerned with unit testing. Unit testing primarily ensures that the code performs functions for the functional design. We may also have performance requirements, where we need to test the code under a simulated load. AX provides a method to do this through a test project, where we can extend the test framework to write specific test cases. These work well when simulating the load against the live hardware environment. We can use a range of performance tools to ascertain where performance bottlenecks may lie and correct them. It is better to know before "go live" that we have a bottleneck. Even with this framework in place, manual testing is often the best method, especially since we are typically writing code based on database and UI interaction. Let's take an example of a vehicle status change requirement. In this case, we will list the conditions that allow the status change to occur, and what should happen. Status changed to condition Result Available Status: created Vehicle: not acquired Error "Vehicle not yet acquired" Status remains unchanged Available Status: created Vehicle: acquired Success Status changed to Available Not available Status: created Vehicle: not acquired Error "Vehicle not yet acquired" Status remains unchanged We then test our code to ensure that these statuses are being followed. Because we have one class that handles the status change, the form button, service call, and code call should also work. "Should" does not mean "will" of course, so each should be tested individually. One of the biggest fears and causes of end user complaints is regression. The users involved in testing are usually key users or process owners, who are already busy with additional work brought on by an implementation. It is often their job to train their users, and "sell" the system's benefits; user buy-in is critical for successful user adoption. There are two causes of regression: code that breaks other code or a change to a process that is incompatible with another process. The latter is mitigated by getting a solution architect or lead consultant who is responsible for the solution as a whole. Code regression can be caused by the simplest change, and these changes are often the main cause of regression, as testing is often skipped in these cases. This is mitigated by thinking of testing as a component of the technical design, and having good technical documentation. The risks are further reduced if developer notes points where regression might occur, as the code is being written. Since the code that might be affected is commented with the TDD or FDD reference, it should be easy to locate the test plan to check for regression. Summary In this article, we covered the steps that we take to start up a new project. We covered both the theory and practical steps that are to be followed when starting work on a new solution. This includes creating a model and designing the technical solution. Resources for Article: Further resources on this subject: Where Is My Data and How Do I Get to It? [article] Consuming Web Services using Microsoft Dynamics AX [article] Training, Tools, and Next Steps [article]

0
0
1382

How-To Tutorials

article-image-solving-some-not-so-common-vcenter-issues

Packt

05 May 2015

7 min read

Solving Some Not-so-common vCenter Issues

Packt

05 May 2015

7 min read

In this article by Chuck Mills, author of the book vCenter Troubleshooting, we will review some of the not-so-common vCenter issues that administrators could face while they work with the vSphere environment. The article will cover the following issues and provide the solutions: The vCenter inventory shows no objects after you log in You get the VPXD must be stopped to perform this operation message Removing the vCenter plugins when they are no longer needed (For more resources related to this topic, see here.) Solving the problem of no objects in vCenter After successfully completing the vSphere 5.5 installation (not an upgrade) process with no error messages whatsoever, and logging in you log in to vCenter with the account you used for the installation. In this case, it is the local administrator account. Surprisingly, you are presented with an inventory of 0. The first thing is to make sure you have given vCenter enough time to start. Considering the previously mentioned account was the account used to install vCenter, you would assume the account is granted appropriate rights that allow you to manage your vCenter Server. Also consider the fact that you can log in and receive no objects from vCenter. Then, you might try logging in with your domain administrator account. This makes you wonder, What is going on here? After installing vCenter 5.5 using the Windows option, remember that the administrator@vsphere.local user will have administrator privileges for both the vCenter Single Sign-On Server and vCenter Server. You log in using the administrator@vsphere.local account with the password you defined during the installation of the SSO server: vSphere attaches the permissions along with assigning the role of administrator to the default account administrator@vsphere.local. These privileges are given for both the vCenter Single Sign-On server and the vCenter Server system. You must log in with this account after the installation is complete. After logging in with this account, you can configure your domain as an identity source. You can also give your domain administrator access to vCenter Server. Remember, the installation does not assign any administrator rights to the user account that was used to install vCenter. For additional information, review the Prerequisites for Installing vCenter Single Sign-On, Inventory Service, and vCenter Server document found at https://pubs.vmware.com/vsphere-51/index.jsp?topic=%2Fcom.vmware.vsphere.install.doc%2FGUID-C6AF2766-1AD0-41FD-B591-75D37DDB281F.html. Now that you understand what is going on with the vCenter account, use the following steps to enable the use of your Active Directory account for managing vCenter. Add or verify your AD domain as an identity source using the following procedure: Log in with administrator@vsphere.local. Select Administration from the menu. Choose Configuration under the Single Sign-On option. You will see the Single Sign-On | Configuration option only when you log in using the administrator@vsphere.local account. Select the Identity Sources tab and verify that the AD domain is listed. If not, choose Active Directory (Integrated Windows Authentication) found at the top of the window. Enter your Domain name and click on OK at the bottom of the window. Verify that your domain was added to Identity Sources, as shown in the following screenshot: Add the permissions for the AD account using the following steps: Click on Home at the top left of the window. Select vCenter from the menu options. Select vCenter Servers and then choose the vCenter Server object: Select the Manage tab and then the Permissions tab found in the vCenter Object window. Review the image that follows the steps to verify the process. Click on the green + icon to add permission. Choose the Add button located at the bottom of the window. Select the AD domain found in the drop-down option at the top of the window. Choose a user or group you want to assign permission to (the account named Chuck was selected for this example). Verify that the user or group is selected in the window. Use the drop-down options to choose the level of permissions (verify that Propagate to children is checked). Now, you should be able to log into vCenter with your AD account. See the results of the successful login in the following screenshot: Now, by adding the permissions to the account, you are able to log into vCenter using your AD credentials. The preceding screenshot shows the results of the changes, which is much different than the earlier attempt. Fixing the VPXD must be stopped to perform this operation message It has been mentioned several times in this article that the Virtual Center Service Appliance (VCSA) is the direction VMware is moving in when it comes to managing vCenter. As the number of administrators using it keeps increasing, the number of problems will also increase. One of the components an administrator might have problems with is the Virtual Centre Server service. This service should not be running during any changes to the database or the account settings. However, as with most vSphere components, there are times when something happens and you need to stop or start a service in order to fix the problem. There are times when an administrator who works within the VCSA appliance encounters the following error: This service can be stopped using the web console, by performing the following steps: Log into the console using https://ip-of-vcsa:5480. Enter your username and password: Choose vCenter Server after logging in. Make sure the Summary tab is selected. Click on the Stop button to stop the server: This should work most of the time, but if you find that using the web console is not working, then you need to log into the VCSA appliance directly and use the following procedure to stop the server: Connect to the appliance by using an SSH client such as Putty or mRemote. Type the command chkconfig. This will list all the services and their current status: Verify that vmware-vxpd is on: You can stop the server by using service vmware-vpxd stop command: After completing your work, you can start the server using one of the following methods: Restart the VCSA appliance Use the web console by clicking on the Start button on the vCenter Summary page Type service vmware-vpxd start on the SSH command line This should fix the issues that occur when you see the VPXD must be stopped to perform this operation message. Removing unwanted plugins in vSphere Administrators add and remove tools from their environment based on the needs and also the life of the tool. This is no different for the vSphere environment. As the needs of the administrator change, so does the usage of the plugins used in vSphere. The following section can be used to remove any unwanted plugins from your current vCenter. So, if you have lots of plugins and they are no longer needed, use the follow procedure to remove them: Log into your vCenter using http://vCenter_name or IP_address/mob and enter your username and password: Click on the content link under Properties: Click on ExtensionManager, which is found in the VALUE column: Highlight, right-click, and Copy the extension to be removed. Check out the Knowledge Base 1025360 found at http://Kb.vmware.com/kb/1025360 to get an overview of the plugins and their names. Select UnregisterExtension near the bottom of the page: Right-click on the plugin name and Paste it into the Value field: Click on Invoke Method to remove the plugin: This will give you the Method Invocation Result: void message. This message informs you that the selected plugin has been removed. You can repeat this process for each plugin that you want to remove. Summary In this article, we covered some of the not-so-common challenges an administrator could encounter in the vSphere environment. It provided the troubleshooting along with the solutions to the following issues: Seeing NO objects after logging into vCenter with the account you used to install it How to get past the VPXD must be stopped error when you are performing certain tasks within vCenter Removing the unwanted plugins from vCenter Server Resources for Article: Further resources on this subject: Availability Management [article] The Design Documentation [article] Design, Install, and Configure [article]

0
0
7762

article-image-symmetric-messages-and-asynchronous-messages-part-1

Packt

05 May 2015

31 min read

Symmetric Messages and Asynchronous Messages (Part 1)

Packt

05 May 2015

31 min read

0
0
9638

Packt

05 May 2015

8 min read

Installation and Upgrade

Packt

05 May 2015

8 min read

0
0
1851

AngularJS Web Application Development Cookbook

Mastering Lumion 3D

Learning Selenium Testing Tools with Python

NodeJS: Building a Maintainable Codebase

A Command-line Companion Called Artisan

Getting Started with WebSockets

Controlling the Movement of a Robot with Legs

Introducing PostgreSQL 9

Introduction to Hadoop

Provisioning Docker Containers

Trending Topics

Why Big Data in the Financial Sector?

Preparing our Solution

Solving Some Not-so-common vCenter Issues

Symmetric Messages and Asynchronous Messages (Part 1)

Installation and Upgrade

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access