Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials

7019 Articles
article-image-ble-and-internet-things
Packt
02 Feb 2017
11 min read
Save for later

BLE and the Internet of Things

Packt
02 Feb 2017
11 min read
In this article by Muhammad Usama bin Aftab, the author of the book Building Bluetooth Low Energy (BLE) Systems, this article is a practical guide to the world of Internet of Things (IoT), where readers will not only learn the theoretical concepts of the Internet of Things but also will get a number of practical examples. The purpose of this article is to bridge the gap between the knowledge base and its interpretation. Much literature is available for the understanding of this domain but it is difficult to find something that follows a hands-on approach to the technology. In this article, the readers will get an introduction of Internet of Things with a special focus on Bluetooth Low Energy (BLE). There is no problem justifying that the most important technology for the Internet of Things is Bluetooth Low Energy as it is widely available throughout the world and almost every cell phone user keeps this technology in his pocket. The article will then go beyond Bluetooth Low Energy and will discuss many other technologies available for the Internet of Things. In this article we'll explore the following topics: Introduction to Internet of Things Current statistics about IoT and how we are living in a world which is going towards Machine to Machine (M2M) communication Technologies in IoT (Bluetooth Low Energy, Bluetooth beacons, Bluetooth mesh and wireless gateways and so on) Typical examples of IoT devices (catering wearables, sports gadgets and autonomous vehicles and so on) (For more resources related to this topic, see here.) Internet of Things The Internet is a system of interconnected devices which uses a full stack of protocols over a number of layers. In early 1960, the first packet-switched network ARPANET was introduced by the United States Department of Defense (DOD) which used a variety of protocols. Later, with the invention of TCP/IP protocols the possibilities were infinite. Many standards were evolved over time to facilitate the communication between devices over a network. Application layer protocols, routing layer protocols, access layer protocols, and physical layer protocols were designed to successfully transfer the Internet packets from the source address to the destination address. Security risks were also taken care of during this process and now we live in the world where the Internet is an essential part of our lives. The world had progressed quite afar from ARPANET and the scientific communities had realized that the need of connecting more and more devices was inevitable. Thus came the need of more Internet addresses. The Internet Protocol version 6 (IPv6) was developed to give support to an almost infinite number of devices. It uses 128 bits' address, allowing 2^128 (3.4 e38) devices to successfully transmit packets over the internet. With this powerful addressing mechanism, it was now possible to think beyond the traditional communication over the Internet. The availability of more addresses opened the way to connect more and more devices. Although, there are other limitations in expanding the number of connected devices, addressing scheme opened up significant ways. Modern Day IoT The idea of modern day Internet of Things is not significantly old. In 2013, the perception of the Internet of Things evolved. The reasons being the merger of wireless technologies, increase the range of wireless communication and significant advancement in embedded technology. It was now possible to connect devices, buildings, light bulbs and theoretically any device which has a power source and can be connected wirelessly. The combination of electronics, software, and network connectivity has already shown enough marvels in the computer industry in the last century and Internet of Things is no different. Internet of Things is a network of connected devices that are aware of their surrounding. Those devices are constantly or eventually transferring data to its neighboring devices in order to fulfil certain responsibility. These devices can be automobiles, sensors, lights, solar panels, refrigerators, heart monitoring implants or any day-to-day device. These things have their dedicated software and electronics to support the wireless connectivity. It also implements the protocol stack and the application level programming to achieve the required functionality: An illustration of connected devices in the Internet of Things Real life examples of the Internet of Things Internet of Things is fascinatingly spread in our surroundings and the best way to check it is to go to a shopping mall and turn on your Bluetooth. The devices you will see are merely a drop in the bucket of the Internet of Things. Cars, watches, printers, jackets, cameras, light bulbs, street lights, and other devices that were too simple before are now connected and continuously transferring data. It is to keep in mind that this progress in the Internet of Things is only 3 years old and it is not improbable to expect that the adoption rate of this technology will be something that we have never seen before. Last decade tells us that the increase in the internet users was exponential where it reached the first billion in 2005, second in 2010 and third in 2014. Currently, there are 3.4 billion internet users present in the world. Although this trend looks unrealistic, the adoption rate of the Internet of Things is even more excessive. The reports say that by 2020, there will be 50 billion connected devices in the world and 90 percent of the vehicles will be connected to the Internet. This expansion will bring $19 trillion in profits by the same year. By the end of this year, wearables will become a $6 billion market with 171 million devices sold. As the article suggests, we will discuss different kinds of IoT devices available in the market today. The article will not cover them all, but to an extent where the reader will get an idea about the possibilities in future. The reader will also be able to define and identify the potential candidates for future IoT devices. Wearables The most important and widely recognized form of Internet of Things is wearables. In the traditional definition, wearables can be any item that can be worn. Wearables technology can range from fashion accessories to smart watches. The Apple Watch is a prime example of wearable technology. It contains fitness tracking and health-oriented sensors/apps which work with iOS and other Apple products. A competitor of Apple Watch is Samsung Gear S2 which provides compatibility with Android devices and fitness sensors. Likewise, there are many other manufacturers who are building smart watches including, Motorola, Pebble, Sony, Huawei, Asus, LG and Tag Heuer. These devices are more than just watches as they form a part of the Internet of Things—they can now transfer data, talk to your phone, read your heart rate and connect directly to Wi-Fi. For example, a watch can now keep track of your steps and transfer this information to the cellphone: Fitbit Blaze and Apple Watch The fitness tracker The fitness tracker is another important example of the Internet of Things where the physical activities of an athlete are monitored and maintained. Fitness wearables are not confined to the bands, there are smart shirts that monitor the fitness goals and progress of the athlete. We will discuss two examples of fitness trackers in this article. Fitbit and Athos smart apparel. The Blaze is a new product from Fitbit which resembles a smart watch. Although it resembles a smart watch, it a fitness-first watch targeted at the fitness market. It provides step tracking, sleep monitoring, and 24/7 heart rate monitoring. Some of Fitbit's competitors like Garmin's vívoactive watch provide a built-in GPS capability as well. Athos apparel is another example of a fitness wearable which provides heart rate and EMG sensors. Unlike watch fitness tracker, their sensors are spread across the apparel. The theoretical definition of wearables may include augmented and virtual reality headsets and Bluetooth earphones/headphones in the list. Smart home devices The evolution of the Internet of Things is transforming the way we live our daily lives as people use wearables and other Internet of Things devices in their daily lives. Another growing technology in the field of the Internet of Things is the smart home. Home automation, sometimes referred to as smart homes, results from extending the home by including automated controls to the things like heating, ventilation, lighting, air-conditioning, and security. This concept is fully supported by the Internet of Things which demands the connection of devices in an environment. Although the concept of smart homes has already existed for several decades 1900s, it remained a niche technology that was either too expensive to deploy or with limited capabilities. In the last decade, many smart home devices have been introduced into the market by major technology companies, lowering costs and opening the doors to mass adoption. Amazon Echo A significant development in the world of home automation was the launch of Amazon Echo in late 2014. The Amazon Echo is a voice enabled device that performs tasks just by recognizing voice commands. The device responds to the name Alexa, a key word that can be used to wake up the device and perform an number of tasks. This keyword can be used followed by a command to perform specific tasks. Some basic commands that can be used to fulfil home automation tasks are: Alexa, play some Adele. Alexa, play playlist XYZ. Alexa, turn the bedroom lights on (Bluetooth enabled lights bulbs (for example Philips Hue) should be present in order to fulfil this command). Alexa, turn the heat up to 80 (A connected thermostat should be present to execute this command). Alexa, what is the weather? Alexa, what is my commute? Alexa, play audiobook a Game of Thrones. Alexa, Wikipedia Packt Publishing. Alexa, How many teaspoons are in one cup? Alexa, set a timer for 10 minutes. With these voice commands, Alexa is fully operable: Amazon Echo, Amazon Tap and Amazon Dot (From left to right) Amazon Echo's main connectivity is through Bluetooth and Wi-Fi. Wi-Fi connectivity enables it to connect to the Internet and to other devices present on the network or worldwide. Bluetooth Low Energy, on the other hand, is used to connect to other devices in the home which are Bluetooth Low Energy capable. For example, Philips Hue and Thermostat are controlled through Bluetooth Low Energy. In Google IO 2016, Google announced a competing smart home device that will use Google as a backbone to perform various tasks, similar to Alexa. Google intends to use this device to further increase their presence in the smart home market, challenging Amazon and Alexa. Amazon also launched Amazon Dot and Amazon Tap. Amazon Dot is a smaller version of Echo which does not have speakers. External speakers can be connected to the Dot in order to get full access to Alexa. Amazon Tap is a more affordable, cheaper and wireless version of Amazon Echo. Wireless bulbs The Philips Hue wireless bulb is another example of a smart home device. It is a Bluetooth Low Energy connected light bulb that's give full control to the user through his smartphone. These colored bulbs can display millions of colors and can be also controlled remotely through the away from home feature. The lights are also smart enough to sync with music: Illustration of controlling Philips Hue bulbs with smartphones Smart refrigerators Our discussion of home automation would not be complete incomplete without discussing kitchen and other house electronics, as several major vendors such as Samsung have begun offering smart appliances for a smarter home. The Family Hub refrigerator is a smart fridge that lets you access the Internet and runs applications. It is also categorized in the Internet of Things devices as it is fully connected to the Internet and provides various controls to the users: Samsung Family Hub refrigerator with touch controls Summary In this article we spoke about the Internet of Things technology and how it is rooting in our real lives. The introduction of the Internet of Things discussed wearable devices, autonomous vehicles, smart light bulbs, and portable media streaming devices. Internet of Things technologies like Wireless Local Area Network (WLAN), Mobile Ad-hoc Networks (MANETs) and Zigbee was discussed in order to have a better understanding of the available choices in the IoT. Resources for Article: Further resources on this subject: Get Connected – Bluetooth Basics [article] IoT and Decision Science [article] Building Voice Technology on IoT Projects [article]
Read more
  • 0
  • 0
  • 15404

article-image-introduction-magento-2
Packt
02 Feb 2017
10 min read
Save for later

Introduction to Magento 2

Packt
02 Feb 2017
10 min read
In this article, Gabriel Guarino, the author of the book Magento 2 Beginners Guide discusses, will cover the following topics: Magento as a life style: Magento as a platform and the Magento community Competitors: hosted and self-hosted e-commerce platforms New features in Magento 2 What do you need to get started? (For more resources related to this topic, see here.) Magento as a life style Magento is an open source e-commerce platform. That is the short definition, but I would like to define Magento considering the seven years that I have been part of the Magento ecosystem. In the seven years, Magento has been evolving to the point it is today, a complete solution backed up by people with a passion for e-commerce. If you choose Magento as the platform for your e-commerce website, you will receive updates for the platform on a regular basis. Those updates include new features, improvements, and bug fixes to enhance the overall experience in your website. As a Magento specialist, I can confirm that Magento is a platform that can be customized to fit any requirement. This means that you can add new features, include third-party libraries, and customize the default behavior of Magento. As the saying goes, the only limit is your imagination. Whenever I have to talk about Magento, I always take some time to talk about its community. Sherrie Rohde is the Magento Community Manager and she has shared some really interesting facts about the Magento community in 2016: Delivered over 725 talks on Magento or at Magento-centric events Produced over 100 podcast episodes around Magento Organized and produced conferences and meetup groups in over 34 countries Written over 1000 blog posts about Magento Types of e-commerce solutions There are two types of e-commerce solutions: hosted and self-hosted. We will analyze each e-commerce solution type, and we will cover the general information, pros, and cons of each platform from each category. Self-hosted e-commerce solutions The self-hosted e-commerce solution is a platform that runs on your server, which means that you can download the code, customize it based on your needs, and then implement it in the server that you prefer. Magento is a self-hosted e-commerce solution, which means that you have absolute control on the customization and implementation of your Magento store. WooCommerce WooCommerce is a free shopping cart plugin for WordPress that can be used to create a full-featured e-commerce website. WooCommerce has been created following the same architecture and standards of WordPress, which means that you can customize it with themes and plugins. The plugin currently has more than 18,000,000 downloads, which represents over 39% of all online stores. Pros: It can be downloaded for free Easy setup and configuration A lot of themes available Almost 400 extensions in the marketplace Support through the WooCommerce help desk Cons: WooCommerce cannot be used without WordPress Some essential features are not included out-of-the-box, such us PayPal as a payment method, which means that you need to buy several extensions to add those features Adding custom features to WooCommerce through extensions can be expensive PrestaShop PrestaShop is a free open source e-commerce platform. The platform is currently used by more than 250,000 online stores and is backed by a community of more than 1,000,000 members. The company behind PrestaShop provides a range of paid services, such us technical support, migration, and training to run, manage, and maintain the store. Pros: Free and open source 310 integrated features 3,500 modules and templates in the marketplace Downloaded over 4 million times 63 languages Cons: As WooCommerce, many basic features are not included by default and adding those features through extensions is expensive Multiple bugs and complaints from the PrestaShop community OpenCart OpenCart is an open source platform for e-commerce, available under the GNU General Public License. OpenCart is a good choice for a basic e-commerce website. Pros: Free and open source Easy learning curve More than 13,000 extensions available More than 1,500 themes available Cons: Limited features Not ready for SEO No cache management page in admin panel Hard to customize Hosted e-commerce solutions A hosted e-commerce solution is a platform that runs on the server from the company that provides that service, which means that the solution is easier to set up but there are limitations and you don’t have the freedom to customize the solution according to your needs. The monthly or annual fees increase when the store attracts more traffic and has more customers and orders placed. Shopify Shopify is a cloud-based e-commerce platform for small and medium-sized business. The platform currently powers over 325,000 online stores in approximately 150 countries. Pros: No technical skills required to use the platform Tool to import products from another platform during the sign up process More than 1,500 apps and integrations 24/7 support through phone, chat, and e-mail Cons: The source code is not provided Recurring fee to use the platform Hard to migrate from Shopify to another platform BigCommerce BigCommerce is one of the most popular hosted e-commerce platforms, which is powering more than 95,000 stores in 150 countries. Pros: No technical skills required to use the platform More than 300 apps and integrations available More than 75 themes available Cons: The source code is not provided Recurring fee to use the platform Hard to migrate from BigCommerce to another platform New features in Magento 2 Magento 2 is the new generation of the platform, with new features, technologies, and improvements that make Magento one of the most robust and complete e-commerce solutions available at the moment. In this section, we will describe the main differences between Magento 1 and Magento 2. New technologies Composer: This is a dependency manager for PHP. The dependencies can be declared and Composer will manage these dependencies by installing and updating them. In Magento 2, Composer simplifies the process of installing and upgrading extensions and upgrading Magento. Varnish 4: This is an open source HTTP accelerator. Varnish stores pages and other assets in memory to reduce the response time and network bandwidth consumption. Full Page Caching: In Magento 1, Full Page Caching was only included in the Magento Enterprise Edition. In Magento 2, Full Page Caching is included in all the editions, allowing the content from static pages to be cached, increasing the performance and reducing the server load. Elasticsearch: This is a search engine that improves the search quality in Magento and provides background re-indexing and horizontal scaling. RequireJS: It is a library to load Javascript files on-the-fly, reducing the number of HTTP requests and improving the speed of the Magento Store. jQuery: The frontend in Magento 1 was implemented using Prototype as the language for Javascript. In Magento 2, the language for Javascript code is jQuery. Knockout.js: This is an open source Javascript library that implements the Model-View-ViewModel (MVVM) pattern, providing a great way of creating interactive frontend components. LESS: This is an open source CSS preprocessor that allows the developer to write styles for the store in a more maintainable and extendable way. Magento UI Library: This is a modular frontend library that uses a set of mix-ins for general elements and allows developers to work more efficiently on frontend tasks. New tools Magento Performance Toolkit: This is a tool that allows merchants and developers to test the performance of the Magento installation and customizations. Magento 2 command-line tool: This is a tool to run a set of commands in the Magento installation to clear the cache, re-index the store, create database backups, enable maintenance mode, and more. Data Migration Tool: This tool allows developers to migrate the existing data from Magento 1.x to Magento 2. The tool includes verification, progress tracking, logging, and testing functions. Code Migration Toolkit: This allows developers to migrate Magento 1.x extensions and customizations to Magento 2. Manual verification and updates are required in order to make the Magento 1.x extensions compatible with Magento 2. Magento 2 Developer Documentation: One of the complaints by the Magento community was that Magento 1 didn’t have enough documentation for developers. In order to resolve this problem, the Magento team created the official Magento 2 Developer Documentation with information for developers, system administrators, designers, and QA specialists. Admin panel changes Better UI: The admin panel has a new look-and-feel, which is more intuitive and easier to use. In addition to that, the admin panel is now responsive and can be viewed from any device in any resolution. Inline editing: The admin panel grids allow inline editing to manage data in a more effective way. Step-by-step product creation: The product add/edit page is one of the most important pages in the admin panel. The Magento team worked hard to create a different experience when it comes to adding/editing products in the Magento admin panel, and the result is that you can manage products with a step-by-step page that includes the fields and import tools separated in different sections. Frontend changes Integrated video in product page: Magento 2 allows uploading a video for the product, introducing a new way of displaying products in the catalog. Simplified checkout: The steps in the checkout page have been reduced to allow customers to place orders in less time, increasing the conversion rate of the Magento store. Register section removed from checkout page: In Magento 1, the customer had the opportunity to register from step 1 of the checkout page. This required the customer to think about his account and the password before completing the order. In order to make the checkout simpler, Magento 2 allows the customer to register from the order success page without delaying the checkout process. What do you need to get started? Magento is a really powerful platform and there is always something new to learn. Just when you think you know everything about Magento, a new version is released with new features to discover. This makes Magento fun, and this makes Magento unique as an e-commerce platform. That being said, this book will be your guide to discover everything you need to know to implement, manage, and maintain your first Magento store. In addition to that, I would like to highlight additional resources that will be useful in your journey of mastering Magento: Official Magento Blog (https://magento.com/blog): Get the latest news from the Magento team: best practices, customer stories, information related to events, and general Magento news Magento Resources Library (https://magento.com/resources): Videos, webinars and publications covering useful information organized by categories: order management, marketing and merchandising, international expansion, customer experience, mobile architecture and technology, performance and scalability, security, payments and fraud, retail innovation, and business flexibility Magento Release Information (http://devdocs.magento.com/guides/v2.1/release-notes/bk-release-notes.html): This is the place where you will get all the information about the latest Magento releases, including the highlights of each release, security enhancements, information about known issues, new features, and instructions for upgrade Magento Security Center (https://magento.com/security): Information about each of the Magento security patches as well as best practices and guidelines to keep your Magento store secure Upcoming Events and Webinars (https://magento.com/events): The official list of upcoming Magento events, including live events and webinars Official Magento Forums (https://community.magento.com): Get feedback from the Magento community in the official Magento Forums Summary In this article, we reviewed Magento 2 and the changes that have been introduced in the new version of the platform. We also analyzed the types of e-commerce solutions and the most important platforms available. Resources for Article: Further resources on this subject: Installing Magento [article] Magento : Payment and shipping method [article] Magento 2 – the New E-commerce Era [article]
Read more
  • 0
  • 0
  • 36770

article-image-what-azure-api-management
Packt
01 Feb 2017
15 min read
Save for later

What is Azure API Management?

Packt
01 Feb 2017
15 min read
In this article by Martin Abbott, Ashish Bhambhani, James Corbould, Gautam Gyanendra, Abhishek Kumar, and Mahindra Morar, authors of the book Robust Cloud Integration with Azure, we learn that it is important to know how to control and manage API assets that exist or are built as part of any enterprise development. (For more resources related to this topic, see here.) Typically, modern APIs are used to achieve one of the following two outcomes: First, to expose the on-premises line of business applications, such as Customer Relationship Management (CRM) or Enterprise Resource Planning (ERP) solutions to other applications that need to consume and interact with these enterprise assets both on-premises and in the cloud Second, to provide access to the API for commercial purposes to monetize access to the assets exposed by the API The latter use case is important as it allows organizations to extend the use of their API investment, and it has led to what has become known as the API economy. The API economy provides a mechanism to gain additional value from data contained within the organizational boundary whether that data exists in the cloud or on-premises. When providing access to information via an API, two considerations are important: Compliance: This ensures that an access to the API and the use of the API meets requirements around internal or legal policies and procedures, and it provides reporting and auditing information Governance: This ensures the API is accessed and used only by those authorized to do so, and in a way, that is controlled and if necessary metered, and provides reporting and auditing information, which can be used, for example, to provide usage information for billing In order to achieve this at scale in an organization, a tool is required that can be used to apply both compliance and governance structures to an exposed endpoint. This is required to ensure that the usage of the information behind that endpoint is limited only to those who should be allowed access and only in a way that meets the requirements and policies of the organization. This is where API Management plays a significant role. There are two main types of tools that fit within the landscape that broadly fall under the banner of API Management: API Management: These tools provide the compliance and governance control required to ensure that the exposed API is used appropriately and data presented in the correct format. For example, a message may be received in the XML format, but the consuming service may need the data in the JSON format. They can also provide monitoring tools and access control that allows organizations to gain insight into the use of the API, perhaps with the view to charge a fee for access. API Gateway: These tools provide the same or similar level of management as normal API Management tools, but often include other functionality that allows some message mediation and message orchestration thereby allowing more complex interactions and business processes to be modeled, exposed, and governed. Microsoft Azure API Management falls under the first category above whilst Logic Apps, provide the capabilities (and more) that API Gateways offer. Another important aspect of providing management of APIs is creating documentation that can be used by consumers, so they know how to interact with and get the best out of the API. For APIs, generally, it is not a case of build it and they will come, so some form of documentation that includes endpoint and operation information, along with sample code, can lead to greater uptake of usage of the API. Azure API Management is currently offered in three tiers: Developer, Standard, and Premium. The details associated with these tiers at the time of writing are shown in the following table:   Developer Standard Premium API Calls (per unit) 32 K / day ( ~1 M / month ) 7 M / day ( ~217 M / month ) 32 M / day ( ~1 B / month ) Data Transfer (per unit) 161 MB / day ( ~5 GB / month ) 32 GB / day ( ~1 TB / month ) 161 GB / day ( ~5 TB / month ) Cache 10 MB 1 GB 5 GB Scale-out N/A 4 units Unlimited SLA N/A 99.9% 99.95% Multi-Region Deployment N/A N/A Yes Azure Active Directory Integration Unlimited user accounts N/A Unlimited user accounts VPN Yes N/A Yes Key items of note in the table are Scale-out, multiregion deployment, and Azure Active Directory Integration. Scale-out: This defines how many instances, or units, of the API instance are possible; this is configured through the Azure Classic Portal Multi-region deployment: When using Premium tier, it is possible to deploy the API Management instance to many locations to provided geographically distributed load Azure Active Directory Integration: If an organization synchronizes an on-premises Active Directory domain to Azure, access to the API endpoints can be configured to use Azure Active Directory to provide same sign-on capabilities The main use case for Premium tier is if an organization has many hundreds or even thousands of APIs they want to expose to developers, or in cases where scale and integration with line of business APIs is critical. The anatomy of Azure API Management To understand how to get the best out of an API, it is important to understand some terms that are used for APIs and within Azure API Management, and these are described here. API and operations An API provides an abstraction layer through an endpoint that allows interaction with entities or processes that would otherwise be difficult to consume. Most API developers favor using a RESTful approach to API applications since this allows us easy understanding on how to work with the operations that the API exposes and provides scalability, modifiability, reliability, and performance. Representational State Transfer (REST) is an architectural style that was introduced by Roy Fielding in his doctoral thesis in 2000 (http://www.ics.uci.edu/~fielding/pubs/dissertation/rest_arch_style.htm). Typically, modern APIs are exposed using HTTP since this makes it easier for different types of clients to interact with it, and this increased interoperability provides the greatest opportunity to offer additional value and greater adoption across different technology stacks. When building an API, a set of methods or operations is exposed that a user can interact with in a predictable way. While RESTful services do not have to use HTTP as a transfer method, nearly all modern APIs do, since the HTTP standard is well known to most developers, and it is simple and straightforward to use. Since the operations are called via HTTP, a distinct endpoint or Unified Resource Identifier (URI) is required to ensure sufficient modularity of the API service. When calling an endpoint, which may for example represent, an entity in a line of business system, HTTP verbs (GET, POST, PUT, and DELETE, for example) are used to provide a standard way of interacting with the object. An example of how these verbs are used by a developer to interact with an entity is given in the following table: TYPE GET POST PUT DELETE Collection Retrieve a list of entities and their URIs Create a new entity in the collection Replace (update) a collection Delete the entire collection Entity Retrieve a specific entity and its information usually in a particular data format Create a new entity in the collection, not generally used Replace (update) an entity in the collection, or if it does not exist, create it Delete a specific entity from a collection When passing data to and receiving data from an API operation, the data needs to be encapsulated in a specific format. When services and entities were exposed through SOAP-based services, this data format was typically XML. For modern APIs, JavaScript Object Notation (JSON) has become the norm. JSON has become the format of choice since it has a smaller payload than XML and a smaller processing overhead, which suits the limited needs of mobile devices (often running on battery power). JavaScript (as the acronym JSON implies) also has good support for processing and generating JSON, and this suits developers, who can leverage existing toolsets and knowledge. API operations should abstract small amounts of work to be efficient, and in order to provide scalability, they should be stateless, and they can be scaled independently. Furthermore, PUT and DELETE operations must be created that ensure consistent state regardless of how many times the specific operation is performed, this leads to the need of those operations being idempotent. Idempotency describes an operation that when performed multiple times produces the same result on the object that is being operated on. This is an important concept in computing, particularly, where you cannot guarantee that an operation will only be performed once, such as with interactions over the Internet. Another outcome of using a URI to expose entities is that the operation is easily modified and versioned because any new version can simply be made available on a different URI, and because HTTP is used as a transport mechanism, endpoint calls can be cached to provide better performance and HTTP Headers can be used to provide additional information, for example security. By default, when an instance of API Management is provisioned, it has a single API already available named Echo API. This has the following operations: Creating resource Modifying resource Removing resource Retrieving header only Retrieving resource Retrieving resource (cached) In order to get some understanding of how objects are connected, this API can be used, and some information is given in the next section. Objects within API Management Within Azure API Management, there are a number of key objects that help define a structure and provide the governance, compliance, and security artifacts required to get the best out of a deployed API, as shown in the following diagram: As can be seen, the most important object is a PRODUCT. A product has a title and description and is used to define a set of APIs that are exposed to developers for consumption. They can be Open or Protected, with an Open product being publicly available and a Protected product requiring a subscription once published. Groups provide a mechanism to organize the visibility of and access to the APIs within a product to the development community wishing to consume the exposed APIs. By default, a product has three standard groups that cannot be deleted: Administrators: Subscription administrators are included by default, and the members of this group manage API services instances, API creation, API policies, operations, and products Developers: The members of this group have authenticated access to the Developer Portal; they are the developers who have chosen to build applications that consume APIs exposed as a specific product Guests: Guests are able to browse products through the Developer Portal and examine documentation, and they have read-only access to information about the products In addition to these built-in groups, it is possible to create new groups as required, including the use of groups within an Azure Active Directory tenant. When a new instance of API Management is provisioned, it has the following two products already configured: Starter: This product limits subscribers to a maximum of five calls per minute up to a maximum of 100 calls per week Unlimited: This product has no limits on use, but subscribers can only use it with the administrator approval Both of these products are protected, meaning that they need to be subscribed to and published. They can be used to help gain some understanding of how the objects within API Management interact. These products are configured with a number of sample policies that can be used to provide a starting point. Azure API Management policies API Management policies are the mechanism used to provide governance structures around the API. They can define, for instance, the number of call requests allowed within a period, cross-origin resource sharing (CORS), or certificate authentication to a service backend. Policies are defined using XML and can be stored in source control to provide active management. Policies are discussed in greater detail later in the article. Working with Azure API Management Azure API Management is the outcome of the acquisition by Microsoft of Apiphany, and as such it has its own management interfaces. Therefore, it has a slightly different look and feel to the standard Azure Portal content. The Developer and Publisher Portals are described in detail in this section, but first a new instance of API Management is required. Once created and the provisioning in the Azure infrastructure can take some time, most interactions take place through the Developer and Publisher Portals. Policies in Azure API Management In order to provide control over interactions with Products or APIs in Azure API Management, policies are used. Policies make it possible to change the default behavior of an API in the Product, for example, to meet the governance needs of your company or Product, and are a series of statements executed sequentially on each request or response of an API. Three demo scenarios will provide a "taster" of this powerful feature of Azure API Management. How to use Policies in Azure API Management Policies are created and managed through the Publisher Portal. The first step in policy creation is to determine at what scope the policy should be applied. Policies can be assigned to all Products, individual Products, the individual APIs associated with a Product, and finally the individual operations associated with an API. Secure your API in Azure API Management We have previously discussed how it is possible to organize APIs in Products with those products further refined through the use of Policies. Access to and visibility of products is controlled through the use of Groups and developer subscriptions for those APIs requiring subscriptions. In most enterprise scenarios where you are providing access to some line of business system on-premises, it is necessary to provide sufficient security on the API endpoint to ensure that the solution remains compliant. There are a number of ways to achieve this level of security using Azure API Management, such as using certificates, Azure Active Directory, or extending the corporate network into Microsoft Azure using a Virtual Private Network (VPN), and creating a hybrid cloud solution. Securing your API backend with Mutual Certificates Certificate exchange allows Azure API Management and an API to create a trust boundary based on encryption that is well understood and easy to use. In this scenario, because Azure API Management is communicating with an API that has been provided, a self-signed certificate is allowed as the key exchange for the certificate is via a trusted party. For an in-depth discussion on how to configure Mutual Certificate authentication to secure your API, please refer to the Azure API Management documentation (https://azure.microsoft.com/en-us/documentation/articles/api-management-howto-mutual-certificates/). Securing your API backend with Azure Active Directory If an enterprise already uses Azure Active Directory to provide single or same sign-on to cloud-based services, for instance, on-premises Active Directory synchronization via ADConnect or DirSync, then this provides a good opportunity to leverage Azure Active Directory to provide a security and trust boundary to on-premises API solutions. For an in-depth discussion on how to add Azure Active Directory to an API Management instance, please see the Azure API Management documentation (https://azure.microsoft.com/en-us/documentation/articles/api-management-howto-protect-backend-with-aad/). VPN connection in Azure API Management Another way of providing a security boundary between Azure API Management and the API is managing the creation of a virtual private network. A VPN creates a tunnel between the corporate network edge and Azure, essentially creating a hybrid cloud solution. Azure API Management supports site-to-site VPNs, and these are created using the Classic Portal. If an organization already has an ExpressRoute circuit provisioned, this can also be used to provide connectivity via private peering. Because a VPN needs to communicate to on-premises assets, a number of firewall port exclusions need to be created to ensure the traffic can flow between the Azure API Management instance and the API endpoint. Monitoring your API Any application tool is only as good as the insight you can gain from the operation of the tool. Azure API Management is no exception and provides a number of ways of getting information about how the APIs are being used and are performing. Summary API Management can be used to provide developer access to key information in your organization, information that could be sensitive, or that needs to be limited in use. Through the use of Products, Policies, and Security, it is possible to ensure that firm control is maintained over the API estate. The developer experience can be tailored to provide a virtual storefront to any APIs along with information and blogs to help drive deeper developer engagement. Although not discussed in this article, it is also possible for developers to publish their own applications to the API Management instance for other developers to use. Resources for Article: Further resources on this subject: Creating Multitenant Applications in Azure [article] Building A Recommendation System with Azure [article] Putting Your Database at the Heart of Azure Solutions [article]
Read more
  • 0
  • 0
  • 28974

article-image-building-search-geo-locator-elasticsearch-and-spark
Packt
31 Jan 2017
12 min read
Save for later

Building A Search Geo Locator with Elasticsearch and Spark

Packt
31 Jan 2017
12 min read
In this article, Alberto Paro, the author of the book Elasticsearch 5.x Cookbook - Third Edition discusses how to use and manage Elasticsearch covering topics as installation/setup, mapping management, indices management, queries, aggregations/analytics, scripting, building custom plugins, and integration with Python, Java, Scala and some big data tools such as Apache Spark and Apache Pig. (For more resources related to this topic, see here.) Background Elasticsearch is a common answer for every needs of search on data and with its aggregation framework, it can provides analytics in real-time. Elasticsearch was one of the first software that was able to bring the search in BigData world. It’s cloud native design, JSON as standard format for both data and search, and its HTTP based approach are only the solid bases of this product. Elasticsearch solves a growing list of search, log analysis, and analytics challenges across virtually every industry. It’s used by big companies such as Linkedin, Wikipedia, Cisco, Ebay, Facebook, and many others (source https://www.elastic.co/use-cases). In this article, we will show how to easily build a simple search geolocator with Elasticsearch using Apache Spark for ingestion. Objective In this article, they will develop a search geolocator application using the world geonames database. To make this happen the following steps will be covered: Data collection Optimized Index creation Ingestion via Apache Spark Searching for a location name Searching for a city given a location position Executing some analytics on the dataset. All the article code is available on GitHub at https://github.com/aparo/elasticsearch-geonames-locator. All the below commands need to be executed in the code directory on Linux/MacOS X. The requirements are a local Elasticsearch Server instance, a working local Spark installation and SBT installed (http://www.scala-sbt.org/) . Data collection To populate our application we need a database of geo locations. One of the most famous and used dataset is the GeoNames geographical database, that is available for download free of charge under a creative commons attribution license. It contains over 10 million geographical names and consists of over 9 million unique features whereof 2.8 million populated places and 5.5 million alternate names. It can be easily downloaded from http://download.geonames.org/export/dump. The dump directory provided CSV divided in counties and but in our case we’ll take the dump with all the countries allCountries.zip file To download the code we can use wget via: wget http://download.geonames.org/export/dump/allCountries.zip Then we need to unzip it and put in downloads folder: unzip allCountries.zip mv allCountries.txt downloads The Geoname dump has the following fields: No. Attribute name Explanation 1 geonameid Unique ID for this geoname 2 name The name of the geoname 3 asciiname ASCII representation of the name 4 alternatenames Other forms of this name. Generally in several languages 5 latitude Latitude in decimal degrees of the Geoname 6 longitude Longitude in decimal degrees of the Geoname 7 fclass Feature class see http://www.geonames.org/export/codes.html 8 fcode Feature code see http://www.geonames.org/export/codes.html 9 country ISO-3166 2-letter country code 10 cc2 Alternate country codes, comma separated, ISO-3166 2-letter country code 11 admin1 Fipscode (subject to change to iso code 12 admin2 Code for the second administrative division, a county in the US 13 admin3 Code for third level administrative division 14 admin4 Code for fourth level administrative division 15 population The population of Geoname 16 elevation The elevation in meters of Geoname 17 gtopo30 Digital elevation model 18 timezone The timezone of Geoname 19 moddate The date of last change of this Geoname Table 1: Dataset characteristics Optimized Index creation Elasticsearch provides automatic schema inference for your data, but the inferred schema is not the best possible. Often you need to tune it for: Removing not-required fields Managing Geo fields. Optimizing string fields that are index twice in their tokenized and keyword version. Given the Geoname dataset, we will add a new field location that is a GeoPoint that we will use in geo searches. Another important optimization for indexing, it’s define the correct number of shards. In this case we have only 11M records, so using only 2 shards is enough. The settings for creating our optimized index with mapping and shards is the following one: { "mappings": { "geoname": { "properties": { "admin1": { "type": "keyword", "ignore_above": 256 }, "admin2": { "type": "keyword", "ignore_above": 256 }, "admin3": { "type": "keyword", "ignore_above": 256 }, "admin4": { "type": "keyword", "ignore_above": 256 }, "alternatenames": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "asciiname": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "cc2": { "type": "keyword", "ignore_above": 256 }, "country": { "type": "keyword", "ignore_above": 256 }, "elevation": { "type": "long" }, "fclass": { "type": "keyword", "ignore_above": 256 }, "fcode": { "type": "keyword", "ignore_above": 256 }, "geonameid": { "type": "long" }, "gtopo30": { "type": "long" }, "latitude": { "type": "float" }, "location": { "type": "geo_point" }, "longitude": { "type": "float" }, "moddate": { "type": "date" }, "name": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "population": { "type": "long" }, "timezone": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } } } } }, "settings": { "index": { "number_of_shards": "2", "number_of_replicas": "1" } } } We can store the above JSON in a file called settings.json and we can create an index via the curl command: curl -XPUT http://localhost:9200/geonames -d @settings.json Now our index is created and ready to receive our documents. Ingestion via Apache Spark Apache Spark is very hardy for processing CSV and manipulate the data before saving it in a storage both disk or NoSQL. Elasticsearch provides easy integration with Apache Spark allowing write Spark RDD with a single command in Elasticsearch. We will build a spark job called GeonameIngester that will execute the following steps: Initialize the Spark Job Parse the CSV Defining our required structures and conversions Populating our classes Writing the RDD in Elasticsearch Executing the Spark Job Initialize the Spark Job We need to import required classes: import org.apache.spark.sql.SparkSession import org.apache.spark.sql.types._ import org.elasticsearch.spark.rdd.EsSpark import scala.util.Try We define the GeonameIngester object and the SparkSession: object GeonameIngester { def main(args: Array[String]) { val sparkSession = SparkSession.builder .master("local") .appName("GeonameIngester") .getOrCreate() To easy serialize complex datatypes, we switch to use the Kryo encoder: import scala.reflect.ClassTag implicit def kryoEncoder[A](implicit ct: ClassTag[A]) = org.apache.spark.sql.Encoders.kryo[A](ct) import sparkSession.implicits._ Parse the CSV For parsing the CSV, we need to define the Geoname schema to be used to read: val geonameSchema = StructType(Array( StructField("geonameid", IntegerType, false), StructField("name", StringType, false), StructField("asciiname", StringType, true), StructField("alternatenames", StringType, true), StructField("latitude", FloatType, true), StructField("longitude", FloatType, true), StructField("fclass", StringType, true), StructField("fcode", StringType, true), StructField("country", StringType, true), StructField("cc2", StringType, true), StructField("admin1", StringType, true), StructField("admin2", StringType, true), StructField("admin3", StringType, true), StructField("admin4", StringType, true), StructField("population", DoubleType, true), // Asia population overflows Integer StructField("elevation", IntegerType, true), StructField("gtopo30", IntegerType, true), StructField("timezone", StringType, true), StructField("moddate", DateType, true))) Now we can read all the geonames from CSV via: val GEONAME_PATH = "downloads/allCountries.txt" val geonames = sparkSession.sqlContext.read .option("header", false) .option("quote", "") .option("delimiter", "t") .option("maxColumns", 22) .schema(geonameSchema) .csv(GEONAME_PATH) .cache() Defining our required structures and conversions The plain CSV data is not suitable for our advanced requirements, so we define new classes to store our Geoname data. We define a GeoPoint object to store the Geo Point location of our geoname. case class GeoPoint(lat: Double, lon: Double) We define also our Geoname class with optional and list types: case class Geoname(geonameid: Int, name: String, asciiname: String, alternatenames: List[String], latitude: Float, longitude: Float, location: GeoPoint, fclass: String, fcode: String, country: String, cc2: String, admin1: Option[String], admin2: Option[String], admin3: Option[String], admin4: Option[String], population: Double, elevation: Int, gtopo30: Int, timezone: String, moddate: String) To reduce the boilerplate of the conversion we define an implicit method that convert a String in an Option[String] if it is empty or null. implicit def emptyToOption(value: String): Option[String] = { if (value == null) return None val clean = value.trim if (clean.isEmpty) { None } else { Some(clean) } } During processing, in case of the population value is null we need a function to fix this value and set it to 0: to do this we define a function to fixNullInt: def fixNullInt(value: Any): Int = { if (value == null) 0 else { Try(value.asInstanceOf[Int]).toOption.getOrElse(0) } } Populating our classes We can populate the records that we need to store in Elasticsearch via a map on geonames DataFrame. val records = geonames.map { row => val id = row.getInt(0) val lat = row.getFloat(4) val lon = row.getFloat(5) Geoname(id, row.getString(1), row.getString(2), Option(row.getString(3)).map(_.split(",").map(_.trim).filterNot(_.isEmpty).toList).getOrElse(Nil), lat, lon, GeoPoint(lat, lon), row.getString(6), row.getString(7), row.getString(8), row.getString(9), row.getString(10), row.getString(11), row.getString(12), row.getString(13), row.getDouble(14), fixNullInt(row.get(15)), row.getInt(16), row.getString(17), row.getDate(18).toString ) } Writing the RDD in Elasticsearch The final step is to store our new build DataFrame records in Elasticsearch via: EsSpark.saveToEs(records.toJavaRDD, "geonames/geoname", Map("es.mapping.id" -> "geonameid")) The value “geonames/geoname” are the index/type to be used for store the records in Elasticsearch. To maintain the same ID of the geonames in both CSV and Elasticsearch we pass an additional parameter es.mapping.id that refers to where find the id to be used in Elasticsearch geonameid in the above example. Executing the Spark Job To execute a Spark job you need to build a Jar with all the required library and than to execute it on spark. The first step is done via sbt assembly command that will generate a fatJar with only the required libraries. To submit the Spark Job in the jar, we can use the spark-submit command: spark-submit --class GeonameIngester target/scala-2.11/elasticsearch-geonames-locator-assembly-1.0.jar Now you need to wait (about 20 minutes on my machine) that Spark will send all the documents to Elasticsearch and that they are indexed. Searching for a location name After having indexed all the geonames, you can search for them. In case we want search for Moscow, we need a complex query because: City in geonames are entities with fclass=”P” We want skip not populated cities We sort by population descendent to have first the most populated The city name can be in name, alternatenames or asciiname field To achieve this kind of query in Elasticsearch we can use a simple Boolean with several should queries for match the names and some filter to filter out unwanted results. We can execute it via curl via: curl -XPOST 'http://localhost:9200/geonames/geoname/_search' -d '{ "query": { "bool": { "minimum_should_match": 1, "should": [ { "term": { "name": "moscow"}}, { "term": { "alternatenames": "moscow"}}, { "term": { "asciiname": "moscow" }} ], "filter": [ { "term": { "fclass": "P" }}, { "range": { "population": {"gt": 0}}} ] } }, "sort": [ { "population": { "order": "desc"}}] }' We used “moscow” lowercase because it’s the standard token generate for a tokenized string (Elasticsearch text type). The result will be similar to this one: { "took": 14, "timed_out": false, "_shards": { "total": 2, "successful": 2, "failed": 0 }, "hits": { "total": 9, "max_score": null, "hits": [ { "_index": "geonames", "_type": "geoname", "_id": "524901", "_score": null, "_source": { "name": "Moscow", "location": { "lat": 55.752220153808594, "lon": 37.61555862426758 }, "latitude": 55.75222, "population": 10381222, "moddate": "2016-04-13", "timezone": "Europe/Moscow", "alternatenames": [ "Gorad Maskva", "MOW", "Maeskuy", .... ], "country": "RU", "admin1": "48", "longitude": 37.61556, "admin3": null, "gtopo30": 144, "asciiname": "Moscow", "admin4": null, "elevation": 0, "admin2": null, "fcode": "PPLC", "fclass": "P", "geonameid": 524901, "cc2": null }, "sort": [ 10381222 ] }, Searching for cities given a location position We have processed the geoname so that in Elasticsearch, we were able to have a GeoPoint field. Elasticsearch GeoPoint field allows to enable search for a lot of geolocation queries. One of the most common search is to find cities near me via a Geo Distance Query. This can be achieved modifying the above search in curl -XPOST 'http://localhost:9200/geonames/geoname/_search' -d '{ "query": { "bool": { "filter": [ { "geo_distance" : { "distance" : "100km", "location" : { "lat" : 55.7522201, "lon" : 36.6155586 } } }, { "term": { "fclass": "P" }}, { "range": { "population": {"gt": 0}}} ] } }, "sort": [ { "population": { "order": "desc"}}] }' Executing an analytic on the dataset. Having indexed all the geonames, we can check the completes of our dataset and executing analytics on them. For example, it’s useful to check how many geonames there are for a single country and the feature class for every single top country to evaluate their distribution. This can be easily achieved using an Elasticsearch aggregation in a single query: curl -XPOST 'http://localhost:9200/geonames/geoname/_search' -d ' { "size": 0, "aggs": { "geoname_by_country": { "terms": { "field": "country", "size": 5 }, "aggs": { "feature_by_country": { "terms": { "field": "fclass", "size": 5 } } } } } }’ The result can be will be something similar: { "took": 477, "timed_out": false, "_shards": { "total": 2, "successful": 2, "failed": 0 }, "hits": { "total": 11301974, "max_score": 0, "hits": [ ] }, "aggregations": { "geoname_by_country": { "doc_count_error_upper_bound": 113415, "sum_other_doc_count": 6787106, "buckets": [ { "key": "US", "doc_count": 2229464, "feature_by_country": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 82076, "buckets": [ { "key": "S", "doc_count": 1140332 }, { "key": "H", "doc_count": 506875 }, { "key": "T", "doc_count": 225276 }, { "key": "P", "doc_count": 192697 }, { "key": "L", "doc_count": 79544 } ] } },…truncated… These are simple examples how to easy index and search data with Elasticsearch. Integrating Elasticsearch with Apache Spark it’s very trivial: the core of part is to design your index and your data model to efficiently use it. After having correct indexed your data to cover your use case, Elasticsearch is able to provides your result or analytics in few microseconds. Summary In this article, we learned how to easily build a simple search geolocator with Elasticsearch using Apache Spark for ingestion. Resources for Article: Further resources on this subject: Basic Operations of Elasticsearch [article] Extending ElasticSearch with Scripting [article] Integrating Elasticsearch with the Hadoop ecosystem [article]
Read more
  • 0
  • 0
  • 6282

article-image-internet-things-technologies
Packt
30 Jan 2017
9 min read
Save for later

Internet of Things Technologies

Packt
30 Jan 2017
9 min read
In this article by Rubén Oliva Ramos author of the book Internet of Things Programming with JavaScript we will understand different platform for the use of Internet of Things, and how to install Raspbian on micro SD card. (For more resources related to this topic, see here.) Technology has played a huge role in increasing efficiency in the work place, improving living conditions at home, monitoring health and environmental conditions and saving energy and natural resources. This has been made possible through continuous development of sensing and actuation devices. Due to the huge data handling requirements of these devices, the need for a more sophisticated, yet versatile data handling and storage medium, such as the Internet, has arisen. That’s why many developers are adopting the different Internet of Things platforms for prototyping that are available. There are several different prototyping platforms that one can use to add internet connectivity to his or her prototypes. It is important for a developer to understand the capabilities and limitations of the different platforms if he or she wants to make the best choice. So, here is a quick look at each platform. Arduino Arduino is a popular open-source hardware prototyping platform that is used in many devices. It comprises several boards including the UNO, Mega, YUN and DUE among several others. However, out of all the boards, only the Arduino Yun has built-in capability to connect to Wi-Fi and LAN networks. The rest of the boards rely on the shields, such as the Wi-Fi shield and Ethernet shield, to connect to the Internet. The official Arduino Ethernet shield allows you to connect to a network using an RJ45 cable. It has a Wiznet W5100 Ethernet chip that provides a network IP stack that is compatible with UDP and TCP. Using the Ethernet library provided by Arduino you will be able to connect to your network in a few simple steps. If you are thinking of connecting to the Internet wirelessly, you could consider getting the Arduino Wi-Fi shield. It is based on the HDG204 wireless LAN 802.11b/g System in-Package and has an AT32UC3 that provides a network IP stack that is compatible with UDP and TCP. Using this shield and the Wi-Fi library provided by Arduino, you can quickly connect to your prototypes to the web. The Arduino Yun is a hybrid board that has both Ethernet and Wi-Fi connectivity. It is based on the ATmega32u4 and the Atheros AR9331 which supports OpenWrt-Yun. The Atheros processor handles the WiFi and Ethernet interfaces while the ATmega32u4 handles the USB communication. So, if you are looking for a more versatile Arduino board for an Internet of Things project, this is it. There are several advantages to using the Arduino platform for internet of things projects. For starters the Arduino platform is easy to use and has a huge community that you can rely on for technical support. It also is easy to create a prototype using Arduino, since you can design a PCB based on the boards. Moreover, apart from Arduino official shields, the Arduino platform can also work with third party Wi-Fi and Ethernet shields such as the WiFi shield. Therefore, your options are limitless. On the down side, all Arduino boards, apart from Yun, need an external module so as to connect to the internet. So, you have to invest more. In addition to this, there are very many available shields that are compatible with the Arduino. This makes it difficult for you to choose. Also, you still need to choose the right Internet of Things platform for your project, such as Xively or EasyIoT. Raspberry Pi The Raspberry Pi is an open source prototyping platform that features credit-card sized computer boards. The boards have USB ports for a keyboard and a mouse, a HDMI port for display, an Ethernet port for network connection and an SD card to store the operating system. There are several versions of the Raspberry Pi available in the market. They include the Raspberry Pi 1 A, B, A+ and B+ and the Raspberry Pi 2 B. When using a Raspberry Pi board you can connect to the Internet either wirelessly or via an Ethernet cable. Raspberry Pi boards, except version A and A+, have an Ethernet port where you can connect an Ethernet cable. Normally, the boards gain internet connection immediately after you connect the Ethernet cable. However, your router must be configured for Dynamic Host Configuration Protocol (DHCP) for this to happen. Otherwise, you will have to set the IP address of the Raspberry Pi manually and the restart it. To connect your Raspberry Pi to the internet wirelessly, you have to use a WiFi adapter, preferably one that supports the RTL8192cu chipset. This is because Raspbian and Ocidentalis distributions have built-in support for that chip. However, there is no need to be choosy. Almost all Wi-Fi adapters in the market, including the very low cost budget adapters will work without any trouble. Using Raspberry Pi boards for IoT projects is advantageous because you don’t need extra shields or hardware to connect to the internet. Moreover, connecting to your wireless or LAN network happens automatically, so long as the router that has DHCP configured (most routers do). Also, you don’t have to worry if you are a newbie, since there is a huge Raspberry Pi community. You can get help quickly. The disadvantage of using the Raspberry Pi platform to make IoT devices is that it is not easy to use. It would take tremendous time for newbies to learn how to set up everything and code apps. Another demerit is that the Raspberry Pi boards cannot be easily integrated into a product. There are also numerous operating systems that the Pi boards can run on and it is not easy to decide on the best operating system for the device you are creating. Which Platform to Choose? The different features that come with each platform make it ideal for certain applications, but not all. Therefore, if at all you have not yet made up your mind on the platform to use, try selecting them based on usage. The Raspberry Pi is ideal for IoT applications that are server based. This is because it has high storage space and RAM and a powerful processor. Moreover, it supports many programming languages that can create Server-Side apps, such as Node.js. You can also use the Raspberry Pi in instances where you want to access web pages and view data posted on online servers, since you can connect a display to it and view the pages on the web browser. The Raspberry Pi can connect to both LAN and Wi-Fi networks. However, it is not advisable to use the raspberry Pi for projects where you would want to integrate it into a finished product or create a custom made PCB. On the other hand, Arduino comes in handy as a client. It can log data on an online server and retrieve data from the server. It is ideal for logging sensor data and controlling actuators via commands posted on the server by another client. However, there are instances where you can use Arduino boards for server functions such as hosting a simple web page that you can use to control your Arduino from the local network it is connected to. The Arduino platform can connect to both LAN and Wi-Fi networks. The ESP8266 has a very simple hardware architecture and is best used in client applications such as data logging and control of actuators from online server applications. You can use it as a webserver as well, but for applications or web pages that you would want to access from the local network that the module is connected to. The Spark Core platform is ideal for both server and client functions. It can be used to log sensor data onto the Spark.io cloud or receive commands from the cloud. Moreover, you don’t have to worry about getting online server space, since the Spark cloud is available for free. You can also create Graphical User Interfaces based on Node.js to visualize incoming data from sensors connected to the Spark Core and send commands to the Spark Core for activation of actuators. Setting up Raspberry Pi Zero Raspberry Pi is low-cost board dedicated for project purpose this will use a Raspberry Pi Zero board. See the following link https://www.adafruit.com/products/2816 for the Raspberry Pi Zero board and Kit. In order to make the Raspberry Pi work, we need an operating system that acts as a bridge between the hardware and the user, we will be using the Raspbian Jessy that you can download from https://www.raspberrypi.org/downloads/, in this link you will find all the information that you need to download all the software that it’s necessary to use with your Raspberry Pi to deploy Raspbian, we need a micro SD card of at least 4 GB. The kit that I used for testing the Raspberry Pi Zero includes all the necessary items for installing every thing and ready the board. Preparing the SD card The Raspberry Pi Zero only boots from a SD card, and cannot boot from an external drive or USB stick. For now it is recommended to use a Micro SD 4 GB of space. Installing the Raspian Operating System When we have the image file for preparing the SD card that has previously got from the page. Also we insert the micro SD into the adapter, download Win32DiskImager from https://sourceforge.net/projects/win32diskimager/ In the following screenshot you will see the files after downloading the folder It appears the window, open the file image and select the path you have the micro SD card and click on the Write button. After a few seconds we have here the download image and converted files into the micro SD In the next screenshot you can see the progress of the installation Summary In this article we discussed about the different platform for the use of Internet of Things like Ardunio and Raspberry Pi. We also how to setup and prepare Raspberry Pi for further use. Resources for Article: Further resources on this subject: Classes and Instances of Ember Object Model [article] Using JavaScript with HTML [article] Introducing the Ember.JS framework [article]
Read more
  • 0
  • 0
  • 13047

article-image-working-nav-and-azure-app-service
Packt
30 Jan 2017
12 min read
Save for later

Working with NAV and Azure App Service

Packt
30 Jan 2017
12 min read
In this article by Stefano Demiliani, the author of the book Building ERP Solutions with Microsoft Dynamics NAV, we will experience the quick solution to solve complex technical architectural scenarios and create external applications within a blink of an eye using Microsoft Dynamics NAV. (For more resources related to this topic, see here.) The business scenario Imagine a requirement where many Microsoft Dynamics NAV instances (physically located at different places around the world) have to interact with an external application. A typical scenario could be a headquarter of a big enterprise company that has a business application (called HQAPP) that must collect data about item shipments from the ERP of the subsidiary companies around the world (Microsoft Dynamics NAV): The cloud could help us to efficiently handle this scenario. Why not place the interface layer in the Azure Cloud and use the scalability features that Azure could offer? Azure App Service could be the solution to this. We can implement an architecture like the following schema: Here, the interface layer is placed on Azure App Service. Every NAV instance has the business logic (in our scenario, a query to retrieve the desired data) exposed as an NAV Web Service. The NAV instance can have an Azure VPN in place for security. HQAPP performs a request to the interface layer in Azure App Service with the correct parameters. The cloud service then redirects the request to the correct NAV instance and retrieves the data, which in turn is forwarded to HQAPP. Azure App Service can be scaled (manually or automatically) based on the resources requested to perform the data retrieval process. Azure App Service overview Azure App Service is a PaaS service for building scalable web and mobile apps and enabling interaction with on-premises or on-cloud data. With Azure App Service, you can deploy your application to the cloud and you can quickly scale your application to handle high traffic loads and manage traffic and application availability without interacting with the underlying infrastructure. This is the main difference with Azure VM, where you can run a web application on the cloud but in a IaaS environment (you control the infrastructure like OS, configuration, installed services, and so on). Some key features of Azure App Service are as follows: Support for many languages and frameworks Global scale with high availability (scaling up and out manually or automatically) Security Visual Studio integration for creating, deploying and debugging applications Application templates and connectors available Azure App Service offers different types of resources for running a workload, which are as follows: Web Apps: This hosts websites and web applications Mobile Apps: This hosts mobile app backends API Apps: This hosts RESTful APIs Logic Apps: This automates business processes across the cloud Azure App Service has the following different service plans where you can scale from depending on your requirements in terms of resources: Free: This is ideal for testing and development, no custom domains or SSL are required, you can deploy up to 10 applications. Shared: This has a fixed per-hour charge. This is ideal for testing and development, supports for custom domains and SSL, you can deploy up to 100 applications. Basic: This has a per-hour charge based on the number of instances. It runs on a dedicated instance. This is ideal for low traffic requirements, you can deploy an unlimited number of apps. It supports only a single SSL certificate per plan (not ideal if you need to connect to an Azure VPN or use deployment slots). Standard: This has a per-hour charge based on the number of instances. This provides full SSL support. This provides up to 10 instances with auto-scaling, automated backups, up to five deployment slots, ideal for production environments. Premium: This has per-hour charge based on the number of instances. This provides up to 50 instances with auto-scaling, up to 20 deployment slots, different daily backups, dedicated App Service Environment. Ideal for enterprise scale and integration. Regarding the application deployment, Azure App Service supports the concept of Deployment Slot (only on the Standard and Premium tiers). Deployment Slot is a feature that permits you to have a separate instance of an application that runs on the same VM but is isolated from the other deployment slots and production slots active in the App Service. Always remember that all Deployment Slots share the same VM instance and the same server resources. Developing the solution Our solution is essentially composed of two parts: The NAV business logic The interface layer (cloud service) The following steps will help you retrieve the required data from an external application: In the NAV instances of the subsidiary companies, we need to retrieve the sales shipment's data for every item. To do so, we need to create a Query object that reads Sales Shipment Header and Sales Shipment Line and exposes them as web services (OData).The Query object will be designed as follows: For every Sales Shipment Header web service, we retrieve the corresponding Sales Shipment Lines web service that have Type as DataItem: I've changed the name of the field No. in Sales Shipment Line in DataItem as ItemNo because the default name was in used in Sales Shipment Header in DataItem. Compile and save the Query object (here, I've used Object ID as 50009 and Service Name as Item Shipments). Now, we will publish the Query object as web service in NAV, so open the Web Services page and create the following entries:    Object Type: Query    Object ID: 50009    Service Name: Item Shipments    Published: TRUE When published, NAV returns the OData service URL. This Query object must be published as web service on every NAV instances in the subsidiary companies. To develop our interface layer, we need first to download and install (if not present) the Azure SDK for Visual Studio from https://azure.microsoft.com/en-us/downloads/. After that, we can create a new Azure Cloud Service project by opening Visual Studio and navigate to File | New | Project, select the Cloud templates, and choose Azure Cloud Service. Select the project's name (here, it is NAVAzureCloudService) and click on OK. After clicking on OK, Visual Studio asks you to select a service type. Select WCF Service Web Role, as shown in the following screenshot: Visual Studio now creates a template for our solution. Now right-click on the NAVAzureCloudService project and select New Web Role Project, and in the Add New .NET Framework Role Project window, select WCF Service Web Role and give it a proper name (here, we have named it WCFServiceWebRoleNAV): Then, rename Service1.svc with a better name (here, it is NAVService.svc). Our WCF Service Web Role must have the reference to all the NAV web service URLs for the various NAV instances in our scenario and (if we want to use impersonation) the credentials to access the relative NAV instance. You can right-click the WCFServiceWebRoleNAV project, select Properties and then the Settings tab. Here you can add the URL for the various NAV instances and the relative web service credentials. Let's start writing our service code. We create a class called SalesShipment that defines our data model as follows: public class SalesShipment { public string No { get; set; } public string CustomerNo { get; set; } public string ItemNo { get; set; } public string Description { get; set; } public string Description2 { get; set; } public string UoM { get; set; } public decimal? Quantity { get; set; } public DateTime? ShipmentDate { get; set; } } In next step, we have to define our service contract (interface). Our service will have a single method to retrieve shipments for a NAV instance and with a shipment date filter. The service contract will be defined as follows: public interface INAVService { [OperationContract] [WebInvoke(Method = "GET", ResponseFormat = WebMessageFormat.Xml, BodyStyle = WebMessageBodyStyle.Wrapped, UriTemplate = "getShipments?instance={NAVInstanceName}&date={shipmentDateFilter}"] List<SalesShipment> GetShipments(string NAVInstanceName, string shipmentDateFilter); //Date format parameter: YYYY-MM-DD } The WCF service definition will implement the previously defined interface as follows: public class NAVService : INAVService { } The GetShipments method is implemented as follows: public List<SalesShipment> GetShipments(string NAVInstanceName, string shipmentDateFilter) { try { DataAccessLayer.DataAccessLayer DAL = new DataAccessLayer.DataAccessLayer(); List<SalesShipment> list = DAL.GetNAVShipments(NAVInstanceName, shipmentDateFilter); return list; } catch(Exception ex) { // You can handle exceptions here… throw ex; } } This method creates an instance of a DataAccessLayer class (which we will discuss in detail later) and calls a method called GetNAVShipments by passing the NAV instance name and ShipmentDateFilter. To call the NAV business logic, we need to have a reference to the NAV OData web service (only to generate a proxy class, the real service URL will be dynamically called by code) so right-click on your project (WCFServiceWebRoleNAV) and navigate to Add | Service Reference. In the Add Service Reference window, paste the OData URL that comes from NAV and when the service is discovered, give it a reference name (here, it is NAVODATAWS). Visual Studio automatically adds a service reference to your project. The DataAccessLayer class will be responsible for handling calls to the NAV OData web service. This class defines a method called GetNAVShipments with the following two parameters: NAVInstanceName: This is the name of the NAV instance to call shipmentDateFilter: This filters date for the NAV shipment lines (greater than or equal to) According to NAVInstanceName, the method retrieves from the web.config file (appSettings) the correct NAV OData URL and credentials, calls the NAV query (by passing filters), and retrieves the data as a list of SalesShipment records (our data model). The DataAccessLayer class is defined as follows: public List<SalesShipment> GetNAVShipments(string NAVInstanceName, string shipmentDateFilter) { try { string URL = Properties.Settings.Default[NAVInstanceName].ToString(); string WS_User = Properties.Settings.Default[NAVInstanceName + "_User"].ToString(); string WS_Pwd = Properties.Settings.Default[NAVInstanceName + "_Pwd"].ToString(); string WS_Domain = Properties.Settings.Default[NAVInstanceName + "_Domain"].ToString(); DataServiceContext context = new DataServiceContext(new Uri(URL)); NAVODATAWS.NAV NAV = new NAVODATAWS.NAV(new Uri(URL)); NAV.Credentials = new System.Net.NetworkCredential(WS_User, WS_Pwd, WS_Domain); DataServiceQuery<NAVODATAWS.ItemShipments> q = NAV.CreateQuery<NAVODATAWS.ItemShipments>("ItemShipments"); if (shipmentDateFilter != null) { string FilterValue = string.Format("Shipment_Date ge datetime'{0}'", shipmentDateFilter); q = q.AddQueryOption("$filter", FilterValue); } List<NAVODATAWS.ItemShipments> list = q.Execute().ToList(); List<SalesShipment> sslist = new List<SalesShipment>(); foreach (NAVODATAWS.ItemShipments shpt in list) { SalesShipment ss = new SalesShipment(); ss.No = shpt.No; ss.CustomerNo = shpt.Sell_to_Customer_No; ss.ItemNo = shpt.ItemNo; ss.Description = shpt.Description; ss.Description2 = shpt.Description_2; ss.UoM = shpt.Unit_of_Measure; ss.Quantity = shpt.Quantity; ss.ShipmentDate = shpt.Shipment_Date; sslist.Add(ss); } return sslist; } catch (Exception ex) { throw ex; } } The method returns a list of the SalesShipment objects. It creates an instance of the NAV OData web service, applies the OData filter to the NAV query, reads the results, and loads the list of the SalesShipment objects. Deployment to Azure App Service Now that your service is ready, you have to deploy it to the Azure App Service by performing the following steps: Right-click on the NAVAzureCloudService project and select Package… as shown in the following screenshot: In the Package Azure Application window, select Service configuration as Cloud and Build configuration as Release, and then click on Package as shown in the following screenshot: This operation creates two files in the <YourProjectName>binReleaseapp.publish folder as shown in the following screenshot: These are the packages that must be deployed to Azure. To do so, you have to log in to the Azure Portal and navigate to Cloud Services | Add from the hub menu at the left. In the next window, set the following cloud service parameters: DNS name: This depicts name of your cloud service (yourname.cloudapp.net) Subscription: This is the Azure Subscription where the cloud service will be added Resource group: This creates a new resource group for your cloud service or use existing one Location: This is the Azure location where the cloud service is to be added Finally, you can click on the Create button to create your cloud service. Now, deploy the previously created cloud packages to your cloud service that was just created. In the cloud services list, click on NAVAZureCloudService, and in the next window, select the desired slot (for example, Production slot) and click on Upload as shown in the following screenshot: In the Upload a package window, provide the following parameters: Storage account: This is a previously created storage account for your subscription Deployment label: This is the name of your deployment Package: Select the .cspkg file previously created for your cloud service Configuration: Select the .cspkg file previously created for your cloud service configuration You can take a look at the preceding parameters in the following screenshot: Select the Start deployment checkbox and click on the OK button at the bottom to start the deployment process to Azure. Now you can start your cloud service and manage it (swap, scaling, and so on) directly from the Azure Portal: When running, you can use your deployed service by reaching this URL: http://navazurecloudservice.cloudapp.net/NAVService.svc This is the URL that the HQAPP in our business scenario has to call for retrieving data from the various NAV instances of the subsidiary companies around the world. In this way, you have deployed a service to the cloud, you can manage the resources in a central way (via the Azure Portal), and you can easily have different environments by using slots. Summary In this article, you learned to enable NAV instances placed at different locations to interact with an external application through Azure App Service and also the features that it provides. Resources for Article: Further resources on this subject: Introduction to NAV 2017 [article] Code Analysis and Debugging Tools in Microsoft Dynamics NAV 2009 [article] Exploring Microsoft Dynamics NAV – An Introduction [article]
Read more
  • 0
  • 0
  • 3234
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime
article-image-creating-dynamic-maps
Packt
27 Jan 2017
15 min read
Save for later

Creating Dynamic Maps

Packt
27 Jan 2017
15 min read
In this article by Joel Lawhead, author of the book, QGIS Python Programming Cookbook - Second Edition, we will cover the following recipes: Setting a transparent layer fill Using a filled marker symbol Rendering a single band raster using a color ramp algorithm Setting a feature's color using a column in a CSV file Creating a complex vector layer symbol Using an outline for font markers Using arrow symbols (For more resources related to this topic, see here.) Setting a transparent layer fill Sometimes, you may just want to display the outline of a polygon in a layer and have the insides of the polygon render transparently, so you can see the other features and background layers inside that space. For example, this technique is common with political boundaries. In this recipe, we will load a polygon layer onto the map, and then interactively change it to just an outline of the polygon. Getting ready Download the zipped shapefile and extract it to your qgis_data directory into a folder named ms from https://github.com/GeospatialPython/Learn/raw/master/Mississippi.zip. How to do it… In the following steps, we'll load a vector polygon layer, set up a properties dictionary to define the color and style, apply the properties to the layer's symbol, and repaint the layer. In Python Console, execute the following: Create the polygon layer: lyr = QgsVectorLayer("/qgis_data/ms/mississippi.shp", "Mississippi", "ogr") Load the layer onto the map: QgsMapLayerRegistry.instance().addMapLayer(lyr) Now, we’ll create the properties dictionary: properties = {} Next, set each property for the fill color, border color, border width, and a style of no meaning no-brush. Note that we’ll still set a fill color; we are just making it transparent: properties["color"] = '#289e26' properties["color_border"] = '#289e26' properties["width_border"] = '2' properties["style"] = 'no' Now, we create a new symbol and set its new property: sym = QgsFillSymbolV2.createSimple(properties) Next, we access the layer's renderer: renderer = lyr.rendererV2() Then, we set the renderer's symbol to the new symbol we created: renderer.setSymbol(sym) Finally, we repaint the layer to show the style updates: lyr.triggerRepaint() How it works… In this recipe, we used a simple dictionary to define our properties combined with the createSimple method of the QgsFillSymbolV2 class. Note that we could have changed the symbology of the layer before adding it to the canvas, but adding it first allows you to see the change take place interactively. Using a filled marker symbol A newer feature of QGIS is filled marker symbols. Filled marker symbols are powerful features that allow you to use other symbols, such as point markers, lines, and shapebursts as a fill pattern for a polygon. Filled marker symbols allow for an endless set of options for rendering a polygon. In this recipe, we'll do a very simple filled marker symbol that paints a polygon with stars. Getting ready Download the zipped shapefile and extract it to your qgis_data directory into a folder named ms from https://github.com/GeospatialPython/Learn/raw/master/Mississippi.zip. How to do it… A filled marker symbol requires us to first create the representative star point marker symbol. Then, we'll add that symbol to the filled marker symbol and change it with the layer's default symbol. Finally, we'll repaint the layer to update the symbology: First, create the layer with our polygon shapefile: lyr = QgsVectorLayer("/qgis_data/ms/mississippi.shp", "Mississippi", "ogr") Next, load the layer onto the map: QgsMapLayerRegistry.instance().addMapLayer(lyr) Now, set up the dictionary with the properties of the star marker symbol: marker_props = {} marker_props["color"] = 'red' marker_props["color_border"] = 'black' marker_props["name"] = 'star' marker_props["size"] = '3' Now, create the star marker symbol: marker = QgsMarkerSymbolV2.createSimple(marker_props) Then, we create our filled marker symbol: filled_marker = QgsPointPatternFillSymbolLayer() We need to set the horizontal and vertical spacing of the filled markers in millimeters: filled_marker.setDistanceX(4.0) filled_marker.setDistanceY(4.0) Now, we can add the simple star marker to the filled marker symbol: filled_marker.setSubSymbol(marker) Next, access the layer's renderer: renderer = lyr.rendererV2() Now, we swap the first symbol layer of the first symbol with our filled marker using zero indexes to reference them: renderer.symbols()[0].changeSymbolLayer(0, filled_marker) Finally, we repaint the layer to see the changes: lyr.triggerRepaint() Verify that the result looks similar to the following screenshot: Rendering a single band raster using a color ramp algorithm A color ramp allows you to render a raster using just a few colors to represent different ranges of cell values that have a similar meaning in order to group them. The approach that will be used in this recipe is the most common way to render elevation data. Getting ready You can download a sample DEM from https://github.com/GeospatialPython/Learn/raw/master/dem.zip, which you can unzip in a directory named rasters in your qgis_data directory. How to do it... In the following steps, we will set up objects to color a raster, create a list establishing the color ramp ranges, apply the ramp to the layer renderer, and finally, add the layer to the map. To do this, we need to perform the following: First, we import the QtGui library for color objects in Python Console: from PyQt4 import QtGui Next, we load the raster layer, as follows: lyr = QgsRasterLayer("/qgis_data/rasters/dem.asc", "DEM") Now, we create a generic raster shader object: s = QgsRasterShader() Then, we instantiate the specialized ramp shader object: c = QgsColorRampShader() We must name a type for the ramp shader. In this case, we use an INTERPOLATED shader: c.setColorRampType(QgsColorRampShader.INTERPOLATED) Now, we'll create a list of our color ramp definitions: i = [] Then, we populate the list with the color ramp values that correspond to the elevation value ranges: i.append(QgsColorRampShader.ColorRampItem(400, QtGui.QColor('#d7191c'), '400')) i.append(QgsColorRampShader.ColorRampItem(900, QtGui.QColor('#fdae61'), '900')) i.append(QgsColorRampShader.ColorRampItem(1500, QtGui.QColor('#ffffbf'), '1500')) i.append(QgsColorRampShader.ColorRampItem(2000, QtGui.QColor('#abdda4'), '2000')) i.append(QgsColorRampShader.ColorRampItem(2500, QtGui.QColor('#2b83ba'), '2500')) Now, we assign the color ramp to our shader: c.setColorRampItemList(i) Now, we tell the generic raster shader to use the color ramp: s.setRasterShaderFunction(c) Next, we create a raster renderer object with the shader: ps = QgsSingleBandPseudoColorRenderer(lyr.dataProvider(), 1, s) We assign the renderer to the raster layer: lyr.setRenderer(ps) Finally, we add the layer to the canvas in order to view it: QgsMapLayerRegistry.instance().addMapLayer(lyr) How it works… While it takes a stack of four objects to create a color ramp, this recipe demonstrates how flexible the PyQGIS API is. Typically, the more number of objects it takes to accomplish an operation in QGIS, the richer the API is, giving you the flexibility to make complex maps. Notice that in each ColorRampItem object, you specify a starting elevation value, the color, and a label as the string. The range for the color ramp ends at any value less than the following item. So, in this case, the first color will be assigned to the cells with a value between 400 and 899. The following screenshot shows the applied color ramp: Setting a feature's color using a column in a CSV file Comma Separated Value (CSV) files are an easy way to store basic geospatial information. But you can also store styling properties alongside the geospatial data for QGIS to use in order to dynamically style the feature data. In this recipe, we'll load some points into QGIS from a CSV file and use one of the columns to determine the color of each point. Getting ready Download the sample zipped CSV file from the following URL: https://github.com/GeospatialPython/Learn/raw/master/point_colors.csv.zip Extract it and place it in your qgis_data directory in a directory named shapes. How to do it… We'll load the CSV file into QGIS as a vector layer and create a default point symbol. Then we'll specify the property and the CSV column we want to control. Finally we'll assign the symbol to the layer and add the layer to the map: First, create the URI string needed to load the CSV: uri = "file:///qgis_data/shapes/point_colors.csv?" uri += "type=csv&" uri += "xField=X&yField=Y&" uri += "spatialIndex=no&" uri += "subsetIndex=no&" uri += "watchFile=no&" uri += "crs=epsg:4326" Next, create the layer using the URI string: lyr = QgsVectorLayer(uri,"Points","delimitedtext") Now, create a default symbol for the layer's geometry type: sym = QgsSymbolV2.defaultSymbol(lyr.geometryType()) Then, we access the layer's symbol layer: symLyr = sym.symbolLayer(0) Now, we perform the key step, which is to assign a symbol layer property to a CSV column: symLyr.setDataDefinedProperty("color", '"COLOR"') Then, we change the existing symbol layer with our data-driven symbol layer: lyr.rendererV2().symbols()[0].changeSymbolLayer(0, symLyr) Finally, we add the layer to the map and verify that each point has the correct color, as defined in the CSV: QgsMapLayerRegistry.instance().addMapLayers([lyr]) How it works… In this example, we pulled feature colors from the CSV, but you could control any symbol layer property in this manner. CSV files can be a simple alternative to databases for lightweight applications or for testing key parts of a large application before investing the overhead to set up a database. Creating a complex vector layer symbol The true power of QGIS symbology lies in its ability to stack multiple symbols in order to create a single complex symbol. This ability makes it possible to create virtually any type of map symbol you can imagine. In this recipe, we'll merge two symbols to create a single symbol and begin unlocking the potential of complex symbols. Getting ready For this recipe, we will need a line shapefile, which you can download and extract from https://github.com/GeospatialPython/Learn/raw/master/paths.zip. Add this shapefile to a directory named shapes in your qgis_data directory. How to do it… Using Python Console, we will create a classic railroad line symbol by placing a series of short, rotated line markers along a regular line symbol. To do this, we need to perform the following steps: First, we load our line shapefile: lyr = QgsVectorLayer("/qgis_data/shapes/paths.shp", "Route", "ogr") Next, we get the symbol list and reference the default symbol: symbolList = lyr.rendererV2().symbols() symbol = symbolList[0] Then,we create a shorter variable name for the symbol layer registry: symLyrReg = QgsSymbolLayerV2Registry Now, we set up the line style for a simple line using a Python dictionary: lineStyle = {'width':'0.26', 'color':'0,0,0'} Then, we create an abstract symbol layer for a simple line: symLyr1Meta = symLyrReg.instance().symbolLayerMetadata("SimpleLine") We instantiate a symbol layer from the abstract layer using the line style properties: symLyr1 = symLyr1Meta.createSymbolLayer(lineStyle) Now, we add the symbol layer to the layer's symbol: symbol.appendSymbolLayer(symLyr1) Now,in order to create the rails on the railroad, we begin building a marker line style with another Python dictionary, as follows: markerStyle = {} markerStyle['width'] = '0.26' markerStyle['color'] = '0,0,0' markerStyle['interval'] = '3' markerStyle['interval_unit'] = 'MM' markerStyle['placement'] = 'interval' markerStyle['rotate'] = '1' Then, we create the marker line abstract symbol layer for the second symbol: symLyr2Meta = symLyrReg.instance().symbolLayerMetadata("MarkerLine") We instatiate the symbol layer, as shown here: symLyr2 = symLyr2Meta.createSymbolLayer(markerStyle) Now, we must work with a subsymbol that defines the markers along the marker line: sybSym = symLyr2.subSymbol() We must delete the default subsymbol: sybSym.deleteSymbolLayer(0) Now, we set up the style for our rail marker using a dictionary: railStyle = {'size':'2', 'color':'0,0,0', 'name':'line', 'angle':'0'} Now, we repeat the process of building a symbol layer and add it to the subsymbol: railMeta = symLyrReg.instance().symbolLayerMetadata("SimpleMarker") rail = railMeta.createSymbolLayer(railStyle) sybSym.appendSymbolLayer(rail) Then, we add the subsymbol to the second symbol layer: symbol.appendSymbolLayer(symLyr2) Finally, we add the layer to the map: QgsMapLayerRegistry.instance().addMapLayer(lyr) How it works… First, we must create a simple line symbol. The marker line, by itself, will render correctly, but the underlying simple line will be a randomly chosen color. We must also change the subsymbol of the marker line because the default subsymbol is a simple circle. Using an outline for font markers Font markers open up broad possibilities for icons, but a single-color shape can be hard to see across a varied map background. Recently, QGIS added the ability to place outlines around font marker symbols. In this recipe, we'll use font marker symbol methods to place an outline around the symbol to give it contrast and, therefore, visibility on any type of background. Getting ready Download the following zipped shapefile. Extract it and place it in a directory named ms in your qgis_data directory: https://github.com/GeospatialPython/Learn/raw/master/tourism_points.zip How to do it… This recipe will load a layer from a shapefile, set up a font marker symbol, put an outline on it, and then add it to the layer. We'll use a simple text character, an @ sign, as our font marker to keep things simple: First, we need to import the QtGUI library, so we can work with color objects: from PyQt4.QtGui import * Now, we create a path string to our shapefile: src = "/qgis_data/ms/tourism_points.shp" Next, we can create the layer: lyr = QgsVectorLayer(src, "Points of Interest", "ogr") Then, we can create the font marker symbol specifying the font size and color in the constructor: symLyr = QgsFontMarkerSymbolLayerV2(pointSize=16, color=QColor("cyan")) Now, we can set the font family, character, outline width, and outline color: symLyr.setFontFamily("'Arial'") symLyr.setCharacter("@") symLyr.setOutlineWidth(.5) symLyr.setOutlineColor(QColor("black")) We are now ready to assign the symbol to the layer: lyr.rendererV2().symbols()[0].changeSymbolLayer(0, symLyr) Finally, we add the layer to the map: QgsMapLayerRegistry.instance().addMapLayer(lyr) Verify that your map looks similar to the following image: How it works… We used class methods to set this symbol up, but we also could have used a property dictionary just as easily. Note that the font size and color were set in the object constructor for the font maker symbol instead of using setter methods. QgsFontMarkerSymbolLayerV2 doesn't have methods for these two properties. Using arrow symbols Line features convey location, but sometimes you also need to convey a direction along a line. QGIS recently added a symbol that does just that by turning lines into arrows. In this recipe, we'll symbolize some line features showing historical human migration routes around the world. This data requires directional arrows for us to understand it: Getting ready We will use two shapefiles in this example. One is a world boundaries shapefile and the other is a route shapefile. You can download the countries shapefile here: https://github.com/GeospatialPython/Learn/raw/master/countries.zip You can download the routes shapefile here: https://github.com/GeospatialPython/Learn/raw/master/human_migration_routes.zip Download these ZIP files and unzip the shapefiles into your qgis_data directory. How to do it… We will load the countries shapefile as a background reference layer and then, the route shapefile. Before we display the layers on the map, we'll create the arrow symbol layer, configure it, and then add it to the routes layer. Finally, we'll add the layers to the map. First, we'll create the URI strings for the paths to the two shapefiles: countries_shp = "/qgis_data/countries.shp" routes_shp = "/qgis_data/human_migration_routes.shp" Next, we'll create our countries and routes layers: countries = QgsVectorLayer(countries_shp, "Countries", "ogr") routes = QgsVectorLayer(routes_shp, "Human Migration Routes", "ogr") Now, we’ll create the arrow symbol layer: symLyr = QgsArrowSymbolLayer() Then, we’ll configure the layer. We'll use the default configuration except for two paramters--to curve the arrow and to not repeat the arrow symbol for each line segment: symLyr.setIsCurved(True) symLyr.setIsRepeated(False) Next, we add the symbol layer to the map layer: routes.rendererV2().symbols()[0].changeSymbolLayer(0, symLyr) Finally, we add the layers to the map: QgsMapLayerRegistry.instance().addMapLayers([routes,countries]) Verify that your map looks similar to the following image: How it works… The symbol calculates the arrow's direction based on the order of the feature's points. You may find that you need to edit the underlying feature data to produce the desired visual effect, especially when using curved arrows. You have limited control over the arc of the curve using the end points plus an optional third vertex. This symbol is one of the several new powerful visual effects added to QGIS, which would have normally been done in a vector illustration program after you produced a map. Summary In this article, weprogrammatically created dynamic maps using Python to control every aspect of the QGIS map canvas. We learnt to dynamically apply symbology from data in a CSV file. We also learnt how to use some newer QGIS custom symbology including font markers, arrow symbols, null symbols, and the powerful new 2.5D renderer for buildings. Wesaw that every aspect of QGIS is up for grabs with Python, to write your own application. Sometimes, the PyQGIS API may not directly support our application goal, but there is nearly always a way to accomplish what you set out to do with QGIS. Resources for Article: Further resources on this subject: Normal maps [article] Putting the Fun in Functional Python [article] Revisiting Linux Network Basics [article]
Read more
  • 0
  • 0
  • 4330

article-image-common-php-scenarios
Packt
24 Jan 2017
11 min read
Save for later

Common PHP Scenarios

Packt
24 Jan 2017
11 min read
Introduction In this article by Tim Butler, author of the book Nginx 1.9 Cookbook, we'll go through examples of the more common PHP scenarios and how to implement them with Nginx. PHP is a thoroughly tested product to use with Nginx because it is the most popular web-based programming language. It powers sites, such as Facebook, Wikipedia, and every WordPress-based site, and its popularity hasn't faded as other languages have grown. (For more resources related to this topic, see here.) As WordPress is the most popular of the PHP systems, I've put some additional information to help with troubleshooting. Even if you're not using WordPress, some of this information may be helpful if you run into issues with other PHP frameworks. Most of the recipes expect that you have a working understanding of the PHP systems, so not all of the setup steps for the systems will be covered. In order to keep the configurations as simple as possible, I haven't included details such as cache headers or SSL configurations in these recipes. Configuring Nginx for WordPress Covering nearly 30 percent of all websites, WordPress is certainly the Content Management System (CMS) of choice by many. Although it came from a blogging background, WordPress is a very powerful CMS for all content types and powers some of the world's busiest websites. By combing it with Nginx, you can deploy a highly scalable web platform. You can view the official WordPress documentation on Nginx at https://codex.wordpress.org/Nginx. We'll also cover some of the more complex WordPress scenarios, including multisite configurations with subdomains and directories. Let's get started. Getting ready To compile PHP code and run it via Nginx, the preferred method is via PHP-FPM, a high speed FastCGI Process Manager. We'll also need to install PHP itself and for the sake of simplicity, we'll stick with the OS supplied version. Those seeking the highest possible performance should ensure they're running PHP 7 (released December 3, 2015), which can offer a 2-3x speed improvement for WordPress. To install PHP-FPM, you should run the following on a Debian/Ubuntu system: sudo apt-get install php5-fpm For those running CentOS/RHEL, you should run the following: sudo yum install php-fpm As PHP itself is a prerequisite for the php-fpm packages, it will also be installed. Note: Other packages such as MySQL will be required if you're intending on running this on a single VPS instance. Consult the WordPress documentation for a full list of requirements. How to do it… At this instance, we're simply using a standalone WordPress site, which would be deployed in many personal and business scenarios. This is the typical deployment for WordPress. For ease of management, I've created a dedicated config file just for the WordPress site (/etc/nginx.conf.d/generic-wordpress.conf): server { listen 80; server_name wordpressdemo.nginxcookbook.com; access_log /var/log/nginx/access.log combined; location / { root /var/www/html; try_files $uri $uri/ /index.php?$args; } location ~ .php$ { fastcgi_pass unix:/var/run/php5-fpm.sock; fastcgi_index index.php; fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name; include fastcgi_params; } } Restart Nginx to pickup the new configuration file and then check your log files if there are any errors. If you're installing WordPress from scratch, you should see the following: You can complete the WordPress installation if you haven't already. How it works… For the root URL call, we have a new try_files directive, which will attempt to load the files in the order specified, but will fallback to the last parameter if they all fail. For this WordPress example, it means that any static files will be served if they exist on the system, then fallback to /index.php?args if this fails. This can also be very handy for automatic maintenance pages too. The args rewrite allows the permalinks of the site to be in a much more human form. For example, if you have a working WordPress installation, you can see links such as the one shown in the following image: Lastly, we process all PHP files via the FastCGI interface to PHP-FPM. In the preceding example, we're referencing the Ubuntu/Debian standard; if you're running CentOS/RHEL, then the path will be /var/run/php-fpm.sock. Nginx is simply proxying the connection to the PHP-FPM instance, rather than being part of Nginx itself. This separation allows for greater resource control, especially since the number of incoming requests to the webserver don't necessarily match the number of PHP requests for a typical website. There’s more… Take care when copying and pasting any configuration files. It's very easy to miss something and have one thing slightly different in your environment, which will cause issues with the website working as expected. Here's a quick lookup table of various other issues which you may come across: Error What to check 502 Bad Gateway File ownership permissions for the PHP-FPM socket file 404 File Not Found Check for the missing index index.php directive 403 Forbidden Check for the correct path in the root directive Your error log (defaults to /var/log/nginx/error.log) will generally contain a lot more detail in regard to the issue you're seeing compared with what's displayed in the browser. Make sure you check the error log if you receive any errors. Hint: Nginx does not support .htaccess files. If you see examples on the web referencing a .htaccess files, these are Apache specific. Make sure any configurations you're looking at are for Nginx. WordPress multisite with Nginx WordPress multisites (also referred to as network sites) allow you to run multiple websites from the one codebase. This can reduce the management burden of having separate WordPress installs when you have similar sites. For example, if you have a sporting site with separate news and staff for different regions, you can use a Multisite install to accomplish this. How to do it... To convert a WordPress site into a multisite, you need to add the configuration variable into your config file: define( 'WP_ALLOW_MULTISITE', true ); Under the Tools menu, you'll now see an extra menu called Network Setup. This will present you with two main options, Sub-domains and Sub-directories. This is the two different ways the multisite installation will work. The Sub-domains option have the sites separated by domain names, for example, site1.nginxcookbook.com and site2.nginxcookbook.com. The Sub-directories option mean that the sites are separated by directories, for example, www.nginxcookbook.com/site1 and www.nginxcookbook.com/site2. There's no functional difference between the two, it's simply an aesthetic choice. However, once you've made your choice, you cannot return to the previous state. Once you've made the choice, it will then provide the additional code to add to your wp-config.php file. Here's the code for my example instance, which is subdirectory based: define('MULTISITE', true); define('SUBDOMAIN_INSTALL', false); define('DOMAIN_CURRENT_SITE', 'wordpress.nginxcookbook.com'); define('PATH_CURRENT_SITE', '/'); define('SITE_ID_CURRENT_SITE', 1); define('BLOG_ID_CURRENT_SITE', 1); Because Nginx doesn't support .htaccess files, the second part of the WordPress instructions will not work. Instead, we need to modify the Nginx configuration to provide the rewrite rules ourselves. In the existing /etc/nginx/conf.d/wordpress.conf file, you'll need to add the following just after the location / directive: if (!-e $request_filename) { rewrite /wp-admin$ $scheme://$host$uri/ permanent; rewrite ^(/[^/]+)?(/wp-.*) $2 last; rewrite ^(/[^/]+)?(/.*.php) $2 last; } Although the if statements are normally avoided if possible, at this instance, it will ensure the subdirectory multisite configuration works as expected. If you're expecting a few thousand concurrent users on your site, then it may be worthwhile investigating the static mapping of each site. There are plugins to assist with the map generations for this, but they are still more complex compared to the if statement. Subdomains If you've selected subdomains, your code to put in wp-config.php will look like this: define('MULTISITE', true); define('SUBDOMAIN_INSTALL', true); define('DOMAIN_CURRENT_SITE', 'wordpressdemo.nginxcookbook.com'); define('PATH_CURRENT_SITE', '/'); define('SITE_ID_CURRENT_SITE', 1); define('BLOG_ID_CURRENT_SITE', 1); You'll also need to modify the Nginx config as well to add the wildcard in for the server name: server_name *.wordpressdemo.nginxcookbook.com wordpressdemo.nginxcookbook.com; You can now add in the additional sites such as site1.wordpressdemo.nginxcookbook.com and there won't be any changes required for Nginx. See also Nginx recipe page: https://www.nginx.com/resources/wiki/start/topics/recipes/wordpress/ WordPress Codex page: https://codex.wordpress.org/Nginx Running Drupal using Nginx With version 8 recently released and a community of over 1 million supporters, Drupal remains a popular choice when it comes to a highly flexible and functional CMS platform. Version 8 has over 200 new features compared to version 7, aimed at improving both the usability and manageability of the system. This cookbook will be using version 8.0.5. Getting ready This example assumes you already have a working instance of Drupal or are familiar with the installation process. You can also follow the installation guide available at https://www.drupal.org/documentation/install. How to do it... This recipe is for a basic Drupal configuration, with the Drupal files located in /var/www/vhosts/drupal. Here's the configuration to use: server { listen 80; server_name drupal.nginxcookbook.com; access_log /var/log/nginx/drupal.access.log combined; index index.php; root /var/www/vhosts/drupal/; location / { try_files $uri $uri/ /index.php?$args; } location ~ (^|/). { return 403; } location ~ /vendor/.*.php$ { deny all; return 404; } location ~ .php$|^/update.php { fastcgi_pass unix:/var/run/php5-fpm.sock; fastcgi_split_path_info ^(.+?.php)(|/.*)$; fastcgi_index index.php; fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name; include fastcgi_params; } } How it works… Based on a simple PHP-FPM structure, we make a few key changes specific for the Drupal environment. The first change is as follows: location ~ (^|/). { return 403; } We put a block in for any files beginning with a dot, which are normally hidden and/or system files. This is to prevent accidental information leakage: location ~ /vendor/.*.php$ { deny all; return 404; } Any PHP file within the vendor directory is also blocked, as they shouldn't be called directly. Blocking the PHP files limits any potential exploit opportunity which could be discovered in third-party code. Lastly, Drupal 8 changed the way the PHP functions are called for updates, which causes any old configuration to break. The location directive for the PHP files looks like this: location ~ .php$|^/update.php { This is to allow the distinct pattern that Drupal uses, where the PHP filename could be midway through the URI. We also modify how the FastCGI process splits the string, so that we ensure we always get the correct answer: fastcgi_split_path_info ^(.+?.php)(|/.*)$; See also Nginx Recipe: https://www.nginx.com/resources/wiki/start/topics/recipes/drupal/ Using Nginx with MediaWiki MediaWiki, most recognized by its use with Wikipedia, is the most popular open source wiki platform available. With features heavily focused on the ease of editing and sharing content, MediaWiki makes a great system to store information you want to continually edit: Getting ready This example assumes you already have a working instance of MediaWiki or are familiar with the installation process. For those unfamiliar with the process, it's available online at https://www.mediawiki.org/wiki/Manual:Installation_guide. How to do it... The basic Nginx configuration for MediaWiki is very similar to many other PHP platforms. It has a flat directory structure which easily runs with basic system resources. Here's the configuration: server { listen 80; server_name mediawiki.nginxcookbook.com; access_log /var/log/nginx/mediawiki.access.log combined; index index.php; root /var/www/vhosts/mediawiki/; location / { try_files $uri $uri/ /index.php?$args; } location ~ .php$ { fastcgi_pass unix:/var/run/php5-fpm.sock; fastcgi_index index.php; fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name; include fastcgi_params; } } The default installation doesn't use any rewrite rules, which means you'll get URLs such as index.php?title=Main_Page instead of the neater (and more readable) /wiki/Main_Page. To enable this, we need to edit the LocalSettings.php file and add the following lines: $wgArticlePath = "/wiki/$1"; $wgUsePathInfo = TRUE; This allows the URLs to be rewritten in a much neater format. See also NGINX Recipe: https://www.nginx.com/resources/wiki/start/topics/recipes/mediawiki/ Summary In this article we learned common PHP scenarios and how to configure them with Nginx. The first recipe talks about how to configure Nginx for WordPress. Then we learned how to set up a WordPress multisite. In third recipe we discussed how to configure and run Drupal using Nginx. In the last recipe we learned how to configure Nginx for MediaWiki. Resources for Article: Further resources on this subject: A Configuration Guide [article] Nginx service [article] Getting Started with Nginx [article]
Read more
  • 0
  • 0
  • 28486

article-image-understanding-container-scenarios-and-overview-docker
Packt
24 Jan 2017
17 min read
Save for later

Understanding Container Scenarios and Overview of Docker

Packt
24 Jan 2017
17 min read
Docker is one of the recent most successful open source project which provides packaging, shipping, and running any application as light weight containers. We can actually compare Docker containers as shipping containers that provides standard consistent way of shipping any application. Docker is fairly a new project and with help of this article it will be easy to troubleshoot some of the common problems which Docker users face while installing and using Dockers containers. In this article by Rajdeep Dua, Vaibhav Kohli, and John Wooten authors of the book Troubleshooting Docker, the emphasis will be on the following topics; Decoding containers Diving into Docker Advantages of Docker containers Docker lifecycle Docker design patterns Unikernels (For more resources related to this topic, see here.) Decoding containers Containerization are an alternative to virtual machine which involves encapsulation of applications and providing it with its own operating environment. The basic foundation for containers is Linux containers (LXC) which is user space interface for Linux Kernel containment features. With help of powerful API and simple tools it lets Linux users create and manage application containers. LXC containers are in-between of chroot and full-fledged virtual machine. Another key difference with containerization from traditional hypervisor's is that containers share the Linux Kernel used by operating system running the host machine, thus multiple containers running in the same machine uses the same Linux Kernel. It gives the advantage of being fast with almost zero performance overhead compared to VMs. Major use cases of containers are listed in the further sections. OS container OS containers can be easily imagined as a Virtual Machine (VM) but unlike a VM they share the Kernel of the host operating system but provide user space isolation. Similar to a VM dedicated resources can be assigned to containers and we can install, configure and run different application, libraries, and so on. Just as you would run on any VM. OS containers are helpful in case of scalability testing where fleet of containers can be deployed easily with different flavors of distros, which is very less expensive compared to deployment of VM's. Container are created from templates or images that determine the structure and contents of the container. It allows to create a container with identical environment, same package version, and configuration across all containers mostly used in case of dev environment setups. There are various container technologies like LXC, OpenVZ , Docker, and BSD jails which are suitable for OS containers. Figure 1: OS based container Application containers Application containers are designed to run a single service in the package, while OS containers which are explained previously can support multiple processes. Application containers are getting lot of attraction after launch of Docker and Rocket. Whenever container is launched it runs a single process. This process runs an application process but in case of OS containers it runs multiple services on the same OS. Containers usually have a layered approach as in case of Docker container which helps in reduced duplication and increased re-use. Container can be started with base image common for all components and then we can go on adding layers in the file system that are specific to the component. Layered file system helps to rollback changes as we can simple switch to old layers if required. The run command which is specified in Dockerfile adds a new layer for the container. The main purpose of application containers is to package different component of the application in separate container. The different component of the application which are packaged separately in container then interact with help of API's and services. The distributed multi-component system deployment is the basic implementation of micro-service architecture. In the preceding approach developer gets the freedom to package the application as per his requirement and IT team gets the privilege to deploy the container on multiple platforms in order to scale the system both horizontally as well as vertically. Hypervisor is virtual machine monitor (VMM), used to allow multiple operation system to run and share the hardware resources from the host. Each virtual machine is termed as guest machine. The following simple example explains the difference between application container and OS containers: Figure 2: Docker layers Let's consider the example of web three-tier architecture we have a database tier such as MySQL, Nginx for load balancer and application tier as Node.js: Figure 3: OS container In case of OS container we can pick up by default Ubuntu as the base container and install services MySQL, Nginx, Node.js using Dockerfile. This type of packaging is good for testing or for development setup where all the services are packaged together and can be shipped and shared across developer's. But deploying this architecture for production cannot be done with OS containers as there is no consideration of data scalability and isolation. Application containers helps to meet this use case as we can scale the required component by deploying more application specific containers and it also helps to meet load-balancing and recovery use-case. For the preceding three-tier architecture each of the services will be packaged into separate containers in order to fulfill the architecture deployment use-case. Figure 4: Application containers scaled up Main difference between OS and application containers are: OS container Application container Meant to run multiple services on same OS container Meant to run single service Natively, No layered filesystem Layered filesystem Example: LXC, OpenVZ, BSD Jails Example: Docker, Rocket Diving into Docker Docker is a container implementation that has gathered enormous interest in recent years. It neatly bundles various Linux Kernel features and services like namespaces, cgroups, SELinux, and AppArmor profiles and so on with Union files systems like AUFS, BTRFS to make modular images. These images provides highly configurable virtualized environment for applications and follows write-once-run-anywhere principle. Application can be as simple as running a process to a highly scalable and distributed processes working together. Docker is getting a lot of traction in industry, because of its performance savvy, and universal replicability architecture, meanwhile providing the following four cornerstones of modern application development: Autonomy Decentralization Parallelism Isolation Furthermore, wide-scale adaptation of Thoughtworks's micro services architecture or Lots of Small Applications (LOSA) is further bringing potential in Docker technology. As a result, big companies like Google, VMware, and Microsoft have already ported Docker to their infrastructure, and the momentum is continued by the launch of myriad of Docker startups namely Tutum, Flocker, Giantswarm and so on. Since Docker containers replicate their behavior anywhere, be it your development machine, a bare-metal server, virtual machine, or datacenter, application designers can focus their attention on development, while operational semantics are left with Devops. This makes team workflow modular, efficient and productive. Docker is not to be confused with VM, even though they are both virtualization technologies. Where Docker shares an OS, meanwhile providing sufficient level of isolation and security to applications running in containers, later completely abstracts out OS and gives strong isolation and security guarantees. But Docker resource footprint is minuscule in comparison to VM, and hence preferred for economy and performance. However, it still cannot completely replace VM, and hence is complementary to VM technology: Figure 5: VM and Docker architecture Advantages of Docker containers Following listed are some of the advantages of using Docker containers in Micro-service architecture: Rapid application deployment: With minimal runtime containers can be deployed quickly because of the reduced size as only application is packaged. Portability: An application with its operating environment (dependencies) can be bundled together into a single Docker container that is independent from the OS version or deployment model. The Docker containers can be easily transferred to another machine that runs Docker container and executed without any compatibility issues. As well Windows support is also going to be part of future Docker releases. Easily sharable: Pre-built container images can be easily shared with help of public repositories as well as hosted private repositories for internal use. Lightweight footprint: Even the Docker images are very small and have minimal footprint to deploy new application with help of containers. Reusability: Successive versions of Docker containers can be easily built as well as roll-backed to previous versions easily whenever required. It makes them noticeably lightweight as components from the pre-existing layers can be reused. Docker lifecycle These are some of the basic steps involved in the lifecycle of Docker container: Build the Docker image with help of Dockerfile which contains all the commands required to be packaged. It can run in the following way: Docker build Tag name can be added in following way: Docker build -t username/my-imagename If Dockerfile exists at different path then the Docker build command can be executed by providing –f flag: Docker build -t username/my-imagename -f /path/Dockerfile After the image creation, in order to deploy the container Docker run can be used. The running containers can be checked with help of Docker pscommand, which list the currently active containers. There are two more commands to be discussed; Docker pause: This command used cgroups freezer to suspend all the process running in container, internally it uses SIGSTOP signal. Using this command process can be easily suspended and resumed whenever required. Docker start: This command is used to either start the paused or stopped container. After the usage of container is done it can either be stopped or killed; Docker stop: command will gracefully stop the running container by sending SIGTERM and then SIGKILL command. In this case container can still be listed by using Docker ps –a command. Docker kill will kill the running container by sending SIGKILL to main process running inside the container. If there are some changes made to the container while it is running, which are likely to be preserved, container can be converted back to image by using the Docker commit after container has been stopped. Figure 6: Docker lifecycle Docker design patterns Following listed are some of the Docker design patterns with examples. Dockerfile is the base structure from which we define a Docker image it contains all the commands to assemble an image. Using Docker build command we can create automated build that executes all the previously mentioned command-line instructions to create an image: $ Docker build Sending build context to Docker daemon 6.51 MB ... Design patterns listed further can help in creating Docker images that persist in volumes and provides various other flexibility so that they can be re-created or replaced easily at any time. The base image sharing For creating a web-based application or blog we can create a base image which can be shared and help to deploy the application with ease. This patterns helps out as it tries to package all the required services on top of one base image, so that this web application blog image can be re-used anywhere: FROM debian:wheezy RUN apt-get update RUN apt-get -y install ruby ruby-dev build-essential git # For debugging RUN apt-get install -y gdb strace # Set up my user RUN useradd vkohli -u 1000 -s /bin/bash --no-create-home RUN gem install -n /usr/bin bundler RUN gem install -n /usr/bin rake WORKDIR /home/vkohli/ ENV HOME /home/vkohli VOLUME ["/home"] USER vkohli EXPOSE 8080 The preceding Dockerfile shows the standard way of creating an application-based image. Docker image is a zipped file which is a snapshot of all the configuration parameters as well as the changes made in the base image (Kernel of the OS). It installs some specific tools (Ruby tools rake and bundler) on top of Debian base image. It creates a new user adds it to the container image and specifies the working directory by mounting /home directory from the host which is explained in detail in next section. Shared volume Sharing the volume at host level allows other containers to pick up the shared content required by them. This helps in faster rebuilding of Docker image or add/modify/remove dependencies. Example if we are creating the homepage deployment of the previously mentioned blog only directory required to be shared is /home/vkohli/src/repos/homepage directory with this web app container through the Dockerfile in the following way: FROM vkohli/devbase WORKDIR /home/vkohli/src/repos/homepage ENTRYPOINT bin/homepage web For creating the dev version of the blog we can share the folder /home/vkohli/src/repos/blog where all the related developer files can reside. And for creating the dev-version image we can take the base image from pre-created devbase: FROM vkohli/devbase WORKDIR / USER root # For Graphivz integration RUN apt-get update RUN apt-get -y install graphviz xsltproc imagemagick USER vkohli WORKDIR /home/vkohli/src/repos/blog ENTRYPOINT bundle exec rackup -p 8080 Dev-tools container For development purpose we have separate dependencies in dev and production environment which easily gets co-mingled at some point. Containers can be helpful in differentiating the dependencies by packaging them separately. As shown in the following example we can derive the dev tools container image from the base image and install development dependencies on top of it even allowing ssh connection so that we to work upon the code: FROM vkohli/devbase RUN apt-get update RUN apt-get -y install openssh-server emacs23-nox htop screen # For debugging RUN apt-get -y install sudo wget curl telnet tcpdump # For 32-bit experiments RUN apt-get -y install gcc-multilib # Man pages and "most" viewer: RUN apt-get install -y man most RUN mkdir /var/run/sshd ENTRYPOINT /usr/sbin/sshd -D VOLUME ["/home"] EXPOSE 22 EXPOSE 8080 As can be seen previously basic tools such as wget, curl, tcpdump are installed which are required during development. Even SSHD service is installed which allows to ssh connection into the dev container. Test environment container Testing the code in different environment always eases the process and helps to find more bugs in isolation. We can create a ruby environment in separate container to spawn a new ruby shell and use it to test the code base: FROM vkohli/devbase RUN apt-get update RUN apt-get -y install ruby1.8 git ruby1.8-dev In the preceding Dockerfile we are using the base image as devbase and with help of just one command Docker run can easily create a new environment by using the image created from this Dockerfile to test the code. The build container We have built steps involved in the application that are sometimes expensive. In order to overcome this we can create a separate a build container which can use the dependencies needed during build process. Following Dockerfile can be used to run a separate build process: FROM sampleapp RUN apt-get update RUN apt-get install -y build-essential [assorted dev packages for libraries] VOLUME ["/build"] WORKDIR /build CMD ["bundler", "install","--path","vendor","--standalone"] /build directory is the shared directory that can be used to provide the compiled binaries also we can mount the /build/source directory in the container to provide updated dependencies. Thus by using build container we can decouple the build process and final packaging part in separate containers. It still encapsulates both the process and dependencies by breaking the previous process in separate containers. The installation container The purpose of this container is to package the installation steps in separate container. Basically, in order to provide deployment of container in production environment. Sample Dockerfile to package the installation script inside Docker image as follows: ADD installer /installer CMD /installer.sh The installer.sh can contain the specific installation command to deploy container in production environment and also to provide the proxy setup with DNS entry in order to have the cohesive environment deployed. Service-in-a-box container In order to deploy the complete application in a container we can bundle multiple services to provide the complete deployment container. In this case we bundle web app, API service and database together in one container. It helps to ease the pain of inter-linking various separate containers: services: web: git_url: git@github.com:vkohli/sampleapp.git git_branch: test command: rackup -p 3000 build_command: rake db:migrate deploy_command: rake db:migrate log_folder: /usr/src/app/log ports: ["3000:80:443", "4000"] volumes: ["/tmp:/tmp/mnt_folder"] health: default api: image: quay.io/john/node command: node test.js ports: ["1337:8080"] requires: ["web"] databases: - "mysql" - "redis" Infrastructure container As we have talked about the container usage in development environment, there is one big category missing the usage of container for infrastructure services such as proxy setup which provides a cohesive environment in order to provide the access to application. In the following mentioned Dockerfile example we can see that haproxy is installed and links to its configuration file is provided: FROM debian:wheezy ADD wheezy-backports.list /etc/apt/sources.list.d/ RUN apt-get update RUN apt-get -y install haproxy ADD haproxy.cfg /etc/haproxy/haproxy.cfg CMD ["haproxy", "-db", "-f", "/etc/haproxy/haproxy.cfg"] EXPOSE 80 EXPOSE 443 Haproxy.cfg is the configuration file responsible for authenticating a user: backend test acl authok http_auth(adminusers) http-request auth realm vkohli if !authok server s1 192.168.0.44:8084 Unikernels Unikernels compiles source code into a custom operating system that includes only the functionality required by the application logic producing a specialized single address space machine image, eliminating unnecessary code. Unikernels is built using library operating system, which has the following benefits compared to traditional OS: Fast Boot time: Unikernels make provisioning highly dynamic and can boot in less than second. Small Footprint: Unikernel code base is smaller than the traditional OS equivalents and pretty much easy to manage. Improved security: As unnecessary code is not deployed, the attack surface is drastically reduced. Fine-grained optimization: Unikernels are constructed using compile tool chain and are optimized for device drivers and application logic to be used. Unikernels matches very well with the micro-services architecture as both source code and generated binaries can be easily version-controlled and are compact enough to be rebuild. Whereas on other side modifying VM's is not permitted and changes can be only made to source code which is time-consuming and hectic. For example, if the application doesn't require disk access and display facility. Unikernels can help to remove this unnecessary device drivers and display functionality from the Kernel. Thus production system becomes minimalistic only packaging the application code, runtime environment and OS facilities which is the basic concept of immutable application deployment where new image is constructed if any application change is required in production servers: Figure 7: Transition from traditional container to Unikernel based containers Container and Unikernels are best fit for each other. Recently, Unikernel system has become part of Docker and the collaboration of both this technology will be seen sooner in the next Docker release. As it is explained in the preceding diagram the first one shows the traditional way of packaging one VM supporting multiple Docker containers. The next step shows 1:1 map (one container per VM) which allows each application to be self-contained and gives better resource usage but creating a separate VM for each container adds an overhead. In the last step we can see the collaboration of Unikernels with the current existing Docker tools and eco-system, where container will get the Kernel low-library environment specific to its need. Adoption of Unikernels in Docker toolchain will accelerate the progress of Unikernels and it will be widely used and will be understood as packaging model and runtime framework making Unikernels as another type of container. After the Unikernels abstraction for Docker developers, we will be able to choose either to use traditional Docker container or use the Unikernel container in order to create the production environment. Summary In this article we studied about the basic containerization concept with help of application and OS-based containers. And the differences between them explained in this article will clearly help the developers to choose the containerization approach which fits perfectly for their system. We have thrown some light around the Docker technology, its advantages and lifecycle of Docker container. The eight Docker design patterns explained in this article clearly shows the way to implement Docker containers in production environment. Resources for Article: Further resources on this subject: Orchestration with Docker Swarm [article] Benefits and Components of Docker [article] Docker Hosts [article]
Read more
  • 0
  • 1
  • 12467

article-image-welcome-new-world
Packt
23 Jan 2017
8 min read
Save for later

Welcome to the New World

Packt
23 Jan 2017
8 min read
We live in very exciting times. Technology is changing at a pace so rapid, that it is becoming near impossible to keep up with these new frontiers as they arrive. And they seem to arrive on a daily basis now. Moore's Law continues to stand, meaning that technology is getting smaller and more powerful at a constant rate. As I said, very exciting. In this article by Jason Odom, the author of the book HoloLens Beginner's Guide, we will be discussing about one of these new emerging technologies that finally is reaching a place more material than science fiction stories, is Augmented or Mixed Reality. Imagine the world where our communication and entertainment devices are worn, and the digital tools we use, as well as the games we play, are holographic projections in the world around us. These holograms know how to interact with our world and change to fit our needs. Microsoft has to lead the charge by releasing such a device... the HoloLens. (For more resources related to this topic, see here.) The Microsoft HoloLens changes the paradigm of what we know as personal computing. We can now have our Word window up on the wall (this is how I am typing right now), we can have research material floating around it, we can have our communication tools like Gmail and Skype in the area as well. We are finally no longer trapped to a virtual desktop, on a screen, sitting on a physical desktop. We aren't even trapped by the confines of a room anymore. What exactly is the HoloLens? The HoloLens is a first of its kind, head-worn standalone computer with a sensor array which includes microphones and multiple types of cameras, spatial sound speaker array, a light projector, and an optical waveguide. The HoloLens is not only a wearable computer; it is also a complete replacement for the standard two-dimensional display. HoloLens has the capability of using holographic projection to create multiple screens throughout and environment as well as fully 3D- rendered objects as well. With the HoloLens sensor array these holograms can fully interact with the environment you are in. The sensor array allows the HoloLens to see the world around it, to see input from the user's hands, as well as for it to hear voice commands. While Microsoft has been very quiet about what the entire sensor array includes we have a good general idea about the components used in the sensor array, let's have a look at them: One IMU: The Inertia Measurement Unit (IMU) is a sensor array that includes, an Accelerometer, a Gyroscope, and a Magnetometer. This unit handles head orientation tracking and compensates for drift that comes from the Gyroscopes eventual lack of precision. Four environment understanding sensors: These together form the spatial mapping that the HoloLens uses to create a mesh of the world around the user. One depth camera:Also known as a structured--light 3D scanner. This device is used for measuring the three-dimensional shape of an object using projected light patterns and a camera system. Microsoft first used this type of camera inside the Kinect for the Xbox 360 and Xbox One.  One ambient light sensor:Ambient light sensors or photosensors are used for ambient light sensing as well as proximity detection. 2 MP photo/HD video camera:For taking pictures and video. Four-microphone array: These do a great job of listening to the user and not the sounds around them. Voice is one of the primary input types with HoloLens. Putting all of these elements together forms a Holographic computer that allows the user to see, hear and interact with the world around in new and unique ways. What you need to develop for the HoloLens The HoloLens development environment breaks down to two primary tools, Unity and Visual Studio. Unity is the 3D environment that we will do most of our work in. This includes adding holograms, creating user interface elements, adding sound, particle systems and other things that bring a 3D program to life. If Unity is the meat on the bone, Visual Studio is a skeleton. Here we write scripts or machine code to make our 3D creations come to life and add a level of control and immersion that Unity can not produce on its own. Unity Unity is a software framework designed to speed up the creation of games and 3D based software. Generally speaking, Unity is known as a game engine, but as the holographic world becomes more apparently, the more we will use such a development environment for many different kinds of applications. Unity is an application that allows us to take 3D models, 2D graphics, particle systems, and sound to make them interact with each other and our user. Many elements are drag and drop, plug and play, what you see is what you get. This can simplify the iteration and testing process. As developers, we most likely do not want to build and compile forever little change we make in the development process. This allows us to see the changes in context to make sure they work, then once we hit a group of changes we can test on the HoloLens ourselves. This does not work for every aspect of HoloLens--Unity development but it does work for a good 80% - 90%. Visual Studio community Microsoft Visual Studio Community is a great free Integrated Development Environment (IDE). Here we use programming languages such as C# or JavaScript to code change in the behavior of objects, and generally, make things happen inside of our programs. HoloToolkit - Unity The HoloToolkit--Unity is a repository of samples, scripts, and components to help speed up the process of development. This covers a large selection of areas in HoloLens Development such as: Input:Gaze, gesture, and voice are the primary ways in which we interact with the HoloLens Sharing:The sharing repository helps allow users to share holographic spaces and connect to each other via the network. Spatial Mapping:This is how the HoloLens sees our world. A large 3D mesh of our space is generated and give our holograms something to interact with or bounce off of. Spatial Sound:The speaker array inside the HoloLens does an amazing work of giving the illusion of space. Objects behind us sound like they are behind us. HoloLens emulator The HoloLens emulator is an extension to Visual Studio that will simulate how a program will run on the HoloLens. This is great for those who want to get started with HoloLens development but do not have an actual HoloLens yet. This software does require the use of Microsoft Hyper-V , a feature only available inside of the Windows 10 Pro operating system. Hyper-V is a virtualization environment, which allows the creation of a virtual machine. This virtual machine emulates the specific hardware so one can test without the actual hardware. Visual Studio tools for Unity This collection of tools adds IntelliSense and debugging features to Visual Studio. If you use Visual Studio and Unity this is a must have: IntelliSense:An intelligent code completion tool for Microsoft Visual Studio. This is designed to speed up many processes when writing code. The version that comes with Visual Studios tools for Unity has unity specific updates. Debugging:Up to the point that this extension exists debugging Unity apps proved to be a little tedious. With this tool, we can now debug Unity applications inside Visual Studio speeding of the bug squashing process considerably. Other useful tools Following mentioned are some the useful tools that are required: Image editor: Photoshop or Gimp are both good examples of programs that allow us to create 2D UI elements and textures for objects in our apps. 3D Modeling Software: 3D Studio Max, Maya, and Blender are all programs that allow us to make 3D objects that can be imported in Unity. Sound Editing Software: There are a few resources for free sounds out of the web with that in mind, Sound Forge is a great tool for editing those sounds, layering sounds together to create new sounds. Summary In this article, we have gotten to know a little bit about the HoloLens, so we can begin our journey into this new world. Here the only limitations are our imaginations. Resources for Article: Further resources on this subject: Creating a Supercomputer [article] Building Voice Technology on IoT Projects [article] C++, SFML, Visual Studio, and Starting the first game [article]
Read more
  • 0
  • 0
  • 26530
article-image-storage-apache-cassandra
Packt
23 Jan 2017
42 min read
Save for later

The Storage - Apache Cassandra

Packt
23 Jan 2017
42 min read
In this article by Raúl Estrada, the author of the book Fast Data Processing Systems with SMACK Stack we will learn about Apache Cassandra. We have reached the part where we talk about storage. The C in the SMACK stack refers to Cassandra. The reader may wonder; why not use a conventional database? The answer is that Cassandra is the database that propels some giants like Walmart, CERN, Cisco, Facebook, Netflix, and Twitter. Spark uses a lot of Cassandra’s power. The application efficiency is greatly increased using the Spark Cassandra Connector. This article has the following sections: A bit of history NoSQL Apache Cassandra installation Authentication and authorization (roles) Backup and recovery Spark +a connector (For more resources related to this topic, see here.) A bit of history In Greek mythology, there was a priestess who was chastised for her treason againstthe God, Apollo. She asked forthe power of prophecy in exchange for a carnal meeting; however, she failed to fulfill her part of the deal. So, she received a punishment; she would have the power of prophecy, but no one would ever believe her forecasts. This priestess’s name was Cassandra. Movingto more recenttimes, let’s say 50 years ago, in the world of computing there have been big changes. In 1960, the HDD (Hard Disk Drive) took precedence over the magnetic strips which facilitate data handling. In 1966, IBM created the Information Management System (IMS) for the Apollo space program from whose hierarchical models later developed IBM DB2. In 1970s, a model that is fundamentally changing the existing data storage methods appeared, called the relational data model. Devised by Codd as an alternative to IBM’s IMS and its organization mode and data storage in 1985, his work presented 12 rules that a database should meet in order to be considered a relational database. The Web (especially social networks) appeared and demanded the storage oflarge amounts of data. The Relational Database Management System (RDBMS) scales the actual costs of databases, the number of users, amount of data, response time, or the time it takes to make a specific query on a database. In the beginning, it waspossible to solve through vertical scaling: the server machine is upgraded with more RAM, higher processors, and larger and faster HDDs. Now we can mitigate the problem, but it will not disappear. When the same problem occurs again, and the server cannot be upgraded, the only solution is to add a new server, which itself may hide unplanned costs: OS license, Database Management System (DBMS), and so on, without mentioning the data replication, transactions, and data consistency under normal use. One solution of such problems is the use of NoSQL databases. NoSQL was born from the need to process large amounts of data based on large hardware platforms built through clustering servers. The term NoSQL is perhaps not precise. A more appropriate term should be Not Only SQL. It is used on several non-relational databases such as Apache Cassandra, MongoDB, Riak, Neo4J, and so on, which have becomemore widespread in recent years. NoSQL We will read NoSQL as Not only SQL (SQL, Structured Query Language). NoSQL is a distributed database with an emphasis on scalability, high availability, and ease of administration; the opposite of established relational databases. Don’t think it as a direct replacement for RDBMS, rather, an alternative or a complement. The focus is in avoiding unnecessary complexity, the solution for data storage according to today’s needs, and without a fixed scheme. Due its distributed, the cloud computing is a great NoSQL sponsor. A NoSQL database model can be: Key-value/Tuple based For example, Redis, Oracle NoSQL (ACID compliant), Riak, Tokyo Cabinet / Tyrant, Voldemort, Amazon Dynamo, and Memcached and is used by Linked-In, Amazon, BestBuy, Github, and AOL. Wide Row/Column-oriented-based For example, Google BigTable, Apache Cassandra, Hbase/Hypertable, and Amazon SimpleDB and used by Amazon, Google, Facebook, and RealNetworks Document-based For example, CouchDB (ACID compliant), MongoDB, TerraStore, and Lotus Notes (possibly the oldest) and used in various financial and other relevant institutions: the US army, SAP, MTV, and SourceForge Object-based For example, db4o, Versant, Objectivity, and NEO and used by Siemens, China Telecom, and the European Space Agency. Graph-based For example, Neo4J, InfiniteGraph, VertexDb, and FlocDb and used by Twitter, Nortler, Ericson, Qualcomm, and Siemens. XML, multivalue, and others In Table 4-1, we have a comparison ofthe mentioned data models: Model Performance Scalability Flexibility Complexity Functionality key-value high high high low depends column high high high low depends document high high high low depends graph depends depends high high graph theory RDBMS depends depends low moderate relational algebra Table 4-1: Categorization and comparison NoSQL data model of Scofield and Popescu NoSQL or SQL? This is thewrong question. It would be better to ask the question: What do we need? Basically, it all depends on the application’s needs. Nothing is black and white. If consistency is essential, use RDBMS. If we need high-availability, fault tolerance, and scalability then use NoSQL. The recommendation is that in a new project, evaluate the best of each world. It doesn’t make sense to force NoSQL where it doesn’t fit, because its benefits (scalability, read/write speed in entire order of magnitude, soft data model) are only conditioned advantages achieved in a set of problems that can be solved, per se. It is necessary to carefully weigh, beyond marketing, what exactly is needed, what kind of strategy is needed, and how they will be applied to solve our problem. Consider using a NoSQL database only when you decide that this is a better solution than SQL. The challenges for NoSQL databases are: elastic scaling, cost-effective, simple and flexible. In table 4-2, we compare the two models: NoSQL RDBMS Schema-less Relational schema Scalable read/write Scalable read Auto high availability Custom high availability Limited queries Flexible queries Eventual consistency Consistency BASE ACID Table 4-2: Comparison of NoSQL and RDBMS CAP Brewer’s theorem In 2000, in Portland Oregon, the United States held the nineteenth international symposium on principles of distributed computing where keynote speaker Eric Brewer, a professor at UC Berkeley talked. In his presentation, among other things, he said that there are three basic system requirements which have a special relationship when making the design and implementation of applications in a distributed environment, and that a distributed system can have a maximum of two of the three properties (which is the basis of his theorem). The three properties are: Consistency: This property says that the data on one node must be the same data when read from a second node, the second node must show exactly the same data (could be a delay, if someone else in between is performing an update, but not different). Availability: This property says that a failure on one node doesn’t mean the loss of its data; the system must be able to display the requested data. Partition tolerance: This property says that in the event of a breakdown in communication between two nodes, the system should still work, meaning the data will still be available. In Figure 4-1, we show the CAP Brewer’s theorem with some examples.   Figure 4-1 CAP Brewer’s theorem Apache Cassandra installation In the Facebook laboratories, although not visible to the public, new software is developed, for example, the junction between two concepts involving the development departments of Google and Amazon. In short, Cassandra is defined as a distributed database. Since the beginning, the authors took the task of creating a scalable database massively decentralized, optimized for read operations when possible, painlessly modifying data structures, and with all this, not difficult to manage. The solution was found by combining two existing technologies: Google’s BigTable and Amazon’s Dynamo.One of the two authors, A. Lakshman, had earlier worked on BigTable and he borrowed the data model layout, while Dynamo contributed with the overall distributed architecture. Cassandra is written in Java and for good performance it requires the latest possible JDK version. In Cassandra 1.0, they used another open source project Thriftfor client access, which also came from Facebook and is currently an Apache Software project. In Cassandra 2.0, Thrift was removed in favor of CQL. Initially, thrift was not made just for Cassandra, but it is a software library tool and code generator for accessing backend services. Cassandra administration is done with the command-line tools or via the JMX console, the default installation allows us to use additional client tools. Since this is a server cluster, it hasdifferent administration rules and it is always good to review thedocumentation to take advantage of other people’s experiences. Cassandra managed the very demanding taskssuccessfully. Often used on site, serving a huge number of users (such as Twitter, Digg, Facebook, and Cisco) that, relatively, often change their complex data models to meet the challenges that will come later, and usually do not have to dealwith expensive hardware or licenses. At the time of writing, the Cassandra homepage (http://cassandra.apache.org) says that Apple Inc. for example, has a 75000 node cluster storing 10 Petabytes. Data model The storage model of Cassandra could be seen as a sorted HashMap of sorted HashMaps. Cassandra is a database that stores the rows in the form of key-value. In this model, the number of columns is not predefined in advance as in standard relational databases, but a single row can contain several columns. The column (Figure 4-2, Column) is the smallest atomic unit model. Each element in the column consists of a triplet: a name, a value (stored as a series of bytes without regard to the source type), and a timestamp (the time used to determine the most recent record). Figure4-2: Column All data triplets are obtained from the client, and even a timestamp. Thus, the row consists of a key and a set of data triplets (Figure 4-3).Here is how the super column will look: Figure 4-3: Super column In addition, the columns can be grouped into so-called column families (Figure 4-4, Column family), which would be somehow equivalent to the table and can be indexed: Figure 4-4: Column family A higher logical unit is the super column (as shown in the followingFigure 4-5, Super column family), in which columns contain other columns: Figure 4-5: Super column family Above all is the key space (As shown in Figure 4-6, Cluster with Key Spaces), which would be equivalent to a relational schema andis typically used by one application. The data model is simple, but at the same time very flexible and it takes some time to become accustomed to the new way of thinking while rejecting all the SQL’s syntax luxury. The replication factor is unique per keyspace. Moreover, keyspace could span multiple clusters and have different replication factors for each of them. This is used in geo-distributed deployments. Figure 4-6: Cluster with key spaces Data storage Apache Cassandra is designed to process large amounts of data in a short time; this way of storing data is taken from her big brother, Google’s Bigtable. Cassandra has a commit log file in which all the new data is recorded in order to ensure their sustainability. When data is successfully written on the commit log file, the recording of the freshest data is stored in a memory structure called memtable (Cassandra considers a writing failure if the same information is in the commit log and in memtable). Data within memtables issorted by Row key. When memtable is full, its contents are copied to the hard drive in a structure called Sorted String Table (SSTable). The process of copying content from memtable into SSTable is called flush. Data flush is performed periodically, although it could be carried out manually (for example, before restarting a node) through node tool flush commands. The SSTable provides a fixed, sorted map of row and value keys. Data entered in one SSTable cannot be changed, but is possible to enter new data. The internal structure of SSTable consists of a series of blocks of 64Kb (the block size can be changed), internally a SSTable is a block index used to locate blocks. One data row is usually stored within several SSTables so reading a single data row is performed in the background combining SSTables and the memtable (which have not yet made flush). In order to optimize the process of connecting, Cassandra uses a memory structure called Bloomfilter. Every SSTable has a bloom filter that checks if the requested row key is in the SSTable before look up in the disk. In order to reduce row fragmentation through several SSTables, in the background Cassandra performs another process: the compaction, a merge of several SSTables into a single SSTable. Fragmented data iscombined based on the values ​​of a row key. After creating a new SSTable, the old SSTable islabeled as outdated and marked in the garbage collector process for deletion. Compaction has different strategies: size-tiered compaction and leveled compaction and both have their own benefits for different scenarios. Installation To install Cassandra, go to http://www.planetcassandra.org/cassandra/. Installation is simple. After downloading the compressed files, extract them and change a couple of settings in the configuration files (set the new directory path). Run the startup scripts to activate a single node, and the database server. Of course, it is possible to use Cassandra in only one node, but we lose its main power, the distribution. The process of adding new servers to the cluster is called bootstrap and is generally not a difficult operation. Once all the servers are active, they form a ring of nodes, none of which is central meaning without a main server. Within the ring, the information propagation on all servers is performed through a gossip protocol. In short, one node transmits information about the new instances to only some of their known colleagues, and if one of them already knows from other sources about the new node, the first node propagation is stopped. Thus, the information about the node is propagated in an efficient and rapid way through the network. It is necessary for a new node activation to seed its information to at least one existing server in the cluster so the gossip protocol works. The server receives its numeric identifier, and each of the ring nodes stores its data. Which nodes store the information depends on the hash MD5 key-value (a combination of key-value) as shown in Figure 4-7, Nodes within a cluster. Figure 4-7: Nodes within a cluster The nodes are in a circular stack, that is, a ring, and each record is stored on multiple nodes. In case of failure of one of them, the data isstill available. Nodes are occupied according to their identifier integer range, that is, if the calculated value falls into a node range, then the data is saved there. Saving is not performed on only one node, more is better, an operation is considered a success if the data is correctly stored at the most possible nodes. All this is parameterized. In this way, Cassandra achieves sufficient data consistency and provides greater robustness of the entire system, if one node in the ring fails, is always possible to retrieve valid information from the other nodes. In the event that a node comes back online again, it is necessary to synchronize the data on it, which is achieved through the reading operation. The data is read from all the ring servers, a node saves just the data accepted as valid, that is, the most recent data, the data comparison is made according to the timestamp records. The nodes that don’t have the latest information, refresh theirdata in a low priority back-end process. Although this brief description of the architecture makes it sound like it is full of holes, in reality everything works flawlessly. Indeed, more servers in the game implies a better general situation. DataStax OpsCenter In this section, we make the Cassandra installation on a computer with a Windows operating system (to prove that nobody is excluded). Installing software under the Apache open license can be complicated on a Windows computer, especially if it is new software, such as Cassandra. To make things simpler we will use a distribution package for easy installation, start-up and work with Cassandra on a Windows computer. The distribution used in this example is called DataStax Community Edition. DataStax contains Apache Cassandra, along with the Cassandra Query Language (CQL) tool and the free version of DataStax OpsCenter for management and monitoring the Cassandra cluster. We can say that OpsCenter is a kind of DBMS for NoSQL databases. After downloading the installer from the DataStax’s official site, the installation process is quite simple, just keep in mind that DataStax supports Windows 7 and Windows Server 2008 and that DataStax used on a Windows computer must have the Chrome or Firefox web browser (Internet explorer is not supported). When starting DataStax on a Windows computer, DataStax will open asin Figure 4-8, DataStax OpsCenter. Figure 4-8: DataStax OpsCenter DataStax consists of a control panel (dashboard), in which we review the events, performance, and capacity of the cluster and also see how many nodes belong to our cluster (in this case a single node). In cluster control, we can see the different types of views (ring, physical, list). Adding a new key space (the equivalent to creating a database in the classic DBMS) is done through the CQLShell using CQL or using the DataStax data modeling. Also, using the data explorer we can view the column family and the database. Creating a key space The main tool for managing Cassandra CQL runs in a console interface and this tool is used to add new key spaces from which we will create a column family. The key space is created as follows: cqlsh> create keyspace hr with strategy_class=‘SimpleStrategy’ and strategy_options_replication_factor=1; After opening CQL Shell, the command create keyspace will make a new key space, the strategy_class = ‘SimpleStrategy’parameter invokes class replication strategy used when creating new key spaces. Optionally,strategy_options:replication_factor = 1command creates a copy of each row in each cluster node, and the value replication_factor set to 1 produces only one copy of each row on each node (if we set to 2, we will have two copies of each row on each node). cqlsh> use hr; cqlsh:hr> create columnfamily employee (sid int primary key, ... name varchar, ... last_name varchar); There are two types of keyspaces: SimpleStrategy and NetworkTopologyStrategy, whose syntax is as follows: { ‘class’ : ‘SimpleStrategy’, ‘replication_factor’ : <integer> }; { ‘class’ : ‘NetworkTopologyStrategy’[, ‘<data center>‘ : <integer>, ‘<data center>‘ : <integer>] . . . }; When NetworkTopologyStrategyis configured as the replication strategy, we set up one or more virtual data centers. To create a new column family, we use the create command; select the desired Key Space, and with the command create columnfamily example, we create a new table in which we define the id an integer as a primary key and other attributes like name and lastname. To make a data entry in column family, we use the insert command: insert into <table name> (<attribute_1>, < attribute_2> ... < attribute_n>); When filling data tables we use the common SQL syntax: cqlsh:hr>insert into employee (sid, name, lastname) values (1, ‘Raul’, ‘Estrada’); So we enter data values. With the selectcommand we can review our insert: cqlsh:hr> select * from employee; sid | name | last_name ----+------+------------ 1 | Raul | Estrada Authentication and authorization (roles) In Cassandra, the authentication and authorization must be configured on the cassandra.yamlfile and two additional files. The first file is to assign rights to users over the key space and column family, while the second is to assign passwords to users. These files are called access.properties and passwd.properties, and are located in the Cassandra installation directory. These files can be opened using our favorite text editor in order to be successfully configured. Setting up a simple authentication and authorization The following steps are: In the access.properitesfile we add the access rights to users and the permissions to read and write certain key spaces and columnfamily.Syntax: keyspace.columnfamily.permits = users Example 1: hr <rw> = restrada Example 2: hr.cars <ro> = restrada, raparicio In example 1, we give full rights in the Key Space hr to restrada while in example 2 we give read-only rights to users to the column family cars. In the passwd.propertiesfile, user names are matched to passwords, onthe left side of the equal sign we write username and onthe right side the password: Example: restrada = Swordfish01 After we change the files, before restarting Cassandra it is necessary to type the following command in the terminal in order to reflect the changes in the database: $ cd <installation_directory> $ sh bin/cassandra -f -Dpasswd.properties = conf/passwd.properties -Daccess.properties = conf/access.properties Note: The third step of setting up authentication and authorization doesn’t work onWindows computers and is just needed on Linux distributions. Also, note that user authentication and authorization should not be solved through Cassandra, for safety reasons, in the latest Cassandra versions this function is not included. Backup The purpose of making Cassandra a NoSQL database is because when we create a single node, we make a copy of it. Copying the database to other nodes and the exact number of copies depend on the replication factor established when we create a new key space. But as any other standard SQL database, Cassandra offers to create a backup on the local computer. Cassandra creates a copy of the base using snapshot. It is possible to make a snapshot of all the key spaces, or just one column family. It is also possible to make a snapshot of the entire cluster using the parallel SSH tool (pssh). If the user decides to snapshot the entire cluster, it can be reinitiated and use an incremental backup on each node. Incremental backups provide a way to get each node configured separately, through setting the incremental_backupsflagto truein cassandra.yaml. When incremental backups are enabled, Cassandra hard-links each flushed SSTable to a backups directory under the keyspace data directory. This allows storing backups offsite without transferring entire snapshots. To snapshot a key space we use the nodetool command: Syntax: nodetool snapshot -cf <ColumnFamily><keypace> -t <snapshot_name> Example: nodetool snapshot -cf cars hr snapshot1 The snapshot is stored in the Cassandra installation directory: C:Program FilesDataStax Communitydatadataenexamplesnapshots Compression The compression increases the cluster nodes capacity reducing the data size on the disk. With this function, compression also enhances the server’s disk performance. Compression in Cassandra works better when compressing a column family with a lot of columns, when each row has the same columns, or when we have a lot of common columns with the same data. A good example of this is a column family that contains user information such as user name and password because it is possible that they have the same data repeated. As the greater number of the same data to be extended through the rows, the compression ratio higher is. Column family compression is made with the Cassandra-CLI tool. It is possible to update existing columns families or create a new column family with specific compression conditions, for example, the compression shown here: CREATE COLUMN FAMILY users WITH comparator = ‘UTF8Type’ AND key_validation_class = ‘UTF8Type’ AND column_metadata = [ (column_name: name, validation_class: UTF8Type) (column_name: email, validation_class: UTF8Type) (column_name: country, validation_class: UTF8Type) (column_name: birth_date, validation_class: LongType) ] AND compression_options=(sstable_compression:SnappyCompressor, chunk_length_kb:64); We will see this output: Waiting for schema agreement.... ... schemas agree across the cluster After opening the Cassandra-CLI, we need to choose thekey space where the new column family would be. When creating a column family, it is necessary to state that the comparator (UTF8 type) and key_validation_class are of the same type. With this we will ensure that when executing the command we won’t have an exception (generated by a bug). After printing the column names, we set compression_options which has two possible classes: SnappyCompresor that provides faster data compression or DeflateCompresor which provides a higher compression ratio. The chunk_length adjusts compression size in kilobytes. Recovery Recovering a key space snapshot requests all the snapshots made for a certain column family. If you use an incremental backup, it is also necessary to provide the incremental backups created after the snapshot. There are multiple ways to perform a recovery from the snapshot. We can use the SSTable loader tool (used exclusively on the Linux distribution) or can recreate the installation method. Restart node If the recovery is running on one node, we must first shutdown the node. If the recovery is for the entire cluster, it is necessary to restart each node in the cluster. Here is the procedure: Shut down the node Delete all the log files in:C:Program FilesDataStax Communitylogs Delete all .db files within a specified key space and column family:C:Program FilesDataStax Communitydatadataencars Locate all Snapshots related to the column family:C:Program FilesDataStax Communitydatadataencarssnapshots1,351,279,613,842, Copy them to: C:Program FilesDataStax Communitydatadataencars Re-start the node. Printing schema Through DataStax OpsCenter or Apache Cassandra CLI we can obtain the schemes (Key Spaces) with the associated column families, but there is no way to make a data export or print it. Apache Cassandra is not RDBMS and it is not possible to obtain a relational model scheme from the key space database. Logs Apache Cassandra and DataStax OpsCenter both use the Apache log4j logging service API. In the directory where DataStax is installed, under Apache-Cassandra and opsCenter is the conf directory where the file log4j-server.properties is located, log4j-tools.properties for apache-cassandra andlog4j.properties for OpsCenter. The parameters of the log4j file can be modified using a text editor, log files are stored in plain text in the...DataStax Communitylogsdirectory, here it is possible to change the directory location to store the log files. Configuring log4j log4j configuration files are divided into several parts where all the parameters are set to specify how collected data is processed and written in the log files. For RootLoger: # RootLoger level log4j.rootLogger = INFO, stdout, R This section defines the data level, respectively, to all the events recorded in the log file. As we can see in Table 4-3, log level can be: Level Record ALL The lowest level, all the events are recorded in the log file DEBUG Detailed information about events ERROR Information about runtime errors or unexpected events FATAL Critical error information INFO Information about the state of the system OFF The highest level, the log file record is off TRACE Detailed debug information WARN Information about potential adverse events (unwanted/unexpected runtime errors) Table 4-3 Log4J Log level For Standard out stdout: # stdout log4j.appender.stdout = org.apache.log4j.ConsoleAppender log4j.appender.stdout.layout = org.apache.log4j.PatternLayout log4j.appender.stdout.layout.ConversionPattern= %5p %d{HH:mm:ss,SSS} %m%n Through the StandardOutputWriterclass,we define the appearance of the data in the log file. ConsoleAppenderclass is used for entry data in the log file, and theConversionPattern class defines the data appearance written into a log file. In the diagram, we can see how the data looks like stored in a log file, which isdefined by the previous configuration. Log file rotation In this example, we rotate the log when it reaches 20 Mb and we retain just 50 log files. # rolling log file log4j.appender.R=org.apache.log4j.RollingFileAppender log4j.appender.R.maxFileSize=20MB log4j.appender.R.maxBackupIndex=50 log4j.appender.R.layout=org.apache.log4j.PatternLayout log4j.appender.R.layout.ConversionPattern=%5p [%t] %d{ISO8601} %F (line %L) %m%n This part sets the log files. TheRollingFileAppenderclass inherits from FileAppender, and its role is to make a log file backup when it reaches a given size (in this case 20 MB). TheRollingFileAppender class has several methods, these two are the most used: public void setMaxFileSize( String value ) Method to define the file size and can take a value from 0 to 263 using the abbreviations KB, MB, GB.The integer value is automatically converted (in the example, the file size is limited to 20 MB): public void setMaxBackupIndex( int maxBackups ) Method that defines how the backup file is stored before the oldest log file is deleted (in this case retain 50 log files). To set the parameters of the location where the log files will be stored, use: # Edit the next line to point to your logs directory log4j.appender.R.File=C:/Program Files (x86)/DataStax Community/logs/cassandra.log User activity log log4j API has the ability to store user activity logs.In production, it is not recommended to use DEBUG or TRACE log level. Transaction log As mentioned earlier, any new data is stored in the commit log file. Within thecassandra.yaml configuration file, we can set the location where the commit log files will be stored: # commit log commitlog_directory: “C:/Program Files (x86)/DataStax Community/data/commitlog” SQL dump It is not possible to make a database SQL dump, onlysnapshot the DB. CQL CQL is a language like SQL, CQL means Cassandra Query Language.With this language we make the queries on a Key Space. There are several ways to interact with a Key Space, in the previous section we show how to do it using a shell called CQL shell. Since CQL is the first way to interact with Cassandra, in Table 4-4, Shell Command Summary, we see the main commands that can be used on the CQL Shell: Command Description Cqlsh Captures command output and appends it to a file. CAPTURE Shows the current consistency level, or given a level, sets it. CONSISTENCY Imports and exports CSV (comma-separated values) data to and from Cassandra. COPY Provides information about the connected Cassandra cluster, or about the data objects stored in the cluster. DESCRIBE Formats the output of a query vertically. EXPAND Terminates cqlsh. EXIT Enables or disables query paging. PAGING Shows the Cassandra version, host, or tracing information for the current cqlsh client session. SHOW Executes a file containing CQL statements. SOURCE Enables or disables request tracing. TRACING Captures command output and appends it to a file. Table 4-4. Shell command summary For more detailed information of shell commands, visit: http://docs.datastax.com/en/cql/3.1/cql/cql_reference/cqlshCommandsTOC.html CQL commands CQL is very similar to SQLas we have already seen in this article. Table 4-5, CQL Command Summary lists the language commands. CQL, like SQL, is based on sentences/statements.These sentences are for data manipulation and work with their logical container, the key space. The same as SQL statements, they must end with a semicolon (;) Command Description ALTER KEYSPACE Change property values of a keyspace. ALTER TABLE Modify the column metadata of a table. ALTER TYPE Modify a user-defined type. Cassandra 2.1 and later. ALTER USER Alter existing user options. BATCH Write multiple DML statements. CREATE INDEX Define a new index on a single column of a table. CREATE KEYSPACE Define a new keyspace and its replica placement strategy. CREATE TABLE Define a new table. CREATE TRIGGER Registers a trigger on a table. CREATE TYPE Create a user-defined type. Cassandra 2.1 and later. CREATE USER Create a new user. DELETE Removes entire rows or one or more columns from one or more rows. DESCRIBE Provides information about the connected Cassandra cluster, or about the data objects stored in the cluster. DROP INDEX Drop the named index. DROP KEYSPACE Remove the keyspace. DROP TABLE Remove the named table. DROP TRIGGER Removes registration of a trigger. DROP TYPE Drop a user-defined type. Cassandra 2.1 and later. DROP USER Remove a user. GRANT Provide access to database objects. INSERT Add or update columns. LIST PERMISSIONS List permissions granted to a user. LIST USERS List existing users and their superuser status. REVOKE Revoke user permissions. SELECT Retrieve data from a Cassandra table. TRUNCATE Remove all data from a table. UPDATE Update columns in a row. USE Connect the client session to a keyspace. Table 4-5. CQL command summary For more detailed information of CQL commands visit: http://docs.datastax.com/en/cql/3.1/cql/cql_reference/cqlCommandsTOC.html DBMS Cluster The idea of ​​Cassandra is a database working in a cluster, that is databases on multiple nodes. Although primarily intended for Cassandra Linux distributions is building clusters on Linux servers, Cassandra offers the possibility to build clusters on Windows computers. The first task that must be done prior to setting up the cluster on Windows computers is opening the firewall for Cassandra DBMS DataStax OpsCenter. Ports that must be open for Cassandra are 7000 and 9160. For OpsCenter, the ports are 7199, 8888, 61620 and 61621. These ports are the default when we install Cassandra and OpsCenter, however, unless it is necessary, we can specify new ports. Immediately after installing Cassandra and OpsCenter on a Windows computer, it is necessary to stop the DataStax OpsCenter service, the DataStax OpsCenter agent like in Figure 4-9,Microsoft Windows display services. Figure 4-9: Microsoft Windows display services One of Cassandra’s advantages is that it automatically distributes data in the computers of the cluster using the algorithm for the incoming data. To successfully perform this, it is necessary to assign tokens to each computer in the cluster. The token is a numeric identifier that indicates the computer’s position in the cluster and the data scope in the cluster responsible for that computer. For a successful token generation can be used Python that comes within the Cassandra installation located in the DataStax’s installation directory. In the code for generating tokens, the variable num = 2 refers to the number of computers in the cluster: $ python -c “num=2; print ““n”“.join([(““token %d: %d”“ %(i,(i*(2**127)/num))) for i in range(0,num)])” We will see an output like this: token 0: 0 token 1: 88743298547982745894789547895490438209 It is necessary to preserve the value of the token because they will be required in the following steps. We now need to configure the cassandra.yaml file which we have already met in the authentication and authorization section. The cassandra.yaml file must be configured separately on each computer in the cluster. After opening the file, you need to make the following changes: Initial_token On each computer in the cluster, copy the tokens generated. It should start from the token 0 and assign each computer a unique token. Listen_adress In this section, we will enter the IP of the computer used. Seeds You need to enter the IP address of the primary (main) node in the cluster. Once the file is modified and saved, you must restart DataStax Community Server as we already saw. This should be done only on the primary node. After that it is possible to check if the cluster nodes have communication using the node tool. In node tool, enter the following command: nodetool -h localhost ring If the cluster works, we will see the following result: AddressDCRackStatusStateLeadOwnsToken -datacenter1rack1UpNormal13.41 Kb50.0%88743298547982745894789547895490438209 -datacenter1rack1UpNormal6.68 Kb50.0%88743298547982745894789547895490438209 If the cluster is operating normally,select which computer will be the primary OpsCenter (may not be the primary node). Then on that computer open opscenter.conf which can be found in the DataStax’s installation directory. In that directory, you need to find the webserver interface section and set the parameter to the value 0.0.0.0. After that, in the agent section, change the incoming_interfaceparameter to your computer IP address. In DataStax’s installation directory (on each computer in the cluster) we must configure the address.yamlfile. Within these files, set the stomp_interface local_interfaceparameters and to the IP address of the computer where the file is configured. Now the primary computer should run the DataStax OpsCenter Community and DataStax OpsCenter agent services. After that, runcomputers the DataStax OpsCentar agent service on all the nodes. At this point it is possible to open DataStax OpsCenter with anInternet browser and OpsCenter should look like Figure 4-10, Display cluster in OpsCenter. Figure 4-10: Display cluster in OpsCenter Deleting the database In Apache Cassandra, there are several ways to delete the database (key space) or parts of the database (column family, individual rows within the family row, and so on). Although the easiest way to make a deletion is using the DataStax OpsCenter data modeling tool, there are commands that can be executed through the Cassandra-CLI or the CQL shell. CLI delete commands InTable 4-6, we have the CLI delete commands: CLI Command Function part Used to delete a great column, a column from the column family or rows within certain columns drop columnfamily Delete column family and all data contained on them drop keyspace Delete the key space, all the column families and the data contained on them. truncate Delete all the data from the selected column family Table 4-6 CLI delete commands CQL shell delete commands  In Table 4-7, we have the shell delete commands: CQL shell command Function alter_drop Delete specified column from the column family delete Delete one or more columns from one or more rows of the selected column family delete_columns Delete columns from the column family delete_where Delete individual rows drop_table Delete the selected column family and all the data contained on it drop_columnfamily Delete column family and all the data contained on it drop_keyspace Delete the key space, all the column families and all the data contained on them. truncate Delete all data from the selected column family. Table 4-7 CQL Shell delete commands DB and DBMS optimization Cassandra optimization is specified in the cassandra.yamlfile and these properties are used to adjust the performance and specify the use of system resources such as disk I/O, memory, and CPU usage. column_index_size_in_kb: Initial value: 64 Kb Range of values: - Column indices added to each row after the data reached the default size of 64 Kilobytes. commitlog_segment_size_in_mb Initial value: 32 Mb Range of values: 8-1024 Mb Determines the size of the commit log segment. The commit log segment is archived to be obliterated or recycled after they are transferred to the SRM table. commitlog_sync Initial value: - Range of values: - In Cassandra, this method is used for entry reception. This method is closely correlated with commitlog_sync_period_in_ms that controls how often log is synchronized with the disc. commitlog_sync_period_in_ms Initial value: 1000 ms Range of values: - Decides how often to send the commit log to disk when commit_sync is in periodic mode. commitlog_total_space_in_mb Initial value: 4096 MB Range of values: - When the size of the commit log reaches an initial value, Cassandra removes the oldest parts of the commit log. This reduces the data amount and facilitates the launch of fixtures. compaction_preheat_key_cache Initial value: true Range of values: true / false When this value is set to true, the stored key rows are monitored during compression, and after resaves it to a new location in the compressed SSTable. compaction_throughput_mb_per_sec Initial value: 16 Range of values: 0-32 Compression damping the overall bandwidth throughout the system. Faster data insertion means faster compression. concurrent_compactors Initial value: 1 per CPU core Range of values: depends on the number of CPU cores Adjusts the number of simultaneous compression processes on the node. concurrent_reads Initial value: 32 Range of values: - When there is more data than the memory can fit, a bottleneck occurs in reading data from disk. concurrent_writes Initial value: 32 Range of values: - Making inserts in Cassandra does not depend on I/O limitations. Concurrent inserts depend on the number of CPU cores. The recommended number of cores is 8. flush_largest_memtables_at Initial value: 0.75 Range of values: - This parameter clears the biggest memtable to free disk space. This parameter can be used as an emergency measure to prevent memory loss (out of memory errors) in_memory_compaction_limit_in_mb Initial value: 64 Range of values: Limit order size on the memory. Larger orders use a slower compression method. index_interval Initial value: 128 Value range: 128-512 Controlled sampling records from the first row of the index in the ratio of space and time, that is, the larger the time interval to be sampled the less effective. In technical terms, the interval corresponds to the number of index samples skipped between taking each sample. memtable_flush_queue_size Initial value: 4 Range of values: a minimum set of the maximum number of secondary indexes that make more than one Column family Indicates the total number of full-memtable to allow a flush, that is, waiting to the write thread. memtable_flush_writers Initial value: 1 (according to the data map) Range of values: - Number of memtable flush writer threads. These threads are blocked by the disk I/O, and each thread holds a memtable in memory until it is blocked. memtable_total_space_in_mb Initial value: 1/3 Java Heap Range of values: - Total amount of memory used for all the Column family memtables on the node. multithreaded_compaction Initial value: false Range of values: true/false Useful only on nodes using solid state disks reduce_cache_capacity_to Initial value: 0.6 Range of values: - Used in combination with reduce_cache_capacity_at. When Java Heap reaches the value of reduce_cache_size_at, this value is the total cache size to reduce the percentage to the declared value (in this case the size of the cache is reduced to 60%). Used to avoid unexpected out-of-memory errors. reduce_cache_size_at Initial value: 0.85 Range of values: 1.0 (disabled) When Java Heap marked to full sweep by the garbage Collector reaches a percentage stated on this variable (85%), Cassandra reduces the size of the cache to the value of the variable reduce_cache_capacity_to. stream_throughput_outbound_megabits_per_sec Initial value: off, that is, 400 Mbps (50 Mb/s) Range of values: - Regulate the stream of output file transfer in a node to a given throughput in Mbps. This is necessary because Cassandra mainly do sequential I/O when it streams data during system startup or repair, which can lead to network saturation and affect Remote Procedure Call performance. Bloom filter Every SSTable has a Bloom filter. In data requests, the Bloom filter checks whether the requested order exists in the SSTable before any disk I/O. If the value of the Bloom filter is too low, it may cause seizures of large amounts of memory, respectively, a higher Bloom filter value, means less memory use. The Bloom filter range of values ​​is from 0.000744 to 1.0. It is recommended keep the minimum value of the Bloom filter less than 0.1. The value of the Bloom filter column family is adjusted through the CQL shell as follows: ALTER TABLE <column_family> WITH bloom_filter_fp_chance = 0.01; Data cache Apache Cassandra has two caches by which it achieves highly efficient data caching. These are: cache key (default: enabled): cache index primary key columns families row cache (default: disabled): holding a row in memory so that reading can be done without using the disc If the key and row cache set, the query of data is accomplished in the way shown in Figure 4-11, Apache Cassandra Cache. Figure 4-11: Apache Cassandra cache When information is requested, first it checks in the row cache, if the information is available, then row cache returns the result without reading from the disk. If it has come from a request and the row cache can return a result, it checks if the data can be retrieved through the key cache, which is more efficient than reading from the disk, the retrieved data is finally written to the row cache. As the key cache memory stores the key location of an individual column family, any increase in key cache has a positive impact on reading data for the column family. If the situation permits, a combination of key cache and row cache increases the efficiency. It is recommended that the size of the key cache is set in relation to the size of the Java heap. Row cache is used in situations where data access patterns follow a normal (Gaussian) distribution of rows that contain often-read data and queries often returning data from the most or all the columns. Within cassandra.yaml files, we have the following options to configure the data cache: key_cache_size_in_mb Initial value: empty, meaning“Auto” (min (5% Heap (in MB), 100MB)) Range of values: blank or 0 (disabled key cache) Variable that defines the key cache size per node row_cache_size_in_mb Initial value: 0 (disabled) Range of values: - Variable that defines the row cache size per node key_cache_save_period Initial value: 14400 (i.e. 4 hours) Range of values: - Variable that defines the save frequency of key cache to disk row_cache_save_period Initial value: 0 (disabled) Range of values: - Variable that defines the save frequency of row cache to disk row_cache_provider Initial value: SerializingCacheProvider Range of values: ConcurrentLinkedHashCacheProvider or SerializingCacheProvider Variable that defines the implementation of row cache Java heap tune up Apache Cassandra interacts with the operating system using the Java virtual machine, so the Java heap size plays an important role. When starting Cassandra, the size of the Java Heap is set automatically based on the total amount of RAM (Table 4-8, Determination of the Java heap relative to the amount of RAM). The Java heap size can be manually adjusted by changing the values ​​of the following variables contained on the file cassandra-env.sh located in the directory...apache-cassandraconf. # MAX_HEAP_SIZE = “4G” # HEAP_NEWSIZE = “800M” Total system memory Java heap size < 2 Gb Half of the system memory 2 Gb - 4 Gb 1 Gb > 4 Gb One quarter of the system memory, no more than 8 Gb Table 4-8: Determination of the Java heap relative to the amount of RAM Java garbage collection tune up Apache Cassandra has a GC Inspector which is responsible for collecting information on each garbage collection process longer than 200ms. The Garbage Collection Processes that occur frequently and take a lot of time (as concurrent mark-sweep which takes several seconds) indicate that there is a great pressure on garbage collection and in the JVM. The recommendations to address these issues include: Add new nodes Reduce the cache size Adjust items related to the JVM garbage collection Views, triggers, and stored procedures By definition (In RDBMS) view represents a virtual table that acts as a real (created) table, which in reality does not contain any data. The obtained data isthe result of a SELECT query. View consists of a rows and columns combination of one or more different tables. Respectively in NoSQL, in Cassandra all data for key value rows are placed in one Column family. As in NoSQL, there is noJOIN commands and there is no possibility of flexible queries, the SELECT command lists the actual data, but there is no display options for a virtual table, that is, a view. Since Cassandra does not belong to the RDBMS group, there is no possibility of creating triggers and stored procedures. RI Restrictions can be set only in the application code Also, as Cassandra does not belong to the RDBMS group, we cannot apply Codd’s rules. Client-server architecture At this point, we have probably already noticed that Apache Cassandra runs on a client-server architecture. By definition, the client-server architecture allows distributed applications, since the tasks are divided into two main parts: On one hand, service providers: the servers. On the other hand, the service petitioners:  the clients. In this architecture, several clients are allowed to access the server; the server is responsible for meeting requests and handle each one according its own rules. So far, we have only used one client, managed from the same machine, that is, from the same data network. CQLs allows us to connect to Cassandra, access a key space, and send CQL statements to the Cassandra server. This is the most immediate method, but in daily practice, it is common to access the key spaces from different execution contexts (other systems and other programming languages). Thus, we require other clients different from CQLs, to do it in the Apache Cassandra context, we require connection drivers. Drivers A driver is just a software component that allows access to a key space to run CQL statements. Fortunately, there arealready a lot of drivers to create clients for Cassandra in almost any modern programming language, you can see an extensive list at this URL:http://wiki.apache.org/cassandra/ClientOptions. Typically, in a client-server architecture there are different clients accessing the server from different clients, which are distributed in different networks. Our implementation needs will dictate the required clients. Summary NoSQL is not just hype,or ayoung technology; it is an alternative, with known limitations and capabilities. It is not an RDBMS killer. It’s more like a younger brother who is slowly growing up and takes some of the burden. Acceptance is increasing and it will be even better as NoSQL solutions mature. Skepticism may be justified, but only for concrete reasons. Since Cassandra is an easy and free working environment, suitable for application development, it is recommended, especially with the additional utilities that ease and accelerate database administration. Cassandra has some faults (for example, user authentication and authorization are still insufficiently supportedin Windows environments) and preferably used when there is a need to store large amounts of data. For start-up companies that need to manipulate large amounts of data with the aim of costs reduction, implementing Cassandra in a Linux environment is a must-have. Resources for Article: Further resources on this subject: Getting Started with Apache Cassandra [article] Apache Cassandra: Working in Multiple Datacenter Environments [article] Apache Cassandra: Libraries and Applications [article]
Read more
  • 0
  • 0
  • 6431

article-image-get-familiar-angular
Packt
23 Jan 2017
26 min read
Save for later

Get Familiar with Angular

Packt
23 Jan 2017
26 min read
This article by Minko Gechev, the author of the book Getting Started with Angular - Second Edition, will help you understand what is required for the development of a new version of Angular from scratch and why its new features make intuitive sense for the modern Web in building high-performance, scalable, single-page applications. Some of the topics that we'll discuss are as follows: Semantic versioning and what chances to expect from Angular 2, 3, 5 and so on. How the evolution of the Web influenced the development of Angular. What lessons we learned from using Angular 1 in the wild for the last a couple of years. What TypeScript is and why it is a better choice for building scalable single-page applications than JavaScript. (For more resources related to this topic, see here.) Angular adopted semantic versioning so before going any further let's make an overview of what this actually means. Angular and semver Angular 1 was rewritten from scratch and replaced with its successor, Angular 2. A lot of us were bothered by this big step, which didn't allow us to have a smooth transition between these two versions of the framework. Right after Angular 2 got stable, Google announced that they want to follow the so called semantic versioning (also known as semver). Semver defines the version of given software project as the triple X.Y.Z, where Z is called patch version, Y is called minor version, and X is called major version. A change in the patch version means that there are no intended breaking changes between two versions of the same project but only bug fixes. The minor version of a project will be incremented when new functionality is introduced, and there are no breaking changes. Finally, the major version will be increased when in the API are introduced incompatible changes. This means that between versions 2.3.1 and 2.10.4, there are no introduced breaking changes but only a few added features and bug fixes. However, if we have version 2.10.4 and we want to change any of the already existing public APIs in a backward-incompatible manner (for instance, change the order of the parameters that a method accepts), we need to increment the major version, and reset the patch and minor versions, so we will get version 3.0.0. The Angular team also follows a strict schedule. According to it, a new patch version needs to be introduced every week; there should be three monthly minor release after each major release, and finally, one major release every six months. This means that by the end of 2018, we will have at least Angular 6. However, this doesn't mean that every six months we'll have to go through the same migration path like we did between Angular 1 and Angular 2. Not every major release will introduce breaking changes that are going to impact our projects . For instance, support for newer version of TypeScript or change of the last optional argument of a method will be considered as a breaking change. We can think of these breaking changes in a way similar to that happened between Angular 1.2 and Angular 1.3. We'll refer to Angular 2 as either Angular 2 or only Angular. If we explicitly mention Angular 2, this doesn't mean that the given paragraph will not be valid for Angular 4 or Angular 5; it most likely will. In case you're interested to know what are the changes between different versions of the framework, you can take a look at the changelog at https://github.com/angular/angular/blob/master/CHANGELOG.md. If we're discussing Angular 1, we will be more explicit by mentioning a version number, or the context will make it clear that we're talking about a particular version. Now that we introduced the Angular's semantic versioning and conventions for referring to the different versions of the framework, we can officially start our journey! The evolution of the Web - time for a new framework In the past couple of years, the Web has evolved in big steps. During the implementation of ECMAScript 5, the ECMAScript 6 standard started its development (now known as ECMAScript 2015 or ES2015). ES2015 introduced many changes in JavaScript, such as adding built-in language support for modules, block scope variable definition, and a lot of syntactical sugar, such as classes and destructuring. Meanwhile, Web Components were invented. Web Components allow us to define custom HTML elements and attach behavior to them. Since it is hard to extend the existing set of HTML elements with new ones (such as dialogs, charts, grids, and more), mostly because of the time required for consolidation and standardization of their APIs, a better solution is to allow developers to extend the existing elements the way they want. Web Components provide us with a number of benefits, including better encapsulation, the better semantics of the markup we produce, better modularity, and easier communication between developers and designers. We know that JavaScript is a single-threaded language. Initially, it was developed for simple client-side scripting, but over time, its role has shifted quite a bit. Now, with HTML5, we have different APIs that allow audio and video processing, communication with external services through a two-directional communication channel, transferring and processing big chunks of raw data, and more. All these heavy computations in the main thread may create a poor user experience. They may introduce freezing of the user interface when time-consuming computations are being performed. This led to the development of WebWorkers, which allow the execution of the scripts in the background that communicate with the main thread through message passing. This way, multithreaded programming was brought to the browser. Some of these APIs were introduced after the development of Angular 1 had begun; that's why the framework wasn't built with most of them in mind. Taking advantage of the APIs gives developers many benefits, such as the following: Significant performance improvements Development of software with better quality characteristics Now, let's briefly discuss how each of these technologies has been made part of the new Angular core and why. The evolution of ECMAScript Nowadays, browser vendors are releasing new features in short iterations, and users receive updates quite often. This helps developers take advantage of bleeding-edge Web technologies. ES2015 that is already standardized. The implementation of the latest version of the language has already started in the major browsers. Learning the new syntax and taking advantage of it will not only increase our productivity as developers but also will prepare us for the near future when all the browsers will have full support for it. This makes it essential to start using the latest syntax now. Some projects' requirements may enforce us to support older browsers, which do not support any ES2015 features. In this case, we can directly write ECMAScript 5, which has different syntax but equivalent semantics to ES2015. On the other hand, a better approach will be to take advantage of the process of transpilation. Using a transpiler in our build process allows us to take advantage of the new syntax by writing ES2015 and translating it to a target language that is supported by the browsers. Angular has been around since 2009. Back then, the frontend of most websites was powered by ECMAScript 3, the last main release of ECMAScript before ECMAScript 5. This automatically meant that the language used for the framework's implementation was ECMAScript 3. Taking advantage of the new version of the language requires porting of the entirety of Angular 1 to ES2015. From the beginning, Angular 2 took into account the current state of the Web by bringing the latest syntax in the framework. Although new Angular is written with a superset of ES2016 (TypeScript), it allows developers to use a language of their own preference. We can use ES2015, or if we prefer not to have any intermediate preprocessing of our code and simplify the build process, we can even use ECMAScript 5. Note that if we use JavaScript for our Angular applications we cannot use Ahead-of-Time (AoT) compilation. Web Components The first public draft of Web Components was published on May 22, 2012, about three years after the release of Angular 1. As mentioned, the Web Components standard allows us to create custom elements and attach behavior to them. It sounds familiar; we've already used a similar concept in the development of the user interface in Angular 1 applications. Web Components sound like an alternative to Angular directives; however, they have a more intuitive API and built-in browser support. They introduced a few other benefits, such as better encapsulation, which is very important, for example, in handling CSS-style collisions. A possible strategy for adding Web Components support in Angular 1 is to change the directives implementation and introduce primitives of the new standard in the DOM compiler. As Angular developers, we know how powerful but also complex the directives API is. It includes a lot of properties, such as postLink, preLink, compile, restrict, scope, controller, and much more, and of course, our favorite transclude. Approved as standard, Web Components will be implemented on a much lower level in the browsers, which introduces plenty of benefits, such as better performance and native API. During the implementation of Web Components, a lot of web specialists met with the same problems the Angular team did when developing the directives API and came up with similar ideas. Good design decisions behind Web Components include the content element, which deals with the infamous transclusion problem in Angular 1. Since both the directives API and Web Components solve similar problems in different ways, keeping the directives API on top of Web Components would have been redundant and added unnecessary complexity. That's why, the Angular core team decided to start from the beginning by building a framework compatible with Web Components and taking full advantage of the new standard. Web Components involve new features; some of them were not yet implemented by all browsers. In case our application is run in a browser, which does not support any of these features natively, Angular emulates them. An example for this is the content element polyfilled with the ng-content directive. WebWorkers JavaScript is known for its event loop. Usually, JavaScript programs are executed in a single thread and different events are scheduled by being pushed in a queue and processed sequentially, in the order of their arrival. However, this computational strategy is not effective when one of the scheduled events requires a lot of computational time. In such cases, the event's handling will block the main thread, and all other events will not be handled until the time-consuming computation is complete and passes the execution to the next one in the queue. A simple example of this is a mouse click that triggers an event, in which callback we do some audio processing using the HTML5 audio API. If the processed audio track is big and the algorithm running over it is heavy, this will affect the user's experience by freezing the UI until the execution is complete. The WebWorker API was introduced in order to prevent such pitfalls. It allows execution of heavy computations inside the context of a different thread, which leaves the main thread of execution free, capable of handling user input and rendering the user interface. How can we take advantage of this in Angular? In order to answer this question, let's think about how things work in Angular 1. What if we have an enterprise application, which processes a huge amount of data that needs to be rendered on the screen using data binding? For each binding, the framework will create a new watcher. Once the digest loop is run, it will loop over all the watchers, execute the expressions associated with them, and compare the returned results with the results gained from the previous iteration. We have a few slowdowns here: The iteration over a large number of watchers The evaluation of the expression in a given context The copy of the returned result The comparison between the current result of the expression's evaluation and the previous one All these steps could be quite slow, depending on the size of the input. If the digest loop involves heavy computations, why not move it to a WebWorker? Why not run the digest loop inside WebWorker, get the changed bindings, and then apply them to the DOM? There were experiments by the community, which aimed for this result. However, their integration into the framework wasn't trivial. One of the main reasons behind the lack of satisfying results was the coupling of the framework with the DOM. Often, inside the watchers' callbacks, Angular 1 directly manipulates the DOM, which makes it impossible to move the watchers inside WebWorkers since the WebWorkers are invoked in an isolated context, without access to the DOM. In Angular 1, we may have implicit or explicit dependencies between the different watchers, which require multiple iterations of the digest loop in order to get stable results. Combining the last two points, it is quite hard to achieve practical results in calculating the changes in threads other than the main thread of execution. Fixing this in Angular 1 introduces a great deal of complexity in the internal implementation. The framework simply was not built with this in mind. Since WebWorkers were introduced before the Angular 2 design process started, the core team took them into mind from the beginning. Lessons learned from Angular 1 in the wild It's important to remember that we're not starting completely from scratch. We're taking what we've learned from Angular 1 with us. In the period since 2009, the Web is not the only thing that evolved. We also started building more and more complex applications. Today, single-page applications are not something exotic, but more like a strict requirement for all the web applications solving business problems, which are aiming for high performance and a good user experience. Angular 1 helped us to efficiently build large-scale, single-page applications. However, by applying it in various use cases, we've also discovered some of its pitfalls. Learning from the community's experience, Angular's core team worked on new ideas aiming to answer the new requirements. Controllers Angular 1 follows the Model View Controller (MVC) micro-architectural pattern. Some may argue that it looks more like Model View ViewModel (MVVM) because of the view model attached as properties to the scope or the current context in case of "controller as syntax". It could be approached differently again, if we use the Model View Presenter pattern (MVP). Because of all the different variations of how we can structure the logic in our applications, the core team called Angular 1 a Model View Whatever (MVW) framework. The view in any Angular 1 application is supposed to be a composition of directives. The directives collaborate together in order to deliver fully functional user interfaces. Services are responsible for encapsulating the business logic of the applications. That's the place where we should put the communication with RESTful services through HTTP, real-time communication with WebSockets and even WebRTC. Services are the building block where we should implement the domain models and business rules of our applications. There's one more component, which is mostly responsible for handling user input and delegating the execution to the services--the controller. Although the services and directives have well-defined roles, we can often see the anti-pattern of the Massive View Controller, which is common in iOS applications. Occasionally, developers are tempted to access or even manipulate the DOM directly from their controllers. Initially, this happens while you want to achieve something simple, such as changing the size of an element, or quick and dirty changing elements' styles. Another noticeable anti-pattern is the duplication of the business logic across controllers. Often developers tend to copy and paste logic, which should be encapsulated inside services. The best practices for building Angular 1 applications state is that the controllers should not manipulate the DOM at all, instead, all DOM access and manipulations should be isolated in directives. If we have some repetitive logic between controllers, most likely we want to encapsulate it into a service and inject this service with the dependency injection mechanism of Angular in all the controllers that need that functionality. This is where we're coming from in Angular 1. All this said, it seems that the functionality of controllers could be moved into the directive's controllers. Since directives support the dependency injection API, after receiving the user's input, we can directly delegate the execution to a specific service, already injected. This is the main reason why now Angular uses a different approach by removing the ability to put controllers everywhere by using the ng-controller directive. Scope Data-binding in Angular 1 is achieved using the scope object. We can attach properties to it and explicitly declare in the template that we want to bind to these properties (one- or two-way). Although the idea of the scope seems clear, it has two more responsibilities, including event dispatching and the change detection-related behavior. Angular beginners have a hard time understanding what scope really is and how it should be used. Angular 1.2 introduced something called controller as syntax. It allows us to add properties to the current context inside the given controller (this), instead of explicitly injecting the scope object and later adding properties to it. This simplified syntax can be demonstrated through the following snippet: <div ng-controller="MainCtrl as main"> <button ng-click="main.clicked()">Click</button> </div> function MainCtrl() { this.name = 'Foobar'; } MainCtrl.prototype.clicked = function () { alert('You clicked me!'); }; The latest Angular took this even further by removing the scope object. All the expressions are evaluated in the context of the given UI component. Removing the entire scope API introduces higher simplicity; we don't need to explicitly inject it anymore, instead we add properties to the UI components to which we can later bind. This API feels much simpler and more natural. Dependency injection Maybe the first framework on the market that included inversion of control (IoC) through dependency injection (DI) in the JavaScript world was Angular 1. DI provides a number of benefits, such as easier testability, better code organization and modularization, and simplicity. Although the DI in the first version of the framework does an amazing job, Angular 2 took this even further. Since latest Angular is on top of the latest Web standards, it uses the ECMAScript 2016 decorators' syntax for annotating the code for using DI. Decorators are quite similar to the decorators in Python or annotations in Java. They allow us to decorate the behavior of a given object, or add metadata to it, using reflection. Since decorators are not yet standardized and supported by major browsers, their usage requires an intermediate transpilation step; however, if you don't want to take it, you can directly write a little bit more verbose code with ECMAScript 5 syntax and achieve the same semantics. The new DI is much more flexible and feature-rich. It also fixes some of the pitfalls of Angular 1, such as the different APIs; in the first version of the framework, some objects are injected by position (such as the scope, element, attributes, and controller in the directives' link function) and others, by name (using parameters names in controllers, directives, services, and filters). Server-side rendering The bigger the requirements of the Web are, the more complex the web applications become. Building a real-life, single-page application requires writing a huge amount of JavaScript, and including all the required external libraries may increase the size of the scripts on our page to a few megabytes. The initialization of the application may take up to several seconds or even tens of seconds on mobile until all the resources get fetched from the server, the JavaScript is parsed and executed, the page gets rendered, and all the styles are applied. On low-end mobile devices that use a mobile Internet connection, this process may make the users give up on visiting our application. Although there are a few practices that speed up this process, in complex applications, there's no silver bullet. In the process of trying to improve the user experience, developers discovered something called server-side rendering. It allows us to render the requested view of a single-page application on the server and directly provide the HTML for the page to the user. Later, once all the resources are processed, the event listeners and bindings can be added by the script files. This sounds like a good way to boost the performance of our application. One of the pioneers in this was React, which allowed prerendering of the user interface on the server side using Node.js DOM implementations. Unfortunately, the architecture of Angular 1 does not allow this. The showstopper is the strong coupling between the framework and the browser APIs, the same issue we had in running the change detection in WebWorkers. Another typical use case for the server-side rendering is for building Search Engine Optimization (SEO)-friendly applications. There were a couple of hacks used in the past for making the Angular 1 applications indexable by the search engines. One such practice, for instance, is the traversal of the application with a headless browser, which executes the scripts on each page and caches the rendered output into HTML files, making it accessible by the search engines. Although this workaround for building SEO-friendly applications works, server-side rendering solves both of the above-mentioned issues, improving the user experience and allowing us to build SEO-friendly applications much more easily and far more elegantly. The decoupling of Angular with the DOM allows us to run our Angular applications outside the context of the browser. Applications that scale MVW has been the default choice for building single-page applications since Backbone.js appeared. It allows separation of concerns by isolating the business logic from the view, allowing us to build well-designed applications. Taking advantage of the observer pattern, MVW allows listening for model changes in the view and updating it when changes are detected. However, there are some explicit and implicit dependencies between these event handlers, which make the data flow in our applications not obvious and hard to reason about. In Angular 1, we are allowed to have dependencies between the different watchers, which requires the digest loop to iterate over all of them a couple of times until the expressions' results get stable. The new Angular makes the data flow one-directional; this has a number of benefits: More explicit data flow. No dependencies between bindings, so no time to live (TTL) of the digest. Better performance of the framework: The digest loop is run only once. We can create apps, which are friendly to immutable or observable models, that allows us to make further optimizations. The change in the data flow introduces one more fundamental change in Angular 1 architecture. We may take another perspective on this problem when we need to maintain a large codebase written in JavaScript. Although JavaScript's duck typing makes the language quite flexible, it also makes its analysis and support by IDEs and text editors harder. Refactoring of large projects gets very hard and error-prone because in most cases, the static analysis and type inference are impossible. The lack of compiler makes typos all too easy, which are hard to notice until we run our test suite or run the application. The Angular core team decided to use TypeScript because of the better tooling possible with it and the compile-time type checking, which help us to be more productive and less error-prone. As the following diagram shows, TypeScript is a superset of ECMAScript; it introduces explicit type annotations and a compiler:  Figure 1 The TypeScript language is compiled to plain JavaScript, supported by today's browsers. Since version 1.6, TypeScript implements the ECMAScript 2016 decorators, which makes it the perfect choice for Angular. The usage of TypeScript allows much better IDE and text editors' support with static code analysis and type checking. All this increases our productivity dramatically by reducing the mistakes we make and simplifying the refactoring process. Another important benefit of TypeScript is the performance improvement we implicitly get by the static typing, which allows runtime optimizations by the JavaScript virtual machine. Templates Templates are one of the key features in Angular 1. They are simple HTML and do not require any intermediate translation, unlike most template engines, such as mustache. Templates in Angular combine simplicity with power by allowing us to extend HTML by creating an internal domain-specific language (DSL) inside it, with custom elements and attributes. This is one of the main purposes of Web Components as well. We already mentioned how and why Angular takes advantage of this new technology. Although Angular 1 templates are great, they can still get better! The new Angular templates took the best parts of the ones in the previous release of the framework and enhanced them by fixing some of their confusing parts. For example, let's say we have a directive and we want to allow the user to pass a property to it using an attribute. In Angular 1, we can approach this in the following three different ways: <user name="literal"></user> <user name="expression"></user> <user name="{{interpolate}}"></user> In the user directive, we pass the name property using three different approaches. We can either pass a literal (in this case, the string "literal"), a string, which will be evaluated as an expression (in our case "expression"), or an expression inside, {{ }}. Which syntax should be used completely depends on the directive's implementation, which makes its API tangled and hard to remember. It is a frustrating task to deal with a large amount of components with different design decisions on a daily basis. By introducing a common convention, we can handle such problems. However, in order to have good results and consistent APIs, the entire community needs to agree with it. The new Angular deals with this problem by providing special syntax for attributes, whose values need to be evaluated in the context of the current component, and a different syntax for passing literals. Another thing we're used to, based on our Angular 1 experience, is the microsyntax in template directives, such as ng-if and ng-for. For instance, if we want to iterate over a list of users and display their names in Angular 1, we can use: <div ng-for="user in users">{{user.name}}</div> Although this syntax looks intuitive to us, it allows limited tooling support. However, Angular 2 approached this differently by bringing a little bit more explicit syntax with richer semantics: <template ngFor let-user [ngForOf]="users"> {{user.name}} </template> The preceding snippet explicitly defines the property, which has to be created in the context of the current iteration (user), the one we iterate over (users). Since this syntax is too verbose for typing, developers can use the following syntax, which later gets translated to the more verbose one: <li *ngFor="let user of users"> {{user.name}} </li> The improvements in the new templates will also allow better tooling for advanced support by text editors and IDEs. Change detection We already mentioned the opportunity to run the digest loop in the context of a different thread, instantiated as WebWorker. However, the implementation of the digest loop in Angular 1 is not quite memory-efficient and prevents the JavaScript virtual machine from doing further code optimizations, which allows significant performance improvements. One such optimization is the inline caching ( http://mrale.ph/blog/2012/06/03/explaining-js-vms-in-js-inline-caches.html ). The Angular team did a lot of research in order to discover different ways the performance and the efficiency of the change detection could be improved. This led to the development of a brand new change detection mechanism. As a result, Angular performs change detection in code that the framework directly generates from the components' templates. The code is generated by the Angular compiler. There are two built-in code generation (also known as compilation) strategies: Just-in-Time (JiT) compilation: At runtime, Angular generates code that performs change detection on the entire application. The generated code is optimized for the JavaScript virtual machine, which provides a great performance boost. Ahead-of-Time (AoT) compilation: Similar to JiT with the difference that the code is being generated as part of the application's build process. It can be used for speeding the rendering up by not performing the compilation in the browser and also in environments that disallow eval(), such as CSP (Content-Security-Policy) and Chrome extensions. Summary In this article, we considered the main reasons behind the decisions taken by the Angular core team and the lack of backward compatibility between the last two major versions of the framework. We saw that these decisions were fueled by two things--the evolution of the Web and the evolution of the frontend development, with the lessons learned from the development of Angular 1 applications. We learned why we need to use the latest version of the JavaScript language, why to take advantage of Web Components and WebWorkers, and why it's not worth it to integrate all these powerful tools in version 1. We observed the current direction of frontend development and the lessons learned in the last few years. We described why the controller and scope were removed from Angular 2, and why Angular 1's architecture was changed in order to allow server-side rendering for SEO-friendly, high-performance, single-page applications. Another fundamental topic we took a look at was building large-scale applications, and how that motivated single-way data flow in the framework and the choice of the statically typed language, TypeScript. The new Angular reuses some of the naming of the concepts introduced by Angular 1, but generally changes the building blocks of our single-page applications completely. We will take a peek at the new concepts and compare them with the ones in the previous version of the framework. We'll make a quick introduction to modules, directives, components, routers, pipes, and services, and describe how they could be combined for building classy, single-page applications. Resources for Article: Further resources on this subject: Angular.js in a Nutshell [article] Angular's component architecture [article] AngularJS Performance [article]
Read more
  • 0
  • 0
  • 35657

article-image-how-add-custom-slot-types-intents
Antonio Cucciniello
20 Jan 2017
6 min read
Save for later

How to Add Custom Slot Types to Intents

Antonio Cucciniello
20 Jan 2017
6 min read
Have you created an intent for use with Alexa where you wanted to add your own slot types to it? If this is you, hopefully after following this guide you should be on your way to creating as many custom slot types as you like. Before we go any farther, let us define what a slot is in Amazon Echo Skill Development. A slot essentially is a way to access what the user says when requesting something and then using it in your code to execute the skill's functionality properly. For example, let’s say I created a skill that repeated a name back to me, and the request was "Alexa, ask Repeater to respond with John." Alexa would then respond with "John." In this case the slot value would be John. In order to make slots, you need to do a couple of things in your Developer Portal and in your code. Amazon Developer Portal First, we will visit the steps to implement a slot in your Developer Portal. There are three individual parts to this. intent Schema Once you have logged in to your Developer Portal for the skill you would like to add a slot to, click on Interaction Model tab on the left hand side. Go to the Intent Schema section. This is a JSON object that holds all of your intents. Here we are going to create a new intent to our skill. For example, if our skill's name was Repeater and we wanted Alexa to respond with a name the user said back to them, our intent schema would look like this: { "intents": [ { "intent": "RepeatNameIntent", "slots": [ { "name": "repeatName", "type": "REPEAT_NAME" } } ] } Here we specify the intent name as RepeatNameIntent, then specify that the intent will have one slot named repeatName that is of the custom slot type REPEAT_NAME. Now we have created an intent for Alexa to handle while using this skill. It is time to define what the custom slot type of REPEAT_NAME is. Custom Slot Types Now that the intent has been added, in order to define the custom slot type of REPEAT_NAME, scroll down to the Custom Slot Type section. Now click on Add Slot Type. Enter the type as the same type you made in your Intent Schema (for this example, we will create it of type REPEAT_NAME). Then it will ask you to enter values. Here Amazon is looking for example things a user would say for this slot, in order to know when the slot is being used. In the case of the REPEAT_NAME, I have placed a bunch of different names as values for Alexa to handle. Here is an image of the custom slot type with its values: Sample Utterances Once you have defined what your custom slot type will look like by giving it Sample Values, it is time to create a couple of Sample Utterances so the skill knows where the slot type will be when invoking the intent. In order to specify where the slot will be in the Sample Utterance, you must use the format: {nameOfSlot}. For example, here are a couple of Sample Utterances I implemented for the RepeatNameIntent: RepeatNameIntent to repeat the name {repeatName} RepeatNameIntent to say the name {repeatName} RepeatNameIntent to respond with the name {repeatName} Now, by giving these Sample Utterances, Alexa knows that what the user says after the word "name" in the skill is what you would like repeated back to you. In order to implement this, we need to access the slot in our code. Code Now that you have the logistics of setting up the intent and custom slot type in your Developer Portal, you can move on to implementing the intent's functionality. // repeat-name-intent.js module.exports = RepeatNameIntent function RepeatNameIntent (intent, session, response) { var name = intent.slots.repeatName.value response.tell(name) return } The way to access the value of what the user is saying is through intent.slots.slotName.value. For your skill, you would replace slotName with the actual name of the slot that you used in your Intent Schema in the Developer Portal. For this example, we accessed the value and stored it in a variable called name and then had Alexa respond with the name to the user. Now, to make sure that you have the intent being handled in your code, head over to your main js file for this skill (the file where AWS Lambda is pointing to). Add the following line to pull in your intent function into your main file: // main.js var RepeatNameIntent = require('./repeat-name-intent.js') Once you have added that line, you can add this next line in your intentHandlers in order for your skill to know which intent in your Developer Portal's intent schema relates to which function in your code: RepeaterService.protoype.intentHandlers = { 'RepeatNameIntent' : RepeatNameIntent } This takes the form of 'IntentNameInDevPortal' : IntentNameInCode. Conclusion If you have made it this far, you have successfully added a custom slot type to your Intent! I will briefly explain what happens when invoking the skill with your custom slot type: The user says "Alexa, ask Repeater to say the name Joe." Alexa listens to what you are saying. She recognizes that you are invoking the RepeatNameIntent and saying the slot should be "Joe." She now executes the function RepeatNameIntent because your intent handler tells her that is how you would like to handle that intent. She responds with "Joe." Possible Resources Use my skill here: Edit Docs Check out the Code for my skill on GitHub Alexa Skills Kit Custom Interaction Model Reference Migrating to the Improved Built-in and Custom Slot Types About the author Antonio is a software engineer with a background in C, C++ and JavaScript (Node.js) from New Jersey. His most recent project called Edit Docs is an Amazon Echo skill that allows users to edit Google Drive files using their voice. He loves building cool things with software, reading books on self-help and improvement, finance, and entrepreneurship. To contact Antonio, e-mail him at Antonio.cucciniello16@gmail.com, follow him on twitter @antocucciniello, and follow him on GitHub.
Read more
  • 0
  • 0
  • 11306
article-image-installing-quicksight-application
Packt
20 Jan 2017
4 min read
Save for later

Installing QuickSight Application

Packt
20 Jan 2017
4 min read
In this article by Rajesh Nadipalli, the author of the book Effective Business Intelligence with QuickSight, we will see how you can install the Amazon QuickSight app from the Apple iTunes store for no cost. You can search for the app from the iTunes store and then proceed to download and install or alternatively you can follow this link to download the app. (For more resources related to this topic, see here.) Amazon QuickSight app is certified to work with iOS devices running iOS v9.0 and above. Once you have the app installed, you can then proceed to login to your QuickSight account as shown in the following screenshot: Figure 1.1: QuickSight sign in The Amazon QuickSight app is designed to access dashboards and analyses on your mobile device. All interactions on the app are read-only and changes you make on your device are not applied to the original visuals so that you can explore without any worry. Dashboards on the go After you login to the QuickSight app, you will first see the list of dashboards associated to your QuickSight account for easy access. If you don't see dashboards, then click on Dashboards icon from the menu at the bottom of your mobile device as shown in the following screenshot: Figure 1.2: Accessing dashboards You will now see the list of dashboards associated to your user ID. Dashboard detailed view From the dashboard listing, select the USA Census Dashboard, which will then redirect you to the detailed dashboard view. In the detailed dashboard view you will see all visuals that are part of that dashboard. You can click on the arrow to the extreme top right of each visual to open the specific chart in full screen mode as shown in the following screenshot. In the scatter plot analysis shown in the following screenshot, you can further click on any of the dots to get specific values about that bubble. In the following screenshot the selected circle is for zip code 94027 which has PopulationCount of 7,089 and MedianIncome of $216,905 and MeanIncome of $336,888: Figure 1.3: Dashboard visual Dashboard search QuickSight mobile app also provides a search feature, which is handy if you know only partial name of the dashboard. Follow the following steps to search for a dashboard: First ensure you are in the dashboards tab by clicking on the Dashboards icon from the bottom menu. Next click on the search icon seen on the top right corner. Next type the partial name. In the following example, i have typed Usa. QuickSight now searches for all dashboards that have the word Usa in it and lists them out. You can next click on the dashboard to get details about that specific dashboard as shown in the following screenshot: Figure 1.4: Dashboard search Favorite a dashboard QuickSight provides a convenient way to bookmark your dashboards by setting them as favorites. To use this feature, first identify which dashboards you often use and click on the star icon to it's right side as shown in the following screenshot. Next to access all of your favorites, click on the Favorites tab and the list is then refined to only those dashboards you had previously identified as favorite: Figure 1.5: Dashboard favorites Limitations of mobile app While dashboards are fairly easy to interact with on the mobile app, there are key limitations when compared to the standard browser version, which I am listing as follows: You cannot create share dashboards to others using the mobile app. You cannot zoom in/out from the visual, which would be really good in scenarios where the charts are dense. Chart legends are not shown. Summary We have seen how to install Amazon QuickSight app and using this app you can browse, search, and view dashboards. We have covered how to access dashboards, search, favorite, and its detailed view. We have also seen some limitations of mobile app. Resources for Article: Further resources on this subject: Introduction to Practical Business Intelligence [article] MicroStrategy 10 [article] Making Your Data Everything It Can Be [article]
Read more
  • 0
  • 0
  • 3037

article-image-background-jobs-django-celery
Jean Jung
19 Jan 2017
7 min read
Save for later

Background jobs on Django with Celery

Jean Jung
19 Jan 2017
7 min read
While doing web applications, you usually need to run some operations in the background to improve the application performance, or because a job really needs to run outside of the application environment. In both cases, if you are on Django, you are in good hands because you have Celery, the Distributed Task Queue written in Python. Celery is a tiny but complete project. You can find more information on the project page. In this post, we will see how it’s easy to integrate Celery with an existing project, and although we are focusing on Django here, creating a standalone Celery worker is a very similar process. Installing Celery The first step we will see is how to install Celery. If you already have it, please move to the next section and follow the next step! As every good Python package, Celery is distributed on pip. You can install it just by entering: pip install celery Choosing a message broker The second step is about choosing a message broker to act as the job queue. Celery can talk with a great variety of brokers; the main ones are: RabbitMQ Redis 1 Amazon SQS  ² Check for support on other brokers here. If you’re already using any of these brokers for other purposes, choose it as your primary option. In this section there is nothing more you have to do. Celery is very transparent and does not require any source modification to move from a broker to another, so feel free to try more than one after we end here. Ok let’s move on, but first do not forget to look the little notes below. ¹: For Redis (a great choice in my opinion), you have to install the celery[redis] package. ²: Celery has great features like web monitoring that do not work with this broker. Celery worker entrypoint When running Celery on a directory it will search for a file called celery.py, which is the application entrypoint, where the configs are loaded and the application object resides. Working with Django, this file is commonly stored on the project directory, along with the settings.py file; your file structure should look like this: your_project_name your_project_name __init__.py settings.py urls.py wsgi.py celery.py your_app_name __init__.py models.py views.py …. The settings read by that file will be on the same settings.py file that Django uses. At this point we can take a look at the official documentation celery.py file example. This code is basically the same for every project; just replace proj by your project name and save that file. Each part is described in the file comments. from __future__ import absolute_import, unicode_literals import os from celery import Celery # set the default Django settings module for the 'celery' program. os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'proj.settings') app = Celery('proj') # Using a string here means the worker don't have to serialize # the configuration object to child processes. # - namespace='CELERY' means all celery-related configuration keys # should have a `CELERY_` prefix. app.config_from_object('django.conf:settings', namespace='CELERY') # Load task modules from all registered Django app configs. # This is not required, but as you can have more than one app # with tasks it’s better to do the autoload than declaring all tasks # in this same file. app.autodiscover_tasks() Settings By default, Celery depends only on the broker_url setting to work. As we’ve seen in the previous session, your settings will be stored alongside the Django ones but with the 0‘CELERY_’ prefix. The broker_url format is as follows: CELERY_BROKER_URL = ‘broker://[[user]:[password]@]host[:port[/resource]]’ Where broker is an identifier that specifies the chosen broker, like amqp or redis; user and password are the authentication to the service. If needed, host and port are the addresses of the service and resource is a broker-specific path to the component resource. For example, if you’ve chosen a local Redis as your broker, your broker URL will be: CELERY_BROKER_URL = ‘redis://localhost:6379/0’ ¹ 1: Considering a default Redis installation with the database 0 being used. Doing this we have a functioning celery worker. How lucky! It’s so simple! But wait, what about the tasks? How do we write and execute them? Let’s see. Creating and running tasks Because of the superpowers Celery has, it can autoload tasks from Django app directories as we’ve seen before; you just have to declare your app tasks in a file called tasks.py in the app dir: your_project_name your_project_name __init__.py settings.py urls.py wsgi.py celery.py your_app_name __init__.py models.py views.py tasks.py …. In that file you just need to put functions decorated with the celery.shared_task decorator. So suppose we want to do a background mailer; the source will be like this: from __future__ import absolute_import, unicode_literals from celery import shared_task from django.core.mail import send_mail @shared_task def mailer(subject, message, recipient_list, from=’default@admin.com’): send_mail(subject, message, recipient_list, from) Then on the Django application, on any place you have to send an e-mail on background, just do the following: from __future__ import absolute_import from app.tasks import mailer …. def send_email_to_user(request): if request.user: mailer.delay(‘Alert Foo’, ‘The foo message’, [request.user.email]) delay is probably the most used way to submit a job to a Celery worker, but is not the only one. Check this reference to see what is possible to do. There are many features like task chaining, with future schedules and more! As you can have noticed, in a great majority of the files, we have used the from __future__ import absolute_import statement. This is very important, mainly with Python 2, because of the way Celery serializes messages to post tasks on brokers. You need to follow the same convention when creating and using tasks, as otherwise the namespace of the task will differ and the task will not get executed. The absolute import module forces you to use absolute imports, so you will avoid these problems. Check this link for more information. Running the worker If you get the source code above, put anything in the right place and run the Django development server to test your background jobs, they will not work! Wait. This is because you don’t have a Celery worker started yet. To start it, do a cd to the project main directory (the same as you run python manage.py runserver for example) and run: celery -A your_project_name worker -l info Replace your_project_name with your project and info with the desired log level. Keep this process running, start the Django server, and yes. Now you can see that anything works! Where to go now? Explore the Celery documentation and see all the available features, caveats, and help you can get from it. There is also an example project on the Celery GitHub page that you can use as a template for new projects or a guide to add celery to your existing project. Summary We’ve seen how to install and configure Celery to run alongside a new or existing Django project. We explored some of the broker options we have, and how simple it is to change between them. There are some hints about brokers that don’t offer all of the features Celery has. We have seen an example of a mailer task, and how it was created and called from the Django application. Finally I provided instructions to start the worker to get the things done. References [1] - Django project documentation [2] - Celery project documentation [3] - Redis project page [4] - RabbitMQ project page [5] - Amazon SQS page About the author Jean Jung is a Brazilian developer passionate about technology. He is currently a system analyst at EBANX, an international payment processing company for Latin America. He's very interested in Python and artificial intelligence, specifically machine learning, compilers and operational systems. As a hobby, he's always looking for IoT projects with Arduino.
Read more
  • 0
  • 0
  • 4815
Modal Close icon
Modal Close icon