Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials

7019 Articles
article-image-why-choose-opencv-over-matlab-for-your-next-computer-vision-project
Vincy Davis
20 Dec 2019
6 min read
Save for later

Why choose OpenCV over MATLAB for your next Computer Vision project

Vincy Davis
20 Dec 2019
6 min read
Scientific Computing relies on executing computer algorithms coded in different programming languages. One such interdisciplinary scientific field is the study of Computer Vision, often abbreviated as CV. Computer Vision is used to develop techniques that can automate tasks like acquiring, processing, analyzing and understanding digital images. It is also utilized for extracting high-dimensional data from the real world to produce symbolic information. In simple words, Computer Vision gives computers the ability to see, understand and process images and videos like humans. The vast advances in hardware, machine learning tools, and frameworks have resulted in the implementation of Computer Vision in various fields like IoT, manufacturing, healthcare, security, etc. Major tech firms like Amazon, Google, Microsoft, and Facebook are investing immensely in the research and development of this field. Out of the many tools and libraries available for Computer Vision nowadays, there are two major tools OpenCV and Matlab that stand out in terms of their speed and efficiency. In this article, we will have a detailed look at both of them. Further Reading [box type="shadow" align="" class="" width=""]To learn how to build interesting image recognition models like setting up license plate recognition using OpenCV, read the book “Computer Vision Projects with OpenCV and Python 3” by author Matthew Rever. The book will also guide you to design and develop production-grade Computer Vision projects by tackling real-world problems.[/box] OpenCV: An open-source multiplatform solution tailored for Computer Vision OpenCV, developed by Intel and now supported by Willow Garage, is released under the BSD 3-Clause license and is free for commercial use. It is one of the most popular computer vision tools aimed at providing a well-optimized, well tested, and open-source (C++)-based implementation for computer vision algorithms. The open-source library has interfaces for multiple languages like C++, Python, and Java and supports Linux, macOS, Windows, iOS, and Android. Many of its functions are implemented on GPU. The first stable release of OpenCV version 1.0 was in the year 2006. The OpenCV community has grown rapidly ever since and with its latest release, OpenCV version 4.1.1, it also brings improvements in the dnn (Deep Neural Networks) module, which is a popular module in the library that implements forward pass (inferencing) with deep networks, which are pre-trained using popular deep learning frameworks.  Some of the features offered by OpenCV include: imread function to read the images in the BGR (Blue-Green-Red) format by default. Easy up and downscaling for resizing an image. Supports various interpolation and downsampling methods like INTER_NEAREST to represent the nearest neighbor interpolation. Supports multiple variations of thresholding like adaptive thresholding, bitwise operations, edge detection, image filtering, image contours, and more. Enables image segmentation (Watershed Algorithm) to classify each pixel in an image to a particular class of background and foreground. Enables multiple feature-matching algorithms, like brute force matching, knn feature matching, among others. With its active community and regular updates for Machine Learning, OpenCV is only going to grow by leaps and bounds in the field of Computer Vision projects.  MATLAB: A licensed quick prototyping tool with OpenCV integration One disadvantage of OpenCV, which makes novice computer vision users tilt towards Matlab is the former's complex nature. OpenCV is comparatively harder to learn due to lack of documentation and error handling codes. Matlab, developed by MathWorks is a proprietary programming language with a multi-paradigm numerical computing environment. It has over 3 million users worldwide and is considered one of the easiest and most productive software for engineers and scientists. It has a very powerful and swift matrix library.  Matlab also works in integration with OpenCV. This enables MATLAB users to explore, analyze, and debug designs that incorporate OpenCV algorithms. The support package of MATLAB includes the data type conversions necessary for MATLAB and OpenCV. MathWorks provided Computer Vision Toolbox renders algorithms, functions, and apps for designing and testing computer vision, 3D vision, and video processing systems. It also allows detection, tracking, feature extraction, and matching of objects. Matlab can also train custom object detectors using deep learning and machine learning algorithms such as YOLO v2, Faster R-CNN, and ACF. Most of the toolbox algorithms in Matlab support C/C++ code generation for integrating with existing code, desktop prototyping, and embedded vision system deployment. However, Matlab does not contain as many functions for computer vision as OpenCV, which has more of its functions implemented on GPU. Another issue with Matlab is that it's not open-source, it’s license is costly and the programs are not portable.  Another important factor which matters a lot in computer vision is the performance of a code, especially when working on real-time video processing.  Which has a faster execution time? OpenCV or Matlab? Along with Computer Vision, other fields also require faster execution while choosing a programming language or library for implementing any function. This factor is analyzed in detail in a paper titled “Matlab vs. OpenCV: A Comparative Study of Different Machine Learning Algorithms”.  The paper provides a very practical comparative study between Matlab and OpenCV using 20 different real datasets. The differentiation is based on the execution time for various machine learning algorithms like Classification and Regression Trees (CART), Naive Bayes, Boosting, Random Forest and K-Nearest Neighbor (KNN). The experiments were run on an Intel core 2 duo P7450 machine, with 3GB RAM, and Ubuntu 11.04 32-bit operating system on Matlab version 7.12.0.635 (R2011a), and OpenCV C++ version 2.1.  The paper states, “To compare the speed of Matlab and OpenCV for a particular machine learning algorithm, we run the algorithm 1000 times and take the average of the execution times. Averaging over 1000 experiments is more than necessary since convergence is reached after a few hundred.” The outcome of all the experiments revealed that though Matlab is a successful scientific computing environment, it is outrun by OpenCV for almost all the experiments when their execution time is considered. The paper points out that this could be due to a combination of a number of dimensionalities, sample size, and the use of training sets. One of the listed machine learning algorithms KNN produced a log time ratio of 0.8 and 0.9 on datasets D16 and D17 respectively.  Clearly, Matlab is great for exploring and fiddling with computer vision concepts as researchers and students at universities that can afford the software. However, when it comes to building production-ready real-world computer vision projects, OpenCV beats Matlab hand down. You can learn about building more Computer Vision projects like human pose estimation using TensorFlow from our book ‘Computer Vision Projects with OpenCV and Python 3’. Master the art of face swapping with OpenCV and Python by Sylwek Brzęczkowski, developer at TrustStamp NVIDIA releases Kaolin, a PyTorch library to accelerate research in 3D computer vision and AI Generating automated image captions using NLP and computer vision [Tutorial] Computer vision is growing quickly. Here’s why. Introducing Intel’s OpenVINO computer vision toolkit for edge computing
Read more
  • 0
  • 0
  • 33178

article-image-anatomy-wordpress-plugin
Packt
25 Mar 2011
7 min read
Save for later

Anatomy of a WordPress Plugin

Packt
25 Mar 2011
7 min read
  WordPress 3 Plugin Development Essentials Create your own powerful, interactive plugins to extend and add features to your WordPress site         Read more about this book       WordPress is a popular content management system (CMS), most renowned for its use as a blogging / publishing application. According to usage statistics tracker, BuiltWith (http://builtWith.com), WordPress is considered to be the most popular blogging software on the planet—not bad for something that has only been around officially since 2003. Before we develop any substantial plugins of our own, let's take a few moments to look at what other people have done, so we get an idea of what the final product might look like. By this point, you should have a fresh version of WordPress installed and running somewhere for you to play with. It is important that your installation of WordPress is one with which you can tinker. In this article by Brian Bondari and Everett Griffiths, authors of WordPress 3 Plugin Development Essentials, we will purposely break a few things to help see how they work, so please don't try anything in this article on a live production site. Deconstructing an existing plugin: "Hello Dolly" WordPress ships with a simple plugin named "Hello Dolly". Its name is a whimsical take on the programmer's obligatory "Hello, World!", and it is trotted out only for pedantic explanations like the one that follows (unless, of course, you really do want random lyrics by Jerry Herman to grace your administration screens). Activating the plugin Let's activate this plugin so we can have a look at what it does: Browse to your WordPress Dashboard at http://yoursite.com/wp-admin/. Navigate to the Plugins section. Under the Hello Dolly title, click on the Activate link. You should now see a random lyric appear in the top-right portion of the Dashboard. Refresh the page a few times to get the full effect. Examining the hello.php file Now that we've tried out the "Hello Dolly" plugin, let's have a closer look. In your favorite text editor, open up the /wp-content/plugins/hello.php file. Can you identify the following integral parts? The Information Header which describes details about the plugin (author and description). This is contained in a large PHP /* comment */. User-defined functions, such as the hello_dolly() function. The add_action() and/or add_filter() functions, which hook a WordPress event to a user-defined function. It looks pretty simple, right? That's all you need for a plugin: An information header Some user-defined functions add_action() and/or add_filter() functions In your WordPress Dashboard, ensure that the "Hello Dolly" plugin has been activated. If applicable, use your preferred (s)FTP program to connect to your WordPress installation. Using your text editor, temporarily delete the information header from wpcontent/ plugins/hello.php and save the file (you can save the header elsewhere for now). Save the file. Refresh the Plugins page in your browser. You should get a warning from WordPress stating that the plugin does not have a valid header: Ensure that the "Hello Dolly" plugin is active. Open the /wp-content/plugins/hello.php file in your text editor. Immediately before the line that contains function hello_dolly_get_lyric, type in some gibberish text, such as "asdfasdf" and save the file. Reload the plugins page in your browser. This should generate a parse error, something like: pre width="70"> Parse error: syntax error, unexpected T_FUNCTION in /path/to/ wordpress/html/wp-content/plugins/hello.php on line 16 Author: Listed below the plugin name Author URI: Together with "Author", this creates a link to the author's site Description: Main block of text describing the plugin Plugin Name: The displayed name of the plugin Plugin URI: Destination of the "Visit plugin site" link Version: Use this to track your changes over time Now that we've identified the critical component parts, let's examine them in more detail. Information header Don't just skim this section thinking it's a waste of breath on the self-explanatory header fields. Unlike a normal PHP file in which the comments are purely optional, in WordPress plugin and theme files, the Information Header is required! It is this block of text that causes a file to show up on WordPress' radar so that you can activate it or deactivate it. If your plugin is missing a valid information header, you cannot use it! Exercise—breaking the header To reinforce that the information header is an integral part of a plugin, try the following exercise: After you've seen the tragic consequences, put the header information back into the hello.php file. This should make it abundantly clear to you that the information header is absolutely vital for every WordPress plugin. If your plugin has multiple files, the header should be inside the primary file—in this article we use index.php as our primary file, but many plugins use a file named after the plugin name as their primary file. Location, name, and format The header itself is similar in form and function to other content management systems, such as Drupal's module.info files or Joomla's XML module configurations—it offers a way to store additional information about a plugin in a standardized format. The values can be extended, but the most common header values are listed below: For more information about header blocks, see the WordPress codex at: http://codex.wordpress.org/File_Header. In order for a PHP file to show up in WordPress' Plugins menu: The file must have a .php extension. The file must contain the information header somewhere in it (preferably at the beginning). The file must be either in the /wp-content/plugins directory, or in a subdirectory of the plugins directory. It cannot be more deeply nested. Understanding the Includes When you activate a plugin, the name of the file containing the information header is stored in the WordPress database. Each time a page is requested, WordPress goes through a laundry list of PHP files it needs to load, so activating a plugin ensures that your own files are on that list. To help illustrate this concept, let's break WordPress again. Exercise – parse errors Try the following exercise: Yikes! Your site is now broken. Why did this happen? We introduced errors into the plugin's main file (hello.php), so including it caused PHP and WordPress to choke. Delete the gibberish line from the hello.php file and save to return the plugin back to normal. The parse error only occurs if there is an error in an active plugin. Deactivated plugins are not included by WordPress and therefore their code is not parsed. You can try the same exercise after deactivating the plugin and you'll notice that WordPress does not raise any errors. Bonus for the curious In case you're wondering exactly where and how WordPress stores the information about activated plugins, have a look in the database. Using your MySQL client, you can browse the wp_options table or execute the following query: SELECT option_value FROM wp_options WHERE option_name='active_ plugins'; The active plugins are stored as a serialized PHP hash, referencing the file containing the header. The following is an example of what the serialized hash might contain if you had activated a plugin named "Bad Example". You can use PHP's unserialize() function to parse the contents of this string into a PHP variable as in the following script: <?php $active_plugin_str = 'a:1:{i:0;s:27:"bad-example/bad-example. php";}'; print_r( unserialize($active_plugin_str) ); ?> And here's its output: Array ( [0] => bad-example/bad-example.php )
Read more
  • 0
  • 1
  • 33166

article-image-jim-balsillie-on-data-governance-challenges-and-6-recommendations-to-tackle-them
Savia Lobo
05 Jun 2019
5 min read
Save for later

Jim Balsillie on Data Governance Challenges and 6 Recommendations to tackle them

Savia Lobo
05 Jun 2019
5 min read
The Canadian Parliament's Standing Committee on Access to Information, Privacy and Ethics hosted the hearing of the International Grand Committee on Big Data, Privacy and Democracy from Monday, May 27 to Wednesday, May 29.  Witnesses from at least 11 countries appeared before representatives to testify on how governments can protect democracy and citizen rights in the age of big data. This section of the hearing, which took place on May 28, includes Jim Balsillie’s take on Data Governance. Jim Balsillie, Chair, Centre for International Governance Innovation; Retired Chairman and co-CEO of BlackBerry, starts off by talking about how Data governance is the most important public policy issue of our time. It is cross-cutting with economic, social and security dimensions. It requires both national policy frameworks and international coordination. He applauded the seriousness and integrity of Mr. Zimmer Angus and Erskine Smith who have spearheaded a Canadian bipartisan effort to deal with data governance over the past three years. “My perspective is that of a capitalist and global tech entrepreneur for 30 years and counting. I'm the retired Chairman and co-CEO of Research in Motion, a Canadian technology company [that] we scaled from an idea to 20 billion in sales. While most are familiar with the iconic BlackBerry smartphones, ours was actually a platform business that connected tens of millions of users to thousands of consumer and enterprise applications via some 600 cellular carriers in over 150 countries. We understood how to leverage Metcalfe's law of network effects to create a category-defining company, so I'm deeply familiar with multi-sided platform business model strategies as well as navigating the interface between business and public policy.”, he adds. He further talks about his different observations about the nature, scale, and breadth of some collective challenges for the committee’s consideration: Disinformation in fake news is just two of the negative outcomes of unregulated attention based business models. They cannot be addressed in isolation; they have to be tackled horizontally as part of an integrated whole. To agonize over social media’s role in the proliferation of online hate, conspiracy theories, politically motivated misinformation, and harassment, is to miss the root and scale of the problem. Social media’s toxicity is not a bug, it's a feature. Technology works exactly as designed. Technology products services and networks are not built in a vacuum. Usage patterns drive product development decisions. Behavioral scientists involved with today's platforms helped design user experiences that capitalize on negative reactions because they produce far more engagement than positive reactions. Among the many valuable insights provided by whistleblowers inside the tech industry is this quote, “the dynamics of the attention economy are structurally set up to undermine the human will.” Democracy and markets work when people can make choices align with their interests. The online advertisement driven business model subverts choice and represents a fundamental threat to markets election integrity and democracy itself. Technology gets its power through the control of data. Data at the micro-personal level gives technology unprecedented power to influence. “Data is not the new oil, it's the new plutonium amazingly powerful dangerous when it spreads difficult to clean up and with serious consequences when improperly used.” Data deployed through next-generation 5G networks are transforming passive in infrastructure into veritable digital nervous systems. Our current domestic and global institutions rules and regulatory frameworks are not designed to deal with any of these emerging challenges. Because cyberspace knows no natural borders, digital transformation effects cannot be hermetically sealed within national boundaries; international coordination is critical. With these observations, Balsillie has further provided six recommendations: Eliminate tax deductibility of specific categories of online ads. Ban personalized online advertising for elections. Implement strict data governance regulations for political parties. Provide effective whistleblower protections. Add explicit personal liability alongside corporate responsibility to effect the CEO and board of directors’ decision-making. Create a new institution for like-minded nations to address digital cooperation and stability. Technology is becoming the new 4th Estate Technology is disrupting governance and if left unchecked could render liberal democracy obsolete. By displacing the print and broadcast media and influencing public opinion, technology is becoming the new Fourth Estate. In our system of checks and balances, this makes technology co-equal with the executive that led the legislative and the judiciary. When this new Fourth Estate declines to appear before this committee, as Silicon Valley executives are currently doing, it is symbolically asserting this aspirational co-equal status. But is asserting the status and claiming its privileges without the traditions, disciplines, legitimacy, or transparency that checked the power of the traditional Fourth Estate. The work of this international grand committee is a vital first step towards reset redress of this untenable current situation. Referring to what Professor Zuboff said last night, we Canadians are currently in a historic battle for the future of our democracy with a charade called sidewalk Toronto. He concludes by saying, “I'm here to tell you that we will win that battle.” To know more you can listen to the full hearing video titled, “Meeting No. 152 ETHI - Standing Committee on Access to Information, Privacy, and Ethics” on ParlVU. Speech2Face: A neural network that “imagines” faces from hearing voices. Is it too soon to worry about ethnic profiling? UK lawmakers to social media: “You’re accessories to radicalization, accessories to crimes”, hearing on spread of extremist content Key Takeaways from Sundar Pichai’s Congress hearing over user data, political bias, and Project Dragonfly
Read more
  • 0
  • 0
  • 33154

article-image-internationalization
Packt
10 Nov 2016
16 min read
Save for later

Internationalization

Packt
10 Nov 2016
16 min read
In this article by Jérémie Bouchet author of the book Magento Extensions Development. We will see how to handle this aspect of our extension and how it is handled in a complex extension using an EAV table structure. In this article, we will cover the following topics: The EAV approach Store relation table Translation of template interface texts (For more resources related to this topic, see here.) The EAV approach The EAV structure in Magento is used for complex models, such as customer and product entities. In our extension, if we want to add a new field for our events, we would have to add a new column in the main table. With the EAV structure, each attribute is stored in a separate table depending on its type. For example, catalog_product_entity, catalog_product_entity_varchar and catalog_product_entity_int. Each row in the subtables has a foreign key reference to the main table. In order to handle multiple store views in this structure, we will add a column for the store ID in the subtables. Let's see an example for a product entity, where our main table contains only the main attribute: The varchar table structure is as follows: The 70 attribute corresponds to the product name and is linked to our 1 entity. There is a different product name for the store view, 0 (default) and 2 (in French in this example). In order to create an EAV model, you will have to extend the right class in your code. You can inspire your development on the existing modules, such as customers or products. Store relation table In our extension, we will handle the store views scope by using a relation table. This behavior is also used for the CMS pages or blocks, reviews, ratings, and all the models that are not EAV-based and need to be store views-related. Creating the new table The first step is to create the new table to store the new data: Create the [extension_path]/Setup/UpgradeSchema.php file and add the following code: <?php namespace BlackbirdTicketBlasterSetup; use MagentoEavSetupEavSetup; use MagentoEavSetupEavSetupFactory; use MagentoFrameworkSetupUpgradeSchemaInterface; use MagentoFrameworkSetupModuleContextInterface; use MagentoFrameworkSetupSchemaSetupInterface; /** * @codeCoverageIgnore */ class UpgradeSchema implements UpgradeSchemaInterface { /** * EAV setup factory * * @varEavSetupFactory */ private $eavSetupFactory; /** * Init * * @paramEavSetupFactory $eavSetupFactory */ public function __construct(EavSetupFactory $eavSetupFactory) { $this->eavSetupFactory = $eavSetupFactory; } public function upgrade(SchemaSetupInterface $setup, ModuleContextInterface $context) { if (version_compare($context->getVersion(), '1.3.0', '<')) { $installer = $setup; $installer->startSetup(); /** * Create table 'blackbird_ticketblaster_event_store' */ $table = $installer->getConnection()->newTable( $installer->getTable('blackbird_ticketblaster_event_store') )->addColumn( 'event_id', MagentoFrameworkDBDdlTable::TYPE_SMALLINT, null, ['nullable' => false, 'primary' => true], 'Event ID' )->addColumn( 'store_id', MagentoFrameworkDBDdlTable::TYPE_SMALLINT, null, ['unsigned' => true, 'nullable' => false, 'primary' => true], 'Store ID' )->addIndex( $installer->getIdxName('blackbird_ticketblaster_event_store', ['store_id']), ['store_id'] )->addForeignKey( $installer->getFkName('blackbird_ticketblaster_event_store', 'event_id', 'blackbird_ticketblaster_event', 'event_id'), 'event_id', $installer->getTable('blackbird_ticketblaster_event'), 'event_id', MagentoFrameworkDBDdlTable::ACTION_CASCADE )->addForeignKey( $installer->getFkName('blackbird_ticketblaster_event_store', 'store_id', 'store', 'store_id'), 'store_id', $installer->getTable('store'), 'store_id', MagentoFrameworkDBDdlTable::ACTION_CASCADE )->setComment( 'TicketBlaster Event To Store Linkage Table' ); $installer->getConnection()->createTable($table); $installer->endSetup(); } } } The upgrade method will handle all the necessary updates in our database for our extension. In order to differentiate the update for a different version of the extension, we surround the script with a version_compare() condition. Once this code is set, we need to tell Magento that our extension has new database upgrades to process. Open the [extension_path]/etc/module.xml file and change the version number 1.2.0 to 1.3.0: <?xml version="1.0"?> <config xsi_noNamespaceSchemaLocation="../../../../../lib/internal/Magento/Framework/Module/etc/module.xsd"> <module name="Blackbird_TicketBlaster" setup_version="1.3.0"> <sequence> <module name="Magento_Catalog"/> <module name="Blackbird_AnotherModule"/> </sequence> </module> </config> In your terminal, run the upgrade by typing the following command: php bin/magentosetup:upgrade The new table structure now contains two columns: event_id and store_id. This table will store which events are available for store views: If you have previously created events, we recommend emptying the existing blackbird_ticketblaster_event table, because they won't have a default store view and this may trigger an error output. Adding the new input to the edit form In order to select the store view for the content, we will need to add the new input to the edit form. Before running this code, you should add a new store view: Here's how to do that: Open the [extension_path]/Block/Adminhtml/Event/Edit/Form.php file and add the following code in the _prepareForm() method, below the last addField() call: /* Check is single store mode */ if (!$this->_storeManager->isSingleStoreMode()) { $field = $fieldset->addField( 'store_id', 'multiselect', [ 'name' => 'stores[]', 'label' => __('Store View'), 'title' => __('Store View'), 'required' => true, 'values' => $this->_systemStore->getStoreValuesForForm(false, true) ] ); $renderer = $this->getLayout()->createBlock( 'MagentoBackendBlockStoreSwitcherFormRendererFieldsetElement' ); $field->setRenderer($renderer); } else { $fieldset->addField( 'store_id', 'hidden', ['name' => 'stores[]', 'value' => $this->_storeManager->getStore(true)->getId()] ); $model->setStoreId($this->_storeManager->getStore(true)->getId()); } This results in a new multiselect field in the form. Saving the new data in the new table Now we have the form and the database table, we have to write the code to save the data from the form: Open the [extension_path]/Model/Event.php file and add the following method at its end: /** * Receive page store ids * * @return int[] */ public function getStores() { return $this->hasData('stores') ? $this->getData('stores') : $this->getData('store_id'); } Open the [extension_path]/Model/ResourceModel/Event.php file and replace all the code with the following code: <?php namespace BlackbirdTicketBlasterModelResourceModel; class Event extends MagentoFrameworkModelResourceModelDbAbstractDb { [...] The afterSave() method is handling our insert queries in the new table. The afterload() and getLoadSelect() methods are handling the new load mode to select the right events. Your new table is now filled when you save your events; they are also properly loaded when you go back to your edit form. Showing the store views in the admin grid In order to inform admin users of the selected store views for one event, we will add a new column in the admin grid: Open the [extension_path]/Model/ResourceModel/Event/Collection.php file and replace all the code with the following code: <?php namespace BlackbirdTicketBlasterModelResourceModelEvent; class Collection extends MagentoFrameworkModelResourceModelDbCollectionAbstractCollection { [...] Open the [extention_path]/view/adminhtml/ui_component/ticketblaster_event_listing.xml file and add the following XML instructions before the end of the </filters> tag: <filterSelect name="store_id"> <argument name="optionsProvider" xsi_type="configurableObject"> <argument name="class" xsi_type="string">MagentoCmsUiComponentListingColumnCmsOptions</argument> </argument> <argument name="data" xsi_type="array"> <item name="config" xsi_type="array"> <item name="dataScope" xsi_type="string">store_id</item> <item name="label" xsi_type="string" translate="true">Store View</item> <item name="captionValue" xsi_type="string">0</item> </item> </argument> </filterSelect> Before the actionsColumn tag, add the new column: <column name="store_id" class="MagentoStoreUiComponentListingColumnStore"> <argument name="data" xsi_type="array"> <item name="config" xsi_type="array"> <item name="bodyTmpl" xsi_type="string">ui/grid/cells/html</item> <item name="sortable" xsi_type="boolean">false</item> <item name="label" xsi_type="string" translate="true">Store View</item> </item> </argument> </column> You can refresh your grid page and see the new column added at the end. Magento remembers the previous column's order. If you add a new column, it will always be added at the end of the table. You will have to manually reorder them by dragging and dropping them. Modifying the frontend event list Our frontend list (/events) is still listing all the events. In order to list only the events available for our current store view, we need to change a file: Edit the [extension_path]/Block/EventList.php file and replace the code with the following code: <?php namespace BlackbirdTicketBlasterBlock; use BlackbirdTicketBlasterApiDataEventInterface; use BlackbirdTicketBlasterModelResourceModelEventCollection as EventCollection; use MagentoCustomerModelContext; class EventList extends MagentoFrameworkViewElementTemplate implements MagentoFrameworkDataObjectIdentityInterface { /** * Store manager * * @var MagentoStoreModelStoreManagerInterface */ protected $_storeManager; /** * @var MagentoCustomerModelSession */ protected $_customerSession; /** * Construct * * @param MagentoFrameworkViewElementTemplateContext $context * @param BlackbirdTicketBlasterModelResourceModelEventCollectionFactory $eventCollectionFactory, * @param array $data */ public function __construct( MagentoFrameworkViewElementTemplateContext $context, BlackbirdTicketBlasterModelResourceModelEventCollectionFactory $eventCollectionFactory, MagentoStoreModelStoreManagerInterface $storeManager, MagentoCustomerModelSession $customerSession, array $data = [] ) { parent::__construct($context, $data); $this->_storeManager = $storeManager; $this->_eventCollectionFactory = $eventCollectionFactory; $this->_customerSession = $customerSession; } /** * @return BlackbirdTicketBlasterModelResourceModelEventCollection */ public function getEvents() { if (!$this->hasData('events')) { $events = $this->_eventCollectionFactory ->create() ->addOrder( EventInterface::CREATION_TIME, EventCollection::SORT_ORDER_DESC ) ->addStoreFilter($this->_storeManager->getStore()->getId()); $this->setData('events', $events); } return $this->getData('events'); } /** * Return identifiers for produced content * * @return array */ public function getIdentities() { return [BlackbirdTicketBlasterModelEvent::CACHE_TAG . '_' . 'list']; } /** * Is logged in * * @return bool */ public function isLoggedIn() { return $this->_customerSession->isLoggedIn(); } } Note that we have a new property available and instantiated in our constructor: storeManager. Thanks to this class, we can filter our collection with the store view ID by calling the addStoreFilter() method on our events collection. Restricting the frontend access by store view The events will not be listed in our list page if they are not available for the current store view, but they can still be accessed with their direct URL, for example http://[magento_url]/events/view/index/event_id/2. We will change this to restrict the frontend access by store view: Open the [extention_path]/Helper/Event.php file and replace the code with the following code: <?php namespace BlackbirdTicketBlasterHelper; use BlackbirdTicketBlasterApiDataEventInterface; use BlackbirdTicketBlasterModelResourceModelEventCollection as EventCollection; use MagentoFrameworkAppActionAction; class Event extends MagentoFrameworkAppHelperAbstractHelper { /** * @var BlackbirdTicketBlasterModelEvent */ protected $_event; /** * @var MagentoFrameworkViewResultPageFactory */ protected $resultPageFactory; /** * Store manager * * @var MagentoStoreModelStoreManagerInterface */ protected $_storeManager; /** * Constructor * * @param MagentoFrameworkAppHelperContext $context * @param BlackbirdTicketBlasterModelEvent $event * @param MagentoFrameworkViewResultPageFactory $resultPageFactory * @SuppressWarnings(PHPMD.ExcessiveParameterList) */ public function __construct( MagentoFrameworkAppHelperContext $context, BlackbirdTicketBlasterModelEvent $event, MagentoFrameworkViewResultPageFactory $resultPageFactory, MagentoStoreModelStoreManagerInterface $storeManager, ) { $this->_event = $event; $this->_storeManager = $storeManager; $this->resultPageFactory = $resultPageFactory; $this->_customerSession = $customerSession; parent::__construct($context); } /** * Return an event from given event id. * * @param Action $action * @param null $eventId * @return MagentoFrameworkViewResultPage|bool */ public function prepareResultEvent(Action $action, $eventId = null) { if ($eventId !== null && $eventId !== $this->_event->getId()) { $delimiterPosition = strrpos($eventId, '|'); if ($delimiterPosition) { $eventId = substr($eventId, 0, $delimiterPosition); } $this->_event->setStoreId($this->_storeManager->getStore()->getId()); if (!$this->_event->load($eventId)) { return false; } } if (!$this->_event->getId()) { return false; } /** @var MagentoFrameworkViewResultPage $resultPage */ $resultPage = $this->resultPageFactory->create(); // We can add our own custom page handles for layout easily. $resultPage->addHandle('ticketblaster_event_view'); // This will generate a layout handle like: ticketblaster_event_view_id_1 // giving us a unique handle to target specific event if we wish to. $resultPage->addPageLayoutHandles(['id' => $this->_event->getId()]); // Magento is event driven after all, lets remember to dispatch our own, to help people // who might want to add additional functionality, or filter the events somehow! $this->_eventManager->dispatch( 'blackbird_ticketblaster_event_render', ['event' => $this->_event, 'controller_action' => $action] ); return $resultPage; } } The setStoreId() method called on our model will load the model only for the given ID. The events are no longer available through their direct URL if we are not on their available store view. Translation of template interface texts In order to translate the texts written directly in the template file, for the interface or in your PHP class, you need to use the __('Your text here') method. Magento looks for a corresponding match within all the translation CSV files. There is nothing to be declared in XML; you simply have to create a new folder at the root of your module and create the required CSV: Create the [extension_path]/i18n folder. Create [extension_path]/i18n/en_US.csv and add the following code: "Event time:","Event time:" "Please sign in to read more details.","Please sign in to read more details." "Read more","Read more" Create [extension_path]/i18n/en_US.csv and add the following code: "Event time:","Date de l'évènement :" "Pleasesign in to read more details.","Merci de vous inscrire pour plus de détails." "Read more","Lire la suite" The CSV file contains the correspondences between the key used in the code and the value in its final language. Translation of e-mail templates: creating and translating the e-mails We will add a new form in the Details page to share the event to a friend. The first step is to declare your e-mail template. To declare your e-mail template, create a new [extension_path]/etc/email_templates.xml file and add the following code: <?xml version="1.0"?> <config xsi_noNamespaceSchemaLocation="urn:magento:module:Magento_Email:etc/email_templates.xsd"> <template id="ticketblaster_email_email_template" label="Share Form" file="share_form.html" type="text" module="Blackbird_TicketBlaster" area="adminhtml"/> </config> This XML line declares a new template ID, label, file path, module, and area (frontend or adminhtml). Next, create the corresponding template by creating the [extension_path]/view/adminhtml/email/share_form.html file and add the following code: <!--@subject Share Form@--> <!--@vars { "varpost.email":"Sharer Email", "varevent.title":"Event Title", "varevent.venue":"Event Venue" } @--> <p>{{trans "Your friend %email is sharing an event with you:" email=$post.email}}</p> {{trans "Title: %title" title=$event.title}}<br/> {{trans "Venue: %venue" venue=$event.venue}}<br/> <p>{{trans "View the detailed page: %url" url=$event.url}}</p> Note that in order to translate texts within the HTML file, we use the trans function, which works like the default PHP printf() function. The function will also use our i18n CSV files to find a match for the text. Your e-mail template can also be overridden directly from the backoffice: Marketing | Email templates. The e-mail template is ready; we will also add the ability to change it in the system configuration and allow users to determine the sender's e-mail and name: Create the [extension_path]/etc/adminhtml/system.xml file and add the following code: <?xml version="1.0"?> <config xsi_noNamespaceSchemaLocation="urn:magento:module:Magento_Config:etc/system_file.xsd"> <system> <section id="ticketblaster" translate="label" type="text" sortOrder="100" showInDefault="1" showInWebsite="1" showInStore="1"> <label>Ticket Blaster</label> <tab>general</tab> <resource>Blackbird_TicketBlaster::event</resource> <group id="email" translate="label" type="text" sortOrder="50" showInDefault="1" showInWebsite="1" showInStore="1"> <label>Email Options</label> <field id="recipient_email" translate="label" type="text" sortOrder="10" showInDefault="1" showInWebsite="1" showInStore="1"> <label>Send Emails To</label> <validate>validate-email</validate> </field> <field id="sender_email_identity" translate="label" type="select" sortOrder="20" showInDefault="1" showInWebsite="1" showInStore="1"> <label>Email Sender</label> <source_model>MagentoConfigModelConfigSourceEmailIdentity</source_model> </field> <field id="email_template" translate="label comment" type="select" sortOrder="30" showInDefault="1" showInWebsite="1" showInStore="1"> <label>Email Template</label> <comment>Email template chosen based on theme fallback when "Default" option is selected.</comment> <source_model>MagentoConfigModelConfigSourceEmailTemplate</source_model> </field> </group> </section> </system> </config> Create the [extension_path]/etc/config.xml file and add the following code: <?xml version="1.0"?> <config xsi_noNamespaceSchemaLocation="urn:magento:module:Magento_Store:etc/config.xsd"> <default> <ticketblaster> <email> <recipient_email> <![CDATA[hello@example.com]]> </recipient_email> <sender_email_identity>custom2</sender_email_identity> <email_template>ticketblaster_email_email_template</email_template> </email> </ticketblaster> </default> </config> Thanks to these two files, you can change the configuration for the e-mail template in the Admin panel (Stores | Configuration). Let's create our HTML form and the controller that will handle our submission: Open the existing [extension_path]/view/frontend/templates/view.phtml file and add the following code at the end: <form action="<?php echo $block->getUrl('events/view/share', array('event_id' => $event->getId())); ?>" method="post" id="form-validate" class="form"> <h3> <?php echo __('Share this event to my friend'); ?> </h3> <input type="email" name="email" class="input-text" placeholder="email" /> <button type="submit" class="button"><?php echo __('Share'); ?></button> </form> Create the [extension_path]/Controller/View/Share.php file and add the following code: <?php namespace BlackbirdTicketBlasterControllerView; use MagentoFrameworkExceptionNotFoundException; use MagentoFrameworkAppRequestInterface; use MagentoStoreModelScopeInterface; use BlackbirdTicketBlasterApiDataEventInterface; class Share extends MagentoFrameworkAppActionAction { [...] This controller will get the necessary configuration entirely from the admin and generate the e-mail to be sent. Testing our code by sending the e-mail Go to the page of an event and fill in the form we prepared. When you submit it, Magento will send the e-mail immediately. Summary In this article, we addressed all the main processes that are run for internationalization. We can now create and control the availability of our events with regards to Magento's stores and translate the contents of our pages and e-mails. Resources for Article: Further resources on this subject: Magento Theme Distribution [article] Installing Magento [article] Magento 2 – the New E-commerce Era [article]
Read more
  • 0
  • 0
  • 33122

article-image-introducing-r-rstudio-and-shiny
Packt
25 Sep 2015
9 min read
Save for later

Introducing R, RStudio, and Shiny

Packt
25 Sep 2015
9 min read
 In this article, by Hernán G. Resnizky, author of the book Learning Shiny, the main objective will be to learn how to install all the needed components to build an application in R with Shiny. Additionally, some general ideas about what R is will be covered in order to be able to dive deeper into programming using R. The following topics will be covered: A brief introduction to R, RStudio, and Shiny Installation of R and Shiny General tips and tricks (For more resources related to this topic, see here.) About R As stated on the R-project main website: "R is a language and environment for statistical computing and graphics." R is a successor of S and is a GNU project. This means, briefly, that anyone can have access to its source codes and can modify or adapt it to their needs. Nowadays, it is gaining territory over classic commercial software, and it is, along with Python, the most used language for statistics and data science. Regarding R's main characteristics, the following can be considered: Object oriented: R is a language that is composed mainly of objects and functions. Can be easily contributed to: Similar to GNU projects, R is constantly being enriched by user's contributions either by making their codes accessible via "packages" or libraries, or by editing/improving its source code. There are actually almost 7000 packages in the common R repository, Comprehensive R Archive Network (CRAN). Additionally, there are R repositories of public access, such as bioconductor project that contains packages for bioinformatics. Runtime execution: Unlike C or Java, R does not need compilation. This means that you can, for instance, write 2 + 2 in the console and it will return the value. Extensibility: The R functionalities can be extended through the installation of packages and libraries. Standard proven libraries can be found in CRAN repositories and are accessible directly from R by typing install.packages(). Installing R R can be installed in every operating system. It is highly recommended to download the program directly from http://cran.rstudio.com/ when working on Windows or Mac OS. On Ubuntu, R can be easily installed from the terminal as follows: sudo apt-get update sudo apt-get install r-base sudo apt-get install r-base-dev The installation of r-base-dev is highly recommended as it is a package that enables users to compile the R packages from source, that is, maintain the packages or install additional R packages directly from the R console using the install.packages() command. To install R on other UNIX-based operating systems, visit the following links: http://cran.rstudio.com/ http://cran.r-project.org/doc/manuals/r-release/R-admin.html#Obtaining-R A quick guide to R When working on Windows, R can be launched via its application. After the installation, it is available as any other program on Windows. When opening the program, a window like this will appear: When working on Linux, you can access the R console directly by typing R on the command line: In both the cases, R executes in runtime. This means that you can type in code, press Enter, and the result will be given immediately as follows: > 2+2 [1] 4 The R application in any operating system does not provide an easy environment to develop code. For this reason, it is highly recommended (not only to write web applications in R with Shiny, but for any task you want to perform in R) to use an Integrated Development Environment (IDE). About RStudio As with other programming languages, there is a huge variety of IDEs available for R. IDEs are applications that make code development easier and clearer for the programmer. RStudio is one of the most important ones for R, and it is especially recommended to write web applications in R with Shiny because this contains features specially designed for R. Additionally, RStudio provides facilities to write C++, Latex, or HTML documents and also integrates them to the R code. RStudio also provides version control, project management, and debugging features among many others. Installing RStudio RStudio for desktop computers can be downloaded from its official website at http://www.rstudio.com/products/rstudio/download/ where you can get versions of the software for Windows, MAC OS X, Ubuntu, Debian, and Fedora. Quick guide to RStudio Before installing and running RStudio, it is important to have R installed. As it is an IDE and not the programming language, it will not work at all. The following screenshot shows RStudio's starting view: At the first glance, the following four main windows are available: Text editor: This provides facilities to write the R scripts such as highlighting and a code completer (when hitting Tab, you can see the available options to complete the code written). It is also possible to include the R code in an HTML, Latex, or C++ piece of code. Environment and history: They are defined as follows: In the Environment section, you can see the active objects in each environment. By clicking on Global Environment (which is the environment shown by default), you can change the environment and see the active objects. In the History tab, the pieces of codes executed are stored line by line. You can select one or more lines and send them either to the editor or to the console. In addition, you can look up for a certain specific piece of code by typing it in the textbox in the top right part of this window. Console: This is an exact equivalent of R console, as described in Quick guide of R. Tabs: The different tabs are defined as follows: Files: This consists of a file browser with several additional features (renaming, deleting, and copying). Clicking on a file will open it in editor or the Environment tab depending on the type of the file. If it is a .rda or .RData file, it will open in both. If it is a text file, it will open in one of them. Plots: Whenever a plot is executed, it will be displayed in that tab. Packages: This shows a list of available and active packages. When the package is active, it will appear as clicked. Packages can also be installed interactively by clicking on Install Packages. Help: This is a window to seek and read active packages' documentation. Viewer: This enables us to see the HTML-generated content within RStudio. Along with numerous features, RStudio also provides keyboard shortcuts. A few of them are listed as follows: Description Windows/Linux OSX Complete the code. Tab Tab Run the selected piece of code. If no piece of code is selected, the active line is run. Ctrl + Enter ⌘ + Enter Comment the selected block of code. Ctrl + Shift + C ⌘ + / Create a section of code, which can be expanded or compressed by clicking on the arrow to the left. Additionally, it can be accessed by clicking on it in the bottom left menu. ##### ##### Find and replace. Ctrl + F ⌘ + F The following screenshots show how a block of code can be collapsed by clicking on the arrow and how it can be accessed quickly by clicking on its name in the bottom-left part of the window: Clicking on the circled arrow will collapse the Section 1 block, as follows: The full list of shortcuts can be found at https://support.rstudio.com/hc/en-us/articles/200711853-Keyboard-Shortcuts. For further information about other RStudio features, the full documentation is available at https://support.rstudio.com/hc/en-us/categories/200035113-Documentation. About Shiny Shiny is a package created by RStudio, which enables to easily interface R with a web browser. As stated in its official documentation, Shiny is a web application framework for R that makes it incredibly easy to build interactive web applications with R. One of its main advantages is that there is no need to combine R code with HTML/JavaScript code as the framework already contains prebuilt features that cover the most commonly used functionalities in a web interactive application. There is a wide range of software that has web application functionalities, especially oriented to interactive data visualization. What are the advantages of using R/Shiny then, you ask? They are as follows: It is free not only in terms of money, but as all GNU projects, in terms of freedom. As stated in the GNU main page: To understand the concept (GNU), you should think of free as in free speech, not as in free beer. Free software is a matter of the users' freedom to run, copy, distribute, study, change, and improve the software. All the possibilities of a powerful language such as R is available. Thanks to its contributive essence, you can develop a web application that can display any R-generated output. This means that you can, for instance, run complex statistical models and return the output in a friendly way in the browser, obtain and integrate data from the various sources and formats (for instance, SQL, XML, JSON, and so on) the way you need, and subset, process, and dynamically aggregate the data the way you want. These options are not available (or are much more difficult to accomplish) under most of the commercial BI tools. Installing and loading Shiny As with any other package available in the CRAN repositories, the easiest way to install Shiny is by executing install.packages("shiny"). The following output should appear on the console: Due to R's extensibility, many of its packages use elements (mostly functions) from other packages. For this reason, these packages are loaded or installed when the package that is dependent on them is loaded or installed. This is called dependency. Shiny (on its 0.10.2.1 version) depends on Rcpp, httpuv, mime, htmltools, and R6. An R session is started only with the minimal packages loaded. So if functions from other packages are used, they need to be loaded before using them. The corresponding command for this is as follows: library(shiny) When installing a package, the package name must be quoted but when loading the package, it must be unquoted. Summary After these instructions, the reader should be able to install all the fundamental elements to create a web application with Shiny. Additionally, he or she must have acquired at least a general idea of what R and the R project is. Resources for Article: Further resources on this subject: R ─ Classification and Regression Trees[article] An overview of common machine learning tasks[article] Taking Control of Reactivity, Inputs, and Outputs [article]
Read more
  • 0
  • 0
  • 33103

article-image-4-popular-algorithms-distance-based-outlier-detection
Sugandha Lahoti
01 Dec 2017
7 min read
Save for later

4 popular algorithms for Distance-based outlier detection

Sugandha Lahoti
01 Dec 2017
7 min read
[box type="note" align="" class="" width=""]The article is an excerpt from our book titled Mastering Java Machine Learning by Dr. Uday Kamath and  Krishna Choppella.[/box] This book introduces you to an array of expert machine learning techniques, including classification, clustering, anomaly detection, stream learning, active learning, semi-supervised learning, probabilistic graph modelling and a lot more. The article given below is extracted from Chapter 5 of the book - Real-time Stream Machine Learning, explaining 4 popular algorithms for Distance-based outlier detection. Distance-based outlier detection is the most studied, researched, and implemented method in the area of stream learning. There are many variants of the distance-based methods, based on sliding windows, the number of nearest neighbors, radius and thresholds, and other measures for considering outliers in the data. We will try to give a sampling of the most important algorithms in this article. Inputs and outputs Most algorithms take the following parameters as inputs: Window size w, corresponding to the fixed size on which the algorithm looks for outlier patterns. Sliding size s, corresponds to the number of new instances that will be added to the window, and old ones removed. The count threshold k of instances when using nearest neighbor computation. The distance threshold R used to define the outlier threshold in distances. Outliers as labels or scores (based on neighbors and distance) are outputs. How does it work? We present different variants of distance-based stream outlier algorithms, giving insights into what they do differently or uniquely. The unique elements in each algorithm define what happens when the slide expires, how a new slide is processed, and how outliers are reported. Exact Storm Exact Storm stores the data in the current window w in a well-known index structure, so that the range query search or query to find neighbors within the distance R for a given point is done efficiently. It also stores k preceding and succeeding neighbors of all data points: Expired Slide: Instances in expired slides are removed from the index structure that affects range queries but are preserved in the preceding list of neighbors. New Slide: For each data point in the new slide, range query R is executed, results are used to update the preceding and succeeding list for the instance, and the instance is stored in the index structure. Outlier Reporting: In any window, after the processing of expired and new slide elements is complete, any instance with at least k elements from the succeeding list and non-expired preceding list is reported as an outlier. Abstract-C Abstract-C keeps the index structure similar to Exact Storm but instead of preceding and succeeding lists for every object it just maintains a list of counts of neighbors for the windows the instance is participating in: Expired Slide: Instances in expired slides are removed from the index structure that affects range queries and the first element from the list of counts is removed corresponding to the last window. New Slide: For each data point in the new slide, range query R is executed and results are used to update the list count. For existing instances, the count gets updated with new neighbors and instances are added to the index structure. Outlier Reporting: In any window, after the processing of expired and new slide elements is complete, all instances with a neighbors count less than k in the current window are considered outliers. Direct Update of Events (DUE) DUE keeps the index structure for efficient range queries exactly like the other algorithms but has a different assumption, that when an expired slide occurs, not every instance is affected in the same way. It maintains two priority queues: the unsafe inlier queue and the outlier list. The unsafe inlier queue has sorted instances based on the increasing order of smallest expiration time of their preceding neighbors. The outlier list has all the outliers in the current window: Expired Slide: Instances in expired slides are removed from the index structure that affects range queries and the unsafe inlier queue is updated for expired neighbors. Those unsafe inliers which become outliers are removed from the priority queue and moved to the outlier list. New Slide: For each data point in the new slide, range query R is executed, results are used to update the succeeding neighbors of the point, and only the most recent preceding points are updated for the instance. Based on the updates, the point is added to the unsafe inlier priority queue or removed from the queue and added to the outlier list. Outlier Reporting: In any window, after the processing of expired and new slide elements is complete, all instances in the outlier list are reported as outliers. Micro Clustering based Algorithm (MCOD) Micro-clustering based outlier detection overcomes the computational issues of performing range queries for every data point. The micro-cluster data structure is used instead of range queries in these algorithms. A micro-cluster is centered around an instance and has a radius of R. All the points belonging to the micro-clusters become inliers. The points that are outside can be outliers or inliers and stored in a separate list. It also has a data structure similar to DUE to keep a priority queue of unsafe inliers: Expired Slide: Instances in expired slides are removed from both microclusters and the data structure with outliers and inliers. The unsafe inlier queue is updated for expired neighbors as in the DUE algorithm. Microclusters are also updated for non-expired data points. New Slide: For each data point in the new slide, the instance either becomes a center of a micro-cluster, or part of a micro-cluster or added to the event queue and the data structure of the outliers. If the point is within the distance R, it gets assigned to an existing micro-cluster; otherwise, if there are k points within R, it becomes the center of the new micro cluster; if not, it goes into the two structures of the event queue and possible outliers.   Outlier Reporting: In any window, after the processing of expired and new slide elements is complete, any instance in the outlier structure with less than k neighboring instances is reported as an outlier. Advantages and limitations The advantages and limitations are as follows:   Exact Storm is demanding in storage and CPU for storing lists and retrieving neighbors. Also, it introduces delays; even though they are implemented in efficient data structures, range queries can be slow.   Abstract-C has a small advantage over Exact Storm, as no time is spent on finding active neighbors for each instance in the window. The storage and time spent is still very much dependent on the window and slide chosen.   DUE has some advantage over Exact Storm and Abstract-C as it can efficiently re-evaluate the "inlierness" of points (that is, whether unsafe inliers remain inliers or become outliers) but sorting the structure impacts both CPU and memory. MCOD has distinct advantages in memory and CPU owing to the use of the micro-cluster structure and removing the pairwise distance computation. Storing the neighborhood information in micro-clusters helps memory too. Validation and evaluation of stream-based outliers is still an open research area. By varying parameters such as window-size, neighbors within radius, and so on, we determine the sensitivity to the performance metrics (time to evaluate in terms of CPU times per object, Number of outliers detected in the streams,TP/Precision/Recall/ Area under PRC curve) and determine the robustness. If you liked the above article, checkout our book Mastering Java Machine Learning to explore more on advanced machine learning techniques using the best Java-based tools available.
Read more
  • 0
  • 0
  • 33076
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime
article-image-salesforce-is-buying-tableau-in-a-15-7-billion-all-stock-deal
Richard Gall
10 Jun 2019
4 min read
Save for later

Salesforce is buying Tableau in a $15.7 billion all-stock deal

Richard Gall
10 Jun 2019
4 min read
Salesforce, one of the world's leading CRM platforms, is buying data visualization software Tableau in an all-stock deal worth $15.7 billion. The news comes just days after it emerged that Google is buying one of Tableau's competitors in the data visualization market, Looker. Taken together, the stories highlight the importance of analytics to some of the planet's biggest companies. They suggest that despite years of the big data revolution, it's only now that market-leading platforms are starting to realise that their customers want the level of capabilities offered by the best in the data visualization space. Salesforce shareholders will use their stock to purchase Tableau. As the press release published on the Salesforce site explains "each share of Tableau Class A and Class B common stock will be exchanged for 1.103 shares of Salesforce common stock, representing an enterprise value of $15.7 billion (net of cash), based on the trailing 3-day volume weighted average price of Salesforce's shares as of June 7, 2019." The acquisition is expected to be completed by the end of October 2019. https://twitter.com/tableau/status/1138040596604575750 Why is Salesforce buying Tableau? The deal is an incredible result for Tableau shareholders. At the end of last week, its market cap was $10.7 billion. This has led to some scepticism about just how good a deal this is for Salesforce. One commenter on Hacker News said "this seems really high for a company without earnings and a weird growth curve. Their ticker is cool and maybe sales force [sic] wants to be DATA on nasdaq. Otherwise, it will be hard to justify this high markup for a tool company." With Salesforce shares dropping 4.5% as markets opened this week, it seems investors are inclined to agree - Salesforce is certainly paying a premium for Tableau. However, whatever the long term impact of the acquisition, the price paid underlines the fact that Salesforce views Tableau as exceptionally important to its long term strategy. It opens up an opportunity for Salesforce to reposition and redefine itself as much more than just a CRM platform. It means it can start compete with the likes of Microsoft, which has a full suite of professional and business intelligence tools. Moreover, it also provides the platform with another way of potentially onboarding customers - given Tableau is well-known as a powerful yet accessible data visualization tool, it create an avenue through which new users can find their way to the Salesforce product. Marc Benioff, Chair and co-CEO of Salesforce, said "we are bringing together the world’s #1 CRM with the #1 analytics platform. Tableau helps people see and understand data, and Salesforce helps people engage and understand customers. It’s truly the best of both worlds for our customers--bringing together two critical platforms that every customer needs to understand their world.” Tableau has been a target for Salesforce for some time. Leaked documents from 2016 found that the data visualization was one of 14 companies that Salesforce had an interest in (another was LinkedIn, which would eventually be purchased by Microsoft). Read next: Alteryx vs. Tableau: Choosing the right data analytics tool for your business What's in it for Tableau (aside from the money...)? For Tableau, there are many other benefits of being purchased by Salesforce alongside the money. Primarily this is about expanding the platform's reach - Salesforce users are people who are interested in data with a huge range of use cases. By joining up with Salesforce, Tableau will become their go-to data visualization tool. "As our two companies began joint discussions," Tableau CEO Adam Selipsky said, "the possibilities of what we might do together became more and more intriguing. They have leading capabilities across many CRM areas including sales, marketing, service, application integration, AI for analytics and more. They have a vast number of field personnel selling to and servicing customers. They have incredible reach into the fabric of so many customers, all of whom need rich analytics capabilities and visual interfaces... On behalf of our customers, we began to dream about we might accomplish if we could combine our ability to help people see and understand data with their ability to help people engage and understand customers." What will happen to Tableau? Tableau won't be going anywhere. It will continue to exist under its own brand with the current leadership all remaining, including Selipsky. What does this all mean for the technology market? At the moment, it's too early to say - but the last year or so has seen some major high-profile acquisitions by tech companies. Perhaps we're seeing the emergence of a tooling arms race as the biggest organizations attempt to arm themselves with ecosystems of established market-leading tools. Whether this is good or bad for users remains to be seen, however.  
Read more
  • 0
  • 0
  • 33073

article-image-create-local-ubuntu-repository-using-apt-mirror-and-apt-cacher
Packt
05 Oct 2009
7 min read
Save for later

Create a Local Ubuntu Repository using Apt-Mirror and Apt-Cacher

Packt
05 Oct 2009
7 min read
How can a company or organization minimize bandwidth costs when maintaining multiple Ubuntu installations? With bandwidth becoming the currency of the new millennium, being responsible with the bandwidth you have can be a real concern. In this article by Christer Edwards, we will learn how to create, maintain and make available a local Ubuntu repository mirror, allowing you to save bandwidth and improve network efficiency with each machine you add to your network. (For more resources on Ubuntu, see here.) Background Before I begin, let me give you some background into what prompted me to initially come up with these solutions. I used to volunteer at local university user groups whenever they held "installfests". I always enjoyed offering my support and expertise with new Linux users. I have to say, the distribution that we helped deploy the most was Ubuntu. The fact that Ubuntu's parent company, Canonical, provided us with pressed CDs didn't hurt! Despite having more CDs than we could handle we still regularly ran into the same problem. Bandwidth. Fetching updates and security errata was always our bottleneck, consistently slowing down our deployments. Imagine you are in a conference room with dozens of other Linux users, each of them trying to use the internet to fetch updates. Imagine how slow and bogged down that internet connection would be. If you've ever been to a large tech conference you've probably experienced this first hand! After a few of these "installfests" I started looking for a solution to our bandwidth issue. I knew that all of these machines were downloading the same updates and security errata, therefore duplicating work that another user had already done. It just seemed so inefficient. There had to be a better way. Eventually I came across two solutions, which I will outline below. Apt-Mirror The first method makes use of a tool called “apt-mirror”. It is a perl-based utility for downloading and mirroring the entire contents of a public repository. This may likely include packages that you don't use and will not use, but anything stored in a public repository will also be stored in your mirror. To configure apt-mirror you will need the following: apt-mirror package (sudo aptitude install apt-mirror) apache2 package (sudo aptitude install apache2) roughly 15G of storage per release, per architecture Once these requirements are met you can begin configuring the apt-mirror tool. The main items that you'll need to define are: storage location (base_path) number of download threads (nthreads) release(s) and architecture(s) you want The configuration is done in the /etc/apt/mirror.list file. A default config was put in place when you installed the package, but it is generic. You'll need to update the values as mentioned above. Below I have included a complete configuration which will mirror Ubuntu 8.04 LTS for both 32 and 64 bit installations. It will require nearly 30G of storage space, which I will be putting on a removable drive. The drive will be mounted at /media/STORAGE/. # apt-mirror configuration file#### The default configuration options (uncomment and change to override)###set base_path /media/STORAGE/# set mirror_path $base_path/mirror# set skel_path $base_path/skel# set var_path $base_path/var## set defaultarch <running host architecture>set nthreads 20## 8.04 "hardy" i386 mirrordeb-i386 http://us.archive.ubuntu.com/ubuntu hardy main restricted universe multiversedeb-i386 http://us.archive.ubuntu.com/ubuntu hardy-updates main restricted universe multiversedeb-i386 http://us.archive.ubuntu.com/ubuntu hardy-security main restricted universe multiversedeb-i386 http://us.archive.ubuntu.com/ubuntu hardy-backports main restricted universe multiversedeb-i386 http://us.archive.ubuntu.com/ubuntu hardy-proposed main restricted universe multiversedeb-i386 http://us.archive.ubuntu.com/ubuntu hardy main/debian-installer restricted/debian-installer universe/debian-installer multiverse/debian-installerdeb-i386 http://packages.medibuntu.org/ hardy free non-free# 8.04 "hardy" amd64 mirrordeb-amd64 http://us.archive.ubuntu.com/ubuntu hardy main restricted universe multiversedeb-amd64 http://us.archive.ubuntu.com/ubuntu hardy-updates main restricted universe multiversedeb-amd64 http://us.archive.ubuntu.com/ubuntu hardy-security main restricted universe multiversedeb-amd64 http://us.archive.ubuntu.com/ubuntu hardy-backports main restricted universe multiversedeb-amd64 http://us.archive.ubuntu.com/ubuntu hardy-proposed main restricted universe multiversedeb-amd64 http://us.archive.ubuntu.com/ubuntu hardy main/debian-installer restricted/debian-installer universe/debian-installer multiverse/debian-installerdeb-amd64 http://packages.medibuntu.org/ hardy free non-free# Cleaning sectionclean http://us.archive.ubuntu.com/clean http://packages.medibuntu.org/ It should be noted that each of the repository lines within the file should begin with deb-i386 or deb-amd64. Formatting may have changed based on the web formatting. Once your configuration is saved you can begin to populate your mirror by running the command: apt-mirror Be warned that, based on your internet connection speeds, this could take quite a long time. Hours, if not more than a day for the initial 30G to download. Each time you run the command after the initial download will be much faster, as only incremental updates are downloaded. You may be wondering if it is possible to cancel the transfer before it is finished and begin again where it left off. Yes, I have done this countless times and I have not run into any issues. You should now have a copy of the repository stored locally on your machine. In order for this to be available to other clients you'll need to share the contents over http. This is why we listed Apache as a requirement above. In my example configuration above I mirrored the repository on a removable drive, and mounted it at /media/STORAGE. What I need to do now is make that address available over the web. This can be done by way of a symbolic link. cd /var/www/sudo ln -s /media/STORAGE/mirror/us.archive.ubuntu.com/ubuntu/ ubuntu The commands above will tell the filesystem to follow any requests for “ubuntu” back up to the mounted external drive, where it will find your mirrored contents. If you have any problems with this linking double-check your paths (as compared to those suggested here) and make sure your link points to the ubuntu directory, which is a subdirectory of the mirror you pulled from. If you point anywhere below this point your clients will not properly be able to find the contents. The additional task of keeping this newly downloaded mirror updated with the latest updates can be automated by way of a cron job. By activating the cron job your machine will automatically run the apt-mirror command on a regular daily basis, keeping your mirror up to date without any additional effort on your part. To activate the automated cron job, edit the file /etc/cron.d/apt-mirror. There will be a sample entry in the file. Simply uncomment the line and (optionally) change the “4” to an hour more convenient for you. If left with its defaults it will run the apt-mirror command, updating and synchronizing your mirror, every morning at 4:00am. The final step needed, now that you have your repository created, shared over http and configured to automatically update, is to configure your clients to use it. Ubuntu clients define where they should grab errata and security updates in the /etc/apt/sources.list file. This will usually point to the default or regional mirror. To update your clients to use your local mirror instead you'll need to edit the /etc/apt/sources.list and comment the existing entries (you may want to revert to them later!) Once you've commented the entries you can create new entries, this time pointing to your mirror. A sample entry pointing to a local mirror might look something like this: deb http://192.168.0.10/ubuntu hardy main restricted universe multiversedeb http://192.168.0.10/ubuntu hardy-updates main restricted universe multiversedeb http://192.168.0.10/ubuntu hardy-security main restricted universe multiverse Basically what you are doing is recreating your existing entries but replacing the archive.ubuntu.com with the IP of your local mirror. As long as the mirrored contents are made available over http on the mirror-server itself you should have no problems. If you do run into problems check your apache logs for details. By following these simple steps you'll have your own private (even portable!) Ubuntu repository available within your LAN, saving you future bandwidth and avoiding redundant downloads.
Read more
  • 0
  • 0
  • 33055

article-image-working-with-kibana-in-elasticsearch-5-x
Savia Lobo
26 Jan 2018
9 min read
Save for later

Working with Kibana in Elasticsearch 5.x

Savia Lobo
26 Jan 2018
9 min read
[box type="note" align="" class="" width=""]Below given post is a book excerpt from Mastering Elasticsearch 5.x written by  Bharvi Dixit. This book introduces you to the new features of Elasticsearch 5.[/box] The following article showcases Kibana, a tool belongs to the Elastic Stack, and used for visualization and exploration of data residing in Elasticsearch. One can install Kibana and start to explore Elasticsearch indices in minutes — no code, no additional infrastructure required. If you have been using an older version of Kibana, you will notice that it has transformed altogether in terms of functionality. Note: This URL has all the latest changes done in Kibana 5.0: https://www.elastic.co/guide/en/kibana/current/breaking-changes- 5.0.html. Installing Kibana Similar to other Elastic Stack tools, you can visit the following URL to download Kibana 5.0.0, as per your operating system distribution: https://www.elastic.co/downloads/past-releases/kibana-5-0-0 An example of downloading and installing Kibana from the Debian package. First of all, download the package: https://artifacts.elastic.co/downloads/kibana/kibana-5.0.0-amd64.deb Then install it using the following command: sudo dpkg -i kibana-5.0.0-amd64.deb Kibana configuration Once installed, you can find the Kibana configuration file, kibana.yml, inside the/etc/kibana/ directory. All the settings related to Kibana are done only in this file. There is a big list of configuration options available inside the Kibana settings which you can learn about here: https://www.elastic.co/guide/en/kibana/current/settings.html. Starting Kibana Kibana can be started using the following command and it will be started on port 5601 bounded on localhost by default: sudo service kibana start Exploring and visualizing data on Kibana Now all the components of Elastic Stack are installed and configured, we can start exploring the awesomeness of Kibana visualizations. Kibana 5.x is supported on almost all of the latest major web browsers, including Internet Explorer 11+. To load Kibana, you just need to type localhost:5601 in your web browser. You will see different options available in the left panel of the screen, as shown in following figure: These different options are used for the following purposes: Discover: Used for data exploration where you get the access of each field along with a default time. Visualize: Used for creating visualizations of the data in your Elasticsearch indices. You can then build dashboards that display related visualizations. Dashboard: Used to display a collection of saved visualizations. Timelion: A time series data visualizer that enables you to combine totally independent data sources within a single visualization. It is based on simple expression  language. Management: A place where you perform your runtime configuration of Kibana, including both the initial setup and ongoing configuration of index patterns, advanced settings that tweak the behaviors of Kibana itself and saved objects. Dev Tools: Contains the console which is based on the Sense plugin and allows you to write Elasticsearch commands in one tab and see the responses of those commands in the other tab. Understanding the Kibana Management screen The Management screen has three tabs available: Index Patterns: For selecting and configuring index names Saved Objects: Where all of your saved visualizations, searches, and dashboards are located Advanced Settings: Contains advanced settings of Kibana: As you can see on the management screen, the very first tab is for Index Patterns. Kibana is asking you to configure an index pattern so that it can load all the mappings and settings from the defined index. It defaults to logstash-*; you can add as many index patterns or absolute index names as you want and can select them while creating the visualization. Since we do have an index already available with the logstash-* pattern, when you click on the Time-field name drop-down list, you will find that it will show you two fields, @timestamp and received_at, which are of the date type, as shown in following screenshot: We will select the @timestamp field and hit the Create button. As soon as you do it, the following screen appears: In the above screenshot, you can see that Kibana has loaded all the mappings from our Logstash index. In addition, you can see three labels in blue (for marking this index as the default), yellow (for reloading the mappings; this is needed if you have updated the mapping after selecting the index pattern), and red (for deleting this index pattern altogether from Kibana). The second tab on the management screen is about saved objects, which contain all of your saved visualizations, searches, and dashboards as you can see in the following screenshot. Please note that you can see the imported dashboards and visualizations from Metricbeat here, which we have done a while ago. The third option is for Advanced Settings and you should not play with the settings shown on this page if you are not aware of the tweaking effects. Discovering data on Kibana When you move to the Discover page of Kibana, you will see a screen similar to the following: Setting the time range and auto-refresh interval Please note that Kibana by default loads the data of the last 15 minutes, which you change by clicking on the clock sign which you can find in the top-right corner of the screen and selecting the desired time range. We have shown it in the following screenshot: One more thing to take look out for is that, after clicking on this clock sign, apart from time- based settings, you will see one more option in the top corner with the name Auto-refresh. This setting tells Kibana how often it needs to query Elasticsearch. When you click on this setting, you will get the option to choose either to completely turn off the auto-refresh or select the desired time interval. Adding fields for exploration and using the search panel As you can see in the following screenshot, you have all your fields available inside your index. On the Visualization screen, by default Kibana shows the timestamp and _source field but you can add your selected fields from the left panel by just moving the cursor on them and then clicking Add. Similarly, if you want to remove the field from the column, just move the cursor to the field's name on the column heading and click on the cross icon. In addition, Kibana also provides you with a search panel in which you can write queries. For example, in the following screenshot, I have searched for the logstash keyword inside the syslog_message field. When you hit the search button, the search text gets highlighted inside the rendered responses: Exploring more options on the Visualization page On Kibana, you will see lots of small arrow signs to open or collapse the sections/settings. You will see one of these arrows in the following image, in the bottom-left corner (I have also added a custom text on the image just beside the arrow): When you click on this arrow, the time series histogram gets hidden and you get to see the following screen, which contains multiple properties such as Table, which contains the histogram data in tabular format; Request, which contains the actual JSON query sent to Elasticsearch; Response, which contains the JSON response returned from Elasticsearch; and Statistics, which shows the query execution time and number of hits matching the query: Using the Dashboard screen to create/load dashboards When you click on the Dashboard panel, you first get a blank screen with some options, such as New for creating a dashboard and Open to open an existing dashboard, along with some more options. If you are creating a dashboard from scratch, you will have to add the built visualizations onto it and then save it using some name. But since we already have a dashboard available which we imported using Metricbeat, we will click Open and you will see something similar to the following screenshot on your Kibana page: Please note that if you do not have Apache installed on your system, selecting the first option, Metricbeat – Apache HTTPD server status, will load a blank dashboard. You can select any other title; for example, if you select the second option, you will see a dashboard similar to the following: Editing an existing visualization When you move the cursor on the visualizations presented on the dashboard, you will notice that a pencil sign appears, as shown in the following screenshot: When you click on that pencil sign, it will open that particular visualization inside the visualization editor panel, as shown in the following screenshot. Here you can edit the properties and either override the same visualization or save it using some other name: Please note that if you want to create a visualization from scratch, just click on the Visualize option on the left-hand side and it will guide you through the steps of creating the visualization. Kibana provides almost 10 types of visualizations. To get the details about working with each type of visualization, please follow the official documentation of Kibana on this link: https://www.elastic.co/guide/en/kibana/master/createvis.html. Using Sense Inside the Dev-Tools option, you can find the console for Kibana, which was previously known as Sense Editor. This is one of the most wonderful tools to help you speed up the learning curve of Elasticsearch since it provides auto-suggestions for all the endpoints and queries, as shown in the following screenshot: You will see that the Kibana Console is divided into two parts; the left part is where you write your queries/requests, and after clicking the green arrow, the response from Elasticsearch is rendered inside the right-hand panel:   To summarize we explained how to work with the Kibana tool in Elasticsearch 5.x. We explored installation of  Kibana, Kibana configuration, and moving ahead with exploring and visualizing data using Kibana. If you enjoyed this excerpt, and want to get an understanding of how you can scale your ElasticSearch cluster to contextualize it and improve its performance, check out the book Mastering Elasticsearch 5.x.    
Read more
  • 0
  • 0
  • 33036

article-image-how-sql-server-handles-data-under-the-hood
Sunith Shetty
27 Feb 2018
11 min read
Save for later

How SQL Server handles data under the hood

Sunith Shetty
27 Feb 2018
11 min read
[box type="note" align="" class="" width=""]This article is an excerpt from a book written by Marek Chmel and Vladimír Mužný titled SQL Server 2017 Administrator's Guide. In this book, you will learn the required skills needed to successfully create, design, and deploy database using SQL Server 2017.[/box] Today, we will explore how SQL Server handles data as it is of utmost importance to get an understanding of what, when, and why data should be backed. Data structures and transaction logging We can think about a database as of physical database structure consisting of tables and indexes. However, this is just a human point of view. From the SQL Server's perspective, a database is a set of precisely structured files described in a form of metadata also saved in database structures. A conceptual imagination of how every database works is very helpful when the database has to be backed up correctly. How data is stored Every database on SQL Server must have at least two files: The primary data file with the usual suffix, mdf The transaction log file with the usual suffix, ldf For lots of databases, this minimal set of files is not enough. When the database contains big amounts of data such as historical tables, or the database has big data contention such as production tracking systems, it's good practise to design more data files. Another situation when a basic set of files is not sufficient can arise when documents or pictures would be saved along with relational data. However, SQL Server still is able to store all of our data in the basic file set, but it can lead to a performance bottlenecks and management issues. That's why we need to know all possible storage types useful for different scenarios of deployment. A complete structure of files is depicted in the following image: Database A relational database is defined as a complex data type consisting of tables with a given amount of columns, and each column has its domain that is actually a data type (such as an integer or a date) optionally complemented by some constraints. From SQL Server's perspective, the database is a record written in metadata and containing the name of the database, properties of the database, and names and locations of all files or folders representing storage for the database. This is the same for user databases as well as for system databases. System databases are created automatically during SQL Server installation and are crucial for correct running of SQL Server. We know five system databases. Database master Database master is crucial for the correct running of SQL Server service. In this database is stored data about logins, all databases and their files, instance configurations, linked servers, and so on. SQL Server finds this database at startup via two startup parameters, -d and -l, followed by paths to mdf and ldf files. These parameters are very important in situations when the administrator wants to move the master's files to a different location. Changing their values is possible in the SQL Server Configuration Manager in the SQL Server service Properties dialog on the tab called startup parameters. Database msdb The database msdb serves as the SQL Server Agent service, Database Mail, and Service Broker. In this database are stored job definitions, operators, and other objects needed for administration automation. This database also stores some logs such as backup and restore events of each database. If this database is corrupted or missing, SQL Server Agent cannot start. Database model Database model can be understood as a template for every new database while it is created. During a database creation (see the CREATE DATABASE statement on MSDN), files are created on defined paths and all objects, data and properties of database model are created, copied, and set into the new database during its creation. This database must always exist on the instance, because when it's corrupted, database tempdb can be created at instance start up! Database tempdb Even if database tempdb seems to be a regular database like many others, it plays a very special role in every SQL Server instance. This database is used by SQL Server itself as well as by developers to save temporary data such as table variables or static cursors. As this database is intended for a short lifespan (temporary data only, which can be stored during execution of stored procedure or until session is disconnected), SQL Server clears this database by truncating all data from it or by dropping and recreating this database every time when it's started. As the tempdb database will never contain durable data, it has some special internal behavior and it's the reason why accessing data in this database is several times faster than accessing durable data in other databases. If this database is corrupted, restart SQL Server. Database resourcedb The resourcedb is fifth in our enumeration and consists of definitions for all system objects of SQL Server, for example, sys.objects. This database is hidden and we don't need to care about it that much. It is not configurable and we don't use regular backup strategies for it. It is always placed in the installation path of SQL Server (to the binn directory) and it's backed up within the filesystem backup. In case of an accident, it is recovered as a part of the filesystem as well. Filegroup Filegroup is an organizational metadata object containing one or more data files. Filegroup does not have its own representation in the filesystem--it's just a group of files. When any database is created, a filegroup called primary is always created. This primary filegroup always contains the primary data file. Filegroups can be divided into the following: Row storage filegroups: These filegroup can contain data files (mdf or ndf). Filestream filegroups: This kind of filegroups can contain not files but folders to store binary data. In-memory filegroup: Only one instance of this kind of filegroup can be created in a database. Internally, it is a special case of filestream filegroup and it's used by SQL Server to persist data from in-memory tables. Every filegroup has three simple properties: Name: This is a descriptive name of the filegroup. The name must fulfill the naming convention criteria. Default: In a set of filegroups of the same type, one of these filegroups has this option set to on. This means that when a new table or index is created without explicitly specified to which filegroup it has to store data in, the default filegroup is used. By default, the primary filegroup is the default one. Read-only: Every filegroup, except the primary filegroup, could be set to read- only. Let's say that a filegroup is created for last year's history. When data is moved from the current period to tables created in this historical filegroup, the filegroup could be set as read-only, and later the filegroup cannot be backed up again and again. It is a very good approach to divide the database into smaller parts-- filegroups with more files. It helps in distributing data across more physical storage and also makes the database more manageable; backups can be done part by part in shorter times, which better fit into a service window. Data files Every database must have at least one data file called primary data file. This file is always bound to the primary filegroup. In this file is all the metadata of the database, such as structure descriptions (could be seen through views such as sys.objects, sys.columns, and others), users, and so on. If the database does not have other data files (in the same or other filegroups), all user data is also stored in this file, but this approach is good enough just for smaller databases. Considering how the volume of data in the database grows over time, it is a good practice to add more data files. These files are called secondary data files. Secondary data files are optional and contain user data only. Both types of data files have the same internal structure. Every file is divided into 8 KB small parts called data pages. SQL Server maintains several types of data pages such as data, data pages, index pages, index allocation maps (IAM) pages to locate data pages of tables or indexes, global allocation map (GAM) and shared global allocation maps (SGAM) pages to address objects in the database, and so on. Regardless of the type of a certain data page, SQL Server uses a data page as the smallest unit of I/O operations between hard disk and memory. Let's describe some common properties: A data page never contains data of several objects Data pages don't know each other (and that's why SQL Server uses IAMs to allocate all pages of an object) Data pages don't have any special physical ordering A data row must always fit in size to a data page These properties could seem to be useless but we have to keep in mind that when we know these properties, we can better optimize and manage our databases. Did you know that a data page is the smallest storage unit that can be restored from backup? As a data page is quite a small storage unit, SQL Server groups data pages into bigger logical units called extents. An extent is a logical allocation unit containing eight coherent data pages. When SQL Server requests data from disk, extents are read into memory. This is the reason why 64 KB NTFS clusters are recommended to format disk volumes for data files. Extents could be uniform or mixed. Uniform extent is a kind of extent containing data pages belonging to one object only; on the other hand, a mixed extent contains data pages of several objects. Transaction log When SQL Server processes any transaction, it works in a way called two-phase commit. When a client starts a transaction by sending a single DML request or by calling the BEGIN TRAN command, SQL Server requests data pages from disk to memory called buffer cache and makes the requested changes in these data pages in memory. When the DML request is fulfilled or the COMMIT command comes from the client, the first phase of the commit is finished, but data pages in memory differ from their original versions in a data file on disk. The data page in memory is in a state called dirty. When a transaction runs, a transaction log file is used by SQL Server for a very detailed chronological description of every single action done during the transaction. This description is called write-ahead-logging, shortly WAL, and is one of the oldest processes known on SQL Server. The second phase of the commit usually does not depend on the client's request and is an internal process called checkpoint. Checkpoint is a periodical action that: searches for dirty pages in buffer cache, saves dirty pages to their original data file location, marks these data pages as clean (or drops them out of memory to free memory space), marks the transaction as checkpoint or inactive in the transaction log. Write-ahead-logging is needed for SQL Server during recovery process. Recovery process is started on every database every time SQL Server service starts. When SQL Server service stops, some pages could remain in a dirty state and they are lost from memory. This can lead to two possible situations: The transaction is completely described in the transaction log, the new content of the data page is lost from memory, and data pages are not changed in the data file The transaction was not completed at the moment SQL Server stopped, so the transaction cannot be completely described in the transaction log as well, data pages in memory were not in a stable state (because the transaction was not finished and SQL Server cannot know if COMMIT or ROLLBACK will occur), and the original version of data pages in data files is intact SQL Server decides these two situations when it's starting. If a transaction is complete in the transaction log but was not marked as checkpoint, SQL Server executes this transaction again with both phases of COMMIT. If the transaction was not complete in the transaction log when SQL Server stopped, SQL Server will never know what was the user's intention with the transaction and the incomplete transaction is erased from the transaction log as if it had never started. The aforementioned described recovery process ensures that every database is in the last known consistent state after SQL Server's startup. It's crucial for DBAs to understand write-ahead-logging when planning a backup strategy because when restoring the database, the administrator has to recognize if it's time to run the recovery process or not. To summarize, we introduced internal data handling as it is important not only during performance backups and restores but also for optimizing a database. If you are interested to know more about how to backup, recover and secure SQL Server, do checkout this book SQL Server 2017 Administrator's Guide.  
Read more
  • 0
  • 0
  • 33020
article-image-writing-perform-test-functions-in-golang-tutorial
Natasha Mathur
10 Jul 2018
9 min read
Save for later

Writing test functions in Golang [Tutorial]

Natasha Mathur
10 Jul 2018
9 min read
Go is a modern programming language built for the 21st-century application development. Hardware and technology have advanced significantly over the past decade, and most of the other languages do not take advantage of these technological advancements.  Go allows us to build network applications that take advantage of concurrency and parallelism made available with multicore systems. Testing is an important part of programming, whether it is in Go or in any other language. Go has a straightforward approach to writing tests, and in this tutorial, we will look at some important tools to help with testing. This tutorial is an excerpt from the book ‘Distributed computing with Go’, written by V.N. Nikhil Anurag. There are certain rules and conventions we need to follow to test our code. They are as follows: Source files and associated test files are placed in the same package/folder The name of the test file for any given source file is <source-file-name>_test.go Test functions need to have the "Test" prefix, and the next character in the function name should be capitalized In the remainder of this tutorial, we will look at three files and their associated tests: variadic.go and variadic_test.go addInt.go and addInt_test.go nil_test.go (there isn't any source file for these tests) Along the way, we will introduce any concepts we might use. variadic.go function In order to understand the first set of tests, we need to understand what a variadic function is and how Go handles it. Let's start with the definition: Variadic function is a function that can accept any number of arguments during function call. Given that Go is a statically typed language, the only limitation imposed by the type system on a variadic function is that the indefinite number of arguments passed to it should be of the same data type. However, this does not limit us from passing other variable types. The arguments are received by the function as a slice of elements if arguments are passed, else nil, when none are passed. Let's look at the code to get a better idea: // variadic.go package main func simpleVariadicToSlice(numbers ...int) []int { return numbers } func mixedVariadicToSlice(name string, numbers ...int) (string, []int) { return name, numbers } // Does not work. // func badVariadic(name ...string, numbers ...int) {} We use the ... prefix before the data type to define a function as a variadic function. Note that we can have only one variadic parameter per function and it has to be the last parameter. We can see this error if we uncomment the line for badVariadic and try to test the code. variadic_test.go We would like to test the two valid functions, simpleVariadicToSlice, and mixedVariadicToSlice, for various rules defined above. However, for the sake of brevity, we will test these: simpleVariadicToSlice: This is for no arguments, three arguments, and also to look at how to pass a slice to a variadic function mixedVariadicToSlice: This is to accept a simple argument and a variadic argument Let's now look at the code to test these two functions: // variadic_test.go package main import "testing" func TestSimpleVariadicToSlice(t *testing.T) { // Test for no arguments if val := simpleVariadicToSlice(); val != nil { t.Error("value should be nil", nil) } else { t.Log("simpleVariadicToSlice() -> nil") } // Test for random set of values vals := simpleVariadicToSlice(1, 2, 3) expected := []int{1, 2, 3} isErr := false for i := 0; i < 3; i++ { if vals[i] != expected[i] { isErr = true break } } if isErr { t.Error("value should be []int{1, 2, 3}", vals) } else { t.Log("simpleVariadicToSlice(1, 2, 3) -> []int{1, 2, 3}") } // Test for a slice vals = simpleVariadicToSlice(expected...) isErr = false for i := 0; i < 3; i++ { if vals[i] != expected[i] { isErr = true break } } if isErr { t.Error("value should be []int{1, 2, 3}", vals) } else { t.Log("simpleVariadicToSlice([]int{1, 2, 3}...) -> []int{1, 2, 3}") } } func TestMixedVariadicToSlice(t *testing.T) { // Test for simple argument & no variadic arguments name, numbers := mixedVariadicToSlice("Bob") if name == "Bob" && numbers == nil { t.Log("Recieved as expected: Bob, <nil slice>") } else { t.Errorf("Received unexpected values: %s, %s", name, numbers) } } Running tests in variadic_test.go Let's run these tests and see the output. We'll use the -v flag while running the tests to see the output of each individual test: $ go test -v ./{variadic_test.go,variadic.go} === RUN TestSimpleVariadicToSlice --- PASS: TestSimpleVariadicToSlice (0.00s) variadic_test.go:10: simpleVariadicToSlice() -> nil variadic_test.go:26: simpleVariadicToSlice(1, 2, 3) -> []int{1, 2, 3} variadic_test.go:41: simpleVariadicToSlice([]int{1, 2, 3}...) -> []int{1, 2, 3} === RUN TestMixedVariadicToSlice --- PASS: TestMixedVariadicToSlice (0.00s) variadic_test.go:49: Received as expected: Bob, <nil slice> PASS ok command-line-arguments 0.001s addInt.go The tests in variadic_test.go elaborated on the rules for the variadic function. However, you might have noticed that TestSimpleVariadicToSlice ran three tests in its function body, but go test treats it as a single test. Go provides a good way to run multiple tests within a single function, and we shall look them in addInt_test.go. For this example, we will use a very simple function as shown in this code: // addInt.go package main func addInt(numbers ...int) int { sum := 0 for _, num := range numbers { sum += num } return sum } addInt_test.go You might have also noticed in TestSimpleVariadicToSlice that we duplicated a lot of logic, while the only varying factor was the input and expected values. One style of testing, known as Table-driven development, defines a table of all the required data to run a test, iterates over the "rows" of the table and runs tests against them. Let's look at the tests we will be testing against no arguments and variadic arguments: // addInt_test.go package main import ( "testing" ) func TestAddInt(t *testing.T) { testCases := []struct { Name string Values []int Expected int }{ {"addInt() -> 0", []int{}, 0}, {"addInt([]int{10, 20, 100}) -> 130", []int{10, 20, 100}, 130}, } for _, tc := range testCases { t.Run(tc.Name, func(t *testing.T) { sum := addInt(tc.Values...) if sum != tc.Expected { t.Errorf("%d != %d", sum, tc.Expected) } else { t.Logf("%d == %d", sum, tc.Expected) } }) } } Running tests in addInt_test.go Let's now run the tests in this file, and we are expecting each of the row in the testCases table, which we ran, to be treated as a separate test: $ go test -v ./{addInt.go,addInt_test.go} === RUN TestAddInt === RUN TestAddInt/addInt()_->_0 === RUN TestAddInt/addInt([]int{10,_20,_100})_->_130 --- PASS: TestAddInt (0.00s) --- PASS: TestAddInt/addInt()_->_0 (0.00s) addInt_test.go:23: 0 == 0 --- PASS: TestAddInt/addInt([]int{10,_20,_100})_->_130 (0.00s) addInt_test.go:23: 130 == 130 PASS ok command-line-arguments 0.001s nil_test.go We can also create tests that are not specific to any particular source file; the only criteria is that the filename needs to have the <text>_test.go form. The tests in nil_test.go elucidate on some useful features of the language which the developer might find useful while writing tests. They are as follows: httptest.NewServer: Imagine the case where we have to test our code against a server that sends back some data. Starting and coordinating a full blown server to access some data is hard. The http.NewServer solves this issue for us. t.Helper: If we use the same logic to pass or fail a lot of testCases, it would make sense to segregate this logic into a separate function. However, this would skew the test run call stack. We can see this by commenting t.Helper() in the tests and rerunning go test. We can also format our command-line output to print pretty results. We will show a simple example of adding a tick mark for passed cases and cross mark for failed cases. In the test, we will run a test server, make GET requests on it, and then test the expected output versus actual output: // nil_test.go package main import ( "fmt" "io/ioutil" "net/http" "net/http/httptest" "testing" ) const passMark = "u2713" const failMark = "u2717" func assertResponseEqual(t *testing.T, expected string, actual string) { t.Helper() // comment this line to see tests fail due to 'if expected != actual' if expected != actual { t.Errorf("%s != %s %s", expected, actual, failMark) } else { t.Logf("%s == %s %s", expected, actual, passMark) } } func TestServer(t *testing.T) { testServer := httptest.NewServer( http.HandlerFunc( func(w http.ResponseWriter, r *http.Request) { path := r.RequestURI if path == "/1" { w.Write([]byte("Got 1.")) } else { w.Write([]byte("Got None.")) } })) defer testServer.Close() for _, testCase := range []struct { Name string Path string Expected string }{ {"Request correct URL", "/1", "Got 1."}, {"Request incorrect URL", "/12345", "Got None."}, } { t.Run(testCase.Name, func(t *testing.T) { res, err := http.Get(testServer.URL + testCase.Path) if err != nil { t.Fatal(err) } actual, err := ioutil.ReadAll(res.Body) res.Body.Close() if err != nil { t.Fatal(err) } assertResponseEqual(t, testCase.Expected, fmt.Sprintf("%s", actual)) }) } t.Run("Fail for no reason", func(t *testing.T) { assertResponseEqual(t, "+", "-") }) } Running tests in nil_test.go We run three tests, where two test cases will pass and one will fail. This way we can see the tick mark and cross mark in action: $ go test -v ./nil_test.go === RUN TestServer === RUN TestServer/Request_correct_URL === RUN TestServer/Request_incorrect_URL === RUN TestServer/Fail_for_no_reason --- FAIL: TestServer (0.00s) --- PASS: TestServer/Request_correct_URL (0.00s) nil_test.go:55: Got 1. == Got 1. --- PASS: TestServer/Request_incorrect_URL (0.00s) nil_test.go:55: Got None. == Got None. --- FAIL: TestServer/Fail_for_no_reason (0.00s) nil_test.go:59: + != - FAIL exit status 1 FAIL command-line-arguments 0.003s We looked at how to write test functions in Go, and learned a few interesting concepts when dealing with a variadic function and other useful test functions. If you found this post useful, do check out the book  'Distributed Computing with Go' to learn more about testing, Goroutines, RESTful web services, and other concepts in Go. Why is Go the go-to language for cloud-native development? – An interview with Mina Andrawos Systems programming with Go in UNIX and Linux How to build a basic server-side chatbot using Go
Read more
  • 0
  • 0
  • 32940

article-image-big-data-analysis-using-googles-pagerank
Sugandha Lahoti
14 Dec 2017
8 min read
Save for later

Getting started with big data analysis using Google's PageRank algorithm

Sugandha Lahoti
14 Dec 2017
8 min read
[box type="note" align="" class="" width=""]The article given below is a book excerpt from Java Data Analysis written by John R. Hubbard. Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the aim of discovering useful information. Java is one of the most popular languages to perform your data analysis tasks. This book will help you learn the tools and techniques in Java to conduct data analysis without any hassle. [/box] This post aims to help you learn how to analyse big data using Google’s PageRank algorithm. The term big data generally refers to algorithms for the storage, retrieval, and analysis of massive datasets that are too large to be managed by a single file server. Commercially, these algorithms were pioneered by Google, Google’s PageRank being one of them is considered in this article. Google PageRank algorithm Within a few years of the birth of the web in 1990, there were over a dozen search engines that users could use to search for information. Shortly after it was introduced in 1995, AltaVista became the most popular among them. These search engines would categorize web pages according to the topics that the pages themselves specified. But the problem with these early search engines was that unscrupulous web page writers used deceptive techniques to attract traffic to their pages. For example, a local rug-cleaning service might list "pizza" as a topic in their web page header, just to attract people looking to order a pizza for dinner. These and other tricks rendered early search engines nearly useless. To overcome the problem, various page ranking systems were attempted. The objective was to rank a page based upon its popularity among users who really did want to view its contents. One way to estimate that is to count how many other pages have a link to that page. For example, there might be 100,000 links to https://en.wikipedia.org/wiki/Renaissance, but only 100 to https://en.wikipedia.org/wiki/Ernest_Renan, so the former would be given a much higher rank than the latter. But simply counting the links to a page will not work either. For example, the rug-cleaning service could simply create 100 bogus web pages, each containing a link to the page they want users to view. In 1996, Larry Page and Sergey Brin, while students at Stanford University, invented their PageRank algorithm. It simulates the web itself, represented by a very large directed graph, in which each web page is represented by a node in the graph, and each page link is represented by a directed edge in the graph. The directed graph shown in the figure below could represent a very small network with the same properties: This has four nodes, representing four web pages, A, B, C, and D. The arrows connecting them represent page links. So, for example, page A has a link to each of the other three pages, but page B has a link only to A. To analyze this tiny network, we first identify its transition matrix, M : This square has 16 entries, mij, for 1 ≤ i ≤ 4 and 1 ≤ j ≤ 4. If we assume that a web crawler always picks a link at random to move from one page to another, then mij, equals the probability that it will move to node i from node j, (numbering the nodes A, B, C, and D as 1, 2, 3, and 4). So m12 = 1 means that if it's at node B, there's a 100% chance that it will move next to A. Similarly, m13 = m43 = ½ means that if it's at node C, there's a 50% chance of it moving to A and a 50% chance of it moving to D. Suppose a web crawler picks one of those four pages at random, and then moves to another page, once a minute, picking each link at random. After several hours, what percentage of the time will it have spent at each of the four pages? Here is a similar question. Suppose there are 1,000 web crawlers who obey that transition matrix as we've just described, and that 250 of them start at each of the four pages. After several hours, how many will be on each of the four pages? This process is called a Markov chain. It is a mathematical model that has many applications in physics, chemistry, computer science, queueing theory, economics, and even finance. The diagram in the above figure is called the state diagram for the process, and the nodes of the graph are called the states of the process. Once the state diagram is given, the meaning of the nodes (web pages, in this case) becomes irrelevant. Only the structure of the diagram defines the transition matrix M, and from that we can answer the question. A more general Markov chain would also specify transition probabilities between the nodes, instead of assuming that all transition choices are made at random. In that case, those transition probabilities become the non-zero entries of the M. A Markov chain is called irreducible if it is possible to get to any state from any other state. According to the mathematical theory of Markov chains, if the chain is irreducible, then we can compute the answer to the preceding question using the transition matrix. What we want is the steady state solution; that is, a distribution of crawlers that doesn't change. The crawlers themselves will change, but the number at each node will remain the same. To calculate the steady state solution mathematically, we first have to realize how to apply the transition matrix M. The fact is that if x = (x1 , x2 , x3 , x4 ) is the distribution of crawlers at one minute, and the next minute the distribution is y = (y1 , y2 , y3 , y4 ), then y = Mx , using matrix multiplication. So now, if x is the steady state solution for the Markov chain, then Mx = x. This vector equation gives us four scalar equations in four unknowns: One of these equations is redundant (linearly dependent). But we also know that x1 + x2 + x3 + x4 = 1, since x is a probability vector. So, we're back to four equations in four unknowns. The solution is: The point of that example is to show that we can compute the steady state solution to a static Markov chain by solving an n × n matrix equation, where n is the number of states. By static here, we mean that the transition probabilities mij do not change. Of course, that does not mean that we can mathematically compute the web. In the first place, n > 30,000,000,000,000 nodes! And in the second place, the web is certainly not static. Nevertheless, this analysis does give some insight about the web; and it clearly influenced the thinking of Larry Page and Sergey Brin when they invented the PageRank algorithm. The purpose of the PageRank algorithm is to rank the web pages according to some criteria that would resemble their importance, or at least their frequency of access. The original simple (pre-PageRank) idea was to count the number of links to each page and use something proportional to that count for the rank. Following that line of thought, we can imagine that, if x = (x1 , x2 ,..., xn )T is the page rank for the web (that is, if xj is the relative rank of page j and ∑xj = 1), then Mx = x, at least approximately. Another way to put that is that repeated applications of M to x should nudge x closer and closer to that (unattainable) steady state. That brings us (finally) to the PageRank formula: where ε is a very small positive constant, z is the vector of all 1s, and n is the number of nodes. The vector expression on the right defines the transformation function f which replaces a page rank estimate x with an improved page rank estimate. Repeated applications of this function gradually converge to the unknown steady state. Note that in the formula, f is a function of more than just x. There are really four inputs: x, M, ε , and n. Of course, x is being updated, so it changes with each iteration. But M, ε , and n change too. M is the transition matrix, n is the number of nodes, and ε is a coefficient that determines how much influence the z/n vector has. For example, if we set ε to 0.00005, then the formula becomes: This is how Google's PageRank algorithm can be utilized for the analysis of very large datasets. To learn how to implement this algorithm and various other machine learning algorithms for big data, data visualization, and more using Java, check out this book Java Data Analysis.  
Read more
  • 0
  • 0
  • 32934

article-image-getting-started-kinect
Packt
30 Aug 2013
8 min read
Save for later

Getting Started with Kinect

Packt
30 Aug 2013
8 min read
(For more resources related to this topic, see here.) Before the birth of Microsoft Kinect, few people were familiar with the technology of motion sensing. Similar devices have been invented and developed originally for monitoring aerial and undersea aggressors in wars. Then in the non-military cases, motion sensors are widely used in alarm systems, lighting systems and so on, which could detect if someone or something disrupts the waves throughout a room and trigger predefined events. Although radar sensors and modern infrared motion sensors are used more popularly in our life, we seldom notice their existence, and can hardly make use of these devices in our own applications. But Kinect changed everything from the time it was launched in North America at the end of 2010. Different from most other user input controllers, Kinect enables users to interact with programs without really touching a mouse or a pad, but only through gestures. In a top-level view, a Kinect sensor is made up of an RGB camera, a depth sensor, an IR emitter, and a microphone array, which consists of several microphones for sound and voice recognition. A standard Kinect (for Windows) equipment is shown as follows: The Kinect device The Kinect drivers and software, which are either from Microsoft or from third-party companies, can even track and analyze advanced gestures and skeletons of multiple players. All these features make it possible to design brilliant and exciting applications with handsfree user inputs. And until now, Kinect had already brought a lot of games and software to an entirely new level. It is believed to be the bridge between the physical world we exist in and the virtual reality we create, and a completely new way of interacting with arts and a pro fitable business opportunity for individuals and companies. In this article, we will try to make an interesting game with the popular Kinect technology for user inputs, As Kinect captures the camera and depth images as video streams, we can also merge this view of our real-world environment with virtual elements, which is called Augmented Reality (AR) . This enables users to feel as if they appear and live in a nonexistent world, or something unbelievable exists in the physical earth. In this article, we will first introduce the installation of Kinect hardware and software on personal computers, and then consider a good enough idea compounded of Kinect and augmented reality elements. Before installing the Kinect device on your PCs, obviously you should buy Kinect equipment first. In this article, we will depend on Kinect for Windows or Kinect for Xbox 360, which can be learned about and bought at: http://www.microsoft.com/en-us/kinectforwindows/ http://www.xbox.com/en-US/kinect Please note that you don't need to buy an Xbox 360 at all. Kinect will be connected to PCs so that we can make custom programs for it. An alternative choice is Kinect for Windows, which is located at: http://www.microsoft.com/en-us/kinectforwindows/purchase/ The uses and developments of both will be of no difference for our cases. Installation of Kinect It is strongly suggested that you have a Windows 7 operating system or higher. It can be either 32-bit or 64-bit and with dual-core or faster processors. Linux developers can also benefit from third-party drivers and SDKs to manipulate Kinect components. Before we start to discuss the software installation, you can download both the Microsoft Kinect SDK and the Developer Toolkit from: http://www.microsoft.com/en-us/kinectforwindows/develop/developerdownloads.aspx In this article, we prefer to develop Kinect-based applications using Kinect SDK Version 1.5 (or higher versions) and the C++ language. Later versions should be backward compatible so that the source code provided in this article doesn't need to be changed. Setting up your Kinect software on PCs After we have downloaded the SDK and the Developer Toolkit, it's time for us to install them on the PC and ensure that they can work with the Kinect hardware. Let's perform the following steps: Run the setup executable with administrator permissions. Select I agree to the license terms and conditions after reading the License Agreement. The Kinect SDK setup dialog Follow the steps until the SDK installation has finished. And then, install the toolkit following similar instructions. The hardware installation is easy: plug the ends of the cable into the USB port and a power point, and plug the USB into your PC. Wait for the drivers to be found automatically. Now, start the Developer Toolkit Browser, choose Samples: C++ from the tabs, and find and run the sample with the name Skeletal Viewer. You should be able to see a new window demonstrating the depth/ skeleton/color images of the current physical scene, which is similar to the following image: The depth (left), skeleton (middle), and color (right) images read from Kinect Why did I do that? We chose to set up the SDK software at first so that it will install the motor and camera drivers, the APIs, and the documentations, as well as the toolkit including resources and samples onto the PC. If the operation steps are inversed, that is, the hardware is connected before installing the SDK, your Windows OS may not be able to recognize the device. Just start the SDK setup at this time and the device should be identified again during the installation process. But before actually using Kinect, you still have to ensure there is nothing between the device and you (the player). And it's best to keep the play space at least 1.8 m wide and about 1.8 m to 3.6 m long from the sensor. If you have more than one Kinect device, don't keep them face-to-face as there may be infrared interference between them. If you have multiple Kinects to install on the same PC, please note that one USB root hub can have one and only one Kinect connected. The problem happens because Kinect takes over 50 percent of the USB bandwidth, and it needs an individual USB controller to run. So plugging more than one device on the same USB hub means only one of them will work. The depth image at the left in the preceding image shows a human (in fact, the author) standing in front of the camera. Some parts may be totally black if they are too near (often less than 80 cm), or too far (often more than 4 m). If you are using Kinect for Windows, you can turn on Near Mode to show objects that are near the camera; however, Kinect for Xbox 360 doesn't have such features. You can read more about the software and hardware setup at: http://www.microsoft.com/en-us/kinectforwindows/purchase/sensor_setup.aspx The idea of the AR-based Fruit Ninja game Now it's time for us to define the goal we are going to achieve in this article. As a quick but practical guide for Kinect and augmented reality, we should be able to make use of the depth detection, video streaming, and motion tracking functionalities in our project. 3D graphics APIs are also important here because virtual elements should also be included and interacted with irregular user inputs not common mouse or keyboard inputs). A fine example is the Fruit Ninja game, which is already a very popular game all over the world. Especially on mobile devices like smartphones and pads, you can see people destroy different kinds of fruits by touching and swiping their fingers on the screen. With the help of Kinect, our arms can act as blades to cut off flying fruits, and our images can also be shown along with the virtual environment so that we can determine the posture of our bodies and position of our arms through the screen display. Unfortunately, this idea is not fresh enough for now. Already, there are commercial products with similar purposes available in the market; for example: http://marketplace.xbox.com/en-US/Product/Fruit-Ninja-Kinect/66acd000-77fe-1000-9115-d80258410b79 But please note that we are not going to design a completely different product here, or even bring it to the market after finishing this article. We will only learn how to develop Kinect-based applications, work in our own way from the very beginning, and benefit from the experience in our professional work or as an amateur. So it is okay to reinvent the wheel this time, and have fun in the process and the results. Summary Kinect, which is a portmanteau of the words "kinetic" and "connect", is a motion sensor developed and released by Microsoft. It provides a natural user interface (NUI) for tracking and manipulating handsfree user inputs such as gestures and skeleton motions. It can be considered as one of the most successful consumer electronics device in recent years, and we will be using this novel device to build the Fruit Ninja game in this article. We will focus on developing Kinect and AR-based applications on Windows 7 or higher using the Microsoft Kinect SDK 1.5 (or higher) and the C++ programming language. Mainly, we have introduced how to install Kinect for Windows SDK in this article. Resources for Article : Further resources on this subject: So, what is KineticJS? [Article] Mission Running in EVE Online [Article] Making Money with Your Game [Article]
Read more
  • 0
  • 0
  • 32932
article-image-getting-started-with-fortigate-troubleshooting
Packt
20 Nov 2013
6 min read
Save for later

Getting Started with Fortigate: Troubleshooting

Packt
20 Nov 2013
6 min read
Base system diagnostics The status screen in the web-based manager includes a high level overview of information such as the system time (that is important, for example, to have coherent error messages and log recording), CPU and memory usage, license information, and alerts, as we can see in the following screenshot: Although this screen is useful for a rapid assessment of the situation, our diagnostic tools usually have to dig deeper. The first base command we will use in the CLI is get system. This command can open more than eighty information options, dedicated to the different features of the FortiGate units. Among the others, we are able to check counters related to performance, such as: Startup configuration errors with the get system startup-error-log command. Firewall traffic statistics related to the traffic with the get system performance firewall statistics command. Firewall packet distribution statistics with the get system performance firewall packet-distribution command. Information about the most intensive CPU processes with the get system performance top, that will show a screen divided in columns, as we can see in the following screenshot: Another fundamental command we will use is diagnose hardware, which is used for problem-solving procedures related to certificates, devices, PCI, and system information. The devices menu is opened with the diagnose hardware deviceinfo, and includes a disk option to recover information about internal disks (if present) and a nic option to display data from network interfaces. The latter also shows on screen the errors and the drops related to network packets, as we can see in the following screenshot: To have access to real-time information, we will use the diagnose debug command. The diagnose debug report is not a troubleshooting tool, but is used to create a report for the Fortinet technical support. We will talk about additional options for the diagnose debug command later, in relation to TCP/IP debugging. Troubleshooting routing The tools that we will see in the following paragraphs will be required to troubleshoot the addressing and routing features of the TCP/IP protocol. Before we proceed to explain the single tools and commands for troubleshooting, we can take advantage of a real-world suggestion. In order to perform the troubleshooting steps in a more comfortable way, it is often advisable to use a client for SSH and Telnet such as PuTTY (http://bit.ly/1kyS98), to launch two separate sessions on a FortiGate unit. One of the two consoles will be dedicated to watch the results of the debug commands. The second console will be dedicated to launch commands, such as ping and traceroute that we will use to trigger actions that will be visible in the first open console. In the following screenshot we have a diagnose sniffer packet port1 icmp command running on the session opened to the left-hand side and an execute ping command on the session opened on the right-hand side window: Layer 2 and layer 3 TCP/IP diagnostics Some issues can be solved only by correcting the ARP table that associates IP and MAC addresses. The diagnose ip arp list command shows the ARP cache as shown in the following screenshot: The following commands are used to manage the ARP cache: The execute clear system arp table command to remove the ARP cache. The diagnose ip arp delete <interface name> <IP address> command to remove a single ARP entry. The diagnose ip arp flush <interface name> command to remove all entries associated with a single interface. The config system arp-table command to add a static ARP entry. This command requires two further commands: The config system arp-table command The edit command to create a new entry and to modify an existing entry or to create a new one Three mandatory parameters are: set mac, to configure a MAC address for the entry set ip, to configure an IP address for the entry set interface, to select the interface that is connected to the MAC and IP In the following screenshot we can see all the required steps to add the entry number 3 on our ARP cache with the following parameters: ip 192.168.12.1 with a mac F0:DE:F1:E4:75:B9 on the internal interface: We can now take care of layer 3, especially from the point of view of routing. As in any device that manages networking, the most used command (included in the ICMP protocol) is the ping command. A FortiGate unit supports two kinds of ping commands: execute ping <IP address> and a command dedicated to modify the behavior of the ping command, execute ping-options, that includes parameters such as: data-size: To select the datagram size in bytes (between 0 and 65507) interval: To set a value in seconds between two pings repeat-count: To select the number of pings to send source: To specify a source interface (default value is auto-select) view-settings: Used to show the current ping options timeout: To specify time out in seconds In the following screenshot we have modified some ping parameters and verified them with the view-settings parameter: Another fundamental command, based on ICMP is execute traceroute <dest>, that allows us to see all the hops (networking devices) that a network packet traverses, starting from the FortiGate to a destination (which can be an IP address or an FQDN). Having the full path shown can be important to detect a wrong or faulty hop along the path. The usefulness of traceroute is related to how many devices along the route allow the use of the ICMP protocol, but also if we use it only inside to troubleshoot our internal corporate network, the results of this simple command are extremely useful. To show the result of a traceroute and have fun along the way, we can use the so called "Star Wars Traceroute"; execute traceroute 216.81.59.173, that will show the opening crawl to Star Wars Episode IV (a result that was obtained making clever use of hostnames and routing). We can see a (small) part of the result in the following screenshot: The next logical step to debug problems at layer 3 of TCP/IP is to verify the routing table, something that we are able to do with the get router info routing-table all command. The resulting information text could be very lengthy, so we are able to filter the output using the parameters including: details: Show routing table details information rip: Show RIP routing table ospf: Show OSPF routing table isis: Show ISIS routing table static: Show static routing table connected: Show connected routing table database: Show routing information base The routing table shows the routing entries and their origin (the routing protocol that added an entry in the routing table). Summary In this article, the authors have made the understanding of the Base system diagnostics, the troubleshooting of routing, and layer 2 and layer 3 TCP/IP diagnostics better. Useful Links: vCloud Networks Network Virtualization and vSphere Supporting hypervisors by OpenNebula
Read more
  • 0
  • 0
  • 32921

article-image-understanding-patterns-and-architecturesin-typescript
Packt
01 Jun 2016
19 min read
Save for later

Understanding Patterns and Architecturesin TypeScript

Packt
01 Jun 2016
19 min read
In this article by Vilic Vane,author of the book TypeScript Design Patterns, we'll study architecture and patterns that are closely related to the language or its common applications. Many topics in this articleare related to asynchronous programming. We'll start from a web architecture for Node.js that's based on Promise. This is a larger topic that has interesting ideas involved, including abstractions of response and permission, as well as error handling tips. Then, we'll talk about how to organize modules with ES module syntax. Due to the limited length of this article, some of the related code is aggressively simplified, and nothing more than the idea itself can be applied practically. (For more resources related to this topic, see here.) Promise-based web architecture The most exciting thing for Promise may be the benefits brought to error handling. In a Promise-based architecture, throwing an error could be safe and pleasant. You don't have to explicitly handle errors when chaining asynchronous operations, and this makes it tougher for mistakes to occur. With the growing usage with ES2015 compatible runtimes, Promise has already been there out of the box. We have actually plenty of polyfills for Promises (including my ThenFail, written in TypeScript) as people who write JavaScript roughly, refer to the same group of people who create wheels. Promises work great with other Promises: A Promises/A+ compatible implementation should work with other Promises/A+ compatible implementations Promises do their best in a Promise-based architecture If you are new to Promise, you may complain about trying Promise with a callback-based project. You may intend to use helpers provided by Promise libraries, such asPromise.all, but it turns out that you have better alternatives,such as the async library. So, the reason that makes you decide to switch should not be these helpers (as there are a lot of them for callbacks).They should be because there's an easier way to handle errors or because you want to take the advantages of ES async and awaitfeatures which are based on Promise. Promisifying existing modules or libraries Though Promises do their best with a Promise-based architecture, it is still possible to begin using Promise with a smaller scope by promisifying existing modules or libraries. Taking Node.js style callbacks as an example, this is how we use them: import * as FS from 'fs';   FS.readFile('some-file.txt', 'utf-8', (error, text) => { if (error) {     console.error(error);     return; }   console.log('Content:', text); }); You may expect a promisified version of readFile to look like the following: FS .readFile('some-file.txt', 'utf-8') .then(text => {     console.log('Content:', text); }) .catch(reason => {     Console.error(reason); }); Implementing the promisified version of readFile can be easy as the following: function readFile(path: string, options: any): Promise<string> { return new Promise((resolve, reject) => {     FS.readFile(path, options, (error, result) => {         if (error) { reject(error);         } else {             resolve(result);         }     }); }); } I am using any here for parameter options to reduce the size of demo code, but I would suggest that you donot useany whenever possible in practice. There are libraries that are able to promisify methods automatically. Unfortunately, you may need to write declaration files yourself for the promisified methods if there is no declaration file of the promisified version that is available. Views and controllers in Express Many of us may have already been working with frameworks such as Express. This is how we render a view or send back JSON data in Express: import * as Path from 'path'; import * as express from 'express';   let app = express();   app.set('engine', 'hbs'); app.set('views', Path.join(__dirname, '../views'));   app.get('/page', (req, res) => {     res.render('page', {         title: 'Hello, Express!',         content: '...'     }); });   app.get('/data', (req, res) => {     res.json({         version: '0.0.0',         items: []     }); });   app.listen(1337); We will usuallyseparate controller from routing, as follows: import { Request, Response } from 'express';   export function page(req: Request, res: Response): void {     res.render('page', {         title: 'Hello, Express!',         content: '...'     }); } Thus, we may have a better idea of existing routes, and we may have controllers managed more easily. Furthermore, automated routing can be introduced so that we don't always need to update routing manually: import * as glob from 'glob';   let controllersDir = Path.join(__dirname, 'controllers');   let controllerPaths = glob.sync('**/*.js', {     cwd: controllersDir });   for (let path of controllerPaths) {     let controller = require(Path.join(controllersDir, path));     let urlPath = path.replace(/\/g, '/').replace(/.js$/, '');       for (let actionName of Object.keys(controller)) {         app.get(             `/${urlPath}/${actionName}`, controller[actionName] );     } } The preceding implementation is certainly too simple to cover daily usage. However, it displays the one rough idea of how automated routing could work: via conventions that are based on file structures. Now, if we are working with asynchronous code that is written in Promises, an action in the controller could be like the following: export function foo(req: Request, res: Response): void {     Promise         .all([             Post.getContent(),             Post.getComments()         ])         .then(([post, comments]) => {             res.render('foo', {                 post,                 comments             });         }); } We use destructuring of an array within a parameter. Promise.all returns a Promise of an array with elements corresponding to values of resolvablesthat are passed in. (A resolvable means a normal value or a Promise-like object that may resolve to a normal value.) However, this is not enough, we need to handle errors properly. Or in some case, the preceding code may fail in silence (which is terrible). In Express, when an error occurs, you should call next (the third argument that is passed into the callback) with the error object, as follows: import { Request, Response, NextFunction } from 'express';   export function foo( req: Request, res: Response, next: NextFunction ): void {     Promise         // ...         .catch(reason => next(reason)); } Now, we are fine with the correctness of this approach, but this is simply not how Promises work. Explicit error handling with callbacks could be eliminated in the scope of controllers, and the easiest way to do this is to return the Promise chain and hand over to code that was previously performing routing logic. So, the controller could be written like the following: export function foo(req: Request, res: Response) {     return Promise         .all([             Post.getContent(),             Post.getComments()         ])         .then(([post, comments]) => {             res.render('foo', {                 post,                 comments             });         }); } Or, can we make this even better? Abstraction of response We've already been returning a Promise to tell whether an error occurs. So, for a server error, the Promise actually indicates the result, or in other words, the response of the request. However, why we are still calling res.render()to render the view? The returned Promise object could be an abstraction of the response itself. Think about the following controller again: export class Response {}   export class PageResponse extends Response {     constructor(view: string, data: any) { } }   export function foo(req: Request) {     return Promise         .all([             Post.getContent(),             Post.getComments()         ])         .then(([post, comments]) => {             return new PageResponse('foo', {                 post,                 comments             });         }); } The response object that is returned could vary for a different response output. For example, it could be either a PageResponse like it is in the preceding example, a JSONResponse, a StreamResponse, or even a simple Redirection. As in most of the cases, PageResponse or JSONResponse is applied, and the view of a PageResponse can usually be implied with the controller path and action name.It is useful to have these two responses automatically generated from a plain data object with proper view to render with, as follows: export function foo(req: Request) {     return Promise         .all([             Post.getContent(),             Post.getComments()         ])         .then(([post, comments]) => {             return {                 post,                 comments             };         }); } This is how a Promise-based controller should respond. With this idea in mind, let's update the routing code with an abstraction of responses. Previously, we were passing controller actions directly as Express request handlers. Now, we need to do some wrapping up with the actions by resolving the return value, and applying operations that are based on the resolved result, as follows: If it fulfills and it's an instance of Response, apply it to the resobjectthat is passed in by Express. If it fulfills and it's a plain object, construct a PageResponse or a JSONResponse if no view found and apply it to the resobject. If it rejects, call thenext function using this reason. As seen previously,our code was like the following: app.get(`/${urlPath}/${actionName}`, controller[actionName]); Now, it gets a little bit more lines, as follows: let action = controller[actionName];   app.get(`/${urlPath}/${actionName}`, (req, res, next) => {     Promise         .resolve(action(req))         .then(result => {             if (result instanceof Response) {                 result.applyTo(res);             } else if (existsView(actionName)) {                 new PageResponse(actionName, result).applyTo(res);             } else {                 new JSONResponse(result).applyTo(res);             }         })         .catch(reason => next(reason)); });   However, so far we can only handle GET requests as we hardcoded app.get() in our router implementation. The poor view matching logic can hardly be used in practice either. We need to make these actions configurable, and ES decorators could perform a good job here: export default class Controller { @get({     View: 'custom-view-path' })     foo(req: Request) {         return {             title: 'Action foo',             content: 'Content of action foo'         };     } } I'll leave the implementation to you, and feel free to make them awesome. Abstraction of permission Permission plays an important role in a project, especially in systems that have different user groups. For example, a forum. The abstraction of permission should be extendable to satisfy changing requirements, and it should be easy to use as well. Here, we are going to talk about the abstraction of permission in the level of controller actions. Consider the legibility of performing one or more actions a privilege. The permission of a user may consist of several privileges, and usually most of the users at the same level would have the same set of privileges. So, we may have a larger concept, namely groups. The abstraction could either work based on both groups and privileges, or work based on only privileges (groups are now just aliases to sets of privileges): Abstraction that validates based on privileges and groups at the same time is easier to build. You do not need to create a large list of which actions can be performed for a certain group of user, as granular privileges are only required when necessary. Abstraction that validates based on privileges has better control and more flexibility to describe the permission. For example, you can remove a small set of privileges from the permission of a user easily. However, both approaches have similar upper-level abstractions, and they differ mostly on implementations. The general structure of the permission abstractions that we've talked about is like in the following diagram: The participants include the following: Privilege: This describes detailed privilege corresponding to specific actions Group: This defines a set of privileges Permission: This describes what a user is capable of doing, consist of groups that the user belongs to, and the privileges that the user has. Permission descriptor: This describes how the permission of a user works and consists of possible groups and privileges. Expected errors A great concern that was wiped away after using Promises is that we do not need to worry about whether throwing an error in a callback would crash the application most of the time. The error will flow through the Promises chain and if not caught, it will be handled by our router. Errors can be roughly divided as expected errors and unexpected errors. Expected errors are usually caused by incorrect input or foreseeable exceptions, and unexpected errors are usually caused by bugs or other libraries that the project relies on. For expected errors, we usually want to give users a friendly response with readable error messages and codes. So that the user can help themselves searching the error or report to us with useful context. For unexpected errors, we would also want a reasonable response (usually a message described as an unknown error), a detailed server-side log (including real error name, message, stack information, and so on), and even alerts to let the team know as soon as possible. Defining and throwing expected errors The router will need to handle different types of errors, and an easy way to achieve this is to subclass a universal ExpectedError class and throw its instances out, as follows: import ExtendableError from 'extendable-error';   class ExpectedError extends ExtendableError { constructor(     message: string,     public code: number ) {     super(message); } } The extendable-error is a package of mine that handles stack trace and themessage property. You can directly extend Error class as well. Thus, when receiving an expected error, we can safely output the error name and message as part of the response. If this is not an instance of ExpectedError, we can display predefined unknown error messages. Transforming errors Some errors such as errors that are caused by unstable networks or remote services are expected.We may want to catch these errors and throw them out again as expected errors. However, it could be rather trivial to actually do this. A centralized error transforming process can then be applied to reduce the efforts required to manage these errors. The transforming process includes two parts: filtering (or matching) and transforming. These are the approaches to filter errors: Filter by error class: Many third party libraries throws error of certain class. Taking Sequelize (a popular Node.js ORM) as an example, it has DatabaseError, ConnectionError, ValidationError, and so on. By filtering errors by checking whether they are instances of a certain error class, we may easily pick up target errors from the pile. Filter by string or regular expression: Sometimes a library might be throw errors that are instances of theError class itself instead of its subclasses.This makes these errors hard to distinguish from others. In this situation, we can filter these errors by their message with keywords or regular expressions. Filter by scope: It's possible that instances of the same error class with the same error message should result in a different response. One of the reasons may be that the operation throwing a certain error is at a lower-level, but it is being used by upper structures within different scopes. Thus, a scope mark can be added for these errors and make it easier to be filtered. There could be more ways to filter errors, and they are usually able to cooperate as well. By properly applying these filters and transforming errors, we can reduce noises, analyze what's going on within a system,and locate problems faster if they occur. Modularizing project Before ES2015, there are actually a lot of module solutions for JavaScript that work. The most famous two of them might be AMD and CommonJS. AMD is designed for asynchronous module loading, which is mostly applied in browsers. While CommonJSperforms module loading synchronously, and this is the way that the Node.js module system works. To make it work asynchronously, writing an AMD module takes more characters. Due to the popularity of tools, such asbrowserify and webpack, CommonJS becomes popular even for browser projects. Proper granularity of internal modules can help a project keep a healthy structure. Consider project structure like the following: project├─controllers├─core│  │ index.ts│  ││  ├─product│  │   index.ts│  │   order.ts│  │   shipping.ts│  ││  └─user│      index.ts│      account.ts│      statistics.ts│├─helpers├─models├─utils└─views Let's assume that we are writing a controller file that's going to import a module defined by thecore/product/order.ts file. Previously, usingCommonJS style'srequire, we would write the following: const Order = require('../core/product/order'); Now, with the new ES import syntax, this would be like the following: import * as Order from '../core/product/order'; Wait, isn't this essentially the same? Sort of. However, you may have noticed several index.ts files that I've put into folders. Now, in the core/product/index.tsfile, we could have the following: import * as Order from './order'; import * as Shipping from './shipping';   export { Order, Shipping } Or, we could also have the following: export * from './order'; export * from './shipping'; What's the difference? The ideal behind these two approaches of re-exporting modules can vary. The first style works better when we treat Order and Shipping as namespaces, under which the identifier names may not be easy to distinguish from one another. With this style, the files are the natural boundaries of building these namespaces. The second style weakens the namespace property of two files, and then uses them as tools to organize objects and classes under the same larger category. A good thingabout using these files as namespaces is that multiple-level re-exporting is fine, while weakening namespaces makes it harder to understand different identifier names as the number of re-exporting levels grows. Summary In this article, we discussed some interesting ideas and an architecture formed by these ideas. Most of these topics focused on limited examples, and did their own jobs.However, we also discussed ideas about putting a whole system together. Resources for Article: Further resources on this subject: Introducing Object Oriented Programmng with TypeScript [article] Writing SOLID JavaScript code with TypeScript [article] Optimizing JavaScript for iOS Hybrid Apps [article]
Read more
  • 0
  • 0
  • 32879
Modal Close icon
Modal Close icon